Robert's Data Science Blog

The inst Folder in Other R Packages

When making an R package the inst folder is for files/folders that should be copied unmodified into the installed R package folder.

One of my usecases is to include test data in inst/testdata. When the package’s test suite is executed, data is loaded and can be used as part of the test.

However, the devil is in the detail: Things may behave differently when executing the statements of the test interactively compared to running devtools::test.

The two packages

To make things more concrete I here make two packages: testload and testconceal located in the folder /home/robert/Documents/R.

The testload package

The testload package has the following content:

testload
├── DESCRIPTION
├── inst
│   └── testdata
│       └── testload_data.csv
├── testload.Rproj
├── NAMESPACE
└── R
    └── lookup.R

The location of the installed testload is /home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testload and the contents of the folder is as follows (with the unimportant parts left out):

testload
├── DESCRIPTION
├── help
│   └── ...
├── html
│   └── ...
├── Meta
│   └── ...
├── NAMESPACE
├── R
│   └── ...
└── testdata
    └── testload_data.csv

The location of the file testload_data.csv can be found with the command

system.file("testdata/testload_data.csv", package = "testload")

that returns

"/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testload/testdata/testload_data.csv"

In my actual usecase I have a number of interdependent packages, with test data in a specific format that needs to be parsed. It therefore makes sense to have a single function handling this in one of the packages.

The tricky part is locating the test data, so this is all my example function does here in lookup.R:

get_testdata <- function(filename, pkg) {
	system.file(filename, package = pkg)
}

I noticed an odd behavior with get_testdata when I ran tests. A nice thing about running tests with the testthat package and devtools::test is that it makes all functions of the package available – not just the exported ones.

Looking at the code of devtools::test this is accomplished by calling devtools::load_all. (As a side node, I highly recommend load_all when developing packages.) Hence we can emulate the behavior of devtools::test with load_all.

To illustrate what happens, consider these commands when the testload project is open:

> testload::get_testdata("testdata/testload_data.csv", "testload")
[1] "/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testload/testdata/testload_data.csv"
> devtools::load_all()
Loading testload
> testload::get_testdata("testdata/testload_data.csv", "testload")
[1] "/home/robert/Documents/R/testload/inst/testdata/testload_data.csv"

That is, load_all makes get_testdata find testload_data.csv in the source code folder instead of the installed folder.

The testconceal package

The testconceal package has the following content:

testconceal
├── testconceal.Rproj
├── DESCRIPTION
├── inst
│   └── testdata
│       └── testconceal_data.csv
├── NAMESPACE
└── R

Now trying to look for the testdata in the testconceal package does not work as for the testload package when we are in the testconceal project:

> testload::get_testdata("testdata/testconceal_data.csv", "testconceal")
[1] "/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testconceal/testdata/testconceal_data.csv"
> devtools::load_all()
Loading testconceal
> testload::get_testdata("testdata/testconceal_data.csv", "testconceal")
[1] ""

That’s odd! Not finding any test data will (hopefully!) break the tests. However, just running the body of get_testdata still works:

> system.file("testdata/testconceal_data.csv", package = "testconceal")
[1] "/home/robert/Documents/R/testconceal/inst/testdata/testconceal_data.csv"

Silent failure

The peculiar thing is that devtools silently replaces system.file when attached. Looking up the documentation for system.file before devtools is attached yields only one result. But after attaching devtools we have two choices:

> library(devtools)
> ?system.file
Help on topic ‘system.file’ was found in the following packages:

  Package      Library
  base         /opt/R/3.5.1/lib/R/library
  pkgload      /home/robert/R/x86_64-pc-linux-gnu-library/3.5.1

Choose one

1: Find Names of R System Files {base}
2: Replacement version of system.file {pkgload}

Selection:

From the docs:

[system.file] is meant to intercept calls to base::system.file() … It is made available when a package is loaded with load_all().

The pkgload::system.file does its work by using find.package whose documentation states that

If lib.loc is NULL, then loaded namespaces are searched before the libraries.

So a solution is to specify where system.file should look for the test data. The package folders are available with the .libPaths function. In my case:

> .libPaths()
[1] "/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1"
[2] "/opt/R/3.5.1/lib/R/library"

The get_testdata function is updated as follows:

get_testdata <- function(filename, pkg) {
	system.file(filename, package = pkg, lib.loc = .libPaths()[1])
}

Now we get consistent results:

> testload::get_testdata("testdata/testconceal_data.csv", "testconceal")
[1] "/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testconceal/testdata/testconceal_data.csv"
> devtools::load_all()
Loading testconceal
> testload::get_testdata("testdata/testconceal_data.csv", "testconceal")
[1] "/home/robert/R/x86_64-pc-linux-gnu-library/3.5.1/testconceal/testdata/testconceal_data.csv"

The downside of this approach is that when the test data is updated the package must be installed before it can be used.

Furthermore, to avoid these silent empty strings it may be a good idea to ask for errors if no file is found using the mustWork argument to system.file.