Hosting a Local CRAN

The original way of installing R packages is with install.packages that works with a Comprehensive R Archive Network. Even though there are more options nowadays e.g. through the remotes package it can be easier to work with a CRAN.

In this post I am looking into how to make a CRAN that works with install.packages and remotes::install_version.

What does it mean to be a CRAN?

We make a CRAN by having a certain file structure and a bit of metadata available. To host the CRAN the root of this file structure should be available on the server. See also Karl Broman’s guide to miniCRAN which is where I started.

Consider the following CRAN that has only version 0.0.(3|4|5) of the package arithmetic that I have used before. The following files are needed to have a CRAN that serves Linux and Windows (for R 3.6):

.
├── bin
│   └── windows
│       └── contrib
│           └── 3.6
│               ├── PACKAGES
│               ├── PACKAGES.gz
│               ├── PACKAGES.rds
│               └── arithmetic_0.0.5.zip
└── src
    └── contrib
        ├── Archive
        │   └── arithmetic
        │       ├── arithmetic_0.0.3.tar.gz
        │       └── arithmetic_0.0.4.tar.gz
        ├── Meta
        │   └── archive.rds
        ├── PACKAGES
        ├── PACKAGES.gz
        ├── PACKAGES.rds
        └── arithmetic_0.0.5.tar.gz

The folder structure can be made with the miniCRAN package:

miniCRAN::makeRepo(c(), path = "/path/to/cran", type = c("source", "win.binary"))

Package files

The package files are the tar.gz and zip files for each version of the package. If we are inside a project they can be generated with the devtools package. The tar.gz file is the easier one as it only requires a tar command (through Rtools on Windows):

devtools::build(binary = FALSE)

The zip file requires all dependencies installed and a zip command (and compilers, if compiled code is included):

devtools::build(binary = TRUE)

The tar.gz file and zip file can then be copied into the correct folder. Note that there should be only the most recent tar.gz file for each package in src/contrib – the remaining ones/older versions should be in src/contrib/Archive/<package name>.

PACKAGES files

The PACKAGES files contain information about the the newest version of the available packages. When the tar.gz files are in src/contrib they can be generated with

tools::write_PACKAGES("/path/to/cran/src/contrib", type = "source")

When the zip files are in bin/windows/contrib/<version> they can be generated with

tools::write_PACKAGES("/path/to/cran/bin/windows/contrib/<version>", type = "win.binary")

Archive

By default, install.packages will install the most recent version of a package. Older versions of a package can be installed by supplying the entire path to the package or by using remotes::install_version.

To determine if the requested version is available remotes::install_version use archive.rds, that is a serialization of a list with information about the tar.gz files in src/contrib/Archive.

The entries in the list are the names of the packages in the Archive folder:

> a <- readRDS("/path/to/cran/src/contrib/Meta/archive.rds")
> names(a)
>  [1] "arithmetic"

Each entry in the list is a dataframe with information about each tar.gz file in corresponding subfolder:

> colnames(a$arithmetic)
>  [1] "size"   "isdir"  "mode"   "mtime"  "ctime"  "atime"  "uid"    "gid"
>  [9] "uname"  "grname"
>
> rownames(a$arithmetic)
> [1] "arithmetic/arithmetic_0.0.3.tar.gz" "arithmetic/arithmetic_0.0.4.tar.gz"

Each row is the output from file.info of the tar.gz file.

The information in archive.rds can be generated with the following functions:

packages_in_archive <- function(cran_root) {
    archived_package_paths <- fs::dir_ls(file.path(cran_root, "src", "contrib", "Archive"), recurse = 1, glob = "*.tar.gz")
    archived_package_filenames <- basename(archived_package_paths)
    
    package_names <- basename_from_targz(archived_package_filenames)
    
    split(package_metadata(archived_package_paths), package_names)
}

where basename_from_targz removes the _<version>.tar.gz from the filenames

basename_from_targz <- function(targz) {
    vapply(strsplit(targz, "_"), `[[`, character(1), 1)
}

and package_metadata generates the metadata for all tar.gz files

package_metadata <- function(package_files) {
    metadata <- file.info(package_files)
    targz_files <- basename(package_files)
    rownames(metadata) <- file.path(basename_from_targzgz(targz_files), targz_files)

    return(metadata)
}

I suppose the few file manipulations here could be carried out using base R, but I enjoy the fs package and also use it elsewhere in the collection of function I end up creating for maintaining a CRAN.

Location aware packages

To make a package aware of which CRAN it is located in, the DESCRIPTION file should include the the field

Repository: <CRAN url>

If a package depends on other packages from a specific CRAN it can be specified that R should also look here by including yet another field:

Additional_repositories: <CRAN url>

Installing packages

Installing a package from a specific CRAN can be accomplished with install.packages:

install.packages("arithmetic", repos = "<CRAN url>")

If this repository is often used or if dependencies should be installed from multiple CRANs (e.g. an official and private CRAN), it is probably easier to set both CRANs in the options:

options(repos = c(MYCRAN = "<CRAN url>", CRAN = "<offical CRAN url>"))

Installing a specific version of a package can be accomplished by specifying the complete URL of the tar.gz file:

install.packages("<CRAN url>/src/contrib/Archive/arithmetic/arithmetic_0.0.3.tar.gz", repos = NULL, type = "source")

As mentioned earlier, remotes::install_version is a less verbose alternative.

Hosting a CRAN

I like to try out such a homemade CRAN before hosting it. It is possible to specify that R should use a “file protocol” as the URL of the CRAN:

cran_url <- file.path("file://", normalizePath(cran_root, winslash = "/"))

(normalizePath is used to get the correct number of slashes.) This works with install.packages:

install.packages("arithmetic", type = "source", repos = cran_url)

However, it does not work with remotes::install_version.

remotes::install_version("arithmetic", version = "0.0.3", repos = cran_url)

On Linux I get an error like this:

Downloading package from url: file:////path/to/cran/src/contrib/Archive/arithmetic/arithmetic_0.0.3.tar.gz
Error in utils::download.file(url, path, method = method, quiet = quiet,  :
   cannot open URL 'file:////path/to/cran/src/contrib/Archive/arithmetic/arithmetic_0.0.3.tar.gz'

On Windows the error looks like this:

tar.exe: Error opening archive: truncated gzip input
Warning messages:
1: In utils::untar(tarfile, ...) :
  ‘tar.exe -xf "C:\Users\robert\AppData\Local\Temp\RtmpQTtWcH\file2b5c166e5518.tar.gz" -C "C:/Users/robert/AppData/Local/Temp/RtmpQTtWcH/remotes2b5c59f945e8"’ returned error code 1
2: In system(cmd, intern = TRUE) :
  running command 'tar.exe -tf "C:\Users\robert\AppData\Local\Temp\RtmpQTtWcH\file2b5c166e5518.tar.gz"' had status 1

The CRAN has to be accessed through HTTP. With R running, an easy solution is the servr package. The command servr::httd() can host the root of the CRAN and the local IP is accepted by remotes::install_version.