Robert's Data Science Blog

Enums and Function Factories in R

I have run into a usecase of enums in R. I have a package that provides functions for interfacing with some REST endpoints – that is, essentially wrapping calls with httr package.

Some the query parameters for these endpoints use “magic numbers”. So instead of e.g. a query with key=foo I would write key=1. I have the magic numbers, but I prefer to use meaningful names and translate these into the magic numbers. This is where the enums come in.

In this post I will focus on my approach to using enums with an artifical example and not focus more on my actual usecase.

The veggie package

I am going to make a small package veggie with two enums Fruit and Vegetable.

usethis::create_package("veggie")

Package data

Now the magic numbers and their associated memorable names are collected. I follow the practice from the R Packages book and save the steps needed to generate the data. Set up the structure with usethis::use_data_raw().

I use the script data-raw/enums.R along with a command to save them as internal package data:

Fruit <- c(
    Apple = 0L,
    Banana = 1L
)

Vegetable <- c(
    Broccoli = 0L,
    Cauliflower = 1L
)

usethis::use_data(Fruit, Vegetable, internal = TRUE)

This generates the file R/sysdata.rda. The enums are made internal in order to not tempt direct usage, but rely on the later abstractions.

The simplest form is to use these vectors directly and getting the magic numbers with Fruit$Apple. However, this is pretty verbose and require users to interact too much with the inner workings of the package. This makes it difficult to change things in a backwards compatible manner (maybe we want to use lists instead of vectors or the unofficial R package with an enumeration type).

Instead, I prefer to make a few abstractions. The first thing is to make functions that converts between the text and the magic number. I rely on S3 dispatch to overload the function name convert_fruit:

convert_fruit <- function(fruit) {
    UseMethod("convert_fruit")
}

Now I can make a method that translates the magic numbers (integers) to text:

convert_fruit.integer <- function(fruit_number) {
    int_match <- which(Fruit == fruit_number)

    names(Fruit)[int_match]
}

And a method that translates the (text) names to the magic numbers:

convert_fruit.character <- function(fruit_name) {
    name_match <- which(names(Fruit) == fruit_name)

    Fruit[[name_match]]
}

This is still pretty verbose and tightly coupled to the representation I use. It is also difficult for a user to know which fruits are available without glancing at the Fruit vector. Finally, when converting the fruit names I don’t want to be case sensitive; convert_fruit("apple") and convert_fruit("Apple") should both work.

Therefore, another function is introduced:

list_fruits <- function(lowercase = FALSE) {
    fruit_names <- names(Fruit)

    if (isTRUE(lowercase))
        fruit_names <- tolower(fruit_names)

    return(fruit_names)
}

Now names(Fruit) can be replaced with list_fruits in both convert_fruit methods:

convert_fruit.integer <- function(fruit_number) {
    int_match <- which(Fruit == fruit_number)

    list_fruits()[int_match]
}

And a method that translates the (text) names to the magic numbers:

convert_fruit.character <- function(fruit_name) {
    name_match <- which(list_fruits(TRUE) == tolower(fruit_name))

    Fruit[[name_match]]
}

Error handling

In a sunshine scenario the convert_fruit methods work. But it is better to handle errors – for example if a non-existing fruit_name is given as input:

convert_fruit.character <- function(fruit_name) {
    name_match <- which(list_fruits(TRUE) == tolower(fruit_name))

    if (length(name_match) == 0)
        stop("No matching fruit")

    if (length(name_match) > 1)
        stop("Multiple matching fruits")

    Fruit[[name_match]]
}

Similar checks can be introduced in convert_fruit.integer. The checks I have introduced here mean that the convert_fruit methods are not vectorized.

Function factories

I think the above approach is a reasonable way of interacting with magic numbers. Unfortunately, it also ended up being quite long. Simply copy-pasting everything for the Vegetable enum is not very DRY and it certainly doesn’t scale well if there are more enums.

Instead, I want to use function factories to generate these functions

Consider the functions for returning the fruit/vegetable names:

list_names_factory <- function(enum) {
    function(lowercase = FALSE) {
        stopifnot(is.logical(lowercase))

        enum_strings <- names(enum)

        if (isTRUE(lowercase))
            enum_strings <- tolower(enum_strings)

        return(enum_strings)
    }
}

The actual functions are now one-liners (in the same .R file as list_names_factory):

#' List fruits
#'
#' @export
list_fruit <- list_names_factory(Fruit)

#' List vegetables
#'
#' @export
list_vegetable <- list_names_factory(Vegetable)

An alternative to generating every function on its own line with a call to list_names_factory is use a loop:

list_functions <- lapply(
    c("list_fruit" = "Fruit", "list_vegetable" = "Vegetable"),
    function(x) list_names(get(x))
)

The functions are included in the package with list2env (that must also be executed in one of the .R files in the package for environment() to point to the correct environment):

list2env(list_functions, envir = environment())

The downside of this approach is that the functions now have to be documented in a different location in the code and more explicitly.

#' @title List fruit
#'
#' @name list_fruit
#'
#' @export
NULL

I have not settled on which way I prefer…