Robert's Data Science Blog

Videos with lots of images

The {av} package is a nice wrapper around the omnipotent ffmpeg tool. I am using the function av::av_encode_video to combine multiple images into a video.

However, one trick was necessary to speed up the process. In my actual usecase with 10.000s of images, it literally took the image generation time from hours to minutes.

Generate images

Let me first generate some data in the following format: Each row in a tibble has the title of the image and the x and y data to be plotted. I am deliberately using base R plotting – I find this much more convenient than transforming my data into a format that {ggplot2} can handle.

tbl <- tibble::tibble(
    x = list(runif(10), runif(8)),
    y = list(runif(10), runif(8)),
    title = c("a", "b")

I then define a function for plotting a row of data:

my_plot <- function(x, y, title) {
    plot(x = x, y = y, main = title)

I also need a function for saving the plots. This function is where there are potential speed-ups. My first version saves each plot and closes the connection:

save_plots <- function(x, y, title, save_dir) {
    png(filename = fs::path(save_dir, title, ext = "png"))
    my_plot(x, y, title)

The plots can now be generated by sweeping through the rows of tbl:

save_dir <- fs::path_temp("my_images")
purrr::pwalk(tbl, save_plots, save_dir = save_dir)

When tbl is large this is very time consuming.

Faster image generation

The png function has a trick: We can specify a “family” of filenames (just as the default filename argument of png) and only call once after all images are saved.

png(filename = fs::path(save_dir, "image%03d.png"))
purrr::pwalk(tbl, my_plot)

The sprintf format “%03d” means that each image is numbered sequentially with 3 digits that are left-padded with zeros if needed (this ensures that the lexicographical ordering aligns with the index ordering). We can see this in the generated images

## [1] "a.png"        "b.png"        "image001.png" "image002.png"