Friday, September 12, 2014

Embedding RData files in Rmarkdown files for more reproducible analyses

For those of us interested in reproducible analysis, Rmarkdown is a great way of communicating our code to other researchers. Rstudio, in particular, makes it very easy to create attractive HTML document containing text, code, and figures, which can then be sent to colleagues or put on the internet for anyone to see. If you aren't using Rmarkdown for your statistical analyses, I recommend you start; you'll never go back to simple script files again (and your colleagues won't want you to).

In this post, I describe how to improve your Rmarkdown by embedding data that can be downloaded by anyone viewing the document in a modern browser with javascript enabled. For a quick look, see the example Rmd file and resulting HTML file.

One of the drawbacks of Rmakdown, from a reproducible analysis perspective, is that the data is not a part of the document itself. Typically, an Rmarkdown file will use R code to load a file from your disk, and when you send the resulting HTML file to a colleague, or put it on the internet, that file is separate. It must be sent in an email or placed on a server to be downloaded.

This raises the possibility that the data could get separated from the code, and I think this is a terrible thing for reproducible analysis. In my mind, the data and the document and data should travel together as a single document. What we would like is a method of encoding R data into the HTML file such that any user who has access to the HTML file can download it, without even having access to the internet.

As it turns out, files can be encoded in an HTML document via the URI data scheme. All we need is an R function that encodes the data, and produces a link to enable downloading the data.
setDownloadURI = function(list, filename = stop("'filename' must be specified"), textHTML = "Click here to download the data.", fileext = "RData", envir = parent.frame()){
  require(base64enc,quietly = TRUE)
  divname = paste(sample(LETTERS),collapse="")
  tf = tempfile(pattern=filename, fileext = fileext)
  save(list = list, file = tf, envir = envir)
  filenameWithExt = paste(filename,fileext,sep=".")

  uri = dataURI(file = tf, mime = "application/octet-stream", encoding = "base64")
  cat("<a style='text-decoration: none' id='",divname,"'></a>
    var a = document.createElement('a');
    var div = document.getElementById('",divname,"');
    a.setAttribute('href', '",uri,"');
    a.innerHTML = '",textHTML,"' + ' (",filenameWithExt,")';
    if (typeof != 'undefined') {
      a.setAttribute('download', '",filenameWithExt,"');
      a.setAttribute('onclick', 'confirm(\"Your browser does not support the download HTML5 attribute. You must rename the file to ",filenameWithExt," after downloading it (or use Chrome/Firefox/Opera). \")');
The first argument of the function, list, is a character vector containing names of variables to save in the RData file.

Once this function is declared, all we need to do is call it in our Rmd file. If we use the argument results = 'asis' in our R code block, it will inject the appropriate HTML code into our compiled HTML document to allow a download of the embedded data as an RData file, and anyone with the HTML file can download it.

Unfortunately, blogger will not allow me to embed the data into a post; therefore, a complete, self-contained example Rmd file can be found here, and the resulting HTML file can be found here.

Keep in mind, however, that the data file is actually embedded in the HTML file. This means that the resulting HTML file can be very large, if your data file is large. Also consider that data are encoded in base64, which increases the size of the file by about a third over the equivalent RData binary file. For very large data sets, one might consider hosting them outside of the HTML file; but for many purposes, the technique I describe will improve the ease with which you can share reproducible analyses.


  1. This is a fantastic idea! Could you also store the Rmd file as an R object in the list of objects to save? That would avoid pasting the Rmd code in the HTML file or keeping it as a separate file.

    1. In my haste (and impatience I guess) I came up with this little snippet:

      1. Import your Rmd script as an R object using
      script1 <- readChar("script.Rmd","script.Rmd")$size)
      2. Add instructions to save this object to the computer using
      cat(script1, file = "script1.Rmd")
      3. And then instruct to open "script1.Rmd" and run it.

  2. Hi Tom, that's an interesting idea. That way, the Rmd can always be had by simply downloading it. But why not compress the Rmd file then insert the compressed file into the HTML document in the same manner that the Rdata file is inserted above?

    1. A bit of searching revealed that on Windows this might not work, since zip might not be in the path. I guess saving it in the Rdata object is the next best thing.

  3. Good point, Richard. That might make it more straight forward.

  4. What’s up, every time i used to check blog posts here in the early hours in the break of day, for the reason that i enjoy to gain knowledge of more and more.
    Lol Elos