Programmatic Lecture Slides Made Even More Difficult with R Markdown

DHkludgetasticTeX

In Easy Lecture Slides Made Difficult, I showed how to use markdown to make slides while still retaining some aesthetic flexibility. All that was required was a handful of TeX Macros, a little python script, a few Makefiles, and a maniacal commitment to automation. That was an enjoyable trip into the abyss, but what if your slides regularly include the results of calculations, data visualizations, or computational examples? Then it is time to, how do they say it these days, dive deeper. Thanks to R Markdown and knitr, it is possible to build on the pandoc/beamer system I described before to incorporate program code and its results. Call it, uh, Presentationally Literate Programming.

I am spurred to describe my approach by two things: first, I have had a whole semester of teaching Literary Data to work out the kinks; in that graduate course I regularly used R markdown-based slides to present new material. (Here are a couple examples: slides on network visualization, April 23, 2015; slides on topic modeling and PCA for literary texts, April 16, 2015.) Second, I was just at a conference where almost everyone’s presentation had slides with R output in them, but it seemed like people might…benefit…from an example showing how to make the aesthetics of the R output and the slides more consistent with one another. I’ll also show how the same R markdown can be the basis of slides, speaker notes, and a handout for audience members to take notes on. And I’ll show the settings for incorporating program code into the slides, which I used in my teaching. At the end of this post, I imagine you will question my sanity. The whole setup can be found in this repository subdirectory on github.

[Update, November 2015. I’ve made a slightly easier-to-use version of the same setup available as an R package here.]

The source

The hard part is writing the source file, talk.Rmd. For a presentation, you write an R markdown file that is just like a regular markdown for beamer slides—including all the LaTeX and pandoc tweaks you want, like fancier overlay specifications, arbitrarily positioned blocks, automatic citations,1 and so on—but you also include R code chunks that generate some more markdown from computations. The knitting step runs those computations and yields talk.md. From there everything proceeds as with ordinary markdown slides (see the earlier post).

Graphics

knitr has excellent support for including figures generated by R code. So excellent, in fact, that we can be quite demanding about the appearance of visualizations in our slides. For a long time I was frustrated by the fact that I preferred light-colored text on a dark slide background, but ggplot2 defaults to dark marks on a light background (with the trademark light-grey grid). Either one has to revert to black-on-white slides (aaggh! so bright! blinding!) or one has ugly white blocks in the middle of one’s elegant black slides. And then by default ggplot2, like the rest of R graphics, is very impoverished typographically. Why typeset slides with LaTeX only to paste in pixelated PDFs or even PNGs?

TikZ and xelatex to the rescue! I’ve mentioned this combination before. With TikZ graphics, the tikzDevice R package, and knitr’s miraculous wrangling of all of the above, you can get your nice ggplot visualizations to render with TeX primitives. That way your charts can match the typeface of the rest of your slides, and your curves will be as smooth as xelatex can make them (pretty smooth). With xelatex, that typeface can be pretty much any font on your system. All you need is the following straightforward addition to the first code chunk in talk.Rmd:

opts_chunk$set(comment=NA, collapse=T, cache=T, autodep=T,
               dev="tikz", fig.width=4.5, fig.height=2.75,
               size ='footnotesize',
               dev.args=list(pointsize=9),
               message=F, warning=F, autodep=T)
options(tikzDefaultEngine="xetex")
options(tikzXelatexPackages=c(
    "\\usepackage{tikz}\n",
    "\\usepackage[active,tightpage,xetex]{preview}\n",
    "\\usepackage{fontspec,xunicode}\n",
    "\\setmainfont{Gill Sans}\n",
    "\\PreviewEnvironment{pgfpicture}\n",
    "\\setlength\\PreviewBorder{0pt}\n"))

Replace Gill Sans with the name of your font of choice. Choose a typeface that’s meant for display and readable at a distance.

A bunch of things happen in the opts_chunk$set() call. The important one for TikZ graphics is the dev="tikz" argument. But the default figure sizing parameters, fig.width, fig.height, are also important. Beamer slides are “physically” 128 mm by 96 mm, so all dimensions are relative to those lengths. After some trial and error I’ve found these figure size defaults (knitr dimensions are always in inches) to look all right in most circumstances. One consequence of the oddness of beamer’s dimensions is that picking the right type size takes some guesswork. And then figuring out how type sizing will work in graphics that are scaled and rescaled by the typesetting process is another layer of trial and error. I haven’t solved this problem in a principled way, but setting the text size to LaTeX’s footnotesize seems to work out all right. I’m not even sure whether the dev.args makes any difference to the tikzDevice.

This will improve the type quality of graphics, but does nothing about the colors. If you use the Keynote-esque white-on-almost-black theme I do, you’ll want to adjust ggplot’s colors. All you have to do is make use of ggplot’s extremely simple and streamlined theme function:

plot_theme <- function() {
    base_size <- 9
    dark <- "gray10"
    light <- "white"
    theme_grey(base_size=base_size) %+replace% theme(
      axis.line=element_blank(), 
      axis.text.x=element_text(size=base_size*0.8, color=light,
                               lineheight=0.9, vjust=1), 
      axis.text.y=element_text(size=base_size*0.8, color=light,
                               lineheight=0.9, hjust=1), 
      axis.ticks=element_line(color=light, size = 0.2), 
      axis.title.x=element_text(size=base_size, color=light, vjust=0),
      axis.title.y=element_text(size=base_size, color=light, angle=90,
                                vjust=0.5), 
      axis.ticks.length=grid::unit(0.3, "lines"), 
      axis.ticks.margin=grid::unit(0.5, "lines"),
      legend.background=element_rect(color=NA, fill=dark), 
      legend.key=element_rect(color=light, fill=dark), 
      legend.key.size=grid::unit(1.2, "lines"), 
      legend.key.height=NULL, 
      legend.key.width=NULL,     
      legend.text=element_text(size=base_size * 0.8, color=light), 
      legend.title=element_text(size=base_size * 0.8, color=light), 
      legend.text.align=NULL, 
      legend.title.align=NULL, 
      legend.position="bottom",
      legend.box=NULL,
      panel.background=element_rect(fill=dark, color = NA), 
      panel.border=element_rect(fill=NA, color=light), 
      panel.grid.major=element_line(color="gray20"), 
      panel.grid.minor=element_blank(), 
      panel.margin=grid::unit(0.25, "lines"),  
      strip.background=element_rect(fill="gray40",color="gray20"), 
      strip.background=element_rect(fill=dark,color=light), 
      strip.text.x=element_text(size=base_size * 0.8, color=light), 
      strip.text.y=element_text(size=base_size * 0.8, color=light,
                                angle=-90), 
      plot.background=element_rect(color=dark,fill=dark), 
      plot.title=element_text(size=base_size*1.2,color=light), 
      plot.margin=grid::unit(c(1,1,0.5,0.5),"lines")
    )
}

Most of this is the work of Jon Lefcheck, but since my slide backgrounds are 90% black I’ve fiddled the colors. It’s sort of flexible, but you’ll have to manually adjust the strip.background colors if you adjust the base dark or light color. I’ve included this function in slide-utils.R. Just add plot_theme() (note the function invocation) to each ggplot you print. (Alternatively you can set up a chunk hook to run theme_set. That would be more elegant but I haven’t gotten around to it.) You can of course apply further theme settings by adding on another theme() invocation.

That’s only for ggplot2 graphics, of course. On those rare occasions when you go to another graphics system, well, you’ll just have to make similar adjustments. If the system, like base graphics, is stateful rather than (relatively) functional like ggplot2, you have to wrestle a bit with knitr. For example, when I wanted to include igraph network visualizations in a slide set, I added an output hook to ensure that some graphical parameters were always set:

knit_hooks$set(igraph=function(before, options, envir) {
    if (before) {
        par(bg="gray10", fg="white")
        igraph.options(plot.asp=1 / 2, vertex.label.family="Helvetica",
                       vertex.label.color="white", vertex.color=NA,
                       vertex.frame.color="white")
    }
})

Then I gave chunks that plotted igraph objects the igraph=T chunk option. Yes, that aspect ratio is pretty weird, and I had to fiddle with the fig.width and fig.height on those chunks too before the results looked sane.2

Tables

Tabular displays of data are important (and, as Saint Tufte says, sometimes superior to graphical displays). To get them to look respectable, you need to know a few fiddly things. First of all, in the YAML metadata block, set tables: true to ensure that the generated LaTeX file will include the right LaTeX packages for typesetting tables (booktabs and longtable, not that you should need the latter. You might also need tabularx, which requires adjusting the generation of the LaTeX preamble yourself.)

Then you need a way to turn R tabular data forms into print tables. knitr has an underappreciated function, kable, for generating markup for tables (in markdown, HTML, or LaTeX). I learned about kable late, so I have instead typically used the xtable package. This is has more options but is frustratingly poorly designed, with options unpredictably split between the xtable function and the print.xtable method. Here is a wrapper function for my most common use of xtable, printing a data frame as a table and rounding numbers uniformly:

print_tabular <- function (x, digits=0,
                           alignment=paste(ifelse(sapply(x, is.numeric),
                                                  "r", "l"),
                                           collapse=""),
                           include.colnames=T,
                           floating=F, caption=NULL, label=NULL,
                           ...) {
    if (length(alignment) != 1 || !is.character(alignment)) {
        stop("alignment must be a character vector of length 1")
    }
    if (nchar(alignment) == 1) {
        alignment <- paste(rep(alignment, length(x)), collapse="")
    }
    if (!is.data.frame(x)) {
        stop("x is not a data frame")
    }

    alignment <- paste("l", alignment, sep="")
    xt <- xtable(x, digits=digits, align=alignment,
                 caption=caption, label=label)
    print(xt, comment=F, include.rownames=F,
          include.colnames=include.colnames,
          floating=floating, booktabs=T,
          tabular.environment=ifelse(floating, "tabular", "longtable"),
          ...)
}

[Edit, 7/29/15: for a tweaked version that doesn’t require you to set the results="asis" chunk option whenever you use this, see the updated code on github.]

Here are the parameters:

x: a data frame. Convert matrices and contingency tables yourself.

digits: the number of digits after the decimal point.

alignment: a single-element character vector giving the column alignments according to the tabular syntax (l, c, r for left, center, right, and p{dim} for text wrapped in a box of width dim—for example p{2 in}; but see below for remarks on dimensions on slides). By default numeric columns are right aligned and the rest are left aligned.

include.colnames: whether to print the data frame column names as column headers. It often makes sense to assign something human-friendly to the colnames first. This function assumes you never want to print rownames. Keeping anything interesting in the rownames of a data frame is a bad idea anyway. If you want rownames, add a data column on the front instead.

floating, caption: whether just to drop the table in the flow of text or to put it in a LaTeX environment like table. floating=T is probably what you want, with a caption specified; beamer centers floating tables on slides as you’d hope.

label: the LaTeX label of the table (if floating=T), so that you can refer to the table number—but why do that in a talk?

...: the rest of the parameters are passed on to print.xtable.

I’ve included this function in the slide-utils.R script that gets sourced at the start of the main talk.Rmd file.

R code

For my teaching, I frequently wanted to show students my source code as well as my results. knitr is great at this. For light-on-dark slides, the default syntax highlighting wasn’t ideal. I prefer the Zenburn theme, which is easily set as a pandoc option (see below). What was harder was getting even a few lines of code and code output on a slide. The default sizing is very readable—but so large that it crowds even fairly compact little blocks of code off the slide. Adjusting the type size proved a little tricky. For code blocks, we adjust the size with a LaTeX preamble line:

\RecustomVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\},fontsize=\small}

For code output, we need two steps. First we need to enclose the output in a LaTeX environment we can customize later. For this we need to use an output hook. I lifted the following from knitr:::.verb.hook (yes, seven dots there), which is invoked by its render_latex method:

knit_hooks$set(output=function (x, options) {
    paste(c("\\begin{ROutput}",
            sub("\n$", "", x),
            "\\end{ROutput}",
            ""),
          collapse="\n")
})

(h/t to StackOverflow, probably). Then the LaTeX preamble needs to define this environment, using commands from the fancyvrb package:

\DefineVerbatimEnvironment{ROutput}{Verbatim}{frame=single,fontsize=\footnotesize}

Even with all this I still had trouble getting a sane amount of output onto a slide, and adjusting the width with either knitr option calls or options() seems to avail me nothing. Would be glad for suggested tweaks to this.

Workflow automation

Now. We hardly went through all that effort just to sit around clicking “Knit PDF” in RStudio. Instead, we’ll invoke knitr from a Makefile. This part of the process looks like this:

talk.md: talk.Rmd
	R -e 'library(knitr); knit("$<")'

It would also be possible to specify this workflow differently, using the YAML metadata block in conjunction with the rmarkdown package to control knitr and handle intermediate files: then the Make rule would look like R -e 'rmarkdown::render("$<"). I haven’t adapted my process to do this,, though I may eventually; in any case, there are lots of knobs and dials on the rmarkdown beamer output format.

Once talk.md is in place, we have what we need to create a set of different output formats. Though we could use pandoc to go directly to PDF, I keep the intermediate TeX files (since otherwise debugging LaTeX problems is tricky), and then invoke xelatex on these via the handy latexmk utility. We have the general rule:

pdfs := talk.pdf talk-slides.pdf talk-handout.pdf

$(pdfs): %.pdf: %.tex
	latexmk -$(latex_msg) -xelatex $(basename $<)

latex_msg is a parameter set to quiet earlier in the Makefile. I can’t tell you how happy I was when I learned that latexmk could quiet down LaTeX’s endless messaging.

But now we need the LaTeX. The slides proper are generated as follows:

talk-slides.tex: talk.md preamble-slides.tex
	pandoc $< \
	    -t beamer \
	    --slide-level 1 \
	    -H preamble-slides.tex \
	    --latex-engine xelatex \
	    --filter overlay_filter \
	    --highlight-style zenburn \
	    -o $@

This rule uses the overlay_filter python script (found in my miscellaneous TeX github repo; place it in your PATH) for processing my kludged syntax for beamer overlays in markdown (things like alert{<1>}{...}). preamble-slides.tex includes the preamble lines for setting the color scheme. It also \inputs a shared file of preamble statements, macros.tex. Zenburn syntax highlighting is applied via an option to pandoc.

With talk-slides.pdf in hand, I then use PDF to Keynote to generate a Keynote presentation so I can use Keynote’s better presentation mode. This is the same as the R-free version of my process, though I always feel even a little bit more smug than I usually do when I go from R code to a Keynote presentation with a single make.3

My notes to myself are generated with a different preamble, which leaves things in black-on-white form for printing and sets beamer to display my notes to myself (marked up as \note{<1>}{...} and so on), interleaved with the slides.

talk.tex: talk.md preamble-notes.tex
	pandoc $< \
	    -t beamer \
	    -H preamble-notes.tex \
	    -V fontsize=8pt \
	    --filter overlay_filter \
	    --latex-engine xelatex \
	    -o $@

Notice the use of 8 pt base font size. This produces smallish but readable type, even when you print the slides two to a page, as I do with the following rule:

nup_suffix := 4up
nup_layout := 2x2
# for portrait, set to --no-landscape
nup_landscape := --landscape

talk-$(nup_suffix).pdf: talk.pdf
	pdfjam $< \
	    $(nup_landscape) \
	    --nup $(nup_layout) \
	    --suffix $(nup_suffix)

pdfjam ships with TeXLive. It uses pgfpages (brought to you by your friends at TikZ) to stick multiple logical pages on a physical page.4

Handouts

My students asked me for slide handouts. They found it easier to take notes if they could write on a transcript of the slides instead of trying to take down the slide material and add notes. This requires some pgfpages trickery to stick two slides and two blank pages on each logical page. This is carried out in the preamble-handout.tex file, where I followed a model from Guido Diepen.

Then the Makefile rule is simply

talk-handout.tex: talk.md preamble-handout.tex
	pandoc $< \
	    -t beamer \
	    --slide-level 1 \
	    -H preamble-handout.tex \
	    --latex-engine xelatex \
	    -V handout \
	    --filter overlay_filter \
	    -o $@

Elegant and streamlined

There you go: it’s all on github for you if you’d like to use it, improve it, complain about it, etc. I don’t pretend this is particularly easy to use [Edit, May 18, 2016: but my packaged up version, scuro, is a little easier]; synthesizing text, graphics, and automated calculations into something that is intelligible for people is a difficult task. Doing it programmatically allows you to avoid having to duplicate information by hand (including the information of layout): it helps you to be consistent, and not just aesthetically, but there’s just no getting around the complexity of the task. The “simple” alternative is also a simplification of something that may not be worth simplifying. But no one not everyone is as fussy as I am, or as willing to spend the morning before a talk frantically hunting Stack Overflow for ggplot tricks.


  1. Actually citations are a pain because biblatex and beamer have some quirky interactions wherever a given beamer frame yields multiple slides. You can’t rely on biblatex’s citation tracking to work as expected. ↩︎

  2. Helvetica?? I hear you cry. Yes, well. This setting actually gets overwritten as long as we’re using TikZ graphics. But network visualizations can get pretty intense pretty quickly; my largest hairball made TeX freak out, so I reverted to dev="pdf" and Helvetica type there. ↩︎

  3. My smugness is frequently deflated by the unreliability of iWork. When PDF to Keynote generates a Keynote file, it automatically launches Keynote and tries to open the file. The most recent Keynote (the desktop app, not the evil cloud version) tends to balk at this. Yet if you then re-open the same file (with shell open or Keynote’s Open command or whatever), it works fine. [Edit, May 18, 2016: but lately I’ve given up on Keynote and gone over to Présentation.] ↩︎

  4. The only thing I don’t like about this output is that the white-on-black graphics that I put so much effort into for the slides carry over to these notes pages (and the audience handout). One would have to add an extra switch to alter plot_theme and generate a different markdown file to fix this. Maybe later. ↩︎