Producing Digital Documents

I have strong feelings about typography. I abominate and despise Microsoft Word, and I long ago developed a preference for systems that separate the task of composing text from the task of typesetting the page. I prepare my text using one program, then use others to produce the final document from that text. Such systems make better use of the capacities of the computer for managing all the information that goes into complex documents. The particular setup I use has changed somewhat over time, though the foundation has always been the TeX typesetting system. Somewhat more labor-intensive than ordinary word-processing—sometimes much more labor-intensive—using a system of this sort yields aesthetically superior results and a righteous feeling of self-satisfaction.

Here are some notes on what I’ve been using lately. It’s not a comprehensive guide. An eminently reasonable (and unself-satisfied) essay in that vein is the sociologist Kieran Healy’s Plain Person’s Guide to Plain Text Social Science, which I recommend.

Text editing

When I write, I write markdown, which is ordinary text plus a few special signs here and there. The extra signs replace the formatting commands one would use in Word. Originally conceived as a shorthand way of producing HTML for web pages, markdown can also be used to generate well-typeset PDFs thanks to pandoc. Pandoc can, among many other things, automatically convert markdown to PDFs; first it converts markdown to TeX, then it runs the TeX typesetting program to make a PDF. I have a few notes on markdown on this site. The pandoc documention explains the ins and outs in great detail.

I edit text in MacVim, a Mac version of the venerable programmer’s text editor vim. That is to say that all the writing I do—of e-mails, markdown, LaTeX, HTML, R code for data analysis, R markdown for combined text and analysis—I do in MacVim. Others use the formidable emacs as an editor. Both vim and emacs have excellent, baggy-monster-sized documentation and are potentially (dangerously) infinitely customizable. There are good free text editors that are more straightforward to use than either: BBEdit (free version) on Mac and Notepad++ on Windows.

Working in plain text means I can use sophisticated version control software to keep track of my revisions. I use git to track the changes in all my writing, and I have even used it in a collaborative writing project. There is an excellent free book on git, Scott Chacon’s Pro Git.

Typesetting

As a book person and general fussbudget, I care about typographic niceties like hyphenation, justification, footnote placement, and pagination. TeX—created by the son of a printer—is far better at this than Word. If you wish to convince yourself such things matter, you can do worse than Matthew Butterick’s Typography in Ten Minutes. At a more advanced level of finickiness, I enjoy Robert Bringhurst’s Elements of Typographic Style.

Once you have become sufficiently finicky, you will want to know more about TeX in order to take control of the typesetting process more fully. There are many resources on the website of the TeX Users Group, as well as links to download the software itself; Mac users should use the MacTeX distribution. Tobias Oetiker’s Not-So-Short Introduction to LaTeX2e is a good starting point for learning TeX. (LaTeX is the most widely used variety of TeX and the one ordinarily used by pandoc in making PDFs from markdown.) The Tex Stack Exchange is the first place to search for answers to specific questions—and to pose them to an occasionally testy though usually helpful community. The venerable TeX FAQ is good too.

Bibliography

I use LaTeX’s extraordinarily sophisticated citation-generation system, biblatex. The biblatex-chicago package is a beautifully executed implementation of the Chicago style; I used it to generate the citations in my book and here, however many years on, it is still being kept up to date with new editions of the Chicago Manual. The biblatex bibliographic database format is also plain text, but here I do sometimes use a graphical program, the excellent BibDesk, to manage it. For gathering citations from the web I use Zotero. There are some extensions to Zotero which help use Zotero and biblatex together: zot2bib or the much more elaborate Better BibTeX. (I also swear by ZotFile for corraling PDFs of articles.)

Pandoc also supports its own citation-generation system, which uses the Citation Style Language. Unfortunately this system is not really flexible enough for citation in literary studies or history except in the simplest cases.

biblatex and biblatex-chicago both come with very extensive documentation, but one can get started simply by using the BibDesk graphical interface to enter information and taking advantage of Pandoc’s ability to generate biblatex citation commands (command-line option --biblatex).

Computer code and outputs

Once I started making tables and graphs in my research, I needed a way to incorporate them into my writing. I use R markdown for this task. The supporting R software packages take a markdown document with R code mixed in and produce a new document that includes the results of that code (numbers, tables, charts). Thus it is not necessary to copy-and-paste tables and graphs into a document, and it is much easier to ensure that the document is up-to-date with the latest version of your analysis.

I use RStudio for interacting with R, though I write most of my code in vim.

Slides, too

Markdown lends itself very well to composing two other kinds of documents: slides and text-centric webpages. I use pandoc to translate markdown into LaTeX for slides using the beamer package. The output is again in PDF, which is a flexible, reliable option (avoiding the vicissitudes of either PowerPoint or the many HTML-based slide formats now in circulation). To display slides for presentation, I use Présentation, which has excellent features for presenting from PDF files of slides. I still have no idea how to use PowerPoint, and I hope I will never learn.

Managing the appearance of slides produced via the markdown-beamer combination is tricky. I’ve collected everything needed for my current approach in a repository on github. With R markdown in the mix, things get trickier still; I have tried to simplify some of the process with a small R package that supplies an R markdown template (“Dark on Light Beamer Slides”).

Webpages

Markdown’s original purpose was to be a shorthand for HTML. It is still a good way to compose text-y webpages. Unfortunately, like everything to do with the Web, the landscape of available tools for turning markdown into websites is chaotic and eternally in flux. This website is generated from markdown source using Hugo, which has been around for a few years without turning into vapor.

Automation

All these interacting programs eventually come to need some management themselves. And with data analysis, it is usually best to automate the incorporation of computations into the final document. These programs are by and large Unix command-line programs, and some facility with the Unix shell is necessary to benefit from their capacities, and especially to manage the files they generate. I use zsh, not very expertly. For installing command-line programs, Mac users have a truly outstanding package manager in homebrew. Finally, I am an inveterate user and abuser of GNU Make for automation.

Of many possible tutorials and starting points on the Unix shell, here is one for historians by Ian Milligan and James Baker: Introduction to the Bash Command Line. Mike Bostock, the author of the ubiquitous web visualization library D3.js, has written a very nice introductory discussion of Make.

Last updated: September 28, 2021.