Producing Digital Documents

I have used LaTeX for many years to typeset all of my writing. It can be used to produce aesthetically superior documents, and, in conjunction with other tools, it can do a great deal to help manage all the information of scholarship. Microsoft Word still does not do much to help with the tasks of scholarly writing, and it is all too good at turning out ugly pages of type. It’s possible to do much better. What is required is to use a system that separates the task of composing text from the task of typesetting the (digital) printed page.

That is just what LaTeX is, and for a long time I did most of my writing directly in LaTeX. My book was produced that way. Over the last couple of years I have switched to composing largely in markdown, which is similar in inspiration, but more economical to type and also useful for writing for the web (I use it on this site, for example). This markdown can be automatically converted to LaTeX before being typeset into a “printed” PDF. I still make heavy use of LaTeX to adjust the appearance of what I produce.

Here are the pieces of the setup I use, with some notes on resources for learning to use them.

Text editing

To compose markdown or LaTeX requires an editor for plain text—also known as a programmer’s text editor. These editors do not show you the printed page as it will finally appear, but offer many useful features for composing and transforming text. Writing plain text has a further advantage: it allows you to take advantage of version control. I use git to track the changes in all my writing, and I have even used it in a collaborative writing project.

I also enjoy having all the features of a good programmer’s text editor. I use MacVim, which I find a more fluent way to edit text than ordinary word processing. Others use the formidable emacs. There are good free text editors that are more straightforward to use than either of these: TextWrangler on Mac and Notepad++ on Windows. Both vim and emacs have excellent documentation but take practice.

As for git, there is a reasonable online introductory tutorial try-git and an excellent free book, Scott Chacon’s Pro Git.

Typesetting

For me, the original reason to move to LaTeX was typography. I wanted to stop making ugly texts. The computer should allow an ordinary writer to produce a polished typeset page, but Word makes this extremely difficult to achieve. By contrast, LaTeX uses plain text source to create a document with many typographic niceties, including good hyphenation, justification, footnote placement, and pagination. Once you’ve tried it, you’ll never go back to Word.

For a starter lesson on becoming a typographical fussbudget, you can do worse than Matthew Butterick’s Typography in Ten Minutes. At a more advanced level of finickiness, I enjoy Robert Bringhurst’s Elements of Typographic Style.

In order to produce the finished document, the markdown/LaTeX source must be processed into a PDF file. I use xelatex, which is included as part of LaTeX, to typeset LaTeX into PDF; xelatex is a variant of TeX with superior capacities for multilingual text in Unicode and for good-quality fonts via OpenType. I use pandoc to convert markdown to LaTeX before typesetting to PDF with xelatex. Actually, pandoc can automate this two-step process, hiding the intermediate LaTeX step.

Pandoc’s powers make it seem as though writing purely in markdown should be possible, avoiding LaTeX’s fiddliness. You can get quite far this way, but in order to take control of the typesetting process, I end up using LaTeX a lot, both within my markdown documents and in pandoc templates.

Markdown is straightforward to learn. I provide some notes on my markdown page. The pandoc documention explains the ins and outs thoroughly. LaTeX can be much more complex, though I do not think it is difficult once you adjust to the mindset of marking up text for typesetting. There are many resources on the website of the TeX Users Group, as well as links to download the software itself; Mac users should use the MacTeX distribution. Oetiker’s Not-So-Short Introduction to LaTeX2e is a good starting point for learning TeX. The Tex Stack Exchange is now the first place to search for answers to specific questions—and to pose them to an occasionally testy though usually helpful community. The venerable TeX FAQ is good too.

In 2013 I conducted an introductory workshop about markdown, LaTeX, and HTML for the Rutgers Digital Humanities Initiative: these are my notes from the session.

Bibliography

LaTeX has an extraordinarily sophisticated citation-generation system, biblatex. I store all my citations in a database, which biblatex then uses to generate citations in my documents. In particular, biblatex-chicago is a beautifully executed implementation of the Chicago style; I used it to generate the citations in my book. The biblatex bibliographic database format is also plain text, but here I do use a graphical program, the excellent BibDesk, to manage it. For gathering citations from the web I use Zotero, which can be more or less integrated with bibtex using zot2bib and Better BibTeX. I say “more or less,” because some hand-editing of material exported from Zotero to BibDesk is usually required.

Pandoc also supports its own citation-generation system which uses the Citation Style Language. Unfortunately this system is not flexible enough for citation in literary studies or history. I write biblatex commands directly into my markdown.

biblatex and biblatex-chicago both come with very extensive documentation, but one can get started simply by using the BibDesk graphical interface to enter information and taking advantage of Pandoc’s ability to generate biblatex citation commands (command-line option --biblatex).

Computer code and outputs

Once I started making tables and graphs in my research, I needed a way to incorporate them into my writing. I use R markdown for this task. The remarkable knitr package for R takes a markdown document with R code mixed in and produces a new document that includes the results of that code. Thus it is not necessary to copy-and-paste tables and graphs into a document, and it is much easier to ensure that the document is up-to-date with the latest version of your analysis.

I use RStudio for interacting with R, though I write most of my code in vim.

Slides, too

Markdown lends itself very well to composing two other kinds of documents: slides and text-centric webpages. I use pandoc to translate markdown into LaTeX for slides using the beamer package. The output is again in PDF, which is a flexible, reliable option (avoiding the vicissitudes of either PowerPoint or the many HTML-based slide formats now in circulation). I used to use PDF to Keynote so that I could use Keynote as my presentation software, but I have recently replaced Keynote—which Apple seems determined to downgrade further with each new version—with PrĂ©sentation, which has all the features I need for presenting from PDF.

Managing the appearance of slides produced via the markdown-beamer combination is a bit tricky. I’ve collected everything needed for my current approach in a repository on github. With R markdown in the mix, things get trickier still; I have tried to simplify some of the process with a small R package that supplies an R markdown template (“Dark on Light Beamer Slides”).

Automation

All these interacting programs eventually come to need some management themselves. And with data analysis, it is usually best to automate the incorporation of computations into the final document. These programs are by and large Unix command-line programs, and some facility with the shell is necessary to benefit from their capacities, and especially to manage the files they generate. I use zsh, not very expertly. For installing command-line programs, Mac users have a truly outstanding package manager in homebrew. Finally, I am an inveterate user and abuser of GNU Make for automation.

Of many possible tutorials and starting points on the Unix shell, here is one for historians by Ian Milligan and James Baker: Introduction to the Bash Command Line. Mike Bostock, the author of the ubiquitous web visualization library D3.js, has written a very nice introductory discussion of Make.

Further Adventures

I have written quite a few blog posts over the years about the ins and outs of using TeX and markdown for my scholarly writing. Some of these used to be on a separate wordpress blog, but I have moved all of those, together with new material, to this site. See all the blog posts filed under TeX.

I accumulate sample TeX, markdown, and R markdown files for others to use and modify on github.

The sociologist Kieran Healy has a rich collection of Resources related to plain-text typesetting, writing with code and data, and so on.