Workshop notes: Digital text

DH, events

Over on dh.rutgers.edu, I’ve posted extensive notes from last week’s Empowerment Part II workshop on plain text formats like markdown, HTML, and LaTeX. I’ve also made available the set of sample files I used in the workshop, plus a few more. Those might be useful for others who might wish to learn any of those three languages or see what files converted from one to the other look like. I’ve put some time into making the workshop notes into something someone could work through on their own. For more about the files, see the workshop notes. For anyone interested in LaTeX, the template.tex file contained in that set of sample files supplies a starter template for a LaTeX document (it’s derived from the templates in my git repository of tex files). As a bonus I have also made available the markdown source used to generate both that workshop-notes webpage and the slideshow for the in-person version.1

You might reasonably ask: Why spend any time on this? Computer typesetting and the ins and outs of plain-text composition gadgets are very far from the center of the digital humanities. HTML is an important computer language in DH, but LaTeX is no such thing, and markdown is pretty rudimentary-seeming. Some thoughts on this over the fold…

One reason is frankly aestheticist: master these technologies, and you will be able to present your text more beautifully on the screen or the page. As the complement to this aestheticism, I think the alienating experience of composing in plain text or markup has a useful estrangement effect, and can change your relation to your own writing in interesting ways.

Another reason is pragmatic: composing on the web is of course crucial for digital scholarship. Knowing some markup is important for anyone who participates in composing on the web.2 And markdown is currently having a bit of a moment—partly because it is genuinely handy. You can now use it as an input format on wordpress.com; it is the format of choice for github project documentation and Github Pages; and, in a sign of the times, R markdown is strongly promoted by the RStudio tool. As for LaTeX, its practical benefits as a composition medium for humanists lie above all in its multilingual capacities, its unequalled citation-generation packages, and its amenability to version control.

But the most important rationale is different. Every humanist writes extensively, and most humanists both read and write extensively in the digital medium. My tutorial skims the surface of three computer languages in order to emphasize fundamental concepts about digital text:

  1. Digital text always involves multiple layers of convention, from character encoding schemes to structural markup.
  2. All formats can be subject to automatic processing and conversion into other formats. One source format can yield multiple, quite distinct destination formats.
  3. Plain-text formats are readable in themselves.

This holds true not just for markup but for programming languages too: they too are conventions for expression in plain text, which, if followed, also make your text amenable to automatic processing.3 Thus, even making a start on languages like markdown and LaTeX gives you a different sense of what the machine can do and how you interact with it. The “empowerment” you achieve in becoming proficient in a family of plain text formats, or moving some of your composition outside of Word, is naturally quite modest. Still, this proficiency enriches your digital literacy. It gives you a wider range of expressive choices in the digital medium; it allows you to think about multiple ways to circulate digital texts; and it demystifies some of the operations of the computer by exposing some of the layers of mediation in computer texts.


  1. The source is a github gist. Annoyingly, github automatically renders gists in markdown, so there’s no way to use a gist to easily show markdown as source code. You have to click through to the “raw” file (or link directly to the raw file at a specified commit).
  2. The importance of TEI-XML to text digitization work means that HTML-like markup languages have an additional significance for DH.
  3. What I like in the idea of literate programming is its emphasis on expression and communication: “The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer” (Knuth).