LaTeX and My Book: Postscript

TeX, Fictions of Autonomy

So now that my book is in print (read it! buying links on the book page!), I wanted to indulge an urge to look back at the technical, or rather TeXnical, aspects of producing the book. I had the quixotic determination to write the book in LaTeX and to take advantage of what LaTeX can do right through production, and, through the hard work of various parties at Oxford U.P., I had the chance to do so.

This is going to be a nerdy post. You’ve been warned. I’ve also added an equally nerdy e-colophon page describing the technology I used to make the book.

It might seem that nothing could be simpler than producing a book from LaTeX:

> vim book.tex
> git commit -a -m "final revisions from proofreading"
> xelatex book.tex
> send_to_offset_printer book.pdf

Actually, it was a bit more complicated than that. In reality, as book historians and sociologists of publishing teach us, the process that takes a book from manuscript to circulating printed object involves a chain of human and technical mediators. Each link in the chain, nowadays, uses a digital object, but there is nothing like a common format that everyone involved in writing, editing, and producing the book can use. (It probably should be XML, and next time I am going to ask for a DTD from a publisher. And maybe by that time the LaTeX3 project, which promises all kinds of happy XML-LaTeX-PDF coexistence, will have come to fuller fruition.) Anyway, producing my book required moving back and forth among LaTeX source, PDF, and Word. The back-and-forth aspect required considerable human work, but I still managed to do a lot of things automatically.

The workhorse of my file converting was tex4ht. tex4ht offered two things I really needed: support for biblatex, and output to Open Document Format. I am very proud to report that every citation in my book was generated by the biblatex-chicago package from a big .bib database I compiled. biblatex and biblatex-chicago do a brilliant job of handling complex citations, and they provide both a rich data model for the bibliography and an equally rich suite of citation commands. (And they both have very thorough documentation. It is hard to overstate how much the completion of my TeXnical work has depended on the culture of Literate Programming in LaTeX.) This helped me correct bibliographic errors across the whole manuscript when I found them, and it helped me keep my “ibid”s in their place. I didn’t dispense with the Chicago Manual, but I did spend much less time fiddling with footnote formatting in the book than I had in the dissertation (where I wrote all the footnotes by hand). Take that, Chicago.

The tex4ht maintainers (principally CV Radhakrishnan and Karl Berry, I believe) have done an amazing job of tracking biblatex development, which, for reasons discussed here, is what makes conversion from latex that uses biblatex possible. tex4ht outputs to .odt, which I then converted to .docx in NeoOffice.

For a complex manuscript, the tex4ht conversion requires some fine-tuning. First of all, though I normally use xelatex to process my LaTeX source, tex4ht’s xelatex support is fragile, and I didn’t use it. So the first task was to produce a parallel “plain” version of my book.tex for ordinary pdflatex. Fortunately, my actual document bodies didn’t use any xelatex-specific commands; I’d tried pretty hard to write “semantic” LaTeX and to flag any layout tweaks as such using comments. Thus I just needed a minimalist preamble to wrap around the actual chapter document bodies. Mine looked pretty much like this:

\documentclass[12pt]{book}

% We still need UTF-8 encoding support...

\usepackage[english]{babel} 
\usepackage[utf8]{inputenc} 
\usepackage{csquotes}
\usepackage[backend=biber]{biblatex-chicago}
\bibliography{master.bib}

% These are my own macros and utility commands. Nothing fancy, just
% substitution macros for a few book titles.

\input{macros.tex}

\setcounter{secnumdepth}{0} 
\setcounter{tocdepth}{1}

\pagestyle{empty}

\begin{document}

% ...

\input{chapter1.tex} 
\input{chapter2.tex} 
% ...

\end{document}

Then you can run

> htlatex book-plain.tex "xhtml,ooffice" "ooffice/! -cmozhtf" "-coo -cvalidate"

The resulting book-plain.odt file was pretty good but not perfect. Some hand-correcting was necessary. Fortunately, you can unzip the file (it is really a zip archive) and edit the generated Open Document XML directly or postprocess it with a Perl script. (I had some trouble with verse environments.) It is also not too hard to adjust tex4ht’s OpenOffice output; if you have an ooffice.4ht file in the directory where you run htlatex, then tex4ht will use that in place of /usr/local/texlive/2012/texmf-dist/tex/generic/tex4ht/ooffice.4ht. (Your tex/generic/tex4ht/ directory may well live in a slightly different place, depending on your distro.) ooffice.4ht hard-codes some…interesting…font and page layout choices, which you can change once and for all. But some changes by hand in NeoOffice are probably inevitable for a big document. tex4ht is consistent about converting LaTeX environments into ODT styles, so you can also do a lot by modifying the document style sheet in NeoOffice. It’s helpful, if you hit problems, to try out the development version of tex4ht, obtainable via subversion repository.

I never attempted the reverse conversion from an edited Word file to patches on my LaTeX source. These were simply put in manually. It’s not hard to imagine an intermediate markup form (XML again) that would have made sharing changes between me and the editors automatable, but that was beyond my capacity to implement on my own. In utopia, editors will have the capacity to send authors changes in a form suitable for automatic processing.

Anyway, eventually I needed a mechanism for keeping track of differences between the “development” version of the MS and the versions being shared with editorial or production people. Actually I found svn branching too cumbersome and did my “releasing” with (1) a shell script that produced a tarball of all the files needed to typeset my LaTeX on a fresh TeX Live installation and (2) a Makefile target for making the ODT version and tweaking it with scripts. Now that I’ve switched from svn to git branching and merging are more straightforward, and I do the “development/release” switch on my new writing using git.

What was it all for?

Many of the benefits of using TeX for the book came during the writing process. I found writing writing LaTeX source in vim stimulating; on the other hand, I sometimes found it an all-too-entertaining source of procrastination (debugging minor glitches, customizing the editor, looking up TeX features). But all in all I think it was an intellectually useful process. It gave me a certain distance from text I’d written, allowing me to mess around with it. I got more disciplined about keeping my bibliographic notes together. And I derived a great deal of use from having a (more or less) proof-quality PDF to read myself and to share with other readers from the beginning. In order to make the book a book, it helped me to be able to see my writing in book-like visual form. And perhaps it helped other readers—readers with a say in the matter—to see it as a book too.