Markdown and biblatex to HTML: some refinements

TeX, TeX4ht, kludgetastic

And if that title doesn’t grab you, I don’t know what will.

After some hacking I improved my earlier tangled pandoc-tex4ht-rinse-repeat process for converting a mostly-markdown-and-some-LaTeX syllabus with biblatex citations into HTML by taking advantage of pandoc’s scriptability. This required some minor Haskell flailing, which is fun in its way.1

Anyway, I had struggled with tex4ht’s eccentric conversion of a biblatex bibliography into a definition list with empty <dt> tags. The following Haskell script (which I’ve uploaded as a github gist) deals with this issue:

I compiled this as a standalone program html_clean, so that I can run:

pdflatex syllabus-web.tex
biber syllabus-web
pdflatex syllabus-web.tex
htlatex syllabus-web.tex syllabus.cfg " -cunihtf -utf8" "-cvalidate"
html_clean < syllabus-web.html > syllabus.html

The details of what syllabus-web.tex consists of are in the earlier post. It’s still pretty kludgy—I didn’t figure out how to stop tex4ht from garbling \begin{enumerate}[1.] and so am still stripping that out with sed in order to produces source files to be included in syllabus-web.tex. But maybe the haskell code will be a useful starting point for others working with pandoc on similar tasks. Since I now actually can automaticaly co-generate a syllabus PDF and a website from the same source, I’m content for the moment, until I need “fun” again.


  1. To be precise, whereas programming normally feels like playing with Legos, programming in Haskell feels more like trying to do a math problem set, with ghc in the role of problem-set grader. So: “fun” for certain values of “fun.” Note that MacFarlane’s pandoc scripting documentation includes—I am not joking—exercises.