New features for dfr-browser/dfrtopics

kludgetastic, dh

For a troublingly long while I have been almost finished with an update to my software for exploring topic models, which consists of an R package, dfrtopics, and a JavaScript thingy, dfr-browser. This update is focused on making it easier to look at a single topic model more than one way or to toggle between topic models of identical or related corpora. Experience suggests that producing and interpreting even one useful topic model of a corpus is so much work that many people, myself included, are tempted to stop short of comparing the outputs of multiple models, even though this is an important way of checking that patterns revealed by any model are not purely artifacts.

The big software update is in dfr-browser, and the documentation on the github project homepage has a new section on setting up a browser visualization of multiple models. But this configuration has enough fiddly bits to it that anyone with the rudiments of R may prefer trying it all out through dfrtopics. The key function is dfr_browser(), which can be invoked on a list of models and/or with more than one metadata variable as the “conditioning” variable (e.g. date of publication, publication venue, etc.). The process for creating and/or loading a model is unchanged and is detailed in the vignette.

Suppose one has several model objects, m1, m2, m3. Then

dfr_browser(list(m1, m2, m3))

will generate and open up a dfr-browser with a menu for swapping among the models.1 In itself this is probably not much more useful than simply creating three separate dfr-browsers and looking at them in three browser windows, though it can be helpful to look at a particular document or word “view” (i.e. topic breakdown) in the browser and toggle among models.

Or suppose one wishes to be able to look at topics conditional on both publication year and journal of publication:

dfr_browser(m1, condition=c("pubdate", "journaltitle"))

Here the the menu will offer the option of swapping between the two covariates (again this is most revealing when examining a particular topic).

Finally, dfrtopics has a rudimentary way of “aligning” models (by matching topics whose heavily-weighted words are similar). Aligned models are particularly informative to look at together, since the visualizer can give some clues as to how “similar” apparently similar topics are. The process is as follows:

list(m1, m2, m3) %>%
    model_distances(n_words=40) %>% # try other values
    align_topics() %>%

The “About” page of the resulting visualization includes a table showing how topics have been aligned.

Six years on from the start of this project, the specialization to JSTOR articles looks a little silly. This all started with a specific analysis of a specific JSTOR corpus, but over time the principle of It Must Be Abstract took over. I myself have been most gratified when people have bravely tried to use the thing with models of other kinds of documents. The JSTOR-specialization is baked in here and there in this software (and it’s stuck in the “dfr” in the names), but only lightly; I have tried to make it usable for general purposes but left all the default behaviors to work easily with JSTOR DfR data. In a few places these defaults will trip you up if you try to use it with different kinds of metadata, and I’m afraid the documentation is imperfect, but I’ve given some attention to this in the updates to dfr-browser and dfrtopics, and hope to do more at some point.

I also plan to write up a tutorial-style demonstration of exploring multiple models soon (famous last words). But to prevent me from sitting on this “almost finished” stuff forever when it is possibly already usable, I’ve pushed the current versions of dfrtopics and dfr-browser to github. To try something like the above R examples, install the R package with devtools::install_github("agoldst/dfrtopics"). I’m certainly glad to hear from anyone who does, though at this late stage of human civilization I can’t promise any support or even bug fixes.

  1. “Opening up” now requires installing the servr package. dfr_browser can also be directed the export the necessary files to any location (dfr_browser(list(m1, m2), out_dir="mybrowser", browse=F)) and you can run a web server from there yourself.