Andrew Goldstone

I am an Assistant Professor in the Department of English at Rutgers University, New Brunswick. I study and teach twentieth-century literature in English. My research interests include modernism and non-modernism in English and French, the sociology of literature, literary theory, the history of genre fiction, South Asian literature in English, and the digital humanities, especially computational text analysis. I also have a long-standing interest in digital systems for document preparation and typesetting, especially LaTeX.

My book, Fictions of Autonomy: Modernism from Wilde to de Man (2013), is published by Oxford University Press. For information about the book and ordering links, see my webpage for the book.

At MSA This Week

I’ll be at the Modernist Studies Association in Boston this weekend. I’m looking forward to a lively discussion of modernist disciplinarity in the seminar Jonathan Goodwin and I have organized.

On Sunday 11/22 I’m on a roundtable, “The New Institutionalism in Modernist Studies,” organized and chaired by Robert Higney (CCNY/CUNY), with Merve Emre (American Academy/McGill), Lisi Schoenbach (University of Tennessee), and Lisa Siraganian (SMU).

H1. The number of definitions of “institutionalism” is at least as large as the number of scholars asked to define it.

H2. The amount of monomaniacal insistence on the importance of genre fiction increases linearly (at least) with the amount of time you let Goldstone speak.

It’s at 10:30 a.m. in St. George A. Since we’re a five-cornered roundtable, there should be plenty of time for wider discussion, especially if the chair draws the correct conclusion from H2. Should be fun.

At NYU this Thursday

I’m speaking at NYU this Thursday afternoon, at the invitation of the NYU Digital Experiments group:

Corpus or Field? A Challenge for Quantitative Methods

One of the most promising prospects for quantitative methods in literary studies is that of rigorous and empirically wide-ranging accounts of the relations between literature and society. Yet the boundary between textual interpretation and a sociological analysis of literature has proven surprisingly hard to cross. In this talk, I retrace some sociological traditions of quantitative textual study, from postwar content analyses of political opinion to contemporary field theory, and I argue that they offer literary scholars alternatives to the doxa of “reading” that dominates and limits methodological discussion in our discipline. The sociological traditions turn us from corpus to field, from text collections to social spaces of symbolic competition and collaboration. I will discuss (and exemplify) the many challenges and pitfalls of this shift, technical and conceptual, in my own attempts to quantify the changing status of “reading” in the history of literary scholarship.

If you’re around, the talk is in Bobst Library (2nd fl., Avery Fisher Center, East Room) at 4:30. Come accuse me of naïve positivism, I dare you.

What Might Have Been

Pierre Bourdieu, pioneer of quantitative literary analysis:

Pour vérifier la correspondance entre l’espace des positions et l’espace des prises de position, nous avons recensé 537 textes de 510 auteurs publiés par les éditeurs retenus dans notre étude, qui ont été traduits en français entre juillet 1995 et juillet 1996 et retenu, pour chacun des titres, les variables suivantes : genre (roman, nouvelle, récit, conte), éditeur d’origine et et d’arrivée, langue d’origine (pour l’anglais on a distingué entre «anglais» et «américain»), nom du traducteur, nom et sexe de l’auteur, année de parution de l’édition originale, de la traduction française (1995 ou 1996), jugements de la critique, prix, nombre de pages, nombre total d’auteurs étrangers publiées par l’éditeur concerné, nombre d’auteurs ayant la même langue d’origine nationale. L’immensité des recherches nécessaires pour le mener à bien nous a conduits à abandonner ce projet.

In order to verify the correspondence between the space of positions and the space of position-takings, we took a census [survey?] of 537 texts by 510 authors translated into French between July 1995 and July 1996 which were published by the publishers in our study, considering the following variables: genre (roman, nouvelle, récit, conte [novel, short story, (non-fiction) narrative, (fantastic) tale]), original publisher and French publisher, original language (for English we distinguished between “English” and “American”), name of translator, name and sex of the author, year of publication of the original edition, of the French translation (1995 or 1996), critics’ judgments, prizes, number of pages, total number of foreign authors published by the publisher in question, number of authors having the same original national language. The immensity of the research necessary to carry this project off led us to abandon it.

(“Une révolution conservatrice dans l’édition,” Actes de la recherche en sciences sociales 126, no. 1 [1999]: 3–28; here 18n31.)

Nothing beside remains….

(thanks to @rania_tn on twitter for help translating genre terms; I alone am responsible for errors in translation above)

dfrtopics, hold the dfr

It’s gratifying, and a little frightening, when someone else uses your own code. Jonathan Goodwin has built on my dfrtopics and dfr-browser code-blobs to produce a fascinating visualization of topics in fiction 1920–1922, derived by modeling the genre-specific word frequencies data set from HathiTrust. He’s given a nice description of his process as well. In the process, Jonathan revealed some unnecessarily restrictive assumptions built into my code. He solved the problem by modifying my code: all praise to him! But then I felt bad and wanted to make it possible for others to go further without having to dig into the Area X that is my code. So I made a few adjustments to my new version of dfrtopics. Here are some notes on using the updated version to cope with the issues Jonathan found in processing and modeling the Hathi data.

Topic modeling: a software update

I have spent a lot of time experimenting with and exploring topic models of text. Aside from an article, some blog posts, and a bunch of strongly held opinions, that time also produced quite a few lines of computer code for handling topic models from MALLET. I started out with a big file of R functions, then escalated to a folder full of R functions. The organization got ever more byzantine, even more so when I collaborated with others. Finally I bit the bullet and, following the gospel of Wickham, converted my pile o’ scripts into an R package, called dfrtopics because I was making models of data from JSTOR’s Data for Research. There it has sat, on github, accumulating bits now and then, plus some function documentation written in a fit of compulsion, but really not in a form that anyone but I could use (and, as time went on, becoming hard for me to use too). My website has had a note promising a tutorial demonstration of how to use my package for nigh-on two years, but no demonstration demonstrated itself.

The package was hard to use and document because of the messy and ad hoc way I represented pieces of the topic model. A hierarchical model is not easy to wrap your mind around, and different questions require different slices of the model to answer. And all the mess of code passing around random collections of data frames, lists, and who knows what else seemed like fertile ground for errors and glitches, even when the whole thing seemed more or less to do what I wanted most of the time.

So: in a questionable expenditure of energy, I’ve spent a few days applying some polish to the package. Herewith dfrtopics version 0.2. Install it from github with devtools::install_github("agoldst/dfrtopics").

Three things are new from a potential user’s perspective—and my hope is that the idea of a potential user is slightly less far-fetched than before. First, there is now an introductory tutorial in the form of a package vignette. Second, the whole package has been rewritten around the idea that a topic model is an object, stored in a single R variable. Third, I have tried to make everything as modular as possible, so that the usefulness of the package is not restricted to MALLET models of DfR data. If you have other textual data in wordcount-form—oh, let’s say, 180,000 18th-through-early-20th-century volumes—you might write up some variant file-loading code if necessary, then use the package functions to model those texts with MALLET. And if you have wordcount data that you want to analyze in R in some other way than with MALLET, there are still, I hope, some useful things here for converting those wordcounts into data frames or term-document matrices.