I am an Assistant Professor in the Department of English at Rutgers University, New Brunswick. I study and teach twentieth-century literature in English. My research interests include modernism, the sociology of literature, genre fiction, South Asian literature in English, and the digital humanities. My book, Fictions of Autonomy: Modernism from Wilde to de Man (2013), is published by Oxford University Press.
In my line of work—and in my social-media universe—I seem to see a lot of ranking of categories: most frequent words in a text corpus, most frequent genres in a bibliographic listing, most frequent nationalities in a group of authors… I’ve often wondered how much uncertainty lies behind rankings but had no idea how to quantify this uncertainty. I caught a link (via @coulmont) to a short essay on the subject: Michael Höhle, “Rank Uncertainty: Why the ‘Most Popular’ Baby Names Might Not Be the Most Popular” (available in OA preprint), the basics of which I was able to follow, thanks to its clear exposition and accompanying R code. Höhle explains that even when one has a full census of the names of all the babies born in a year, there is still uncertainty associated with ranking the names: the true propensity in the population to give a particular name is not fully revealed by the actual total. Höhle suggests we can begin by modeling the uncertainty arising from the fact that (1) even if the birth rate is fixed, there are “chance” variations in the number of births and (2) even if the probabilities of names are fixed, each draw from the bag of names has some randomness (little Jacob might well have been William). The rank uncertainty can then be found by simulating a year’s worth of names many times and finding popularity ranks in each simulation.
Descriptions of my Fall 2017 courses are now up on the teaching page. I am teaching a new graduate course on detective fiction as a case study in twentieth-century genre, and a new version of my undergraduate early twentieth-century fiction course. The syllabuses are still in the works, but in the meantime I’m always glad to hear from interested students.
I was invited by the new journal Cultural Analytics to respond to an essay by Sarah Allison, “Other People’s Data: Humanities Edition.” Sarah’s essay makes an important argument for what she calls “data recycling,” and she touches on my collaboration with Ted Underwood on the way. My comment expands on the conditions for effective data reuse. I will be happy if the conversation does not get bogged down in the technicalities of “reproducibility,” and we instead insist on asking how we carry on cumulative research programs. My short essay is open access: From Reproducible to Productive.
Two years ago, I designed and taught a graduate seminar on approaches to Literary Data. I was invited to contribute an essay on the course to Debates in the Digital Humanities. The essay was some time in the writing, and it will be some time longer in the publishing: the next DDH volume is now due to appear at the start of 2018. But by kind permission of the editors, I am able to share a preprint of my essay: Teaching Quantitative Methods: What Makes It Hard (in Literary Studies). If you quote it, please cite its forthcoming publication.
The essay explains the rationale of the course, which combined a practicum in computing with literary data using the R language with theories of literary data from structuralism to the present. My own evaluation of the course is quite mixed, and I offer my materials and my experience not as a model but as evidence for an argument about the conditions of possibility for a successful quantitative methods pedagogy in literary studies. Pedagogy, in this case, also raises serious questions for research; and I also hint at what I take to be the conditions for fruitful quantitative methodology tout court.
I couldn’t have wished for better students—that condition of possibility is indeed already realized. The major lessons I draw are (this is from the essay):
Cultivating technical facility with computer tools—including programming languages—should receive less attention than methodologies for analyzing quantitative or aggregative evidence. Despite the widespread DH interest in the former, it has little scholarly use without the latter.
Studying method requires pedagogically suitable material for study, but good teaching datasets do not exist. It will require communal effort to create them on the basis of existing research.
Following the “theory” model, DH has typically been inserted into curricula as a single-semester course. Yet as a training in method, the analysis of aggregate data will undoubtedly require more time, and a different rationale, than that offered by what Gerald Graff calls “the field-coverage principle” in the curriculum.
Some more remarks on the essay and the course follow after the jump.
Descriptions of my Spring 2017 courses are now up on the teaching page. I am offering a new course geared to non-majors, Introduction to Twentieth-Century Literature, which will hone in on a few chosen works of fiction, poetry, and drama from across the century and across the English-speaking world. I am also teaching Principles of Literary Study: Fiction, as I have in past years; for the first time I am teaching an Honors section. When I was a first-year in college I took a Shakespeare course which had honors sections for English majors; as a non-major I could only look on in envy at all the cool extra stuff they got to do. Well, my 359:202:H1 is open to majors and non-majors, and I will be aiming to provide cool extra stuff for all. I will add links to reading lists and syllabuses as soon as they are ready. I’m always happy to hear from students who are interested in either course.