or, The yack of the hack of the yack
How we should go about mining the digital archive of the history of scholarship for theoretical resources? Let’s talk about text-mining journals, quantitatively analyzing metadata about scholarship, and living with closed access as theorists. And perhaps we can work on a dataset or two—I’ll bring some example data and laughably primitive visualizations!
The explanation
One of theory’s major tasks is to describe how scholarship is done—and then to prescribe how it should be done. Often the description leads to the prescription: theory as scholarship about scholarship. Well, yes. It is characteristic of a whole family of genres that belong to theory, from De la grammatologie to Orientalism to Ahmad’s In Theory, Laclau and Mouffe’s Hegemony and Socialist Strategy to Sheldon Pollock theorizing a "Political Philology" in a memorial essay about the scholarship of D.D. Kosambi.
Meanwhile, over in digital-land, one of the richest digital archives we have is the archive of scholarship itself. But we are used to using these archives for search, not as objects of analysis in themselves. That is what I’d like to explore in this session. What does the MLA Bibliography tell us—in the aggregate? What theoretical possibilities can we open up by mining the extraordinary archive represented by JSTOR’s Data for Research service?
I’d be able to talk about two examples of datasets I’ve done a little work on—one from the MLA Bibliography and one from JSTOR’s archive of PMLA. Please feel free to bring your own datasets, or leads, or inspirations, or problems, or concerns.
Over on my Rutgers website I’ve placed a longer version of this proposal with a teaser on those example datasets. And a link to Andy Abbott’s hilarious hit piece on DH and keyword search, via an analysis of concordances.
5 comments
Skip to comment form ↓
Patrick
October 12, 2012 at 9:18 pm (UTC 0) Link to this comment
Ooh! I like this! Another possible dataset might be the TEI from MLA’s Variorum editions. Looks like they provide the TEI, and bibliographies are (kinda) easily extracted to manipulate. (Not sure if that’s just a subset of what you already have).
I’m extra curious about how you have manipulated/exposed the datasets you have!
Andrew Goldstone
October 12, 2012 at 9:41 pm (UTC 0) Link to this comment
Also: quantified evaluation of scholarly “impact”–an already existing “theory”?
Andrew Goldstone
October 12, 2012 at 9:47 pm (UTC 0) Link to this comment
Awesome idea about the MLA Variorum data–the bibliography for the Comedy of Errors is in its own xml file in the “NVS Challenge” repository–let’s play with this!
Andrew Goldstone
October 12, 2012 at 9:47 pm (UTC 0) Link to this comment
re exposing dataset–sigh. cf. remark about “closed access.”
Patrick
October 13, 2012 at 12:55 am (UTC 0) Link to this comment
Don’t want to detract attention from the core idea that we don’t treat archives as objects of study themselves, but I’ve done some example playing with NVS Challenge data at BillCritOMatic if interested