The Digital Humanities and Word Clouds

Ever since I joined the Initiative on Neuroscience and Law, I’ve had a growing interest in big data analysis. With so much information being digitized — whether it’s criminal records, government documents, or historical archives — researchers can engage with old resources in new ways and ask questions on scales previously unimaginable. Though I’m not too vocal about it here (yet), right now I’m working to apply what I’ve learned at the Initiative to the Library of Congress’ “Chronicling America” archives. This crossing of fields, for those who are curious, is called the “Digital Humanities.” (If you’d like to know more, I suggest checking out the historian Dan Cohen’s blog. Fred Gibbs also has a helpful introduction to historical data analysis here).

I won’t reveal any of my graphics here (I’m saving them for a future post), but here’s an example of the Digital Humanities that everyone’s familiar with: Word clouds. Technically, these were possible before the digitization of famous works, but it’s the kind of work that required ~~slave labor~~ teaching assistants. The following I put together in a few minutes using Project Gutenberg and Wordle.

This is Sinclair Lewis’ Main Street (1920):

Lewis’ Babbitt (1920):
Thomas Paine‘s entire collected writings:
My personal diaries (May 2008 to May 2012):

Now, even though all of this big data talk is just an excuse for me to post word clouds, I see in each of these one thing: Opportunity. Imagine doing this same work with thousands of books and newspapers. Imagine tracking keywords across time to measure the political trends of in a community (or state or country). We can! We are!

Every researcher ought to be salivating.