In the comments section below are my notes on a session titled “SEASR Data Analytics” at THATCamp 2009.
The Software Environment for the Advancement of Scholarly Research (SEASR.org), funded by the Andrew W. Mellon Foundation, provides a research and development environment capable of powering leading-edge digital humanities initiatives.
Find more information about THATCamp:
- See the full schedule.
- Contribute to the wiki.
- Follow the Twitter conversation
- Check out the Flickr stream of tagged photos.
- Browser the relevant bookmarks on Delicious.
Multiple tools for analysis, including visualizations like word clouds.
The data source is up to you.
Everything runs on the web, so you can log in from anywhere online.
The SEASR server is available for download, if you like, so you can run it on your own server.
Point it to a directory of documents and it will analysis all of them.
Currently, much of the data is being pulled from http://MONKproject.org (which is much better edited than what you’ll find on http://Gutenberg.org) and http://JSTOR.org is also on the horizon.
The API for a text collection is important so that the collection is open to analytical tools like SEARS.
There’s a http://Zotero.org plugin: http://seasr.org/documentation/zotero/
From Zotero, choose an item or a collection of items and send it for analysis.
Extract dates from a sentence and then render them in a Simile timeline (http://www.simile-widgets.org/timeline/). While the algorithm understands something as specific as “the summer of 1850” but not “yesterday,” “today,” or “tomorrow” because the relative date to which those terms refer could be unclear. However, the document XML can be tweaked by a user to make those terms correspond to more specific times.
So far, no formal evaluation of the tools to see how teachers and scholars are using (or might use) these tools.
Audience says that they’d gladly provide some testing of these tools.
SEASR has, however, led workshops where participants have gone away with the software installed on their own machines so that they can run their own analyses and tweak how it runs.
Sounds like Tanya Clement made use of these tools for her work with The Making of Americans.
SEASR can provide 5 different “flows” of data. Flows are “data management, analysis and visualization tasks.”
View examples at http://seasr.org/documentation/example-flows/
Potential issue with users downloading their own copy of SEASR and tweaking for their own purposes: how can others replicate the findings of Researcher A if Researcher A’s version of a particular flow is idiosyncratic?
Answer: Researcher A can make their particular flow publicly available for others to read. Versioning flows and versioning components becomes potentially overwhelming.
Future desired plan: create “SEASR Central” that will (like Firefox plugins, say, or Apple’s App Store) be the official repository for flows and components, complete with authoritative version numbers.
Dynamic Visualization of Music Classification Systems (uses SEASR) http://nema.lis.uiuc.edu/demo/blinkie/nema.htm
Discussion of modeling, genre, and “mood” wrt music. I’m not sure that what I’m taking away from this part of the conversation is what I’m supposed to be taking away. Kind of out of my area of expertise.
Feed the program more data, train it more extensively, and its accuracy increases.