Without even necessarily being aware of it, scholars of all stripes have become dependent on databases, but while these databases
are not really designed to provide us with the data we need, they don’t provide data in the way that we need it. New tools are available for presenting information, but the commercial publishers of most of the databases we use are not putting these tools to good use. This longer-than-I-intended entry looks at one such database and imagines applying one such tool, the treemap, to that database.
One of the most important resources for scholars of the early modern period is the English Short Title Catalogue, a database whose ultimate goal is to provide bibliographic records for all items published in English anywhere in the world (or in any language in England) between 1470 (“the beginnings of print”) and 1800. They claim to have already created records for everything published up to 1700. While working on my dissertation, I must have conducted hundreds of searches in the ESTC.
When faced with such a massive number of records, however, it’s important that we develop new tools for end users not just to sift and sort through the huge “piles” of information that result but also to be able to step back, figuratively speaking, and have an overview of the piles themselves. Sometimes it’s the shape of the pile you’re interested in, not the items that make up that pile. Currently*, the ESTC functions are defined by the conventions of print: in response to a search, you get a list of 25 (I think) records per page. If your search returns 1,000 records, you get 40 pages to scroll through; there is no easy way to download all of these records to your own bibliographic database. The ESTC is not unique in providing these sorts of functions; they are common to scholarly databases.
These functions are adequate if all you’re trying to do is compile a list, or find a particular item. But it falls far short of what electronic resources are capable of and what researchers might be after. It does not, for example, allow you to contexualize the printed output of one publisher in relation to others. You cannot get a sense of the relative production of printing houses in Bristol, say, versus those in London between the years 1740 and 1760. You cannot easily find out which author has the most records in the database. (It was only after working on my dissertation for two years that I discovered through an outside source that one of the figures I was writing about had more records in the ESTC than any other author; it would have been nice to know this from the beginning.)
It would be very useful if database publishers of resources such as the ESTC could begin to feed the output into some of the information visualization tools that are becoming available. This would change significantly the nature of the work we do and how we think about knowledge, I believe. In the interests of brevity, I’ll mention one such tool (a treemap), but I’d be interested in hearing about similar tools if others know of them:
A treemap is “a space-constrained visualization of hierarchical structures.” This may sound like an awkward definition, but in practice it means you get an overview of a great deal of information that fits into the confines of your computer screen. The concept was was first conceived by UMD professor Ben Shneiderman as a means of analyzing the contents of a hard drive, and then developed at the University of Maryland Human-Computer Interaction Lab for a variety of uses; Prof. Shneiderman has written a history of the concept and the tools developed. Download a free copy (for noncommercial use) for yourself, if you like.
A good place to see the treemap in action is this visualization of the stockmarket. The map provides you with an overview of the (almost) real-time performance of 500 stocks. Each stock is a rectangle and is a different color, ranging from bright green to bright red. The larger the rectangle, the larger the market cap of the stock; the brighter the green (or red), the more the price has gone up (or down) since a point in time determined by the end user. The rectangles are grouped into market sectors. Mouseover a rectangle and you get a small pop-up window with some basic information; click on the rectangle and you get a menu that will take you to webpages with more information about that stock. It’s amazing how much information you have at your disposal in such a small and pretty easily navigable space.
Why not apply this technology to a database like the ESTC? After a search returns 638 items, say, you ask the database to give you a treemap where the items are grouped by publisher, or by year, or by author. The size of the rectangle could be the number of pages (or words) in the work; the color could be the distance between the city of publication and some geographical point of your choosing. You would then be able to spot certain patterns, or unusual individual entries in a way that would be much more cumbersome (or even impossible) with the current list-based results.
If you know of any other such visualization tools that prove useful along these lines, I’d like to know about them.
* Disclaimer: It’s been a year since I’ve used the ESTC,m so it might have changed. The university where I work now does not, unfortunately, subscribe.
George, an idea that may or may not hold interest: what about sending the permalink to this entry to the C-18 list? It strikes me as something that deserves to reach a secondary audience beyond the blogosphere . . .
That’s worth thinking about, Kari. I haven’t decided if I want to “out” myself as a blogger to the C18-L crowd, yet.
Maybe what I’ll do is just send this entry to the list, rather than give them the link to the blog.
Info viz is, of course, a major field in cs these days. The Maryland people have a useful taxonomy of visualization techniques online here:
My one question to add to your analysis, George, is how or if use of the ESTC differs from use of any other database of comparable size and scope; i.e., do humanities researchers have unique needs that are not accommodated by pre-existing tools? If the answer to that is “no, we don’t have unique needs,” then all that’s keeping viz from taking off in the humanities is a bit of software engineering (and a bit of consciousness raising, of exactly the sort you do here). If, however, the answer is “yes, we *do* have unique needs,” then it’s a much bigger (and much more interesting) challenge. See, for instance, UVa’s Temporal Modeling project:
Take a look at the demo visualization work we are doing for CHLT (http://www.chlt.org) at http://faya.doc.ic.ac.uk:8800/visual/index.html and the write-up at http://www.ariadne.ac.uk/issue34/rydberg-cox/ The demo now is based on LA Times and other news sources. Getting it to work with the Greek, Latin, and Old Norse literary corpora is our main agenda item for July.
Matt, thanks for the link to the taxonomy and to the UVA project. In response to your excellent question, my gut sense is that yes, humanities researchers do have unique needs, but I have not yet thought this through sufficiently (nor have I read widely enough) to be able to argue what those needs are. My first thoughts, though, are to experiment with some of the available visualization tools and see where/if they might fall short of what is needed in the humanities.
Jeff, thank you for the links. The visualizations are very impressive, and the write-up reminds me that no visual representation (an interpretation, after all) of data should be thought of as more ‘natural’ than any other: I needed the explanations to understand exactly what was being shown to me and how to make use of the tools. This observation is not a criticism, however, but an acknowledgment that to be most effective, sophisticated tools require effort on the part of the user.
More thoughts on this topic at a later date, after I’m through with the SHARP conference. I very much appreciate the contributions to this thread, though.
For now, check out this: http://historywired.si.edu/index.html
Just a note that I find this intriguing. Going to have to let it “percolate” for a while.
As Jason points out, “newsmap is cool.”