Without even necessarily being aware of it, scholars of all stripes have become dependent on databases, but while these databases
are not really designed to provide us with the data we need, they don’t provide data in the way that we need it. New tools are available for presenting information, but the commercial publishers of most of the databases we use are not putting these tools to good use. This longer-than-I-intended entry looks at one such database and imagines applying one such tool, the treemap, to that database.
One of the most important resources for scholars of the early modern period is the English Short Title Catalogue, a database whose ultimate goal is to provide bibliographic records for all items published in English anywhere in the world (or in any language in England) between 1470 (“the beginnings of print”) and 1800. They claim to have already created records for everything published up to 1700. While working on my dissertation, I must have conducted hundreds of searches in the ESTC.
When faced with such a massive number of records, however, it’s important that we develop new tools for end users not just to sift and sort through the huge “piles” of information that result but also to be able to step back, figuratively speaking, and have an overview of the piles themselves. Sometimes it’s the shape of the pile you’re interested in, not the items that make up that pile. Currently*, the ESTC functions are defined by the conventions of print: in response to a search, you get a list of 25 (I think) records per page. If your search returns 1,000 records, you get 40 pages to scroll through; there is no easy way to download all of these records to your own bibliographic database. The ESTC is not unique in providing these sorts of functions; they are common to scholarly databases.
These functions are adequate if all you’re trying to do is compile a list, or find a particular item. But it falls far short of what electronic resources are capable of and what researchers might be after. It does not, for example, allow you to contexualize the printed output of one publisher in relation to others. You cannot get a sense of the relative production of printing houses in Bristol, say, versus those in London between the years 1740 and 1760. You cannot easily find out which author has the most records in the database. (It was only after working on my dissertation for two years that I discovered through an outside source that one of the figures I was writing about had more records in the ESTC than any other author; it would have been nice to know this from the beginning.)
It would be very useful if database publishers of resources such as the ESTC could begin to feed the output into some of the information visualization tools that are becoming available. This would change significantly the nature of the work we do and how we think about knowledge, I believe. In the interests of brevity, I’ll mention one such tool (a treemap), but I’d be interested in hearing about similar tools if others know of them:
A treemap is “a space-constrained visualization of hierarchical structures.” This may sound like an awkward definition, but in practice it means you get an overview of a great deal of information that fits into the confines of your computer screen. The concept was was first conceived by UMD professor Ben Shneiderman as a means of analyzing the contents of a hard drive, and then developed at the University of Maryland Human-Computer Interaction Lab for a variety of uses; Prof. Shneiderman has written a history of the concept and the tools developed. Download a free copy (for noncommercial use) for yourself, if you like.
A good place to see the treemap in action is this visualization of the stockmarket. The map provides you with an overview of the (almost) real-time performance of 500 stocks. Each stock is a rectangle and is a different color, ranging from bright green to bright red. The larger the rectangle, the larger the market cap of the stock; the brighter the green (or red), the more the price has gone up (or down) since a point in time determined by the end user. The rectangles are grouped into market sectors. Mouseover a rectangle and you get a small pop-up window with some basic information; click on the rectangle and you get a menu that will take you to webpages with more information about that stock. It’s amazing how much information you have at your disposal in such a small and pretty easily navigable space.
Why not apply this technology to a database like the ESTC? After a search returns 638 items, say, you ask the database to give you a treemap where the items are grouped by publisher, or by year, or by author. The size of the rectangle could be the number of pages (or words) in the work; the color could be the distance between the city of publication and some geographical point of your choosing. You would then be able to spot certain patterns, or unusual individual entries in a way that would be much more cumbersome (or even impossible) with the current list-based results.
If you know of any other such visualization tools that prove useful along these lines, I’d like to know about them.
* Disclaimer: It’s been a year since I’ve used the ESTC,m so it might have changed. The university where I work now does not, unfortunately, subscribe.