Tuesday, October 30, 2012

Data narratives and structural histories: Melville, Maury, and American whaling

Note: this post is part I of my series on whaling logs and digital history. For the full overview, click here.

Data visualizations are like narratives: they suggest interpretations, but don't require them. A good data visualization, in fact, lets you see things the interpreter might have missed. This should make data visualization especially appealing to historians. Much of the historian's art is turning dull information into compelling narrative; visualization is useful for us because it suggests new ways of making interesting the stories we've been telling all along. In particular: data visualization lets us make historical structures immediately accessible in the same way that narratives have let us do so for stories about individual agents.

I've been looking at the ship's logs that climatologists digitize because it's a perfect case of forlorn data that might tell a more interesting story. My post on European shipping gives more of the details about how to make movies from ship's logs, but this time I want to talk about why, using a new set with about a half-century of American vessels sailing around the world. It looks like this:

I'll repost this below the break with a bit more of an explanation. First I want to ask some basic questions: If this is a narrative, what kind of story does it tell? And how compelling can a story from data alone be: is there anything left from a view so high that no individuals are present?

Thursday, October 18, 2012

Word counts rule of thumb

Here's a special post from the archives of my 'too-boring for prime time' files. I wrote this a few months ago but didn't know if anyone needed: but now I'll pull it out just for Scott Weingart since I saw him estimating word counts using 'the,' which is exactly what this post is about. If that sounds boring to you: for heaven's sake, don't read any further.

Melville Plots

Note: this post is part III of my series on whaling logs and digital history. For the full overview, click here.

The main thrust of my big post on the Maury logs is against using them to try to tell individual stories. But in the interests of Internet Melvilleiana, there are two particular tracks I want to pull out.

The first is the Acushnet, the whaling ship Herman Melville served on for 18 months. It was there he got the bulk of his first-hand experience whaling. Melville's track winds mostly around the old American whaling grounds off the coast of South America: you can see that had he stayed aboard a bit longer, the chase for Moby Dick might have entered colder waters. (And we might have a 19th-century account of Aleutian islands as strange as the Encantadas are of the Galapagos).

Friday, October 12, 2012

Logbooks and the long history of digitization

Note: this post is part II of my series on whaling logs and digital history. For the full overview, click here.

To read the data in ship's logs we first must know where the data came from. The short answer--ICOADS--might be enough. But working with digitized books has convinced me that knowing the full provenance of your data, through all its twists and turns, is one of the most important parts of any digital humanities project.

Like most humanists, the real digitization projects I care about are books, periodicals, and archives.  A major theme on this blog is the attempt to understand how particular choices in digitization history shape the books available to us.

But ship's logs are interesting because they present a wholly alternate digitization history that can help us understand the mechanics of digitization more clearly. Logs are a digitized data source that has been driving large-scale research projects for  more than 150 years: because of that, they can be a useful abstraction for reflecting on what digitization means. Logbook digitization is an interesting process in its own right; the particular cast of characters--Confederate technocrats, Nazi data thieves--in the history of shipping logs is unique. But the general problems are the same as those found in other large-scale sources of data. Unless humanists intend only to work with data digitized by our own standards, we have to be better at understanding just what can go wrong.

So before I get to those Nazis, let me lay out the basic themes that the story reinforces.