Digital History

Humanistic Approaches to Data Visualization

In March 2013, Cameron Blevins came to me with a question about how to visualize his research into the U.S. post office. This presented a fantastic opportunity for both of us: as a research collaboration and a chance to learn d3.js. DH likes to ask what to do with a million books. I wanted to ask what to do with 14,000 post offices. We wanted to know – what insight could we gain from visualizing the Post? Visualization, as historian Richard White argues, is a means of doing research: of posing questions we otherwise could not ask without the aid of computers; of identifying patterns that might otherwise go undiscovered. The design of the project went through several iterations as we tried to solve a key question: how to present the data in a meaningful way. Plotting points on a map is a simple enough exercise, and I find even that process can be arresting—to see the massive network of the Post and where communities tended to cluster in the West. But we wanted more than just the presentation of points on a map—we wanted to represent knowledge. Our iterations of the project were straightforward. Our test case focused on Oregon, and we started plotting offices onto a Google Map using the JavaScript library D3.js to begin understanding, first, how the technology would work, and second, what sort of things could we begin to do with the data. We experimented with alternative views as well. Rather than viewing points on a map, could we put post offices into a bin and get a sense of geographic concentrations? We tried with hex binning—a hexagonal grid for creating histograms that allows us to interpolate values between point on the map. An interesting view, perhaps, but does it help us answer or raise new questions? Such a map help us visually understand geographical concentrations of post offices, but our data let us ask an even more interesting question: where are people going in the American West? We needed interactivity. We had a lot of data – complex, messy, incomplete data. The entire dataset is 160,000 post offices for two centuries for the entire US – we only focused on the latter half of the western US. In some ways, the data we worked with didn’t present a lot of challenges: it’s complex, yes, but the data is all the same. We were not confronting, say, a digital archive of texts, all of which can be radically dissimilar in their form and content: case files, advertisements, newspapers, diaries. We didn’t have to confront what William G. Thomas has called the document-type problem. But there were other design challenges: for example, meshing together datasets. Cameron began the project with an already-massive dataset of post offices, but roughly eight months ago he purchased another dataset from a stamp collector that expanded the amount of evidence we worked with massively. He now have data for the entire United States. Thus, we are confronted with the architecture of our data. Then we had to ask the really dangerous question: Where does the West begin? Part of our argument is that we can use the post office as a proxy for understanding settlement patterns in the West. Many of these nineteenth century towns died years ago; they don’t exist on present-day maps. The West is known for its ghost towns – we can see a lot of them here. I want to make a case to you today about why we should think about the Post as a proxy for communities, and the significance of visualizing that process. There are, of course, many ways we could represent population figures visually. One of the more popular techniques is choropleths. But there’s a problem with the chropleth when it comes to the West: we have what Cameron has called the West’s “county problem.” The problem with western counties is that they’re huge. Look at San Bernardino in Southern California, which includes the metropolis of Los Angeles. Lots of people, right? but the actual population is huddled against the western edge of the county rather than evenly distributed through the county. You get a visual representation, then, that can be misleading. What you want is more granularity. Let’s focus on Colorado and New Mexico in 1870 – keep these places in mind, we’ll return to them a few more times. We get a sense that there are lots of people, a rough idea of where they’re at in the states with the choropleth. But it’s hard to really know unless you compare this with our map. Two things stand out to me here. One, these two things map onto each other well. If we overlaid the population data with the post data, I think we’d see that the shading of the counties would fit well with the location of post offices. But, we also get a better sense of where people are at. They’re not distributed throughout these huge counties in New Mexico; they follow a corridor north to south – probably a railroad line in New Mexico, and nestled against the Front Range in Colorado. To me, this is significant. If my historical question is about the settlement patterns of the western US, it matters a great deal to me to know where exactly those communities are at. The Post gives me a window into that process. So, the Post becomes a proxy for a town. You wouldn’t have a post office where there’s no town. This is the way people communicated in the nineteenth century; oftentimes, these towns were not located next to railroads. They needed the Post and its network of postal roads, stagecoaches, and rail lines to distribute news and information. This was your connection to the broader world. It was also a key part of the national government’s process of folding the West into the national. Rather than an isolated region, it became integrated into a national system of information. And this network connected the West to larger social, economic, and political networks. But maybe you don’t believe me quite yet. Maybe you look at these offices and say: this doesn’t work. There’s no story here. Let me give you one more example; I have to give a shoutout to Ben Schmidt, a historian at Northeastern, for alerting me to these maps. In 1915, the U.S. Census Bureau published the Statistical Atlas of the United States. It’s a beautiful book filled with some stellar visualizations. One set of visualizations sought to illustrate the population density of the United States for 1870, 1880, and 1890. I looked at these maps and thought: does the postal data map onto these. In other words, can I really treat the post as a proxy for settlement? If post offices happen to line up well with the Census bureau’s own statistics, to me that’s further evidence that I can treat these as indications of settlement. So, let’s look. Here’s our Colorado – New Mexico corridor in 1870. I apologize that my projection is different from that used by the Census Bureau, so you may have to squint a bit to help. Looking at these side-by-side, I think they pretty accurately map onto each other. Notice the small pocket of post offices to the west of Denver; that same blob appears on the Census map to the west of Denver. Notice the collection of offices in northern New Mexico; the Census map distribution shows that same presence. So, we have the West of 1890 – heavy populations in California and the Pacific Northwest; lots of people along the Rocky Mountains; empty areas in Nevada and Utah and Montana. My map seems to map onto this pretty well. Pay particular attention to WA, OR, CA – my projection isn’t the same as the Census, so they don’t map quite right. But they’re pretty close. And if I fixed the projection, I think they’d map onto each other very closely. So, comparing post offices to other maps and visualizations – I’d say that we can safely use the Post as a proxy for understanding communities. But the question is, why is that important? Why did I spend all this time trying to convince you that I can safely treat the Post as a proxy for communities? Because the story isn’t just about the rise of communities; it’s also about their decline. You don’t see this in the Census maps. If we look at the progression of the maps from 1870 to 1890, the maps tell a particular story: one of growth, one of progress. Let’s return to our Colorado – New Mexico corridor. What you don’t see in these is the communities that don’t thrive. Here’s our corridor in the Southwest again; these are post offices that close between 1870 and 1890. If the story I’m interested in is the process by which communities grow and decline in the West, it’s these communities I want to examine. Again, you don’t see these places in the Census Maps. I would bet you also wouldn’t see these changes using population data for counties. But the postal data we have can give us that. And therein lies one of the great benefits that I think visualization lends to the humanities. The Census maps are static, giving me snapshots of particular moments in time. Our map lets you examine any moment in time between 1848 and 1900. Maybe your curious about what’s happening in western settlement during the American Civil War? You can select those years. Maybe you have a particular interest in western settlement as conflicts with Native Americans are happening throughout the west in the 1870s and 1880s. You can select those years. You can look at places where communities go away and where they spring up – and that’s key. What the map does is lead me to questions. It leads me to places on a map that may be overlooked by historians. Some of these places no longer exist; they, quite literally, are removed from the historical record. Some of these places are mining camps – they exist only for a year until the mines run dry, then they’re off the map. What’s happening is a distant and close reading of a spatial experience. By stepping back and looking at overall patterns, we can then zoom in closer – track down more sources, track down more information, give richness to the fabric of historical experience.

Research Design and Geography of the Post

[Read this along with Cameron Blevins’s companion post.] After more than a year of work, Geography of the Post is live. I wanted to take a moment at the project’s launch to reflect back on the design decisions we made with the project and to document these changes.1 The design of the project went through several iterations as we sought to solve two problems: The first, the most efficient way of presenting the material. Since we are dealing with such a large amount of information (our total dataset approaches 100,000 post offices), we ran into problems very early on with the performance of the map. Dragging, panning, and zooming the map became frustratingly slow – a user experience you always hope to avoid. We built in manual zooming features to work around that problem. Second, our bigger question revolved around how to present the information. We wanted to determine what sort of views we could present to users in order to ask interesting research questions. Our early design iterations focused on Oregon. We started by loading our data onto a Google map: We experimented with alternative views, such as hex binning visually understand geographic concentrations of post offices through histograms: These were useful views, but we had considerations that we wanted to take into account with the offices that simply plotting points doesn’t let us get at. It’s interesting, in one sense, to see the concentrations of post offices. But these points don’t represent much else. If we are using the post to understand something about the movement of people into the American West, we needed more interaction with the points in order for us to query the information with more granularity. With the assistance of some amazing undergraduate research assistants – Jocelyn Hickock and Tara Balakrishnan – we created methods for determining the status of a post office at any point in time. Users are presented with two views. The first is what we called “Duration View,” which uses transparency of the points in order to convey the “age” of a post office. These “ages” update according to the span of time that you draw on the timeline, or you can view the map as a whole and see areas of the West that have had the oldest (or youngest) post offices. A second view of the post offices we built into the project is what we’ve called “Status View.” This view shows us one of four statuses that a post office can be in during a given span of time: closed, opened, open throughout, or open and closed. The view gives us a chance to look for large areas of closings or openings in the context of surrounding post offices and raise questions about why those changes are occurring. Why document our design decisions? Part of my own goal in digital humanities generally is the reusability of approaches, methods, tools, code, and design in projects that may be far afield from my own work. But I also believe that we can make our work more methodologically transparent by presenting the artifacts and iterations of our design process. Not only because designs have implied and explicit arguments, but because sharing the process helps others in their design process. Furthermore, exposing our design and thought process has helped us to think more deeply about our own design decisions.2 In other words, I am trying to answer Trevor Owens’ call that we take “a few moments at the end of a project to reflect on what you wanted to accomplish, what actually happened, and what you learned from the process.”3 Our goal at the outset was to determine what we could about the relationship between the U.S. Post and population growth in the American West. By and large, I think the project goes a long way in giving us an overall picture of population growth at specific areas in the West, a more granular view of populations than we can see in choropleth maps because of the West’s county problem. Since counties are so large in the West, a choropleth fails to really give us a sense of where people are at in space.

Population in the West, 1870. Map by Cameron Blevins.

But the choice of using post office points to surmise about the growth of population centers gives us a greater sense of where people are going in the West. To make that process more clear than a static map could convey, we designed a timeline feature that allows users to drag a span of time – from a single year to the entire span of time contained in the dataset – and visualize how these changes occur over the course of the century. You have a specific interest in the West during the Civil War? You can draw the timespan and see those offices between 1860 and 1865. More interested in the late nineteenth century? Select those years. Want to watch year-by-year how post offices grow in the West? Select a year, and drag across the timeline to watch places in the West expand. There are elements of the map that I wish we had designed in from the beginning. I’d like to see this same information on a terrain map rather than a flat map – to see how the landscape might have determined where post offices located. I would love to add layers to the map – railroads, major roads, postal routes. Other quanifiable information might also be overlaid on the map – population figures, salaries of postmasters, perhaps even voting patterns. We may have even built in more conceptual and experimental visualizations that could have allowed us to distort time and space (think cartograms) to speculate on the ways that the post shaped how people thought about space. In these ways, we could add more layers of information that may cause us to ask new kinds of questions. Visualizations are provocations for interpretation. For those who approach the project – researchers, teachers, students, the public – my hope is that the visualization provokes questions and ideas. The sheer scale of the office network is arresting, but interacting with that network provides a chance to view it from different perspectives. The interactions with the research, I hope, give users a chance to ask different kinds of questions that a static map simply couldn’t prompt because it lacks the ability to reshape the information easily. As an interactive scholarly work, Geography of the Post lets users explore the space of the Post and the growth of the American West.

  1. I have also talked about the project and some of our design decisions on The First Draft Podcast, episode 6.
  2. The code for this project is available on Github. I also have a desire to make this code cleaner and open to wider use by others. Parts of the code are fairly specific to Cameron’s dataset, but I’d like the map to be able to handle any data dropped into it.
  3. Trevor Owens, “Please Write it Down: Design and Research in Digital Humanities,” Journal of Digital Humanities 1 (Winter 2011).

What does Missile Command have to do with Digital History?

I am spending the week at the University of Victoria for this year’s [Digital Humanities Summer Institute]() (DHSI). I was here last year and had the chance to make a lot of new friends – indeed, I think one of the most powerful experiences of attending DHSI is the engagement with the people and the community. The people here are awesome. I’m in the physical computing course with Jentery Sayers, Bill Turkel, and Devon Elliott. We’re getting introduced to a variety of technologies – 3D printing, Arduino boards, Raspberry Pi, Max, and Makey Makey. One of the projects a group of us are working on is reviving a 1983 gaming desk that originally played Missile Command. Missile Command was a 1980 game published by Atari where the game player had to aim three anti-missile batteries at incoming missiles and bombs. (Want to play? It’s online.) The one question that I keep returning to: why build this? Well, if you are an historian of technology, or game culture, or want to understand something about the design decision that went into the production of Missile Command, or you’re thinking about the historical preservation of video games and the devices they were originally played on, then engaging with physical computing makes good sense. After all, the process of of recreating the system may lead to understanding the decisions that went into the circuit design, wiring, and programming of the game itself in ways that reading primary sources may not. It strikes me how often we’ve had to think about certain design decisions for our own technological limitations. For example, the Raspberry Pi only had fifteen pins open that we could interact with, but the controls and other features we were adding on required more than what was available. Our solution was to pair Makey Makey to handle the controls, and feed that into the Raspberry Pi. That left enough pins to control coin detection, audio, a way to escape the emulator, and a way to select the number of players. If we’re running into design and technology considerations with modern computing devices, what sort of things did the original game designers run in to? And how do their decisions reflect something about the technological history of the 1980s? But I don’t study those questions. Maybe they’re obvious questions to those who do. So why am I interested in creating a game emulator within a 1983 gaming desk? As a teaching tool, I find physical computing immensely compelling. I could envision a course on the history of computing or the history of electronics that asks undergraduates to re-create historical objects using modern-day technology. But our technology today – like any technology – carries limitations of function, energy, availability, affordability, and so on. Design decisions are made throughout the process. As they ask questions about their current technology, that opens up an opportunity to ask questions about historical technology. With those questions in mind, plus their physical interaction with the device, we could turn students towards primary sources to understand the historical limitations of technology. We come to understand the past through a material engagement with the present. And as a public history tool, physical computing likewise holds promise. I could envision an opportunity to use such devices to prompt users to think about the historical preservation of electronics, electronic media, and games. Or, to completely repurpose something like a gaming cabinet as an alternative interaction device. You could, for example, set up the cabinet to run Google Earth and allow the game controls to be a method of guiding yourself through historically-recreated virtual worlds. Or replace the cabinet with a physical globe that controls Google Earth. What other perspectives could physical computing lend to the ways that we appreciate electronics or interact with explanatory devices? Anyway, some initial thoughts about the intersection of physical computing with digital history. We’re doing a lot more in the courses and I have many more ideas swirling about – 3D printing, historical reconstruction of electronics and devices, and so on – but those will come later.