Notes on Spatial Data and Representation in the Humanities in a Rapidly Changing Digital Environment

This project seeks to convey spatial data on the internet in a manner that updates the traditional form of the atlas. As a compendium of maps arranged geographically or thematically, and often bound in a book or codex, an atlas represents the promise of presentation of spatial information at a variety of scales but with a consistent narrative: even though one may delve to the microscale, one still retains the feeling that a hand has shaped the narrative pathway down to the particular small scale map, and that it is included for a reason. This project seeks to take that spatial-narrative structure and update it to provide an alternative take on the proliferation of "digital humanities" projects that display spatial data and ask the viewer to reach her own conclusions. Instead, this project does away with the seeming default anonymity of the internet to reintroduce the apparent hand of the author who provides guidance in the interpretation of the mountain of data.

The atlas was conceived of as an instrument to aid in outreach from Dumbarton Oaks to the surrounding community. With an audience of high school students and nonspecialist public in mind, the atlas is meant to be a resource that would provide insight into the way in which water, and the infrastructure that deals with water, shaped various aspects of the design and development of the City of Washington, and the region around the District of Columbia. Primarily a graphic resource, early on in the research and design of the project I realized that text and other kinds of visual material would be a useful augment to the maps themselves. Imagining a tenth grader looking to find material on Rock Creek Park, I envisioned a path through the atlas that would show not only the park boundaries, but begin to make connections to issues of topography and drainage, for example. The path would lead past various connections to other aspects of the city and landscape and end at suggestions of further reading or research (which also happened to be the sources I consulted to make the atlas in the first place). As I gathered data early in the project, I came to realize that my job as cartographer and "digital historian" was to carve these pathways into the historical maps and data, and my tools would be drawing and curating.

Description of Methodology

The libraries and collections of the District contain vast amounts of spatial data; the problem is that much of it has not been digitized. Perhaps half the work of the atlas represents efforts to find, research, and digitize historical and current spatial data. About one-third of the data used to produce the atlas was gleaned from public-domain government sources, primarily the District of Columbia Government and the U.S. Geological Survey (USGS), who both maintain accessible databases of spatial data. One of the primary victories of the water atlas was the compilation of all of this data, located on a website here or a server there, into one location. All of the maps were then generated from this specific library.

Digitization of spatial data is laborious. The process is largely as follows: a historical map is scanned, or a high-resolution photograph is taken of it. The resulting image is processed to correct any curvature or other distortion from the camera's lens, or any wrinkles or tears from the paper map that would distort the fidelity of the data conveyed on the map. The image is then georeferenced, in that longitude and latitude points are plotted to corresponding points on the paper map, and then a computer program reinterprets all of the pixels in the image, in effect redrawing each pixel to correspond with a space on the globe it represents, and stretching the digital image to conform with the surface of the virtual earth. Tracing of features may then begin-this is the creation of the true "spatial data," where points, vectors, and polygons are defined by latitude and longitude coordinates and arranged in relationship to one another to create features we would recognize on a map. Once digitized, the data can then be shown in relations to other data, and spatial features drawn from diverse sources can be compared, overlain, and juxtaposed.

Most "spatial humanities" work ends here, with rough presentations of vector and polygonal data presented. This atlas project takes the interpretation of spatial data a step further. Predigital maps were works of art in themselves, and techniques of representation: how to portray landforms, how to convey the sense of a city or town, were skills that took many years to master and exhibited a variety of subtleties and revealed the talent and dedication of their maker. They also made maps more effective communicators because of the care involved in their construction.

To achieve the "blueprint blue" consistency across all of the maps of the atlas, each map went through a significant amount of processing between output from QGIS and landing on the website. Each map was carefully drawn, with attention paid to lineweights, texture, and contrast. I would argue that visual consistency is more than just a convenience of standard: consistent representation promotes the view that there is consistent analysis.

Archiving and Data

In the past (only about six years ago), GIS data had to be housed on your local hard drive in order for the program you used to be able to access it and run processor-intensive processes over it to produce maps. The data was also largely of a proprietary filetype, the .shp of ESRI, and one needed to have an expensive and burdensome (~ 1 GB) suite of ESRI programs in order to view or manipulate spatial data.

This is thankfully no longer the case. All of the data produced and used to then produce the maps of the atlas made use of an open-source (and excellent) program called QGIS (www.qgis.org). Built on government software developed by the U.S. Army Corps of Engineers, QGIS was revolutionary in that it took the black-box army software and build an intuitive and well-designed graphic user interface over it, and distributed it freely over the web. QGIS can read the old ESRI filetypes, but also can make use of the distributed, or cloud-based, data sources that are quickly becoming the norm, and I expect to become the standard of housing spatial data in the future.

Much of the spatial data I've found is in the process or verge of being translated into Extensible Markup Language (XML). XML is a versatile data format that is especially useful because it was created to be both human and machine readable. I found it fairly easy to become familiar with, and most of the top-layer interactions in the atlas were created using Adobe Illustrator then marked up in XML (read by the web browser as a Scalable Vector Graphic, or SVG) and modified by me by hand.

At the other end of the spectrum is the USGS, which is using XML to house impossibly vast amounts of data on their servers and provide an easy access to portions of the database to the public. For example, the USGS provides elevation data at a variety of grains for the entirety of the United States. By "forking" the USGS server (essentially, establishing a connection and telling the server what area I was interested in), I was able to download data and generate the shaded relief maps that make the background of many of the atlas's maps. Though it is beyond my skills as a computer user, I believe that this conception of data-human- and machine-readable data that lives on an archive's server and can be accessed (but not downloaded) by an outside researcher-to be the direction that these questions of data storage and access are heading.

Critique of Typical Presentations of Spatial Data

American Panorama, a project out of the University of Richmond, is impressive and well-designed, and represents all that is right and wrong with the pursuit of the digital spatial humanities today. With a $500,000 budget (from a Mellon Foundation grant), two full-time historians, and four technicians, it is a difficult behemoth to compete with, but does provide a measure of an objective standard to just how much work and resources one of these projects needs.

The aspect of American Panorama and other projects of its ilk that I most would like to question is the continental-scale conception of the representation of space. Even if indeed one can zoom into a particular canal, the fact that the project is essentially a presentation of data without any interpretation is apparent. The canal remains a polyline, a series of coordinates linked in a chain, only one step up from looking at a database, and conveys no sense of space or place. Instead, these digital humanities projects rest on the laurels of the act of database creation, even if it means the landscape will always remain generic.

The Water Atlas, because of extensive processing, does convey a sense of place and space to the structures and landscapes it considers. A combination of cartography and orthographic drawing conventions, the Water Atlas resists the ubiquitous tendency to reduce events, structures, and landscapes to icons. Instead, it portrays these structures as visible at an urban scale, and the drawings reflect the true impact of these structures on the landscape. Though not as fluid and seamless as the Google Maps-powered Panorama, or Leaflet, or Neatline; orthographic, architectural scale drawings convey information at that middle scale necessary for representation and analysis of landscape.

Conclusion: The Role of the Scholar

When digital humanities websites are considered in the light of the traditional publishing dynamic, it is easy to become frustrated. Websites like the Digital Hadrian's Villa project start as pet projects of a particular scholar or expert, are worked on piecemeal by graduate students, experience growth spurts when they receive grants, and then become encumbrances to the institutions that get stuck hosting them. This is because we think of these projects like academic articles: we provide manuscripts, the institution edits, lays out, publishes, and then houses paper replicas of the work in perpetuity. But websites don't work that way. It would be akin to writing something on paper that will disintegrate or fragment in the next three years.

HTML5 has lent some stability to the web, and there is a push toward standardization. But even the most foresighted designer using the best, most versatile technology will have to revisit and fix the machine eventually, and perhaps even give it a full overhaul. Websites become outdated and need to be updated, and relying on the IT department of Harvard or Dumbarton Oaks, or conscripting grad students has worked in the past, often with great expense because of the steep learning curve associated with learning how a bespoke machine works.

Media platforms and content management systems are specifically meant to avoid this dilemma, providing a stable infrastructure that will last for eternity. But in actuality, these likely won't last for eternity, and instead, digital humanities is relegated to existing as simply the most simple form of stored data: the archive. Structures of interpretation and narrative around it don't seem to last. There have been a few recent critiques of these "digital humanities" platforms, stating that from videotaped lectures to content management systems, these are all simply tentacles of the neoliberalization of the university, and that the media that scholars are forced to use is increasingly dictated by the whims of Silicon Valley.

I remain skeptical of this interpretation. However, I do believe that scholars should not simply rely on the medium that is in fashion currently, nor simply hand over data to technicians who will faithfully present all of it but are not interested in subtle questions of interpretation and meaning. Instead, the scholar needs to have a new attitude toward publication and the institution that houses her work. Instead of simply sending a publication out into the ether, the scholar should consider digital humanities projects as uniquely constructed aspects of that scholar's presence in the digital world. Maintaining that presence is as important as publishing, and the onus is not on the library to keep copies of your publications, but on you to engage in a continual process of building, rebuilding, and experimenting in the scholarly media that has a presence on the internet. This may require a reevaluation of the old contractual relationship between scholar and press: that the scholar is in charge of maintaining the "paper" now, but the library, archive, or university may have other ways of supporting scholarly endeavor.