Introducing Endangered Data Week

Last week, April 17-21, was the first annual Endangered Data Week. With the generous support of the Digital Library Federation, CLIR, the Mozilla Science Lab, and Data Refuge, we launched this international effort to help raise awareness around the creation, retention, and sustainability of data and data collection efforts.

The effort was the brainchild of Brandon Locke of Michigan State University, and quickly gained support from many of us across university campuses, nonprofit organizations, libraries, citizen science initiatives, and cultural heritage institutions. The establishment of Endangered Data Week comes at a particular political moment in the United States, when public datasets appear to be in danger of being deleted, repressed, mishandled, or lost. We seek to promote and raise awareness about the sustainability of publicly available datasets; offer opportunities for critical engagement with datasets through analysis and visualization; and boost advocacy for open data policies and essential data skills such as curation, documentation, discovery, access, and preservation.

What is Endangered Data?

The destruction of public data is not just a phenomenon of the digital age. As Locke described recently in Perspectives on History, the famous example of the 1890 American Census represents just one of many historical moments where data was lost to neglect. But there are many ways data can become endangered.

Budgets

Perhaps the biggest threat to data sustainability is financial. The Sunlight Foundation has argued that “Congress [is] defunding agencies in a way that affects their ability to collect or maintain or disclose data.” Further policy maneuvers by the Trump Administration backs up their assertion: agencies that study climate change have seen their budgets slashed. The Department of Commerce is set to lose $1.5 billion from its budget and the Environmental Protection Agency’s budget will be cut by 30%.

Such moves have affected data and data collection in other countries before: under the Harper Administration in 2010, Canada decided to make its national census voluntary instead of mandatory. Important public health data suddenly was no longer collected by the government, and localities did not have the resources to conduct their own surveys.This left them to develop and implement policies with outdated information.

Censorship and repression

Censorship and repression are perhaps the most obvious ways public data is threatened. In mid-February, the Trump Administration scrubbed the open.whitehouse.gov repository of datasets created and stored there under the Obama Administration. The National Archives and Records Administration has made an archived version available, but an estimate by the Sunlight Foundation suggests there’s “low to moderate confidence” in its completeness. The position of CIO and White House chief digital office, created under the Obama Administration, were meant to guide programs for collecting government data and making it freely accessible – but those positions remain unfilled in the Trump Administration. There’s been only one confirmed case of the outright removal of data – animal welfare records tracked by the USDA – but other takedowns and disappearances may not be far off. Environmental advocates are particularly wary of data losses, prompting the work of our Endangered Data Week partners at Data Refuge.

Neglect

Perhaps less nefarious but just as dangerous is the simple neglect of data. This may mean willful neglect, but it can also include data simply considered too unimportant to continue collecting or curating. For example, NASA once overwrote precious magnetic tape footage of the Apollo 11 landing when reusing the tapes to record routine satellite data. The loss of NASA data helped lead to the creation of the Open Archival Information System. You also can’t find old episodes of the television show Dr. Who: the BBC deleted past archives, mostly due to practical reasons of space, scarcity of material, and lack of rebroadcast rights.

What You Can Do

While some of our efforts have been spurred on by the current political scene in the United States, dangers to data are not unique to this moment, or limited geographically.

April 24, 2017 @jaheppler