Last week, April 17-21, was the first annual Endangered Data Week. With the generous support of the Digital Library Federation, CLIR, the Mozilla Science Lab, and Data Refuge, we launched this international effort to help raise awareness around the creation, retention, and sustainability of data and data collection efforts.
The effort was the brainchild of Brandon Locke of Michigan State University, and quickly gained support from many of us across university campuses, nonprofit organizations, libraries, citizen science initiatives, and cultural heritage institutions. The establishment of Endangered Data Week comes at a particular political moment in the United States, when public datasets appear to be in danger of being deleted, repressed, mishandled, or lost. We seek to promote and raise awareness about the sustainability of publicly available datasets; offer opportunities for critical engagement with datasets through analysis and visualization; and boost advocacy for open data policies and essential data skills such as curation, documentation, discovery, access, and preservation.
What is Endangered Data?
The destruction of public data is not just a phenomenon of the digital age. As Locke described recently in Perspectives on History, the famous example of the 1890 American Census represents just one of many historical moments where data was lost to neglect. But there are many ways data can become endangered.
Perhaps the biggest threat to data sustainability is financial. The Sunlight Foundation has argued that “Congress [is] defunding agencies in a way that affects their ability to collect or maintain or disclose data.” Further policy maneuvers by the Trump Administration backs up their assertion: agencies that study climate change have seen their budgets slashed. The Department of Commerce is set to lose $1.5 billion from its budget and the Environmental Protection Agency’s budget will be cut by 30%.
Such moves have affected data and data collection in other countries before: under the Harper Administration in 2010, Canada decided to make its national census voluntary instead of mandatory. Important public health data suddenly was no longer collected by the government, and localities did not have the resources to conduct their own surveys.This left them to develop and implement policies with outdated information.
Censorship and repression
Censorship and repression are perhaps the most obvious ways public data is threatened. In mid-February, the Trump Administration scrubbed the open.whitehouse.gov repository of datasets created and stored there under the Obama Administration. The National Archives and Records Administration has made an archived version available, but an estimate by the Sunlight Foundation suggests there’s “low to moderate confidence” in its completeness. The position of CIO and White House chief digital office, created under the Obama Administration, were meant to guide programs for collecting government data and making it freely accessible – but those positions remain unfilled in the Trump Administration. There’s been only one confirmed case of the outright removal of data – animal welfare records tracked by the USDA – but other takedowns and disappearances may not be far off. Environmental advocates are particularly wary of data losses, prompting the work of our Endangered Data Week partners at Data Refuge.
Perhaps less nefarious but just as dangerous is the simple neglect of data. This may mean willful neglect, but it can also include data simply considered too unimportant to continue collecting or curating. For example, NASA once overwrote precious magnetic tape footage of the Apollo 11 landing when reusing the tapes to record routine satellite data. The loss of NASA data helped lead to the creation of the Open Archival Information System. You also can’t find old episodes of the television show Dr. Who: the BBC deleted past archives, mostly due to practical reasons of space, scarcity of material, and lack of rebroadcast rights.
What You Can Do
While some of our efforts have been spurred on by the current political scene in the United States, dangers to data are not unique to this moment, or limited geographically.
- Help the Internet Archive do its part to preserve web pages. With a Chrome browser plugin, the Internet Archive will point you to an archived page if you stumble onto an error code (404, etc.) If no archived page exists, the plugin lets you click “Save Page Now,” to suggest a page for crawling and archiving by IA.
- Host a DataRefuge event to archive climate data. Their efforts combine the work of “seeders, “baggers,” “tool builders,” “metadata,” “and “storytellers and documenters,” along the “long trail” to collecting data, telling stories, and thinking about the far future.
- Americans should stay alert to several issues on the horizon: HR.1305 threatens to undercount minority populations in federal databases; S.2852, the Open Government Data Act, passed the Senate last year, but stalled in the House; and S.103 and HR.482 would prohibit funding used “to design, build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing.” Contact your representatives to let them know you oppose legislation that puts data collecting and publication efforts at risk, and support sensible government data efforts.
- Don’t forget about your personal data. European data privacy legislation must be assiduously protected. In the United States, Congress removed internet privacy rules at the end of March, allowing internet service providers to sell their customers’ browsing history. Take steps to protect your private data.
- Participate in Endangered Data Week! The first annual EDW ran from April 17th-21st this year and we saw several local and virtual events worldwide. Keep the event in mind for next year and be sure to check out the index and map for ideas of what you could host.
- Most importantly, love your data. If you are collecting research data free of privacy concerns, or are augmenting datasets you acquired from elsewhere in the course of your work, please make that information publicly available and accessible. Put that data in shared, stable, and open repositories. Help make that data accessible and sustain it for others.