DuraCloud Helps to Ensure Future of World's Largest Archive of Social Science Data & Historic North Carolina Materials
Winchester, MA The Inter-University Consortium for Political and Social Research (ICPSR) is an international consortium of about 700 academic institutions and research organizations housing the largest archive of social science data (500,000+ files) in the world. With the gravitas of more than 50 years of leadership and training in data access, curation, and methods of analysis, ICPSR continues to paint an evolving picture of social trends and ideas across the Unites States and the world by making demographic, social and geographical data available.
The North Carolina Department of Cultural Resources, which includes the North Carolina Division of Archives and Records (http://www.archives.ncdcr.gov
BEST CLOUD PRACTICES AT NC STATE
North Carolina State Library and Archives were in need of a preservation solution for a growing corpus of digitized government materials. They were specifically looking for an offsite cloud service that would meet the legal requirements of the State of North Carolina's contracts office. DuraCloud was able to meet all of those requirements in addition to providing direct integration with Amazon Web Services and Chronopolis, a digital preservation network that provides services for long-term preservation and curation of digital content. DuraCloud is the only cloud service that offers direct integrations with multiple cloud storage providers which was a big advantage for the State of North Carolina. They were able to easily transfer their content to DuraCloud in one simple step and have copies created and stored seamlessly in both Amazon and Chronopolis.
Typical ICPSR online users include researchers and graduate students in social sciences, sociology, political science or criminal justice to name just a few discipline areas where social science data is essential. They search data using ICPSR's faceted search, find data sets with documentation, and add them to a shopping cart. Less research-oriented users like government employees or media representatives can perform quick background investigations by looking at comparative data in tables.
But it's not just about the numbers. ICPSR has normalized, scrubbed and increased the value of raw data sets at their web site by adding metadata, documentation, and tools that make "seeing the numbers" easier and more satisfying.
No matter how individuals or organizations use the data, Bryan Beecher, Director, Computing and Network Services for ICPSR and a DuraCloud customer, is required to make sure that all six terabytes of enhanced and raw ICPSR data are available far into the future. ICPSR preservation policy dictates that additional copies of their archival content collection be maintained in various locations: stored locally, stored with partner institutions, as well as stored directly with Amazon. Beecher's ideal "magical future world" would allow him to "push a button to replicate content across six cloud providers."
Moving towards that "magical future world" is a key motivation for why ICPSR has been a DuraCloud user since the very beginning of DuraCloud development and testing. Their initial use case included hosting one of their archival copies in the cloud and expanded to take advantage of DuraCloud's integration with multiple storage providers. ICPSR also sought to make use of DuraSpace business relationships with commercial cloud providers versus managing several vendor relationships independently. Beecher says, "I would much rather engage with a vendor like DuraCloud than a series of agreements with individual cloud operators to ensure long-term duplicate storage options."
WHEN THE POWER GOES OFF
Due to the recent Amazon Web Services power outages one of ICPSR's independently managed copies of archival content stored directly with Amazon was damaged (http://www.informationweek.
Best cloud practices in preservation of digital content suggest that replication of content with multiple storage providers in multiple geographic locations is ideal, and sometimes required, to meet government digital preservation guidelines. When the added advantage of ongoing bit integrity checking is added to the mix valuable content such as the historic ICPSR corpus of social science data and rich North Carolina cultural content will survive to provide future citizens and scholars with an in-depth understanding of social, historic, and demographic trends.
* Blog post: "Amazon's Loss is SDSC's Gain": http://techaticpsr.blogspot.