DuraCloud Helps to Ensure Future of World's Largest Archive of Social Science Data & Historic North Carolina Materials

Winchester, MA  The Inter-University Consortium for Political and Social Research (ICPSR) is an international consortium of about 700 academic institutions and research organizations housing the largest archive of social science data (500,000+ files) in the world. With the gravitas of more than 50 years of leadership and training in data access, curation, and methods of analysis, ICPSR continues to paint an evolving picture of social trends and ideas across the Unites States and the world by making demographic, social and geographical data available.

The North Carolina Department of Cultural Resources, which includes the North Carolina Division of Archives and Records (http://www.archives.ncdcr.gov) and the State Library of North Carolina (http://statelibrary.ncdcr.gov/), collects, preserves, and utilizes the state's rich historic resources to ensure that the character, cultural identity, and direction of North Carolina will emerge from its historic heritage.

BEST CLOUD PRACTICES AT NC STATE

North Carolina State Library and Archives were in need of a preservation solution for a growing corpus of digitized government materials. They were specifically looking for an offsite cloud service that would meet the legal requirements of the State of North Carolina's contracts office. DuraCloud was able to meet all of those requirements in addition to providing direct integration with Amazon Web Services and Chronopolis, a digital preservation network that provides services for long-term preservation and curation of digital content. DuraCloud is the only cloud service that offers direct integrations with multiple cloud storage providers which was a big advantage for the State of North Carolina. They were able to easily transfer their content to DuraCloud in one simple step and have copies created and stored seamlessly in both Amazon and Chronopolis.

USING AND PRESERVING ICPSR DATA

Typical ICPSR online users include researchers and graduate students in social sciences, sociology, political science or criminal justice to name just a few discipline areas where social science data is essential. They search data using ICPSR's faceted search, find data sets with documentation, and add them to a shopping cart. Less research-oriented users like government employees or media representatives can perform quick background investigations by looking at comparative data in tables.

But it's not just about the numbers. ICPSR has normalized, scrubbed and increased the value of raw data sets at their web site by adding metadata, documentation, and tools that make "seeing the numbers" easier and more satisfying.

No matter how individuals or organizations use the data, Bryan Beecher, Director, Computing and Network Services for ICPSR and a DuraCloud customer, is required to make sure that all six terabytes of enhanced and raw ICPSR data are available far into the future. ICPSR preservation policy dictates that additional copies of their archival content collection be maintained in various locations: stored locally, stored with partner institutions, as well as stored directly with Amazon. Beecher's ideal "magical future world" would allow him to "push a button to replicate content across six cloud providers."

Moving towards that "magical future world" is a key motivation for why ICPSR has been a DuraCloud user since the very beginning of DuraCloud development and testing. Their initial use case included hosting one of their archival copies in the cloud and expanded to take advantage of DuraCloud's integration with multiple storage providers. ICPSR also sought to make use of DuraSpace business relationships with commercial cloud providers versus managing several vendor relationships independently. Beecher says, "I would much rather engage with a vendor like DuraCloud than a series of agreements with individual cloud operators to ensure long-term duplicate storage options."

WHEN THE POWER GOES OFF

Due to the recent Amazon Web Services power outages one of ICPSR's independently managed copies of archival content stored directly with Amazon was damaged (http://www.informationweek.com/news/cloud-computing/infrastructure/240002170). Rather than spend time and money refreshing that copy, ICPSR decided to simply store another copy in DuraCloud*, but this time within the San Diego Super Computer (SDSC) storage location. Now, ICPSR maintains two replica copies of their content in DuraCloud: one in Amazon and the other in SDSC. Each copy has automated, ongoing bit-level integrity checking that is reported through the DuraCloud web interface.

ENSURING THE FUTURE OF SCHOLARLY RESOURCES

Best cloud practices in preservation of digital content suggest that replication of content with multiple storage providers in multiple geographic locations is ideal, and sometimes required, to meet government digital preservation guidelines. When the added advantage of ongoing bit integrity checking is added to the mix valuable content such as the historic ICPSR corpus of social science data and rich North Carolina cultural content will survive to provide future citizens and scholars with an in-depth understanding of social, historic, and demographic trends.

* Blog post: "Amazon's Loss is SDSC's Gain": http://techaticpsr.blogspot.com/2012/07/amazons-loss-is-sdscs-gain.html, Technology at ICPSR

DuraSpace Says

Community Calendar

«  
  »
S M T W T F S
 
 
1
 
2
 
3
 
4
 
5
 
6
 
7
 
8
 
9
 
10
 
11
 
12
 
13
 
14
 
15
 
16
 
17
 
18
 
19
 
20
 
21
 
22
 
23
 
24
 
25
 
26
 
27
 
28
 
29
 
30