DuraCloud + DSpace@MIT: Easy Cloud Access for Collection Managers and Curators

Mon, 2010-12-20 13:49 -- carol

Ithaca, NY How, exactly, do collections of digital resources get parked in a "cloud"? There's  a lot of chatter about cloud storage particularly among higher education IT groups looking to save money on data storage [1], but for those accustomed to hands-on hard drives and racks of processors the idea of looking for sturdy digital back-up and customized services from something as ephemeral-sounding as a "cloud" does not compute.

The fact is that most of us are already "cloud" users without knowing it:

"As you sit hunched over your laptop at home watching a YouTube video or using a search engine, you’re actually plugging into the collective power of thousands of computers that serve all this information to you from far-away rooms distributed around the world. It’s almost like having a massive supercomputer at your beck and call, thanks to the Internet. This phenomenon is what we typically refer to as cloud computing." [2]

Commercial cloud providers like Amazon EC2 [3]patch together the unseen and distributed supercomputer that makes it easy to do things like watch movies online or share photos. They offer users a parking lot in the cloud for digital resources, but little else in the way of security, services or tools to work with data once it's "up in the cloud."  With the launch of DuraCloud in 2011 users will have access to a robust set of tools and services that will allow them to manage copies of their data in the cloud from wherever they are. This concept intrigued MIT Libraries when they signed on to be part of the DuraCloud Pilot Program. They wanted to find out if DuraCloud could help maintain multiple copies of their DSpace repository off-site. This core DuraCloud service is known as "replication." It is a key benefit for institutions like MIT that are looking for simple and cost effective strategies for preserving digital content.

DSpace@MIT [4] took a unique approach to deploying DuraCloud technology. Richard Rodgers, Head of Software Development,  MIT Libraries, wanted to make it easy for repository collection managers and curators to use cloud storage and replication services without interrupting repository workflow–in 2010 DSpace@MIT has enabled 15.2 million downloads to date. With that kind of success Rodgers decided to integrate access to DuraCloud services into the existing DSpace user interface using DuraCloud’s client Java API. The goal for MIT Libraries is to add cloud replication and storage functionality inside the already familiar DSpace interface so that users without technical experience will be able to use DuraCloud services right away. He feels that this capability will allow repository managers and curators to "fine tune" preservation decisions, by allowing for quick collection-by-collection operations, for example with MIT's now well-known Open Access collections.

MIT's groundbreaking open access policy has led to creating an accessible and evolving scholarly record through DSpace@MIT. With the release of the DuraCloud open source APIs and libraries, institutions like MIT are free to develop their own customized solutions for maintaining their collections using cloud technologies.1. http://www.duraspace.org/node/1070

2. http://www.20thingsilearned.com/cloud-computing/1

3. http://aws.amazon.com/ec2/

4. http://dspace.mit.edu/