Winchester, MA Heavy fog over Washington caused several attendees to miss the start of the Coalition for Networked Information (CNI) Fall Member Meeting held on December 10-11, 2012. Clifford Lynch, CNI Executive Director, began with an overview of strategic CNI activities in which he reminded the audience that "Scholarly practice does not stand still". This set the stage for several DuraSpace-related presentations that focused on programs, software and services that support access to and preservation of resources for scholarship.
E-Science Institute: An Approach to the Challenge of Digital Research
Mackenzie Smith, University of California, Davis, explained that the ARL/DLF/DuraSpace E-Science Institute (ESI) was designed as a hands-on course to help academic research libraries match their strengths and natural abilities to all aspects of the data lifecycle. The terms "E-Science and E-Research" were commingled in this approach where participants did research on their campuses to come up with strategic agendas for their institutions. Taking a wide view of the data lifecyle offered participants a variety of ways to think about moving towards support for research and data preservation.
Part of the issue for libraries is that physical spaces need to be re-envisioned to maximize access to research data for faculty and students with visualization tools such as "walls and caves". Cross-pollination activities such as librarians doing ontology creation for researchers is another non-traditional way to consider expanding library services in support of E-Science.
Smith pointed out that support for E-Science is a logical part of libraries' traditional mission: collecting, preserving and making information available for scholars. The E-Science Institute provides methods and tools to complete the analysis and planning required to fit institutional abilities with aspects of support for the data lifecycle.
ESI was first offered in 2011 because demand for help in this area from research libraries had grown. The fall 2012 offering of the DuraSpace/ARL/DLF E-Science Institute was conducted online and in-person with a face-to-face capstone event that concluded on Dec 13 in Arlington, Virgina. Early feedback from this fall's course has been very positive as the course continues to be in demand. To sign up to receive information about future offerings, visit: http://duraspace.org/esi-contact-form
Many thanks to the E-Science Faculty and Guest Faculty: Jake Carlson, Mike Furlough, Chris Shaffer, David Minor, Bill Michener and MacKenzie Smith for again designing and delivering this useful course.
Academic Preservation Trust (APTrust)
Robin Ruggaber, University of Virginia, explained that the APTrust is a critical piece of emerging preservation infrastructure because our data is in jeopardy. Lack of diversity in geographic regions where data is kept, unexpected weather events and even lagging political support can cause fragile scholarly resources to disappear. It is not in the commercial interest to preserve content leaving it up to higher education, and in particular libraries, to preserve resources for scholarship over the long-haul. That is why a consortium of academic institutions committed to the creation and management of academic and research content for multiple institutions came together to form the APTrust.
Twelve current partners make up the APTrust "Core implementation team" who agree that a successful effort will take more than storage solutions and technology. To make the APTrust sustainable community building, business planning and marketing will be part of the effort.
Andrew Woods, DuraSpace, offered answers to the provocative question, "What do you get from the APTrust"? In the context of ensuring that content survives into the future, APTrust partners have access to redundant copies of their content, the ability to leverage economies of scale, generate audit reports and access to an aggregated repository that will take advantage of DuraCloud technology for consortial work on top of the collection.
Woods, the APTrust technical architect, offered attendees a deep dive into the technical infrastructure. APTrust software will roll out in phases with an end-to-end data flow tie-in to DPN (Digital Preservation Network). How the data flow works:
• Local partner IRs are represented at the top layer in "3 flavors of repositories"
• Ingest packages and associated metadata are held in DuraCloud "Staging areas".
• When an ingest submission is complete and validated it then moves into a preservation space
Instant and ongoing bit checking along with the ability to store content with multiple providers will be available through DuraCloud.
The APTrust is looking for cost-effective disaster recovery, access services, long-term preservation, tools and best practices and hosting a portal on top of collections to enhance preservation strategies and opportunities for partners.
Using the Cloud for Backup, Storage, and Archiving: Decision Factors, Experiences, and Use Cases Explored
DuraCloud is a DuraSpace service that allows users to store, manage and archive data in the cloud. There are currently 4.8 million files and 31 terabytes of data in DuraCloud. Over the last year 6 versions of the DuraCloud open source software were released with 11 new major features that included SDSC integration and automated health checking. All features are rolled up into the open source code so that anyone may freely download a full version of DuraCloud software.
Geneva Henry, Rice University, uses DuraCloud as part of an overall digital preservation strategy at her institution. All DSpace repository contents are stored in DuraCloud including digitization masters.
Rice Univeristy was an early DuraCloud pilot institution. They found that they could easily store and manage faculty publications, digital projects, archives and special collections, theses and dissertations and more.
Before content is stored in DuraCloud files are packaged in archival information packages (AIPs). Capturing preservation metadata that reflects everything known about a record ensures resource provenance into the future. They are able to recreate their repository with what is stored in DuraCloud
The answers to many questions around digital preservation issues do not have single answers. Henry believes that it is critical to be part of, and contribute to ongoing experiments and new initiatives.
Holly Mercer, University of Tennessee, is using DuraCloud in the context of an overall digital preservation assessment focused on static collections.
They have established a production workflow that sends content to local storage, syncs to Amazon S3 and SDSC and finally is uploaded to DuraCloud. They have been pleased that DuraCloud maintains their institutional repository hierarchical structure.
Mark Leggott, UPEI, Islandora and Discovery Garden, oversees 100+ institutions who are using the Islandora software. UPEI was also a pilot DuraCloud institution. The focus of their work with DuraCloud has been to pursue integration with the Islandora framework. DuraCloud has been added to the Islandora stack aiming for a "single button disaster recovery" approach. They have five clients using Islandora and have built in "The Vault" to back-up and preserve users' content with DuraCloud.
The DuraCloud sync tool is running in continuous mode in the Islandora stack. Leggott is impressed with the quality of the software that DuraSpace creates and has found that if you have a corrupt file in Islandora you can use DuraCloud as a recovery mechanism.