Road Report from Sun PASIG

Wed, 2008-06-04 15:20 -- Anonymous (not verified)

San Francisco, CA According to an IBM study all 1,407,724,920 people who use the internet will double the amount of digital information in the world every 11 hours by 2010. Should we keep it all? And if the answer is yes, how will that work exactly? Should it be accessible or will it land in a “dark” archive to be mined by future digital archeologists for it’s meaning and context?
Sun Microsystems hosted a group of leaders from institutions and organizations worldwide in San Francisco May 23-25, 2008 to consider these and other questions at the Preservation and Archiving Special Interest Group (PASIG) meeting. Presentations were focused on technical and organizational strategies for how burgeoning collections of global knowledge that now include very large data sets might be managed and sustained. Institutional capacity for creating, managing and studying many pedabytes of scientific and humanities data has increased as mass storage and information technologies have advanced.
Gail Truman, Business Development Manager, ST5800, and Eric Reid, ST5800 Developer, Sun Microsystems, kicked off the PASIG meeting with early adopter updates along with opportunities for organizations interested in the ST 5800 “Honeycomb” system to take part in an open forum about content addressable storage trends.
A variety of Sun products address both structured and unstructured storage. The Honeycomb creates a storage cloud that gives you a “claim check” to go find data with object identifiers. The Honeycomb box is a “cell”–two cells fit into a rack. The system is designed to allow for services and applications that would normally be accessed in the compute layer to be available within the “cell.” Accomplishing on-the-fly data transformations is one example of what this bundled hardware/software solution allows.
Presentations from PASIG are available here: http://events-at-sun.com/pasig_spring/presentations/. A San Francisco Chronicle article entitled, “Librarians Discuss How to Store World’s Data” by Tom Abate about the meeting can be accessed here.
Humans have never before been challenged with having to save so much information. Historically people have been compelled by circumstance to “let stuff go,” albeit often unwillingly. The list of what many of us leave behind can include almost everything we care about–books, hard drives, memorabilia and artworks–to even bigger items such as houses and cars. Whether selling, donating, recycling, sharing or being forced to abandon our stuff, we are more or less wired to cope with the lifecycle of “things” as they intersect with our lives.
The ancient Alexandria Library, for example, was founded in the third century B.C. and painstakingly developed over time by scribes who hand-copied texts that often arrived by ship. When the library burned scholars and citizens mourned the passing of an irreplaceable collection of knowledge. In 2003 a new Library of Alexandria was founded based on the same ancient scholarly traditions. The open and unique practice and style of early Egyptian scholarship persists in the new library even if the highly combustible scrolls and artifacts do not. Determining “how” information is developed and made available, and what policies surround its use may be as significant as “what” the content of the information is in determining the “value” of which data should be preserved for the future.

Fedora Commons, the DSpace Federation, EPrints and many other organizations who make use of these and other types of repository software presented overviews of priorities, community developments and technical achievments. The Fedora Commons, DSpace, and EPrints presentations from Sun PASIG are available here.

Sandy Payette, Executive Director of Fedora Commons sketched recent Fedora Commons history and reviewed what’s on the short term horizon. Scholarly communications, semantic knowledge spaces, data curation and preservation and archiving are all areas where Fedora Commons has established new and ongoing partnerships to help create technology to sustain the world’s knowledge repositories and archives.
Fedora Commons is establishing community-led Solution Councils for open access publication, data curation, eresearch, and preservation and archiving. In the near term these groups will articulate a mission and develop requirements that will lead to establishing bundles of software and applications to address the needs of organizations whose efforts are focused in these four areas.
What is Fedora right now? Payette reviewed Fedora’s key features which include a digital object model that can:
– Aggregate content “datastreams” in an object… any type of content
– Intermix both local content and external content
– Relationships among digital objects (via RDF)
– Register “content models” for known object patterns
As well as Repository Services:
– Modular
– Web service interfaces (REST/SOAP)
– Versioning
– Dynamic service binding based on object content model types
– File-centric (all essential characteristics in XML files)
– RDF-based indexing (semantic triplestoreindex with query)
– Security with pluggable authentication and XACML policies
– Journaling (replay all events to create replicas of repository)

Andrew Treolar presented an overview of the Australian National Data Service (ANDS). This is a multi-institution, long-term and very large data pilot project established to ensure that researchers are able to identify, locate, access and analyze any available research data. They are looking at issues such as multi-discipline access and use outside of research. Australian funding agencies are tracking the implications of the ANDS work so that they can potentially require a data management plan as a part of new research grants.

Michelle Kimpton, executive director, DSpace Federation gave a report on the recent release of DSpace 1.5 which uses the Maven build system. This distributed development tool supports “publishing models.” This functionality is useful for those who are looking to use DSpace “beyond institutional repositories.”
The DSpace Federation has changed the model for how they work with their community in order to manage growth. Almost anyone can now contribute code right away because builders do not contribute code back to the core.
Manakin is part of DSpace 1.5, and is the DSpace default user interface. Manakin allows for customized UIs that look more like a digital library. A number of organizations are using Manakin as a front end to their repositories. More information is available in this December 2007 article from DLib Magazine: “Manakin, A New Face for DSpace.”

Dave Tarrant, PRESERV and the University of Southampton, introduced a storage controller for EPrints repository software. The controller supports a pluggable storage layer for repositories, providing the ability to store objects in different locations based on metadata or type, and enables direct interaction between the repository software and open storage platforms. Tarrant’s presentation, “From Open Storage to Smart Storage: Enabling EPrints Repository Preservation” can be accessed here.