Fedora 4 Deep Dive Number Two: Support for Preservation
Submitted by on Thu, 2014-05-15 13:34
The New Streamlined and Strong Flexible Extensible Durable Object Repository Architecture is designed to support preservation of digital assets–one of the primary Fedora 4 use cases
Winchester, MA The redesigned beta version of Fedora 4 will be released at the Open Repositories Conference in June with plans for a full-production model release later in the year. This post is part of a series of articles that will acquaint you with Fedora 4’s significant improvements–made available to the community by Fedora stakeholders working in concert with DuraSpace.
Fedora 4 open source repository software addresses the following top priorities expressed by the international community over the past few years:
Improved performance and scalability
More flexible storage options
Features to accommodate research data management
Better capabilities for participating in the world of linked open data
An improved platform for developers—one that is easier to work with and which will attract a larger core of developers.
A full set of significant Fedora 4.0 features can be found on the wiki. Highlights include:
How Does Fedora Provide Durable Storage for Digital Assets?
Long-term durability is one of the main requirements of a good preservation system; simply put, digital assets need to stand the test of time. To do this effectively requires a multi-pronged approach, including (but not limited to) flexible storage options, fixity checks, and backup and restore capabilities.
Different types of digital assets are often stored in separate back-end file systems; for example, high resolution image files may be stored in one file system, and large video files in another. Wherever they are located, these files need to be managed by repository software that can maintain file integrity and protect against bit rot and other fixity issues. The repository must also support backup and restoration, not just of the data, but of any external indexes that provide search capabilities and maintain any semantic relationships between the assets.
Fedora 4 provides a strong set of features to support durable storage. Policy-driven storage allows administrators to define ingest rules such that files of different types (e.g. images, videos) get routed to different back-end stores. Checksums can then be calculated when assets are added to the repository, and fixity checks can be configured to run against these checksums on a regular basis. Fedora 4 also provides a means to backup the entire repository and restore everything, including rebuilding an external search index and/or triplestore, in case of a problem. For extremely large repositories (e.g. multi-petabyte datastores) a full repository backup and restore may take a very long time to execute, so Fedora provides object and tree-level import and export capabilities to selectively restore portions of the repository.
The other key component of a good preservation system is long-term access. While asset durability is crucial, a perfectly preserved file in a no-longer-supported format is of little use to anyone. Long-term access can also be supported by adopting linked data practices for assets in the repository. These practices ensure both long-term interoperability with other linked data platforms, and accurate, up-to-date asset descriptions.
Fedora 4 adheres to the recommendations of the Linked Data Platform 1.0 (LDP). LDP defines the expectations for accessing, updating, creating, and deleting resources from servers, such as Fedora, that expose resources as linked data. Core to Linked Data interactions is the assumption that Resource Description Framework (RDF) representations of Fedora resources are available. RDF representations are the default response serialization from Fedora. Additionally, descriptions of repository assets and the relationships between resources within and external to the repository are modeled as RDF triples.