Fedora 4 Deep Dive Number One: Support for Research Data

Mon, 2014-05-05 09:38 -- carol


Get Ready for the New Streamlined and Strong Flexible Extensible Durable Object Repository Architecture

Winchester, MA  The redesigned beta version of Fedora 4 will be released at the Open Repositories Conference in June with plans for a full-production model release later in the year. This post is part of a series of articles that will acquaint you with Fedora 4’s significant improvements–made available to the community by Fedora stakeholders working in concert with DuraSpace.

Fedora 4 open source repository software addresses the following top priorities expressed by the international community over the past few years:

• Improved performance and scalability
• More flexible storage options
• Features to accommodate research data management
• Better capabilities for participating in the world of linked open data
• An improved platform for developers—one that is easier to work with and which will attract a larger core of developers.

A full set of significant Fedora 4.0 features can be found on the wiki. Highlights include:

Authorization
Durable storage
Content modeling
Support for large files
Support for linked data
Internal and external search
Transactions
Support for external triplestores
Versioning

How does Fedora 4 support research data?

Advancing knowledge in all fields of research now requires the curation, collection, management, access and long-term preservation of large digital data sets and related materials that go far beyond burying a flat file on a hard drive. Research institutions are planning for how to put digital data policies, workflows and economic models in place to ensure that data will persist to serve researchers and institutions far into the future. Fedora 4 provides a number of key features that support the management and preservation of research data.

Flexible Content Modeling

Research data can take many forms; from spreadsheets and documents, to large images and videos, and beyond. These files often have metadata associated with them, and can support complex relationships among different objects within the repository. Repository managers are required to support this diversity of file formats and relationships in order to properly represent research data sets.

Fedora 4 allows repository managers to support any type of content and model it however they wish. Multiple research data files can be grouped together with a single metadata record, or they can be distributed as separate objects, each with its own metadata. These separate objects can then be associated with any number of other objects within the repository, allowing for maximum flexibility.

Support for Many and Large Files

Research data files can be gigabytes or even terabytes in size. Moreover, there could be millions of files within a single repository, each of which may have its own metadata and relationships with a variety of other files and projects.

Fedora 4 can be scaled up to support many files, each of which can have any number of associations with other files within the repository. Fedora 4 also supports large individual files; tests have been conducted with files up to 1TB in size.

Fedora 4 can also be projected over an existing file system in order to represent the files as objects and datastreams in Fedora without adding them to a repository. These files can be managed and preserved as if they had been ingested, so this is an attractive option for administrators with large external datastores.

Preservation Features

Research data needs to be protected against both system failures and human error. A good backup solution will allow you to recover your data from a point in time, but it cannot ensure that the data itself is not corrupt or otherwise damaged.

Fedora 4 automatically generates checksums for every file in the repository. Fedora 4 includes a fixity service that can be called on demand to compare recalculated checksums with the stored value to make sure no file corruption has occurred. File versioning can also be configured across the repository; when enabled, Fedora will create a new version of a file whenever it is modified. So if someone accidently changes or overwrites a file, Fedora can restore a previous version.