Fedora and Digital Preservation

Fedora is a community-maintained, open-source repository system that supports durable access to digital objects. In addition to its functionality, Fedora has the following characteristics that touch upon its application in the long-term preservation context.

  • Community-led: Fedora is a product by and of the global cultural heritage community; libraries, archives, and museums around the world use Fedora for diverse purposes with disparate data types, and they collectively drive, develop, and maintain Fedora to meet their information management needs.
  • Open-source: Fedora is made available under the widely used open-source Apache 2.0 license. Open-source code is more visible to more people, and is openly governed; there is no single organization upon which Fedora relies for its long-term vibrancy or survival.
  • Standards-based: Fedora leverages existing, widely used standards to ensure long-term sustainability, preferring standards developed by the wider Web world to bespoke or niche ones.
  • Interoperable: Fedora allows digital objects to conform to common data models and provides access via RESTful APIs to support interoperability and long-term viability.

This document outlines the functionality provided by Fedora that supports digital preservation practice. Coupled with the above characteristics, it is clear why Fedora is often a key component of a long-term digital preservation strategy.

Fedora provides the following features in support of preservation

  • Persistence1
    • Deposited files are stored on the filesystem in a predictable location based on their checksums, which allows operating system-level access to files independent of Fedora
    • Metadata, including structural metadata, describing objects are stored in Fedora’s database and can be exported on demand or at runtime (see below)
  • Fixity2
    • A range of digest algorithms (MD5, SHA-256, etc.) can be configured for calculating, storing, and recalculating checksums
      • A default algorithm can be configured
      • An algorithm can be specified on resource creation
      • An algorithm can be changed after resource creation
    • A SHA-1 checksum is calculated and stored on file ingest
    • If a SHA-1 checksum is provided with an ingest request, it will be compared with the calculated value on ingest. A mismatch will cancel the ingest
    • The SHA-1 checksum can be recalculated and compared with the stored value at any time
  • Audit
    • Preservation metadata, modeled using the RDF-based PREMIS and PROV-O ontologies, may optionally be generated by repository events and stored in the repository; these are also accessible via external triplestores for enhanced querying.
  • Versioning
    • Versions can be created for both objects and files
    • Versions are created on request
    • Previous versions can be restored
  • Import/export
    • Fedora imports and exports resources in standardized RDF serializations. This functionality is aimed at supporting the following objectives:
      • Integration with preservation systems external to Fedora
      • Avoidance of “platform lock-in”
      • Re-import of exported data to support disaster recovery

Upcoming Fedora features to enhance support for preservation

  • Fixity Enhancement
    • A checksum for an object can be calculated, stored, and recalculated at any time
  • Versioning Enhancement3
    • Individual versions of resources can be exported and imported

Starting with Fedora 4.7.0, the persistence stack below Fedora’s community code will consist of ModeShape writing files to disk based on their checksum and metadata being written to a user-defined database. PostgreSQL and MySQL have been verified.

2 “Fixity: Property that a Digital Object has not been changed between two points in time”. Premis Editorial Committee. Premis Data Dictionary Version 2.0, March 2008, p.212. http://www.loc.gov/standards/premis/v2/premis-2-0.pdf  [accessed 26Oct2016]

3 There are potentially many options of versioning functionality that could be exposed to the user. Each variation introduces complexity and long-term maintenance burden. One of Fedora’s development goals is to encourage a simple and sustainable code base. A fundamental decision point in this regard to versioning was: Should Fedora offer on-demand versioning or auto-versioning? It was eventually agreed that offering on-demand versioning allowed users to choose whether they wanted versions on every update or at specific points in a resource’s lifecycle.