A NASA Research Framework for Earth Science Data in Digital Object Repository Architecture with Islandora
With thanks to Jerry Pan, ESDORA Principle Investigator, and Joe Velaidum, Vice-President Discovery Garden for assistance in preparing this blog post.
Ithaca, NY The prototype ESDORA (Earth Science Data Fedora) site at https://esdora.ornl.gov/content/home-0 is a NASA prototype in its second year. The project is funded to find out how Fedora might be used to manage Earth Science data and metadata, leveraging the digital object model and services of Fedora. Project Principle Investigator Jerry Pan says, "We have found that Fedora has great potentials in science data management, especially when coupled with the Drupal front-end through the Islandora software"
Islandora is an open source framework developed by the University of Prince Edward Island's Robertson Library that uniquely combines Drupal and Fedora Repository open source software to create a robust digital asset management system that can be tailored to meet short and long term requirements for digital data stewardship. Pan worked with collaborators at DiscoveryGarden Inc., a full-featured Islandora/Fedora/Drupal services company and a DuraSpace Registered Service Provider, to develop ESDORA.
There are three high-level objectives in this project:
1. Demonstrate the applicability of a digital object repository technology to science data. Based on our preliminary work, we expect to couple the Fedora Repository and a Drupal-based Graphic User Interface (GUI) as key elements of a next-generation NASA Earth system science data center infrastructure, using datasets collected as part of the NACP Modeling and Synthesis Thematic Data Center (MAST-DC) as the primary science context.
2. Use this implementation to enable better and more consistent access to critical metadata, including processing lineage information and administrative metadata, using the capabilities inherent in a digital repository (multiple streams for a given object and remote data streams). The enhanced metadata access ensures that science digital content becomes more transparent to the end user, with provenance and quality control information readily available.
3. Demonstrate how data providers can more easily and effectively manage science data sets, associated metadata, processing lineage, and quality control/data provenance information. A consistent process, with associated user interfaces, application programming interfaces (APIs) can be used by the data provider to ingest, update, and modify a dataset for metadata changes or additional content dissemination revenues. Particularly in the context of the ORNL DAAC data, this work will demonstrate potential technology migration paths for existing data operations.
The current strategy is to use ESDORA to demonstrate the potential of this repository framework in data management, and at the same time, find ways for common data services (such as bulk downloading or visualization tools) to continue to work for data ingested into the repository. The project aims to use a public API to channel the low-level storage management to these data tools, pending on support and buy-in from the community.
The built-in semantics and the RDF triple store in Fedora is found to be a powerful way to encode relationship among data constituents, especially between data content and all kinds of metadata. As a result, it becomes possible for a data archive to preserve data, and data lineage knowledge (provenance) over a long time.