Fedora 4 Upgration Pilot Update

Mon, 2015-04-20 12:12 -- carol

From David Wilcox, Fedora project manager

Winchester, MA  A pilot program to support upgrading to Fedora 4 and migrating content from Fedora 3 to 4 is in progress. This program began in January and ends at the end of April. The term "Upgration" refers to both an upgrade to Fedora 4 and a migration from Fedora 3. Five institutions, representing Islandora, Hydra, and custom front-ends, are running pilot projects:

- Columbia University
- National Library of Wales
- Simon Fraser University
- University of New South Wales
- York University
 
A summary of each project is available on the wiki: https://wiki.duraspace.org/display/FF/Fedora+3+to+4+Upgration+Pilots.
 
These pilot projects are leading the way by participating in data modelling discussions, helping to build and test migration tools, and migrating one or more collections from Fedora 3 to Fedora 4. This brief report provides an update on each of the pilot projects. This update is also available on the wiki: https://wiki.duraspace.org/display/FF/2015-04-20+Upgration+Pilot+Update.
 
Columbia University
 
The fundamental questions behind the March Academic Commons/Fedora 4 pilot were:
 
1. Could the AC Fedora 3 data be modeled in Fedora 4?
2. Could Sufia serve as a platform for AC 3?
3. How much existing migration tooling could be brought to bear on this effort?
 
Modeling the AC 2 data in Fedora 4
 
There is a shift from the Fedora 3 approach of storing descriptions primarily as attached files to principally using RDF properties in Fedora 4, but we were able to reproduce an index driven by attached MODS descriptions in Fedora 4. There is also a change in the way identifiers work that makes it difficult to preserve Fedora 3 identifiers' format, but we were able to do so at the user-facing level of the application. Finally, there was a long-standing desire to bring the AC 2 data into alignment with the rest of the DPTS Fedora content in terms of content models- this was complicated by the emerging PCDM specification from Islandora/Hydra. The in-progress nature of PCDM makes it difficult to resolve this issue with much finality, but we made progress in translating AC2 to the working drafts of the PCDM recommendations.
 
Sufia as a Platform for Fedora 3
 
Most of the search functionality in AC 2 is in keyword/fulltext search, which we were able to reproduce in the pilot, and in linked browse lists of facet values. While we did not complete the latter, there is nothing apparent in the Sufia platform that would obstruct it. The bigger challenge Sufia presented was in the data model it presumed: It has collections and files, but no intermediate concept that collects, e.g., the files used in support of a conference paper. Fortunately for us in the long run, this concept is required in the PCDM and slated to be built into the next major revision of Sufia (under development in early May 2015). In the short term, we were required to build much of this functionality into a fork of Sufia to support the pilot. We were able to do so, and some of that work should find its way into the project during the work in May.
 
Migration with Existing Tools
 
Penn State developed a Hydra-based tool to support their own migration pilot to Fedora 4. We were able to use this tool as the base for our efforts, but it required significant modifications both to address the PCDM concepts addressed above and to navigate the idiosyncrasies of our early Fedora 3 modeling. Future migrations of DPTS material would require further work here, as we use more Fedora 3 features than does Penn State.
 
Deliverables
 
- An outline of feature and data requirements for AC3
- A desktop pilot [1], built to migrate a small subset of data into a local Fedora 4
- A fork of Sufia [2] with preliminary implementations of PCDM Works and Files
- A fork of the fedora-migrate tool [3] that allows lookup of description on other Fedora objects
 
Outstanding Work
 
- Coordinate with LITO-SYS on deployed Fedora 4 infrastructure
- Implement browse lists
- Reproduce homepage, visual style and branding of AC in pilot
- Integrate May 2015 Sufia work into pilot
- Migrate data
- Resolve the issue of authorship/ownership for non-uni authors and multiple authors in support of user-management of deposited data
 
National Library of Wales
 
Fedora 4.1 has been installed on a test server and is ready to load the data. We’ve selected the data we are going to test with:
 
- A Newspaper title called Evening Express which is our largest newspaper with 187,331 objects
- A Journal Title called ‘Wales’ which has a sufficient number of rights issues for testing.
 
The upgration pilot page [4] is nearly complete, and a mapping for the Newspapers to PCDM [5] has been created.
 
A meeting will be organized to map born digital objects to PCDM.
 
Simon Fraser University
 
The SFU Upgration Pilot will investigate how Islandora content models currently implemented in Fedora 3 will translate into a Fedora 4 environment, particularly those content models that are not widely used elsewhere (e.g., tabular data using DDI descriptive metadata, OBJ datastreams (which are plain XML) managed by the Islandora Feeds Solution Pack).
 
The Fedora 4 data models will largely depend on the decisions made by the larger Islandora community on how to model Islandora content in Fedora 4. Once these decisions have been made, the non-standard Fedora 3 objet models mentioned above will be mapped to Fedora 4 and the associated objects will be migrated.
 
University of New South Wales
 
The initial documentation phase of the upgration pilot is now complete. This included:
 
- Documenting UNSW Fedora 3 repositories, including data model, metadata, functionality and API usage
- Investigating Fedora 4 design and functionality
 
The following items have also been completed from the investigation and planning phase of the project:
 
- Propose Fedora 4 models for the UNSWorks and ResData repositories
- Propose Fedora 3 to Fedora 4 object and datastream property mappings
- Plan management of legacy date created / modified in Fedora 4 and associated applications
- Propose URL structure for Fedora 4 nodes
 
The following actions are currently underway:
 
- Testing the use of an external triplestore for Fedora 4
- Coding a migration API that will be used to migrate objects from Fedora 3 to Fedora 4. Currently, the development work has focused on testing ingestion to Fedora 4 using the Fedora 4 API. This includes starting the transaction, creating containers, adding binary files, updating properties and committing the transaction.
- Reviewing the Fedora 3 to Fedora 4 object and datastream property mappings based on feedback
- Further investigating mapping the UNSWorks repository according to the Portland Common Data Model 
 
York University
 
As the project entered the third month, work shifted to migration work, and we also a welcomed a new developer to the project, Jared Whiklo (University of Manitoba).
 
The upgration work was focused a couple parts. The first was collaborating with Mike Durbin (University of Virgina) on fcrepo4-labs/migration-utils [6].
 
migration-utils will support an upgration in this order:
 
- traversing the objectStore file system
- archive export format
- migrate export format
 
Nick updated YUDL’s Fedora to 3.8.1-SNAPSHOT, in order to provide alarge set of test fixtures for use with migration-utils.
 
The second part that the migration work focused on was data modeling [7]. Specifically, mapping fcrepo3 object properties to fcrepo4 container properties, fcrepo3 datastream properties to fcrepo4 file properties, mapping RELS-EXT properties, and bringing Islandora into compliance with the Portland Common Data Model [8]. This work was shared with the community via a Large Image Solution Pack example object modeled in Fedora 4.
 
Related to the migration work, was the Audit Service [9] design work. Nick participated in all of the Audit Service design meetings, and led a discussion around PROV and PREMIS ontology usage in the service. On March 30, a code sprint devoted to the Audit Service will begin, and it will be led by Esme Coles. That work is outlined here [10].
 
The migration work will most likely continue through April.
 
References