Migration Story – University of Maryland

Migration Story – University of Maryland

Migration user stories from institutions are some of the best ways to understand how our users are working with Fedora, what issues they’re experiencing and what is working well for them. These stories are also equally as important for fellow community members looking to understand what is possible at their own institutions.

With the release of Fedora 6 in June 2021, we have been actively following along with community members who are testing out the migration tools and documentation and these early migrations have proven that our work has paid off. The tools are working, the documentation is solid and with help from communication channels like Slack and the mailing lists, institutions are taking the leap and making the move out of their old Fedoras into Fedora 6.

But we can’t forget that migration success stories are not just of those moving to Fedora 6. Migrations are costly, resource intensive and take months or years of planning, and any time an institution is able to secure the preservation of their digital content by migrating to a supported version of Fedora, we consider this a success. So today I am excited to share with you a migration success story from the University of Maryland.

Introduce yourself:

Joshua Westgard (JW): My name is Joshua Westgard and I am the Systems Librarian, Digital Programs & Initiatives at the University of Maryland College Park.

Tell us about the repository you migrated:

JW:  We currently have about 4,780 videos and 6,299 sound recordings.

These represent all of our time-based digital media collections, primarily consisting of digitized content from Special Collections and University Archives, Special Collections in Performing Arts, and the former Library Media Services department. Major holdings include content related to campus athletics, student life, broadcasting archives, and faculty and student recitals and performances.  We also have some digitized versions of content in our physical collections on legacy media types, such as VHS tapes.

Most of the content (~ 9,000 items) was migrated from a legacy system built on Fedora 2 plus an external media streaming service. This instance also includes some newly digitized content that was deposited directly into the new system (~ 2,000 items).

What other types of integrations or software did you use?

JW: This is an instance of Avalon Media System, with authentication through our campus’ central authentication service. We have extended Avalon to allow for IP-based access controls to be applied to the individual items because we needed to allow for unauthenticated access by users who are physically present on campus. We also needed to restrict access to certain buildings for collections that are governed by strict donor agreements. IP-based access controls allow us to do both of these things. 

In addition, we added a feature to Avalon that allows administrative users to download a copy of the master media file for each object, so they can fulfill patron requests directly without needing to ask IT to restore the file from preservation storage. 

Finally, we added a feature that allows administrative users to create access tokens that can be shared with researchers who need to access content that is otherwise restricted. These tokens can be configured to allow streaming or download (or both), and can be scheduled to expire.

Tell us about the challenges did you encountered during the migration process:

JW: Probably the biggest challenge was locating original media for certain very old collections. The legacy system predated our current digital preservation workflow and because the Fedora objects used Fedora’s external relationship feature (rels-ext) to point to the streaming server, the files were not available in our legacy Fedora instance to be extracted for migration. Ultimately we did manage to locate most of the preservation files, and the ones we could not locate were manually extracted from our legacy streaming server.

Overall, our challenges really were not related to Fedora but rather to wrangling our data, much of which was very old.

Do you have any recommendations for the Fedora Team on ways we could have helped?

JW: The migration toolkit met our needs. We extended migration-utils with a python script to convert the extracted data into the Avalon batch import CSV format. 

You can see this fork of the migration-utils for our local changes: https://github.com/umd-lib/migration-utils/tree/feature/LIBAVALON-143

What would you like other institutions to know about your experience?

JW: It was very useful for us to think about the migration not as a straight migration from the old system to the new system, but rather as following the  “Extract/Transform/Load” (ETL) paradigm.  Migration-utils provided the extract to disk (binaries plus CSV), and then significant reorganization and cleaning of the data (the “transform” step) was performed by library staff. Our target was the existing Avalon batch import format (this was the “load” step). One of the challenges this presented was that the Avalon CSV import format, due to the way it handles multi-valued fields, meant that we had to be very careful when moving data around in this extracted form.

You can find the University of Maryland’s AV repository here: https://av.lib.umd.edu

A big thank you to Josh Westgard and Kate Dohe for their contributions to both this post and for sharing their stories at the Fedora Summer Open House in July, 2022. If you have any questions about their migration or any of the information provided above, you can feel free to reach out to Josh at westgard@umd.edu.

Thank you again!