DSpace on Fedora Discussion at OR11
Submitted by on Mon, 2011-07-11 11:24
Winchester, MA DuraSpace recommended that the DSpace and Fedora communities investigate strategies to allow DSpace software to run “on top of” the Fedora platform at the Open Repositories 2010 (OR10) Conference. The suggested goal of the initiative remains the same–to retain an out-of-the-box DSpace experience while also enabling extra features that Fedora provides including versioning, relationships between objects, flexible architecture.
At OR11 last month a panel discussion entitled “Exploring the Value of DSpace with Fedora Inside” was led by Mark Diggory, @mire, to identify the value of various approaches towards making this happen. Panelists from the DSpace and Fedora communities were on hand to lead a discussion about the effects of such an integration. Panelists included Richard Rodgers, Head, Software Development at MIT Libraries; Brad McLean, CTO, DuraSpace; Mark Leggott, University Librarian, University of Prince Edward Island and CEO, DisoveryGarden; Matt Zumwalt, Media Shelf and the Hydra Project; and Ryan Scherle, a researcher and programmer from Duke University. The session included a look back into the history of DSpace on Fedora, a review of survey results, along with questions and answers from the audience, many of whom who were involved in developing and working with DSpace and Fedora.
Each speaker offered introductory remarks. Mark Leggott understands the back end and the authorization features in Fedora. He suggested that integrating Fedora would give DSpace users more deployment options. Brad McLean has been advocating for the gradual alignment of repository platforms as CTO for DuraSpace. Richard Rodgers is a lead architect of DSpace@MIT and is working towards making it a universal platform at MIT. Ryan Scherle has helped to develop the Dryad Digital Repository at Indiana University with DSpace. Matt Zumwalt is founder of Media Shelf and tech lead for the Hydra Project based on Fedora. Mark Diggory is a long-time participant in the DSpace community and now works with @mire, a DuraSpace Registered Service Provider.
Diggory explained the four-year history (2007-2011) of community exploration into ideas around integrating DSpace and Fedora. A quick overview of community answers to questions from a recent poll indicated:
• A majority of community members think that DSpace should be modified to have Fedora inside.
• A large percentage of the DSpace community has made both slight and extensive modifications to the code base.
• Opinion was split regarding whether or not users would adopt DSpace as a front end if they were already Fedora users.
• Users want to see Fedora as more than a dark store for DSpace. They would like to see services to be exposed.
• A majority of respondents would use Fedora if it were part of DSpace
Panel members followed up with their views on moving a DSpace on Fedora initiative forward in statements and in discussions with the audience. McLean believes that "it’s about the journey not the destination." He sees several interesting outcomes that have emerged since last year including METS AIP round-tripping work and steps towards implementing the DSpace content model in Fedora. Other panelists had specific views on moving forward. “We should see this as a useful opportunity to explore what repositories do and how they do it,” said Richard Rodgers. He believes that software systems will always be changing. “It’s the data that’s important,” he reminded the audience. He feels that the community and the resources are available to move the DSpace on Fedora initiative forward at this time.
Q and A
Q: “Will we need to be experts in unfamiliar platforms?”
Ryan Scherle answered by explaining his experience in transitioning from Fedora to DSpace. He did not make changes to the Fedora code but explained that lots of changes were required deep in DSpace code.
Q: “Won’t DSpace be even more complex if you add Fedora?”
Matt Zumwalt stressed that there is an opportunity to clean up past work as you make a big change. The transition could be an opportunity to tidy up DSpace and make it simpler.
“The DSpace content model is not complex. It is being asked to do things that it was not designed for. Atomistic content models are now far more complex than DSpace,” suggested Mark Leggott.
Q: “I am nervous about changes to DSpace and associated costs. Fedora does not have to change.”
“DSpace developers have multiple ideas about this,” answered McLean. “Will there be changes necessary in Fedora? Yes. Underlying component API designs will be discovered.”
Fit any workflow on top
Q: “At what level do we align DSpace and Fedora? Do we need to have a transport format for that?”
“We have a number of interfaces in the Fedora ecosystem,” replied Leggott, “We have to start by building something with easy backward migration, but eventually we will have to use other tools within that content model.”
Q: “How do you go about integrating a new API in Fedora?”
Zumwalt answered, “Hydra was built w Ruby on Rails. We built active Fedora first. Figuring out what is the information at play that you are trying to communication through an API is key. For example, right now we don’t have a format to put Hydra on top of microservices. We are focusing on an adaptor within the API in relatively generic ways. Bypass interoperability and go for an abstraction layer.”
“Islandora hooks Drupal in---Fedora collections and drupal nodes are synchronized and provide full reflections of the assets in Fedora, in Drupal. All the metadata, policies and data streams are captured for long-term storage. Drupal flexibility in the UI makes it easy to change the user experience. Word Press and Life Ray are also being explored as additional UIs that can be added on top of Fedora. It takes about 3 months to plug in another UI on top of Fedora,” added Leggott.
IR history and focus is important
Q: “How do you all feel about how DSpace fits into the array of options already built on top of Fedora?”
Rodgers suggested, “It provides the Fedora community with a rich and well-tested institutional repository content model. The embedded workflows and submission pathways are well-known.”
Scherle added, “DSpace was designed as an institutional repository and nothing else. Problems arise when you try to work outside of that vision. The DSpace front-end makes sense for people doing IR work.”
“Workflow in Fedora is up for grabs. There have not been institutional groups working on collaborative Fedora workflow. DSpace brings that history,” said Zumwalt.
“DSpace with Fedora inside is repackaging with a slightly different set of components,” added McLean, “The collection of components that makes up DSpace will have clearer boundaries. Manakin is an example of a component that may get pulled out into separate modules, but it will still exist as part of a package.”
Leggott offered a further opinion, “ One of the main things that DSpace brings to Fedora is the workflow. I’ve never been a big fan of the DSpace workflow. That staunch requirement to fill out a lot of metadata is a mistake. Scientists want us to ingest large stores of images with no metadata. There is an equal amount of information that DSpace can learn from Fedora in non-IR situations.”
Scherle: “You are right. The requirements that are native to DSpace are tough when you get into other kinds of submission workflows.”
Unlock your data
Q: “I have long been a supporter of DSpace on Fedora. My motivation is that DSpace on Fedora would give you stable APIs to unlock your data. Am I understanding this correctly?”
McLean: “Yes. That is a dream end point to get to. We must also meet incremental needs for DSpace users today.”
“Unlocking the data at all levels is a total win for DSpace users--working from the file system is a great advantage. Your data is really still there--data is preserved in Fedora. The DSpace database is completely opaque,” suggested Scherle.
Looking to the future Leggott said, “ The DSpace+Hydra+Islandora out-of-the box experience would provide many options. If you are an academic institution and want an institutional repository solution, then you could choose one of them and put them on top of Fedora. You could have a whole bunch of out-of-the box experiences.”
Q: “There are many lower level systems in Fedora. Should we map DSpace only to Fedora or should we be looking at a pluggable storage layer that you could add DSpace to?”
“You would set up an abstracted API that expresses what you want to express. Then have something generic. Don’t kill yourself making it perfect,” answered Zumwalt.
Richard Rodgers sees, “ A Hi-fidelity data model for DSpace and Fedora with subsequent opportunities for future expressions, but step one would be to model it fairly closely.”
McLean feels that Fedora will want a model that more as metadata and DSpace will want to see it has always been.
Zumwalt closed by reminding the audience that with Hydra you can have many ideas for how content is expressed and disseminated and characterized.
Work is continuing on this initiative. Next phases will involve Fedora data modeling experts, and institutions who can run both systems to pilot data exchange between them. Interested parties are encouraged to make contact and express interest in participating via the DSpace or Fedora project mailing lists: firstname.lastname@example.org; email@example.com.