DSpace Futures

Thu, 2013-01-10 11:35 -- carol

Towards the end of 2012, DuraSpace organized a series of phone calls with the DSpace community. Sponsors, DCAT members, committers, and service providers participated in one of five meetings scheduled to survey current community perceptions of DSpace. Altogether approximately 53 people joined in on the calls, representing 32 organizations. Participants responded to questions such as whether DSpace was currently meeting the needs of their organization, in what ways was it being used, what were the biggest challenges encountered, and what changes or enhancements were most requested. This paper presents the findings from those discussions.

Overall, many participants expressed satisfaction with DSpace's abilities to meet the basic institutional repository needs of their organization.  Many institutions had made a long-term investment in the software and were for the most part satisfied with it.  In particular, a number of people were excited about new developments with DSpace.  Recent releases have yielded a dramatic increase in developer contributions and new features described by some as a "DSpace Renaissance".  (One conjecture was that the recent move of the codebase to Github, a popular web-based hosting system for revision control, made it more convenient for developers to access and work on code.)  Several people mentioned that DSpace has been a good fit for new initiatives in support of open access.

Speakers expressed a number of challenges as their organization's needs have grown over the years:  As digital preservation has become a priority, several people wished that DSpace could provide more support in that area.  Similarly, a few participants wanted better support for author identity and profiles.  One area of criticism was DSpace's front-end tools and the limitations of the out-of-the-box user interface.  Additionally, growing needs in the area of research data management led some people to question whether DSpace was "the right tool" for new requirements in this space. Another area where participants wished for improvements was scalability. Larger institutions sometimes experience performance issues related to large amounts of content or frequent access.  Others expressed a wish for more and better statistical reporting in future releases.  Still others agreed that improved metadata support would be highly valued.

A number of common areas of interest were noted.  Several institutions are interested in the Hydra framework as a possible platform for DSpace (From the Hydra website at http://projecthydra.org:  Hydra is an ecosystem of components that lets institutions deploy robust and durable digital repositories (the body) supporting multiple “heads”: fully-featured digital asset management applications and tailored workflows.  Its principle platforms are the Fedora Commons repository software, Solr, Ruby on Rails and Blacklight). Islandora, the repository platform integrating Fedora with a Drupal front end (http://islandora.ca/), came up as a framework to explore, as well.  Some were interested in DuraCloud integration (https://wiki.duraspace.org/x/SzD7AQ) and others in the new DSpaceCloud offering tendered by DuraSpace (http://dspacecloud.org/).  There was notable interest in a variety of application integration scenarios, including DSpace with BibApp (http://bibapp.org/) and DSpace with VIVO (vivoweb.org). In general, the consensus was that DSpace should not try to "do it all."  Instead, the platform should become more modular, supporting plug-ins rather than becoming bloated with new functions out of the box.  To this end, there was considerable interest in a stable REST API for DSpace that would provide a "repository abstraction level" for external tools to integrate with and build from.

Finally, several people expressed the view that the community would strongly benefit from increased collaboration among institutions, rather than individuals acting alone on enhancements that were of shared interest. Along similar lines was an expression of interest in better integration of functional and technology staff. At the conclusion of each session, DuraSpace promised to summarize the overall findings and report back to the community. The discussions revealed a number of potential initiatives, some of which would require substantial development effort. DuraSpace will convene new meetings early in the year with institutions that would like to pursue next steps for meeting the needs expressed during the "DSpace Futures" discussions.  
The following points represent specific comments that amplify the summary of observations:

Many participants expressed satisfaction with DSpace at its current level of development.  Quite a few users in the community at large have not felt a need  to keep up to date with new DSpace releases.  Others would like to "push the envelope" when it comes to DSpace use cases and are interested in seeing more improvements that increase functionality.  Those for whom DSpace does not satisfy a full range of needs are now running multiple applications (e.g.,DSpace and Fedora) and some wish one repository application could satisfy all their use cases.  On the other hand, the thought was expressed that using multiple "best of breed" applications for varied purposes might be an optimal approach--for example, video storage applications optimized for streaming or art images for sophisticated display.

Open Access
While some users felt DSpace to be a good fit for meeting open access mandates, others wished for more granular permissions. Their point of view is that while an open access policy is fine for faculty content, it would also be nice to provide more granular embargo options for materials that need it. (In fact, DSpace 3.0 now offers more advanced embargo options which may meet these needs.) One of the drivers requiring this additional flexibility is the mandate to accommodate copyright demands.

Digital Preservation
While the key DSpace use cases are discovery and open access, many users are now focused on digital preservation.  While DSpace does offer "hooks" for preservation, some think that this is still an underdeveloped area that could use more attention. Generally, they believe that, while much interest is directed towards digital preservation, media, data, and digital collections, DSpace is not always well tailored for this work.  Some institutions do intentionally maintain separate applications for open access and preservation management, with preservation copies kept on a separate server. (It was noted that the methods for backing up DSpace to DuraCloud do now offer preservation options: https://wiki.duraspace.org/x/SzD7AQ.)

User Interface
A common perception is that the DSpace user interface (UI) is still difficult to customize.  In addition, some feel that the UI looks dated, and that a more modern-looking interface would improve usability and would appeal more to researchers and other new users.  Some related observations:

• The awkwardness of the UI tools don't permit agile development
• Branding and minor customizations should be easier to accomplish
• Accommodating use cases for special collections, audio, and visual materials requires easy-to-use tools and user interfaces.  DSpace should be more competitive in its support for new UI development
• DSpace needs improved administrative tools to take the burden off of developers and to provide functions for managing things like multimedia streaming and downloading.

Data Management
The widespread strategic interest in data management is leading many to consider how DSpace can be better positioned to meet the related needs.  Recent additions in version 3.0–for example, contributions for item versioning and and expanded use of Solr's capabilities to present data in new and different ways–have been driven by these issues.  Some concerns:

• Research data management requires that the repository be less a catalog and more a central data store that can be shared, with more active input and output
• DSpace can be a bottleneck for ingest of large amounts of research data.  Some organizations are using DSpace as a "proxy" for where the data is actually h, substituting services like Box for the storage
• Some see the need for a more distributed environment to store large amounts of data.  This suggests a need for multiple background repositories or a "distributed DSpace"
• There is a growing interest in geospatial data and an associated need for associated metadata support in DSpace
• Better journal article metadata is also needed.

Integration and Related Platforms
A number of institutions were interested in exploring possible integration with other software platforms in order to get increased flexibility and ease of development for their DSpace repository.  In particular, some schools are investigating the Hydra framework. Over the past few years, a growing number of institutions have joined a DuraSpace-supported Hydra consortium whose members have been working on local Hydra heads to address a variety of common problems.

Thoughts about a "DSpace Hydra head" include a variety of approaches, one of which might be a newly written Ruby-on-Rails DSpace application layered on top of a Fedora repository.  Such an application, it is thought, could take advantage of Fedora's flexible object model and Hydra's rapid development front end tools to create future Hydra solutions for some of the new use cases mentioned, while minimizing the number of technology platforms deployed.  

Another approach might entail a front-end development effort leveraging all or some of the Hydra front end tools on top of a conventional DSpace back end that would provide a stable REST API.  This second approach was brainstormed as "Hydra on DSpace" rather than "DSpace on Hydra". This kind of effort would provide more flexibility in the front end while maintaining DSpace back-end assets.  

Yet another approach would combine the conventional DSpace user interface with a Fedora back end.  Of course, any of these approaches would require a significant, sustained development project.

Islandora was also mentioned as a possible framework for a future DSpace repository application.  Participants were made aware of an upcoming offering from Discovery Garden, a DuraSpace service provider, that provides an Islandora institutional repository application along with optional migration services for DSpace users.

Additionally, DuraSpace mentioned that we are currently assessing interest in DSpaceCloud, a hosted version of the DSpace repository aimed at organizations that prefer not to maintain a local implementation of DSpace (http://dspacecloud.org/).

Some participants cited other applications used at their institution or of particular interest related to some of the issues mentioned above:

• Luna Imaging for special collections and archival material (http://www.lunaimaging.com)
• ArtStor for digital images (http://www.artstor.org/)
• XTF for audio/video (http://xtf.cdlib.org/)
• Fedora for digital library material, audio/video needs, collections (http://www.fedora-commons.org/)
• Box for large datasets (http://box.com/)
• BibApp for faculty/researcher profiles (http://bibapp.org)
• VIVO for semantic web-compliant research discovery and sharing (http://vivoweb.org)

A couple of participants observed that the DSpace community could profit from increased sharing.  It was evident that topics and issues explored by one school were often of interest to others.  It was clear, as well, that research in these areas was sometimes duplicated, and in some cases independent projects aimed at similar goals were undertaken.  There was no strong consensus as to how to best encourage increased collaboration, but there was agreement that we should collectively address the issue in the new year.  Additionally, a suggestion was made that developer community is as welcoming as possible to new people, both those who want to become more experienced developers and to non-developers who want to communicate directly with them.

The DSpace community is large and represents a wide range of requirements.  A large portion of the community seems satisfied with the ability of DSpace to meet its institutional repository needs.  Many in this group have made a long term investment in the software and remain fairly static in their use of the repository.  Many users do not see a need to upgrade the software to the most current release. Still others may wish to do so but do not have the resources to accomplish this work. Those who do upgrade are appreciative of the recent surge in developer contributions of new features.

Another significant subset of the DSpace community would like to accelerate the rate of development.  Many in this group express common themes and wishes for improvements:

• Increased flexibility in satisfying new use cases, minimizing the need to run multiple software platforms
• Improved metadata support
• More granular permissions, including support for more granular embargo options (Note: Some embargo enhancements were recently released in DSpace 3.0)
• A focus on digital preservation and digital asset management
• More agile development for user interfaces, more administrative user functions, more varied UIs to support new use cases
• Support for research data management
• Improved performance for larger institutions
• Improved statistical reporting
• More integration with other open source software applications
Several in this group of users have expressed interest in more ambitious development efforts:
• A stable REST API for DSpace that would permit more flexible UI development (Note: Some third-party REST APIs are available)
• Exploration of the Hydra platform for possible reworking of the DSpace front- and/or back-ends
• Integration with a number of third party open source applications.

Some of these potential initiatives would require substantial development effort.  

DuraSpace will convene new meetings early in the year with institutions that would like to pursue next steps for meeting the needs expressed during the DSpace Futures discussions.
DuraSpace would like to thank the following organizations for participating in the DSpace Futures conversations:
BioMed Central/Open Repostiories
Cambridge University Library
Cornell University
Enovation Solutions, Ltd
George Mason University
Georgetown University
Harvard University
Indiana University
Indiana University—Purdue University Indianapolis
Kansas State University
Ohio State University
Rice University
University of Tennessee
University of Illinois
University of Missouri
University of Aukland
University of Cambridge
University of Delaware
University of Edinburgh
University of Edinburgh
University of Guelph
University of Illinois
University of Michigan
University of Missouri
University of Montreal
University of North Carolina at Chapel Hill
University of Toronto
Virginia Tech
York University