By Sandy Payette, Executive Director, Fedora Commons As Executive Director, I thought it would be appropriate for me to reflect on the major accomplishments of the year in Fedora Commons and provide a mini “annual report” for the community. Indeed, it has been a very exciting time since the startup of our non-profit organization. With a fabulous core team, and a great set of partners and community participants, I am proud to report to the whole community my view of some of the notable developments this year.
DSpace Foundation Partnership
A major development in 2008 is the successful partnership of Fedora Commons and the DSpace Foundation. The Executive Director of the DSpace Foundation, Michele Kimpton, and I have been closely collaborating over the past 6 months and we consider our two organizations to be strategic partners. Already, we have made significant progress in the following areas: research on a business strategies; bringing the Fedora and DSpace developer communities into a collaborative relationship; developing a joint community outreach/education strategy; and initiating projects for interoperability among DSpace, Fedora, Microsoft, EPrints, and others.
In November, Michele and I received a grant from the Mellon Foundation to support a planning phase for “DuraSpace” our idea for a new joint offering. DuraSpace is conceived of as value-add service layer over multiple storage providers, notably commercial “cloud” storage providers and enterprise oriented “intra-clouds.” The DuraSpace concept is influenced by research that Michele and I have been doing on the emerging trend of commercial cloud storage. It is also motivated by a clear interest within the DSpace and Fedora communities in developing new storage abstraction architectures that permits plug-in of heterogeneous storage systems. With DuraSpace, we imagine a “Chinese Menu” of service options that includes content replication to multiple clouds, content monitoring, content aggregation, and more. We will explore requirements that are important to organizations interesting in leveraging cloud and storage virtualization services, but also requiring additional capabilities related to preservation, archiving, integrity, trust, and content sharing/exposure. The new Mellon grant will fund meetings, focus groups, and prototyping to enable us to validate the market need and the business/technical assumptions behind the DuraSpace proposition.
2008: Open Source Software Projects
The Fedora user community has grown in 2008, which is evidence of a healthy open source project. We have seen over 26,000 downloads of the software so far this year, and we continue to work to identify our users and have them come forward and participate in the community. In 2008, the total number of registered Fedora installations in our community registry went from 98 to 153. We encourage you to register your institution with some information about your use of Fedora in our community registry at: https://fedora-commons.org/confluence/display/FCCommReg/Fedora+Commons+Community+Registry.
In 2008 a major new release of Fedora was unveiled - Fedora 3.0. As a mature open source project, Fedora has shown itself to be robust and reliable. The new release provides a streamlined Fedora and positions the code base to evolve more easily into the future. Important aspects of this release include a new lightweight REST API that makes is much easier for developers to build applications on top of the Fedora repository. Also, there are a number of new features that make the repository more easily integrated into lightweight Web approaches to interoperability. These include the ability to ingest/export content using the Atom Syndication Format. This is key to Fedora’s growth since there is a general movement in the open repositories community to make digital repositories more interoperable and more easily fit into Web architecture and emerging cyberinfrastructure. In Fedora 3.0, we re-architected Fedora’s “disseminator” architecture to be more model-driven. In this new approach, known as the Content Model Architecture, repository managers can register models for different types of objects (articles, books, datasets), and Fedora provides a powerful way to plug-in services that operate on those model types. Fedora 3.0 has gone through significant performance and scalability testing in collaboration with our partners at FIZ Karlsruhe in Germany. The results of the tests are very favorable and data is reported currently for a repository 14 millions digital objects (see: http://fedora.fiz-karlsruhe.de/docs/Wiki.jsp?page=Main ).
In 2009, our major goals will be to continue to make the repository easier to use, to improve fit with Web and cyberinfrastructure, to provide new end-user applications and tools that integrate with the repository, and to move towards the new storage abstraction architecture being developed in the Akubra Project. We will work with partners (especially DSpace and Microsoft) to improve repository interoperability using well-accepted lightweight protocols and formats, including Atom Publishing Protocol, SWORD, and ORE. We will focus our attention on requirements generated from solutions communities in Data Curation, Open Access, Preservation and Archiving, and E-Research.
The Akubra Project was created by the Fedora and Topaz teams to create a shared storage abstraction layer that can underpin a variety of systems, including Fedora and Topaz. There is great interest in storage abstraction in the digital repository community and the enterprise systems communities. The Akubra team has done a full analysis of all similar efforts and has designed a new interface that is a valuable contribution since it is generic and simple. Other similar efforts are either too tied to the needs of particular systems, or very complex (e.g. XAM). The Akubra project is elegantly hitting a sweet spot. The architecture provides the storage interface, and a mechanism for extending that interface. The intent is that Akubra provides a plug-in architecture where different adapters can be written to connect to heterogeneous storage systems. Initial plug-ins that will be provided with Akubra are a standard file system, a transactional file system, Sun advanced storage, and Amazon S3 cloud storage. The team expects to release the first version of Akubra by the end of 2008. This project has already generated significant interest in the Fedora and DSpace communities, as well as our commercial partners (Sun and Microsoft).
The Mulgara semantic triplestore has made significant advances in 2008 since Fedora Commons hired Paul Gearon, lead architect and developer for Mulgara. Mulgara is a full-featured, open source triplestore that is integrated with the Fedora repository to provide semantic capabilities (one of the unique features of Fedora). Mulgara is also a core component of the Topaz semantic platform used to underpin the PLoS ONE publishing application. In 2008, Mulgara underwent significant re-engineering to improve performance and assure reliability in production installations. Mulgara also has become standards compliant and it now supports the W3C standard for querying – SPARQL. Mulgara is proving to be highly scalable which is a major goal of all semantic triplestore products and it is gaining significant recognition in the Semantic Web community. We believe it is positioned to become one of premier open source semantic triplestores, especially with work in 2009 to improve scalability and performance even more with the new Mulgara XA2 architecture and distributed query capabilities.
Building Solution Communities
Thorny Staples, Directory of Community Strategy and Outreach, has pursued a new approach for bottom-up community building aimed at encouraging communities to form around solution areas that are important to Fedora Commons. Initially, the goal is to enable the communities to develop and share information. The longer-term goal is to enable communities in collaborative efforts to develop software “solution bundles” based on the Fedora Commons open source software coupled with other software applications or components.
As “community organizer,” Thorny Staples, we are fostering communities in 5 areas, with a community vision leader for each:
- Data Curation, Sayeed Choudhury, Johns Hopkins University
- Open Access Publishing, Rich Cave, PLoS
- Preservation and Archiving, Ron Jantz, Rutgers
- The Scholar’s Repository, University of Virginia, Stanford, and University of Hull
- Solutions Integration, Matt Zumwalt, MediaShelf
Thorny has been using a methodology for community building that is based in theories of social emergence. Details about our strategy in community building and our methodology can be found at:
In this issue of Hatcheck, you can read more about Thorny’s work on the emergence of the Scholar’s Repository community. Also, check out the article on the new DuraSpace grant. Plus, this issue provides a very nice lineup of short articles about community-based work focused on integrating the Fedora repository with user applications and services.