Fedora Futures Completes Second Beta Phase Sprint: Triplestore, Large Files, Policy-driven Storage

Tue, 2013-08-27 11:30 -- carol

Winchester, MA  The Fedora Futures team has completed the second sprint within the "Beta Phase" of Fedora4 development. The work of this and each sprint is planned and completed thanks to the contributions of Fedora stakeholder institutions which allocate developer time. If you would like to be involved with Fedora4 development, please send an email to Andrew Woods awoods@duraspace.org, or the Fedora Steering Committee ff-steering@googlegroups.com. If you have comments on the work from this sprint, please also send an email or comment directly on the wiki. Contributing your institution's descriptive use case(s) to the wiki [1] is particularly valuable in informing ongoing Fedora development.

Read the Sprint B2 summary:

https://wiki.duraspace.org/x/Y_EQAg

About Sprint 2, Beta Phase

Development Team 

Osman Din - Yale University
Yuqing Jiang - University of Prince Edwards Island
Esmé Cowles - University of California, San Diego
Nigel Banks - Discovery Garden

Sprint Themes

1) Triplestore

As a step towards enabling SPARQL-Query over Fedora4 resources, this sprint formalized the Fedora4 RDF ontology and added an initial framework for integrating an external triplestore.

The Fedora4 ontologies have been split and published into the following three namespaces:

• Predicates for internal repository usage (http://fedora.info/definitions/v4/repository#)
• Predicates for use within the REST-API (http://fedora.info/definitions/v4/rest-api#)
• Predicates for inter-object relationships (http://fedora.info/definitions/v4/rels-ext#)

Note: Community review and feedback would be greatly appreciated in regards to the ontologies listed above.

A new external component has been created that listens for content creation/modification/deletion events from the repository and then submits the updated objects to an external triplestore that supports SPARQL-Query.

There is currently support for the Fuseki [2] and Sesame [3] SPARQL servers.

2) Large Files

One of the short-comings of Fedora3 has been in the management of multi-gigabyte datastreams. This sprint focused on leveraging the underlying repository capability of creating a projection [4] over a local file-space which contains the large files. The experiments revealed that the default filesystem projection experiences performance bottlenecks when the content is in the multi-gigabyte range. More information about the large file experiments and follow-on work can be found on the wiki [5].

3) Policy-driven Storage

A goal of Fedora4 is to ease the deployment and configuration of the application. One way of accomplishing that goal is to allow Fedora4 to be deployed as a vanilla application, then provide the option of configuring various repository features via HTTP calls at runtime. Along those lines, this sprint incorporated the Policy-driven Storage prototyping work completed in Sprint B1 into the Fedora codebase. It is now possible at runtime to define or undefine a policy within the repository to route content at the time of ingest into one of multiple backend content stores.

The initial implementation supports routing based on content mime-type 

More information about Fedora's Policy-driven Storage can be found on the wiki [6].

4) Islandora Integration

This sprint included on-going development towards integrating Islandora with the Fedora4 REST-API.

References

[1] https://wiki.duraspace.org/display/FF/Use+Cases

[2] http://jena.apache.org/documentation/serving_data

[3] http://www.openrdf.org/documentation.jsp

[4] https://docs.jboss.org/author/display/MODE/Federation

[5] https://wiki.duraspace.org/display/FF/Design+-+Large+Files

[6] https://wiki.duraspace.org/display/FF/Design+-+Policy+Driven+Storage

RSS Feeds: 
preserve