"Lucky 13" Sprint To Fedora4 Beta: Large Files, Clustering, Triplestore, Authorization, Content Modeling

Thu, 2014-03-27 13:43 -- carol

Winchester, MA  The Fedora team has concluded the thirteenth sprint within the "Beta Phase" of Fedora4 development. The work of this and each sprint is planned and completed thanks to the contributions of Fedora stakeholder institutions which allocate developer time. If you would like to be involved with Fedora4 development, please send an email to Andrew Woods awoods@duraspace.org, or to fedora-community@googlegroups.com. If you have comments on the work from this sprint, please also send an email or comment directly on the wiki.

Read the the Sprint B13 summary:
https://wiki.duraspace.org/display/FF/Sprint+B13+Summary

About Sprint 13, Beta Phase

Development Team

Adam Soroka - University of Virginia
Eric James - Yale University
Esme Cowles - University of California, San Diego

Sprint Themes

1) Large files

In a continued effort to support a variety of use cases around large files, this sprint made a number of enhancements and discoveries related to the filesystem "federation" or "projection" feature [1].

Filesystem projection has the following characteristics:
- Fedora objects and datastreams in the projection can be added/removed/modified via the REST API
- Fedora objects and datastreams in the projection can be copied both to and from the internally managed repository
- Fedora objects and datastreams in the projection can be added/removed from the underlying filesystem and accurately reflected in calls to the REST API

Limited performance tests [2] uploading and retrieving content to/from a projection via the REST API indicate a 2x slowdown on writes, but uncompromised read performance. For use cases that prioritize transparent persistence over write performance, this is an attractive option. Files within a projection consist of directories as objects along with an associated JSON file of the object properties and nested objects and binaries with their accompanying JSON property files.

One capability that has not yet been extended to projection is that of authorization [3].

2) Clustering

With the goal of having a consistent and scriptable [4] process for deploying a Fedora cluster [5] at various institutions, this sprint has moved the project one step further down the road by adding another institution with a four-node cluster to that list of successful deployments.

3) Triplestore

In combing through the use cases under the triplestore [6] umbrella, the one major class of scenario that was unmet until this sprint was the one relying on legacy Mulgara iTQL functionality; specifically, the ability to recursively "walk" the children of a resource. The SPARQL query recipes [7] have been updated, detailing exactly how this is now achieved in Fedora 4.

4) Authorization

An additional example implementation of an extension point to the authorization framework was included in this sprint.

As mentioned before, Fedora 4's authorization pattern expects user requests to be pre-authenticated. Along with the user principal contained in the request, additional principal attributes which detail such things as a user's groups can also be included. A "principal provider" plug-in was created in this sprint that extracts such attributes from a configurable request header, and provides those attributes/principals to the authorization enforcement module.

Documentation on how to plug the "header principal provider" into the authorization framework can be found in the wiki [8].

5) Content Modeling
A special topics meeting on content modeling was convened this sprint.

Much of the meeting was dedicated to "going around the table" and hearing from each person's perspective what is important with respect to content modeling, and where Fedora 4 currently stands in relation to those priorities. The concept of a nominal type system along with further model definition via "mixins" resonated with most present. The good news is that Fedora 4 already supports these use cases. In terms of a proposed validation approach [9], it was clear that following discussions are needed to more clearly define the requirements to which such an extension will be built.

The full minutes can be found on the wiki [10].

6) Housekeeping
Several bugs were addressed during this sprint. Bug fixes and application polishing included:
- Enabling transactions to include property updates - community contribution (Kai Sternad)
- Migrating from use of Java Reflection to Generics
- Removing dead code

References

[1]  https://wiki.duraspace.org/display/FF/Federation
[2]  https://wiki.duraspace.org/display/FF/Large+File+Ingest+and+Retrieval#LargeFileIngestandRetrieval-DirectComparisonofDifferentTransferMethods
[3]  https://wiki.duraspace.org/display/FF/Federation+and+Authorization+ongoing+issues
[4]  https://github.com/futures/puppet-fcrepo
[5]  https://wiki.duraspace.org/display/FF/Deploying+a+Fedora+Cluster
[6]  https://wiki.duraspace.org/label/FF/uc-triplestore
[7]  https://wiki.duraspace.org/display/FF/SPARQL+Recipes#SPARQLRecipes-walk
[8]  https://wiki.duraspace.org/display/FF/How+to+Configure+Servlet+Container+Authentication#HowtoConfigureServletContainerAuthentication-Configureyourrepo.xmlfile
[9]  https://wiki.duraspace.org/display/FF/Design+-+New+Model+Validation+Workflow
[10] https://wiki.duraspace.org/display/FF/2014-03-20+-+Special+Topic+-+Content+Model+Validation

 

RSS Feeds: