Fedora4 Team Completes Sixth Beta Phase Sprint: Authorization, Clustering and Large Files

Mon, 2013-11-11 11:20 -- carol

Winchester, MA  The Fedora team has completed the sixth sprint within the "Beta Phase" of Fedora4 development. The work of this and each sprint is planned and completed thanks to the contributions of Fedora stakeholder institutions which allocate developer time. If you would like to be involved with Fedora4 development, please send an email to Andrew Woods awoods@duraspace.org, or the Fedora Steering Committee ff-steering@googlegroups.com. If you have comments on the work from this sprint, please also send an email or comment directly on the wiki.

Read the the Sprint B6 summary:

https://wiki.duraspace.org/display/FF/Sprint+B6+Summary

About Sprint 6, Beta Phase

Development Team

Greg Jansen - University of North Carolina, Chapel Hill

Osman Din - Yale University

Eric James - Yale University

frank asseg - FIZ Karlsruhe

A. Soroka - University of Virginia

Chris Beer - Stanford University

Scott Prater - University of Wisconsin

Sprint themes

1) Authorization

The initial pattern for Fedora 4 Authorization is that a given user request will have already been Authenticated before entering the Fedora 4 application. Authenticated user requests are expected to contain an identity and zero or more additional attributes, such as groups. These combined user attributes (in addition to other attributes which may be mapped to the requesting user) along with the requesting action are compared against configurable rules to determine if the user has the privilege to perform the action on the resource.

Administrators can apply rules to arbitrary resources (e.g. object, datastream, or node hierarchy) within the repository using the restricted Access Roles REST API [1]. Once the access rules have been defined on repository resources, the Basic Roles-based Policy Enforcement Point [2] (if enabled) will restrict requests as described above and in further detail on the wiki [3].

This feature is available and ready for community Acceptance Testing [4].

2) Clustering

Continuing with the theme of benchmarking and optimizing performance with Fedora 4 configured in a cluster, this sprint established clustering resources within the following institutions:

- FIZ Karlsruhe
- Yale University
- University of North Carolina, Chapel Hill
- University of Wisconsin
- Amazon Web Services

Also produced during this sprint was tooling for the consistent benchmarking of different clusters, documentation of cluster configuration, and some preliminary benchmarking results. The work of profiling and analyzing cluster performance remains for a future sprint.

Details of the cluster resources and benchmarking results can be found on the wiki [5].

3) Large Files

The native "projection" or "federation" capability offered by Fedora 4's underlying JCR implementation (ModeShape [6]) allows for content on the filesystem, in a database, web-accessible, etc., to be connected to and exposed through the repository. The results of testing this capability over multi-gigabyte files in Sprints B2 [7] and B4 [8] showed performance bottlenecks.

One of the advantages of leveraging the opensource ModeShape under Fedora 4 is that we are able to push improvements upstream to that project. Modifications and enhancements to ModeShape's FileSystemConnecter [9] from the Fedora 4 team were incorporated into ModeShape 3.6.0.

While significant performance improvements have been realized by these updates, further testing and description of recommended patterns-of-practice are outstanding.

Additional details of the "large files" approach and documented performance improvements can be found on the wiki [10].

4) Repository Refinements

Several miscellaneous, yet important, improvements where made during this sprint to enhance the deployment and operations of a Fedora 4 repository.

- REST API [11] was expanded to include endpoints for: 

Copying and/or Moving resources or trees of resources within the repository;
Retrieving and Creating repository node types (i.e. Content Models)

- Default locations are now provided to enable zero-required configuration to deploy the Fedora 4 web-application to a servlet container

Additionally, the codebase has been systematically refactored to allow the iteration over repository resources. So far, this capability is exposed through the REST API for a subset of repository services.

References

[1] https://wiki.duraspace.org/display/FF/Access+Roles+Module [2] https://wiki.duraspace.org/display/FF/Basic+Role-based+PEP [3] https://wiki.duraspace.org/display/FF/Design+Guide+-+Policy+Enforcement+... [4] https://wiki.duraspace.org/display/FF/Acceptance+Testing [5] https://wiki.duraspace.org/display/FF/Test+Clusters [6] https://docs.jboss.org/author/display/MODE/Federation [7] https://wiki.duraspace.org/display/FF/Sprint+B2+Summary [8] https://wiki.duraspace.org/display/FF/Sprint+B4+Summary [9] https://docs.jboss.org/author/display/MODE/File+system+connector [10] https://wiki.duraspace.org/display/FF/Design+-+Large+Files#Design-LargeF... [11] https://wiki.duraspace.org/display/FF/REST+API">https://wiki.duraspace.o... https://wiki.duraspace.org/display/FF/Basic+Role-based+PEP
[3] https://wiki.duraspace.org/display/FF/Design+Guide+-+Policy+Enforcement+Points
[4] https://wiki.duraspace.org/display/FF/Acceptance+Testing
[5] https://wiki.duraspace.org/display/FF/Test+Clusters
[6] https://docs.jboss.org/author/display/MODE/Federation

[7] https://wiki.duraspace.org/display/FF/Sprint+B2+Summary

[8] https://wiki.duraspace.org/display/FF/Sprint+B4+Summary
[9] https://docs.jboss.org/author/display/MODE/File+system+connector
[10] https://wiki.duraspace.org/display/FF/Design+-+Large+Files#Design-LargeFiles-Resolution1-ModeshapeFileSystemConnectoroptiontocomputehashfrompathratherthanfullcontent
[11] https://wiki.duraspace.org/display/FF/REST+API

RSS Feeds: 
preserve