GIVE IT A TRY: Fedora 4 Alpha 3 with Extensive Performance Benchmarking, Improvements, Documentation

Thu, 2014-01-02 11:58 -- carol

Winchester, MA  The Fedora 4 team is proud to announce the third Alpha release of Fedora 4. In the continuing effort to provide early access to the quickly growing Fedora 4 feature set, this Alpha release is one of several leading up to the feature-complete Fedora 4 Beta release.  The Fedora 4 development team and supporting institutions made a strong commitment and pushed to produce the Fedora 4 Alpha 3 Release–'The Holiday Release'–on Dec. 28.

The list of features, performance benchmarking and improvements, and associated documentation is extensive.

Release Notes: https://wiki.duraspace.org/display/FF/Fedora+4.0+Alpha+3+Release+Notes

The Holiday Release, is a public-facing alpha release with a 'one-click run' download and associated 'Quick Start Guide' to get as much of the Fedora community as possible putting eyes on the current Fedora 4.

Quick Start Guide:

https://wiki.duraspace.org/display/FF/Quick+Start

Give it a try, and Happy New Year.

Complete list of features:

Authorization

The initial pattern for Fedora 4 Authorization is that a given user request will have already been Authenticated before entering the Fedora 4 application. Authenticated user requests are expected to contain an identity and zero or more additional attributes, such as groups. These combined user attributes (in addition to other attributes which may be mapped to the requesting user) along with the requesting action are compared against configurable rules to determine if the user has the privilege to perform the action on the resource.

Administrators can associate "read", "write", and "admin" roles with user principals on repository object and datastream nodes, as well as on hierarchies of nodes using:
 

• The restricted Access Roles REST API [4] or
• The input form [5] of the Fedora 4 HTML UI

Once the access rules have been defined on repository resources, the Basic Roles-based Policy Enforcement Point [6] (if enabled) will restrict requests as described above and in further detail on the wiki [7].

The Fedora 4 authorization feature ensures that:
 

• Restricted child nodes of a requested node are not visible in API responses
• Deletion of a node will recursively delete its children nodes, unless the requesting user does not have sufficient privileges to delete one or more of the children, in which case the entire deletion operation will fail

More details on the design and implementation of the authorization feature can be found on the following wiki page [8] and its sub-pages.

Batch Operations

This release enhances the previous batch operations capability to support a more standardized approach to performing the following actions batched as a single request:
 

• Retrieve multiple binary resources in a single request
• Create multiple resources in a single request
• Modify multiple resources in a single request
• Delete multiple resources in a single request

In addition to batching multiple actions of the same type, create/modify/delete actions can also be mixed in a single request.

Examples and feature documentation can be found on the wiki [9], along with the REST API documentation [10].

Content Modeling

One aspect of Fedora 4 content modeling is the ability to define custom repository node-types including the node's composition (i.e. property types and multiplicity). In addition to the existing "compact node definition" [11] (CND) file, this release adds the ability to define node-types at runtime via the Fedora 4 REST-API [12]. This now allows repository managers to configure repository node-types programmatically after the application has been installed.

An example set of configurations [13] have been constructed that represent an initial set of Fedora 3 content models translated into Fedora 4 node-types.

Large Files

One of the long-standing requirements of Fedora is support for the management and serving of large files. The native "projection" or "federation" capability offered by Fedora 4's underlying JCR implementation (ModeShape [14]) allows for content on the filesystem, in a database, web-accessible, etc., to be connected to and exposed through the repository. The results of testing this capability over multi-gigabyte files showed performance bottlenecks.

One of the advantages of leveraging the opensource ModeShape under Fedora 4 is that we are able to push improvements upstream to that project. Modifications and enhancements to ModeShape's FileSystemConnecter [15] from the Fedora 4 team have been incorporated into ModeShape 3.6.0. The contributed updates to the ModeShape codebase provide the option to either postpone the most time-intensive "federation" action (i.e. unique internal identifier generation based on content checksum) until the content is requested or to use a faster, surrogate internal identifier in the case where performance would otherwise be unacceptable.

See the "Performance Benchmark - Large files" section below for details of the limits and performance of large file support.

Additional details of the "large files" approach can be found on the wiki [16].

Search

Fedora 4 is designed to support two search services:
 

• External search (i.e. standalone Solr populated by repository event listener)
• Administrative search (i.e. advanced legacy field-search)

External Search

External search went through a significant round of refactoring this release in order to address performance issues discovered in the application profiling effort as well as to establish a flexible pattern for transforming resource properties into indexible fields. In a similar pattern employed by the external triplestore feature, external search relies on repository event messages to trigger index updates. These messages have been refactored to contain minimal, essential event and resource information which now eliminates the previous overhead imposed by the eventing machinery of making additional lookups back into the repository.

As for the configurable identification of resources to be indexed and the definition of transformations which the external search component leverages to get a mapping of resource properties to indexible fields, the basic approach is as follows:
 

1. Set the property on a resource that flags it for indexing
2. Optionally, set the property on a resource that references the properties mapping transformation
3. Optionally, create a new resource that contains the actual LDPath [17] transformation referenced in the previous step

More details of the external search feature and its configuration can be found on the wiki [18].

Administrative Search

This release establishes the administrative search service. If a user-facing, full-featured search service is required of your repository, the external search is ideal. However, if a repository administrator-facing search is needed in support of queries over resource properties, then the new administrative search may suffice. Administrative search exposes both a text search over resource properties as well as a SPARQL endpoint over repository subjects.

For more details on the administrative search and its usage, see the wiki [19].

Simplified Deployment

One of the goals of Fedora 4 is to simplify the application deployment as well as the wiring of optional components and their subsequent configuration. Although there is a significant amount of work remaining towards this goal, one early step in this direction is the ability to deploy Fedora 4 by just dropping the web-application archive (WAR) file into a servlet container without the need for any additional configuration. Leveraging this simple deployment capability, this release produced a "One-Click Run" download [20] which literally enables the user to click on the download to start up a local Fedora 4 repository.

A brief introduction to navigating the Fedora 4 web interface is documented on the wiki [21].

Additionally, in support of the devOps users, on-going effort is dedicated to making the deployment and configuration of Fedora 4 as straight-forward and reproducible as possible. In an attempt to eliminate the confusion as to which system properties should be set for configuring Fedora 4 persistence locations, a single system property (fcrepo.home) allows a Fedora 4 installation to specify the base directory under which all other application data will be written.

Details of the deployment and configuration of Fedora 4 is described in the wiki [22].

Storage Durability

The fundamental principles of Fedora have always included a commitment to a non-proprietary, transparent, persistence format. Within the Fedora 4 architecture, there are several available approaches to defining the backend persistence store.

The two backend stores that have primarily been used so far in development are the filesystem and LevelDB [23] implementations. In both cases, Fedora 4 persists the binary content in a tree of directories and the resource properties as binary JSON.

Details of the format of the JSON and nested fields is described in the wiki [24].

Versioning

This release introduces the first implementation of the versioning [25] capability within Fedora 4.

Versions can be created for a specific repository resource via the REST API [26], with the option to associate a label with the version.

Additionally, auto-versioning can be enabled by setting a property on a resource that indicates the activation of auto-versioning [27]. This property can either be set at runtime by the repository user, or more globally as a default property defined in the Fedora 4 node-type definitions [28].

Resource versions are returned via the REST API and HTML interfaces.

Performance

It often goes without saying that Fedora 4 must be performant under a range of use cases and scenarios. A very specific theme of this release was ensuring those assumptions hold true, and in the cases where they do not, surface and address the reasons. The following performance-related topics received attention this release.

Profiling

Profiling was employed as an initial means of inspecting the hotspots within the codebase. In general, it was determined that the greatest sources of slowdown relate to:
 

• Extraneous creation of JCR sessions
• JCR node lookups
• Synchronous internal index updates

Benchmarking

In parallel to the profiling work, significant effort was put towards painting a clear picture of the current performance status of Fedora 4 across a variety of hardware, configurations, and scenarios.

Tests were performed with consistent and documented setups across test servers at the following institutions:
 

• FIZ Karlsruhe
• Stanford University
• University of California, San Diego
• University of North Carolina, Chapel Hill
• University of Wisconsin
• Yale University

Tests are defined by their union of the following four variables [29]:
 

• Platform Profile - the hardware and networking used to conduct the tests
• Repository Profile - the Fedora-specific configuration options
• Setup Profile - the data loaded into the repository as a baseline before testing
• Workflow Profile - the specific tests performed, what tools were used, and what was measured

Of particular interest are the results [30] of ingest/read/update/delete workflows with Fedora 4 single-node installation.

Performance Benchmark - Authorization

An additional set of benchmarks were collected to determine the effect of authorization on performance [31].

As expected, there is a performance penalty with authorization enabled; however, these tests tend to indicate the impact to be less than 10% across the ingest/read/update/delete functions.

Performance Benchmark - Fedora 3 vs. 4

Defining the goals for acceptable performance levels for a repository is an ambiguous task. There are many variables that come into play, and generating test cases that simulate production scenarios is not always effective. That said, one concrete measure of performance is the relative behavior of Fedora 4 in comparison to Fedora 3.

Significant work remains in this comparison [32], but some initial numbers show favorably for Fedora 4's ingest capability.

Performance Benchmark - Large files

In terms of performance related to large files, this release tested the limits and performance of:
 

• Ingest and retrieval via the Fedora 4 REST API
• Retrieval via Fedora 4 filesystem projection

In both cases, content as large as 1-TB was successfully tested with documented [33] throughput.

Documentation

As we move closer to a Beta release of Fedora 4, it is vital that there exist developer and administrator documentation for the application. An initial structuring of this documentation can be found on the wiki [34].

The following sections contain user-facing documentation:
 

• Administrator Guide [35]
• Developers Guide [36]
• Feature Tour [37]
• Features [38]
• Glossary [39]

Acknowledgements

This release is due to the commitment of the Fedora sponsors [40] and the effort of the following Fedora community developers:
 

• Benjamin Armintor - Columbia University
• Nigel Banks - Discovery Garden
• Frank Asseg - FIZ Karlsruhe
• Ye Cao - Max Planck Digital Library
• Chris Beer - Stanford University
• Esme Cowles - University of California, San Diego
• Greg Jansen - University of North Carolina, Chapel Hill
• Michael Durbin - University of Virginia
• Adam Soroka - University of Virginia
• Scott Prater - University of Wisconsin
• Osman Din - Yale University
• Eric James - Yale University
 

References

[1] https://github.com/futures/fcrepo4/releases/tag/fcrepo-4.0.0-alpha-3
[2] https://github.com/futures/fcrepo4/releases/download/fcrepo-4.0.0-alpha-3/fcrepo-webapp-4.0.0-alpha-3-jetty-console.war
[3] https://github.com/futures/fcrepo4/releases/download/fcrepo-4.0.0-alpha-3/fcrepo-webapp-4.0.0-alpha-3.war
[4] https://wiki.duraspace.org/display/FF/Access+Roles+Module[5] https://wiki.duraspace.org/display/FF/Feature+Tour+-+Action+-+Access+Roles
[6] https://wiki.duraspace.org/display/FF/Basic+Role-based+PEP
[7] https://wiki.duraspace.org/display/FF/Design+Guide+-+Policy+Enforcement+Points
[8] https://wiki.duraspace.org/display/FF/Authorization
[9] https://wiki.duraspace.org/display/FF/Batch+Operations
[10] https://wiki.duraspace.org/display/FF/REST+API+-+Batch+Operations
[11] https://docs.jboss.org/author/display/MODE/Compact+Node+Type+(CND)+files
[12] https://wiki.duraspace.org/display/FF/REST+API+-+Node+Types
[13] https://github.com/futures/fcrepo-content-model-examples
[14] https://docs.jboss.org/author/display/MODE/Federation
[15]
https://docs.jboss.org/author/display/MODE/File+system+connector
[16] https://wiki.duraspace.org/display/FF/Federation
[17] http://wiki.apache.org/marmotta/LDPath
[18] https://wiki.duraspace.org/display/FF/External+Search
[19] https://wiki.duraspace.org/display/FF/Admin+Search
[20] https://github.com/futures/fcrepo4/releases/download/fcrepo-4.0.0-alpha-3/fcrepo-webapp-4.0.0-alpha-3-jetty-console.war
[21] https://wiki.duraspace.org/display/FF/Feature+Tour
[22]
https://wiki.duraspace.org/display/FF/Deploying+Fedora+4
[23] https://code.google.com/p/leveldb/

[24] https://wiki.duraspace.org/display/FF/ModeShape+Artifacts+Layout
[25] https://wiki.duraspace.org/display/FF/Versioning
[26] https://wiki.duraspace.org/display/FF/REST+API+-+Versioning
[27]
https://wiki.duraspace.org/display/FF/How+to+set+repository-wide+auto-versioning
[28] https://github.com/futures/fcrepo4/blob/fcrepo-4.0.0-alpha-3/fcrepo-kernel/src/main/resources/fedora-node-types.cnd
[29] https://wiki.duraspace.org/display/FF/Performance+Testing
[30] https://wiki.duraspace.org/display/FF/Single-Node+Test+Results
[31] https://wiki.duraspace.org/display/FF/AuthZ+-+No+AuthZ+Fedora+4+Comparison+Performance+Testing
[32] https://wiki.duraspace.org/display/FF/Single-Node+Test+Results#Single-NodeTestResults-Fedora3/4Comparison
[33] https://wiki.duraspace.org/display/FF/Large+File+Ingest+and+Retrieval
[34] https://wiki.duraspace.org/display/FF/Documentation
[35] https://wiki.duraspace.org/display/FF/Administrator+Guide
[36] https://wiki.duraspace.org/display/FF/Developers+Guide
[37] https://wiki.duraspace.org/display/FF/Feature+Tour
[38] https://wiki.duraspace.org/display/FF/Features

[39] https://wiki.duraspace.org/display/FF/Glossary
[40] https://wiki.duraspace.org/display/FF/Fedora+Sponsors

RSS Feeds: