Fedora4 Beta Sprint 14: Enhanced Object and Datastream Versioning, Triplestore, Linked Data, Performance ++

Mon, 2014-04-28 12:43 -- carol

Winchester, MA  The Fedora team has concluded the fourteenth sprint within the "Beta Phase" of Fedora4 development. The work of this and each sprint is planned and completed thanks to the contributions of Fedora stakeholder institutions which allocate developer time. If you would like to be involved with Fedora4 development, please send an email to Andrew Woods awoods@duraspace.org, or to fedora-community@googlegroups.com. If you have comments on the work from this sprint, please also send an email or comment directly on the wiki.

Read the the Sprint B14 summary:
https://wiki.duraspace.org/display/FF/Sprint+B14+Summary

About Sprint 14, Beta Phase

Development Team

Kevin S. Clarke - University of California, Los Angeles
Michael Durbin - University of Virginia
Esme Cowles - University of California, San Diego
Longshou Situ - University of California, San Diego
Chris Beer - Stanford University

Sprint Themes

1) Versioning

This sprint enhanced the object and datastream versioning capability [1] in two fundamental ways. Specifically, whereas the creation of new versions was previously supported, this sprint added the logical corollary capability of rolling back to or reverting [2] to a previous version.

Also, the deletion [3] of previous versions is now supported.

2) Triplestore

While the internal search index within Fedora 4 natively supports the ability to reindex on startup, the recommended pattern [4] for exposing a search experience to repository users did not support the ability to reindex the external Solr or triplestore indices prior to this sprint. This sprint introduced an HTTP endpoint [5] for triggering the reindex of external indices for:
• the entire repository
• a tree of resources within the repository, or
• a single repository resource

Beyond reindexing, this sprint also demonstrated the configuration [6] where there are more than one Fedora 4 repositories all feeding events into a single external triplestore.

3) Linked data

In the on-going effort to expose repository resources in a standardized, linked-data friendly manner, Fedora 4 continues to keep in step with the maturing Linked Data Platform (LDP) draft specification [7]. Support for appropriate HTTP request headers which allow the user to indicate a preference for the comprehensiveness of triples found in responses was added. Likewise, appropriate HTTP response headers were added that specify paging [8] information and relationships between parent and child resources in an LDP fashion.

4) Performance
Tests were performed this sprint focused on the determination of whether there is an impact on object creation speeds with the increase in the number of child resources (object or datastream) under a single parent resource. These tests were run with multiple backend storage configurations to additionally assess what, if any, factor the storage backend plays into performance trends.

Thirty thousand objects (first with 1 KB datastreams, then with 2 MB datastreams) were created at the top level of the repository and individually timed using the following backends:
• LevelDB
• RAM
• File

Although a slight up-tick in per object slowdown was indicated during the 2-MB tests, the trend was not absolutely conclusive. Further tests will be repeated with a greater number of objects.

The test results can be found on the wiki [9].

5) Test Coverage

Unit and Integration test coverage [10] is a vital factor in maintaining a healthy code base. The following are the code coverage statistics at the end of this sprint, and the change from the previous sprint.
• Unit tests: 73.1% (up 0.9%)
• Integration tests: 71.7% (up 0.2%)
• Overall coverage: 86.0% (up 0.7%)

6) House-Keeping

Several feature enhancements and bugs were addressed during this sprint. Bug fixes and application polishing included:

• Added resource locking [11] for concurrent requests on same resource(s)
** Locking is available at single node and hierarchy tree levels
• Enhanced pluggability of external/internal identifier mappings
• Created a utility [12] for uploading a sample dataset to a running Fedora 4 repository
• Updated jms-indexer-pluggable integration tests to deploy Fedora webapp in addition to triplestore and indexer webapp
• Enhanced REST-API to return timestamp when creating objects and datastreams
• Improved HTTP caching headers
• Corrected multiple HTTP response codes
• Fixed authorization bug which prevented users with both reader and writer roles from reading a resource
• Fixed benchtool bug which prevented an equal number of HTTP client threads from being created to support the "num threads" option

7) Developer Capacity
For the short and long-term health of Fedora development, the committer base must continue to expand. This sprint witnessed the addition of two new developers, one from each of the University of California, Los Angeles and the University of California, San Diego.

References

[1]  https://wiki.duraspace.org/display/FF/Versioning
[2] https://wiki.duraspace.org/display/FF/RESTful+HTTP+API+-+Versioning#RESTfulHTTPAPI-Versioning-revert
[3]  https://wiki.duraspace.org/display/FF/RESTful+HTTP+API+-+Versioning#RESTfulHTTPAPI-Versioning-delete
[4] https://wiki.duraspace.org/display/FF/External+Triplestore
[5] https://wiki.duraspace.org/display/FF/Indexer+Configuration#IndexerConfiguration-reindex
[6] https://wiki.duraspace.org/display/FF/Indexer+Configuration#IndexerConfiguration-multi 
[7]  http://www.w3.org/TR/2014/WD-ldp-20140311/
[8]  https://dvcs.w3.org/hg/ldpwg/raw-file/default/ldp-paging.html
[9]  https://wiki.duraspace.org/display/FF/Flat+Hierarchies+Testing
[10] http://sonar.fcrepo.org/dashboard/index/1
[11] https://wiki.duraspace.org/display/FF/Locking
[12] https://github.com/futures/fcrepo-sample-dataset

 

RSS Feeds: 
preserve