A View From CNI Breakout Sessions: SHARE, Linked Data for Libraries, Fedora 4 and VIVO

Thu, 2014-12-18 09:40 -- carol

Winchester, MA  The recent CNI Fall Member Meeting held in Washington, DC featured a wide variety of topics in multiple breakout sessions of interest to practitioners in the fields of digital information technology and scholarly communication. The following are just a few session highlights.

SHARE (SHared Access to Research Ecosystem), “Update on SHARE Developments”

About 60 people came to learn more about SHARE (SHared Access to Research Ecosystem) Project progress during the “Update on SHARE Developments” breakout session on Dec. 8.

Project Director Tyler Walters began the project presentation with an overview of the emerging community, history, mission and objectives for how the project will share research widely and openly. Developing infrastructure, policies and workflow are the three cornerstones of the mission that are coming together to support US federal agencies, higher education institutions and international organizations in a collaborative effort to understand and share publishable “events” around publicly funded research.

Eric Celeste, SHARE Technical Director pointed out that collecting community use cases is a significant activity that will guide development and deployment of the SHARE notification service. Documented examples or stories about information needs in a particular setting, or scenarios about how technology will be applied within an organization, are valuable guideposts for developers. How are institutions going to use SHARE outputs in workflows? How will this new kind of information impact the tactical ability to meet or exceed institutional strategic interests?

Collaboration among institutions and partners is key to the success of the project. The overall SHARE effort represents a shift from featuring institutional collection products to sharing components of research in a collaborative environment.

The SHARE notification system aims to expose research events that explain “who is producing what and under what auspices”. SHARE output can include data, articles, slides, administrative output, data management plans, requests, responses, archived data and more. All of this information can be characterized as a research related “event” as content is added to an open repository.

Potential consumers of the SHARE notification service metadata “pipeline” of rapidly updating streams of research events include repositories, sponsored research offices, funders and others.

Jeffrey Spies, Co-Founder of the Center for Open Science outlined the simple SHARE notification service technical architecture. The service will harvest metadata about research events from many sources in in a variety of formats. The service will then normalize the metadata and relay it out to stakeholders or constituents.

With the first release scheduled for spring of 2015 the project is looking for beta subscribers. The aim is to attach a CC licenses to all SHARE data to allow for extensive reuse and redistribution. More harvestable sources of research event notifications are needed to ensure that the technology scales to accommodate many frequently updated research events in the future.

Future plans include completion and deployment of the SHARE notification system, development of a SHARE registry to offer additional information about published works, and a SHARE discovery service that will aid in locating and identifying information about cross-institutional SHARE resources.

Linked Data for Libraries, “The Linked Data For Libraries (LD4L) for Libraries: A Progress Report”

Cornell University Library, Harvard University and Stanford University share an interest and a desire to add value to collections and resources by being able to leverage the intellectual value that librarians and other domain experts and scholars add to information resources when they describe, annotate, organize, select and use those resources. This presentation highlighted progress towards establishing an ontology, architecture, and tools that will operate across all three institutions in an extensible network.

It is no surprise to find out that each of these libraries house valuable resources that are inaccessible through the library catalog. Library sources of information such as databases contain potentially valuable discoverable content. RDF, structured information in a simple format that is open for broad consumption and reuse can unlock value for digital resources. Using linked data as a standard means that it can be used as a service across the world and all sorts of boundaries

Dean Kraft, Chief Technology Strategist, Cornell University Library suggested that we need to go from strings to things and have (URIs) for everything. “The more we can link the more we can discover, expose our own unique entities, and connect them to the reset of the world,” he said.

Cramer explained the six clusters of LD4L use cases:

  • Bibliographic and curation data

  • Bibliographic and person data

  • Leveraging external data including authorities

  • Leveraging the deeper graph (via queries or patterns)

  • Leveraging usage data (like a referral service a la netflix recommender service)

  • Three-site services, like cross-site search

LD4L engineering work includes using HYDRA, a robust repository fronted by feature-rich tailored applications and workflows known as “heads,” on top of a triple store. The project may be unique in that it is an ambitious undertaking to coordinate across three institutions. Project output will include a rich set of shared resources along with a great team to back it up.

Fedora 4, “Fedora 4 Early Adopters”

More than 60 people turned out to learn more about how new Fedora 4 functions are being used at three institutions.  David Wilcox, Fedora Product Manager opened the session by acknowledging  the 34 developers from all over the world who contributed to the recent successful release of Fedora 4. Wilcox noted  that Fedora 4 is a native citizen of the semantic web with built-in linked data capabilities.

Declan Fleming, Chief Technology Strategist and Director of IT at the University of California, San Diego is interested in Fedora 4’s ability to manage large and multiple files and its native RDF capabilities. The UCSD beta pilot focused on mapping their existing metadata to an RDF-based schema and injesting objects from their DAMS into Fedora 4 using a variety of storage options. Fleming is eager to continue working on refining metadata mapping, as part of the broader Hydra community push towards interoperability and pluggable data models.

Prior to implementing a Fedora 4 repository The Art Institute of Chicago was looking for an architecture of well-defined services that could communicate with each other. Stephano Cossu, Director of Applications Development and Collections found what he was looking for in Fedora 4 as an institution-wide single source for shared data. Cossu has found that content modeling is very important for adding and removing functionality via mix-ins and would like to see extended content control in future releases.

Tom Cramer, Chief Technology Strategist and Associate Director of Digital Library Systems and Services at Stanford University Libraries, displayed the Parker Library on the web which is a shared digital manuscript collection curated by Stanford and Cambridge libraries. Annotations, those references and notes added to resources over time by scholars and researchers, enrich the original resource. Stanford was looking for analytic and annotation tools that would fit into the research workflow. Cramer is enthusiastic about new possibilities for a linked data platform that can enhance discovery, connection and added value around complex digital resources in Fedora 4.

VIVO Project, “The Evolution of VIVO Community Source Software”

Many attendees were on hand to learn more about VIVO, the open source, semantic web-based application that enables the discovery of researchers, research data and scholarly output. VIVO began in 2003 as an NIH-funded project primarily based at Cornell University. The first public release in 2004 quickly demonstrated VIVO’s ability to facilitate communication and collaboration, measure impact, collect data, determine performance and operate across a wide variety of disciplines at multiple institutions.

Layne Johnson, VIVO Project Director, reviewed examples of VIVO in use at several institutions and explained how the USDA is using VIVO to get a better picture of who’s doing what and where funding is coming from. Other implementations include “Find an Expert” at  the University of Melbourne, Scholars at Duke that makes faculty members discoverable  by name, by keyword or subject area or by anything on their profile, the Mountain West Research Consortium, the Deep Carbon Observatory, EarthCube, and more.

Johnson reviewed community involvement in the VIVO Project. Active working groups are focused on development and implementation, applications and tools, ontologies, and community engagement.  In the coming months VIVO strategic planning gets underway with leadership from a 14-member community strategy group. Earlier in the year the group reviewed results of a recent community survey to determine key strategic themes that include community, technology and sustainability. The VIVO project has evolved from being grant supported to being supported through community membership stewarded by the DuraSpace organization that is tied to project governance. Next steps will include gathering community input to achieve deeper levels of engagement to advance the project and achieve long-term sustainability.

 
preserve