The Fall 2014 VIVO Hackathon Report

Mon, 2014-10-20 11:24 -- carol

Winchester, MA  The Fall 2014 VIVO Hackathon took place at Cornell University’s Mann Library in Ithaca, New York from Oct. 13-15. More than 30 enthusiastic developers from around the US and Canada in were in attendance. Fall colors, gorge hikes, trips to local eateries and opportunities for making friends around shared interests in leveraging the VIVO interdisciplinary semantic web framework all contributed to a lively and successful event.

On Wednesday morning project teams participated in a closing session where accomplishments were shared with the group. “Show and tell” reports included slides and pointers to code contributions, issues being tracked, and new documentation. Slides and notes are available here at http://goo.gl/uWZU5c, including pointers to GitHub from projects producing code.

Here are a few highlights.

A data visualization hack team set out to identify and gather external authoritative data to help VIVO institutions understand their sources of grant funding. The aim was to populate a “bubble map” visualization that would highlight the location and level of support from organizations providing grant funding to universities using VIVO. It was no surprise that in the initial visualization of the map of the US many large bubbles appeared over Washington, DC, leading the team to look for more specific address information than can be found in existing data sources such as FundRef. Using federal tax id numbers, the team linked additional zip+4 and classification information to the FundRef data so that a registry accessible as linked data will make it easier for VIVO institutions to “follow the (grant) money”.

The ontology team tackled the challenges of sharing local ontology extensions to benefit other institutions and evolve VIVO ontology in concert. The team identified key requirements for a central ontology registry where VIVO installations could review available extensions and/or submit new proposed classes, properties, or terms for review and possible adoption in subsequent VIVO-ISF ontology releases.  The same platform could potentially also manage a registry of URIs for people, educational and funding organizations, journals, events, and vocabulary terms as persistent identifiers. By associating ORCID and/or VIAF records with registry entries wherever available, VIVO URIs from the registry could be linked to and/or added to Library of Congress and/or OCLC authority records used in library catalogs all over the world.

The principles of open data and linked data are central to the VIVO framework. Where and how to find the right authoritative data was a theme running through several hacks. Where do data come from? Where are data first generated? One team was concerned with how to include links to other, related things when publishing data on the web and proposed a simple version of a “VIVO requester” service that could use the vivoweb.org registry proposed above to determine who already has a persistent URI for a funding organization or the right vocabulary of equipment or research resource types for a particular purpose.

Anyone who has worked in a wiki knows that it can become unmanageable over time with many people contributing entries without a clear structure or editorial review. Making the VIVO wiki less wacky was the subject of a hack that sought to address a commonly perceived lack of documentation by improving the structure and tagging existing content and making it more clear to potential contributors where new content belongs.  The DSpace wiki space demonstrates one approach to wiki structure that separates technical documentation varying with each release from introductory and explanatory documentation. Highlighting fewer topical categories, renaming files for consistency and moving technical documents into space-specific versions were among improvements made to the VIVO wiki.

Faceted browsing to support dynamic filters on browse pages or search results requires adding facets to existing page templates. The faceted browse hack added a faceted search of people by research area, and a start was made on building a more general capability that can be configured on a per-site basis without having to modify code.

Embedding Schema.org tags for SEO (search engine optimization) in VIVO requires only modest changes to page templates and will be an out-of-the-box capability in the upcoming release of VIVO v1.8. Google now builds its search results based on recognizing content and structure in Schema.org tags embedded in the HTML on web pages. This hack adds these tags to VIVO’s display pages and will make VIVO data more visible in Google and other search engines while also providing additional descriptive information in search results to benefit users.

Max Hu from Memorial University in Newfoundland demonstrated a Drupal CMS + VIVO data site: YAFFLE (http://www.yaffle.ca), where data developed and stored in VIVO is presented to users through Drupal pages providing additional entry points and help support.

Josh Hanna from the University of Florida worked on testing VIVO with the Stardog enterprise graph database (http://stardog.com) to see whether a commercial triple store could provide better performance with increasing amounts of data. Leveraging the VIVO community to negotiate academic Stardog licenses was discussed. Weill Cornell is also testing VIVO on other triple store platforms.

Patrick West reviewed the Deep Carbon Observatory Data Portal for the Deep Carbon Observatory (https://deepcarbon.net), a project led by RPI and funded by the Sloan Foundation. A Drupal front end with VIVO integration provides a variety of visualizations;  persistent URI handles for every dataset link users back to VIVO as the primary metadata store. This functionality required a code modification in VIVO core to return the persistent data handles from the data repository at the time of dataset upload.

Northwestern University has offered to host the next VIVO Hackathon in 2015. More information to come.

 
RSS Feeds: