Stargazing: Relating Information and Applications to the Data Workflow

Whether you are a professional or amateur astronomer, stargazing is a time-honored way to reflect on human endeavor in relation to the vastness of space. The National Virtual Observatory (NVO) is a community-based effort that has resulted in an interoperable framework of data and services, giving people all over the world easy access to data from many different projects at all wavelengths of the electromagnetic spectrum from radio to gamma rays. If an astronomer in Europe analyzes a portion of the sky, another astronomer can compare that analysis with other data sets that depict the same portion of the sky.
The Digital Research and Curation Center (DRCC) at the Sheridan Libraries of Johns Hopkins University is building Fedora-based infrastructure that will curate astronomical data associated with publications. Currently, the high-level, refined data that is generated by analyzing data releases from projects such as the Sloan Digital Sky Survey (SDSS) are distributed over an ad hoc network of personal websites, hard drives or servers. Even worse, many of these high-level data are simply being lost. By building the infrastructure using Fedora, it will be possible to integrate the data acquisition and curation process into existing publishing workflows. Scientists will be able to submit data at the same time that they submit their papers to publishers. Fedora makes it possible to relate information and application to the data workflow.
While the NVO work represents the initial set of development activities, the DRCC is now in negotiations to archive the entire set of SDSS data. SDSS, part of the NVO, began over a decade ago with the visionary goal of providing detailed optical images covering more than a quarter of the night sky, and a 3-dimensional map of about a million galaxies and quasars. Sayeed Choudhury, Associate Dean of University Libraries and Hodson Director of the DRCC reflected on the lessons learned from SDSS and NVO. The entire SDSS data and associated metadata comprises about 100 terabytes. With this scale and complexity, new infrastructure will be necessary as scientists look to libraries for data curation support. Winston Tabb, Sheridan Dean of University Libraries at Johns Hopkins recently remarked, “Data centers are the new library stacks.” Choudhury added, “Without data curation infrastructure, we are losing the fundamental primary research materials that support scholarship.”