You are here

Block title
Block content

Subject Repositories: A New arXiv Collaborative Business Model

Ithaca, NY Subject repositories house and make accessible a large quantity of specialized information and research that drives innovation in all areas of human endeavor around the world. What are the differences between subject repositories focused on a particular discipline and institutional repositories that house assets from multiple subject areas? As deposits and data in subject repositories grow to contain ever larger tracts of specialized knowledge the issue of how to sustain and preserve the cultural and scientific value contained in these digital repositories is becoming more significant.
Simeon Warner, who has managed arXiv ( since 2006, the largest subject repository in the world with over 580,000 articles says, “Subject repositories such as arXiv are developed to meets the needs of a disciplinary community. As such, they often operate in somewhat idiosyncratic ways in order to serve the particular community’s needs. The technical architectures need not differ from institutional repository architectures, but in most cases they operate at a rather larger scale.”
Warner suggests that the architecture of SRs should be split: background operations designed using open standards and protocols with user interfaces built to meet individual subject community’s needs. This strategy ensures that subject repositories can share resources with institutional repositories and other subject repositories, while effectively serving discipline-specific audiences.
Examples of other large subject repositories include:
SSRN, Social Sciences Research Network (, is devoted to the rapid worldwide dissemination of social science research and is composed of a number of specialized research networks in each of the social sciences.
RePEc, Research Papers in Economics (, is a collaborative effort of volunteers in 70 countries to enhance the dissemination of research in economics through a decentralized database of working papers, journal articles and software components.
Economists Online ( combines content with RePEc archives to provide a new information service for economists.
PMC, PubMed Central (, is the U.S. National Institutes of Health (NIH) free digital archive of biomedical and life sciences journal literature.
CiteSeer ( is a scientific literature digital library and search engine that focuses primarily on the literature in computer and information science. harvesting articles from academics’ homepages  and other sources.
ADS, the NASA Astrophysical Data System (, has almost complete  information on publications in astronomy and astrophysics with three bibliographic databases containing more than 8.1 million records.
SPIRES, High Energy Physics Literature Database ( currently includes bibliographic information in some areas of physics, especially high-energy physics (HEP) and astrophysics and astronomy (ASTRO).
PhilPapers, Online Research in Philosophy (, is a repository of articles and bibliographic records in philosphy.
Cornell University Library Model for Sustaining arXiv
In a move to expand support for sustaining arXiv (, an e-print service in the fields of physics, mathematics, non-linear science, computer science, quantitative biology and statistics, Cornell University Library is broadening the funding base for the online scientific repository. Nearly 600,000 e-prints — research articles published online in physics, mathematics, statistics, computer science and related disciplines — now reside in arXiv, which is an open information source for hundreds of thousands of scientific researchers.
arXiv will remain free for readers and submitters, but the Library has established a voluntary, collaborative business model to engage institutions that benefit most from arXiv.
“Keeping an open-access resource like arXiv sustainable means not only covering its costs, but also continuing to enhance its value, and that kind of financial commitment is beyond a single institution’s resources,” said Oya Rieger, Associate University Librarian for Information Technologies. “If a case can be made for any repository being community-supported, arXiv has to be at the top of the list.”
The 200 institutions that use arXiv most heavily account for more than 75 percent of institutional downloads. Cornell is asking these institutions for financial support in the form of annual contributions, and most of the top 25 have already committed to helping arXiv.
Institutions that have pledged support include:
• California Institute of Technology
• University of California, Berkeley
• University of Cambridge (UK)
• CERN – European Organization for Nuclear Research (Switzerland)
• CNRS – Centre National de la Recherche Scientifique (France)
• Columbia University
• DESY – Deutsches Elektronen-Synchrotron (Germany)
• Durham University (UK)
• ETH Zurich – Eidgenössische Technische Hochschule Zürich (Switzerland)
• Fermilab
• Harvard University
• University of Illinois at Urbana-Champaign
• Imperial College London (UK)
• Los Alamos National Laboratory
• Massachusetts Institute of Technology
• Max Planck Society (Germany)
• University of Michigan
• University of Oxford (UK)
• University of Pennsylvania
• Princeton University
• SLAC National Accelerator Laboratory
• Texas A&M University
“We are delighted that so many others have already stepped forward to share the cost of arXiv, and that even more are considering it,” said Anne R. Kenney, Carl A. Kroch University Librarian at Cornell. “It is heartening to see other institutions show their commitment to sustaining this eminent resource, which is used by scientists around the world.”
“arXiv is a vital resource for scholarly communication on a global scale for researchers and students across numerous disciplines. It is essential that the institutions whose users contribute to the database and consume its content provide an appropriate level of financial support,” said James G. Neal, Vice President for Information Services and University Librarian at Columbia University.
The proposed funding model is viewed as a short-term strategy, and the Library is actively seeking input on a long-term solution. Currently, Cornell University Library supports the operating costs of arXiv, which are comparable to the costs of the university’s collection budget for physics and astronomy. As one of the most influential innovations in scholarly communications since the advent of the Internet, arXiv’s original dissemination model represented the first significant means to provide expedited access to scientific research well ahead of formal publication.
Researchers upload their own articles to arXiv, and they are usually made available to the public the next day. arXiv, founded by physics professor Paul Ginsparg, has about 400,000 users and serves more than 2.5 million article downloads per month. Its 101,000 registered submitters live in nearly 200 countries.
arXiv is interconnected with many other scholarly information resources. These include the
INSPIRE system being developed by supporting high-energy physics laboratories CERN, DESY, Fermilab and SLAC, as well as the Astrophysics Data System at Harvard University, another supporting institution.
For details about the operating principles of the new structure, visit the FAQ at For questions about supporting arXiv, contact consortia representatives or the arXiv office at Cornell University Library at
About Cornell University Library
Cornell University is an Ivy League institution and New York’s land-grant university. Among the top ten academic research libraries in the country, Cornell University Library reflects the university’s distinctive mix of eminent scholarship and democratic ideals. The Library offers cutting-edge programs and facilities, a full spectrum of services, extensive collections that represent the depth and breadth of the university, and a deep network of digital resources. Its impact reaches beyond campus boundaries with initiatives that extend the land grant mission to a global focus. To learn more, visit