Towards a Knowledge Lifecycle: Populating Repositories “Upstream”
Submitted by on Tue, 2008-12-16 17:16
By Sally Rumsey, ORA Services & Development Manager, Oxford University Library Services It is no understatement that populating institutional repositories for research materials is proving something of a challenge. Some disciplines have their own popular subject-targeted repositories such as ArXiV (http://arxiv.org/) or SSRN (Social Science Research Network) (http://www.ssrn.com/). Their attraction lies in the authors’ strong allegiance to their discipline. Repositories such as PubMedCentral (http://www.pubmedcentral.nih.gov/) receive content as a result of funding agency mandates. One cannot expect a second deposit by those authors. How can institutional repositories (a term that should be avoided when speaking to non-librarians) become filled with content when competing against these topic specific repositories (although one can argue that competition shouldn’t be an issue in the open access domain) and battling against other barriers to deposit? It is a reasonable assumption that institutions want to retain copies of their own research assets for a number of purposes. Many potential users have no objection to the concept of an institutional repository. The problem lies in translating that latent interest into actively contributing to populating a resource. It is common for repository managers to find that they can ‘lead a horse to water but can’t make it drink.’ Even with clear guidelines and easy-to-use online facilities potential contributors have not yet incorporated regular repository deposit into their scholarly workflows.
Dorothea Salo raises many of the dangers facing repositories in her paper ‘Innkeeper at the Roach Hotel’ (http://minds.wisconsin.edu/handle/1793/22088) . She talks about the reality and challenges faced by many of those who are involved with repositories, and that could be the downfall of repositories. She lays down the gauntlet to repository software developers to adopt user-cented design as a way to further interest scholars in creating repository facilities that make sense to them. Repositories need to fit with the environment in which they operate in order to earn their accepted place in the academic toolkit.
What makes it attractive?
To encourage deposit in a repository users, both active and potential, need to be aware of, and value its benefits. Such benefits vary according to the audience: a contract researcher will value services that a senior manager will not. To address this, a repository needs to have a clearly stated purpose and provide features, functions and services to attract a variety of users. Value-added services such as automated creation of publication lists, links between related items, export for data in a specific format, usage statistics and reporting are all examples of useful scholarly functionality. Clifford Lynch’s definition as a repository as a ‘set of services’ rings more true today than ever (http://muse.jhu.edu/journals/portal_libraries_and_the_academy/v003/3.2lynch.html) .
Barriers to obtaining material
The topic of barriers to deposit has been extensively debated, but still remain. The two main barriers to deposit in my experience are time (deposit as an additional task for busy academics) and the complexity of copyright. Copyright confusion can be deferred by initially encouraging deposit of materials other than journal articles (policy permitting) such as conference papers where this problem is not so acute, in order to encourage interest in and use of the repository. Other difficulties include existing use of a subject or other repository, version confusion, perception of quality control and the difficulty of changing established personal practice. In order to encourage deposit, institutions, repository managers and developers need to tackle these obstacles to find practical and workable solutions and wherever possible, shift the problem away from the author depositor.
IncReASe and EMBRACE
In the UK two projects are attempting to examine and provide practical solutions to obtaining more content for repositories. Both have been funded by the JISC (Joint Information Systems Committee, http://www.jisc.ac.uk).
IncReASe (Increasing Repository Content through Automation and Services, http://eprints.whiterose.ac.uk/increase/) is designed to build on the work of the existing White Rose Research Online repository, a shared repository for the Universities of Leeds, Sheffield and York. As well as reviewing upload and processes, and investigating automation of these areas, the project aims to offer ways to encourage researchers to use the repository. This includes finding methods to ‘capture significantly more new content’ but also explore methods to streamline processes such as improved workflows for deposit of research materials resulting from ESRC (Economic and Social Research Council, http://www.esrc.ac.uk/ESRCInfoCentre/index.aspx) funded research, automated metadata and full text harvesting from internal sources, integration with the academic workflow and simplified repository administration procedures. The project runs July 2007 to 31 December 2008
The recently completed EMBRACE project (Embedding Repositories And Consortial Enhancement http://www.sherpa-leap.ac.uk/embrace.html)
was also based within a repository consortium: the SHERPA-LEAP group of hosted repositories lead by UCL (http://www.sherpa-leap.ac.uk/). This project incorporated work to enhance the services and functionality of the EPrints platform for the hosted LEAP repositories. These included implementation of the Eprints Application Profile and development of a ‘tool to embed citations and other information into the text of eprints’ at the point of deposit. Coupled with this, work focused on how better to embed repositories into institutional strategy. The resulting report, ‘Embracing the future ’ provides ‘an assessment of current awareness and attitudes of stakeholders regarding digital repositories in three case study institutions. The report identifies drivers for, and barriers to, the embedding of digital repositories in institutional strategy.’ The draft of a separate paper which explores ways of addressing the challenges of sustaining institutional repositories can be accessed at http://eprints.ucl.ac.uk/13963. It will be published as a RAND Briefing Paper early in 2009.
Mandates – for better or for worse?
Institutions have the option of using the carrot or stick approach to deposit. Enticements include benefits to users, both depositors and end users. Some institutions have opted for sticks by adopting a mandate requiring authors to deposit a copy of their articles in the repository. The word ‘mandate’ may send out the wrong messages and is probably best avoided. For example, the University of Glasgow has instead a ‘University Policy Statement on Publications’ (http://www.lib.gla.ac.uk/enlighten/publicationspolicy/index.html). Harvard is being watched where the Faculties of Arts & Sciences and Law (as opposed to senior management) have taken an active role in proposing and adopting policies concerning copyright permissions (see http://www.news.harvard.edu/gazette/2008/02.14/99-fasvote.html).
Mandates like these usually focus on journal articles which are the core item types of the open access movement and the raison d’être for some repositories. A mandate might be imposed for a number of reasons: to try to ensure high visibility of the institution’s research, taking action towards open access, an attempt to use the repository for internal management purposes, to increase visibility in the shark-infested water that is competition for research funding, or for national research assessment as implemented in the UK and Australia.
As is often the case, he who pays the piper, calls the tune. This is true of funding agency requirements for deposit and open access. Authors are more inclined to comply with a funding agency’s stipulation for deposit (of publications and increasingly, of data) and add an extra step to their research workflow, than deposit materials because of a nice librarian’s recommendation.
One less radical solution is to work towards ‘patchwork mandates’ as proposed by Arthur Sale of the University of Tasmania (see DLib January/February 2007
Volume 13 Number 1/2 at http://www.dlib.org/dlib/january07/sale/01sale.html). This model advocates the gradual adoption of mandates by departments or other groups until eventually a critical mass is reached and a full cross-institutional mandate is possible. This is likely to be a more realistic proposal for some institutions where a full mandate imposed by senior management might prove too adversarial. When considering a mandate for deposit, it is preferable to do it in collaboration and agreement with the academic community rather than in spite of them. This will require negotiation and advocacy to engender academic ‘buy-in’ – as opposed to academic ‘walk-out’.
Mediated v. self-deposit v. other methods
Mediated deposit has been adopted by many institutions as a means to encourage deposit, particularly of legacy items, to mitigate difficulties and kick-start the repository. However, this method of filling repositories is often not scaleable, particularly in large institutions. The aim is often to encourage self-deposit, where the author submits his or her items on completion. Such a method needs to be much easier than it is currently: the expectation that every academic will complete a deposit form dutifully for every eligible item is not realistic. Alternatives such as bulk upload of both metadata and full text and other methods of deposit are therefore needed. OAI-PMH and OAI-ORE (http://www.openarchives.org/ore/) can play a role here. There need to be alternatives to re-keying details such as author name and affiliation – such data often exists in other locations and should be re-used. Automated metadata creation is another potential boon. Alternatives to the common method of deposit via completion of a web-based form and file attachment which fit better with the academic workflow are urgently needed. This might include services such as those described as ‘social deposit’ by Stuart.Lewis (Aberystwyth University) (JISC Repositories mailing list archives 18 Nov 2008 http://www.jiscmail.ac.uk/cgi-bin/webadmin?A0=jisc-repositories). They might take the form of deposit via Facebook by means of a SWORDAPP Facebook Repository Deposit Tool recently developed at Aberystwyth as part of a JISC funded ‘SWORD2’ project http://fb.swordapp.org/.
Fitting with the author/creator workflow
The optimum moment of deposit has until recently, been accepted as the point of completion of the document (or equivalent). However, this might not be the best solution in all cases. It might be preferable to begin the metadata gathering process at the point of project proposal, in collaboration with research support services within the institution. Some may value an individual or shared workspace where the document can be created, and automatically generated metadata and files be seamlessly deposited in the repository. It is worth investigating different methods of deposit to provide different options for contributors. Repository managers need to be creative and work in tandem with software developers to develop innovative solutions that fit with author practice. This may require collaboration with central IT services.
The repository environment is changing. Increased use of semantic web technologies and re-use of existing data are making some attractive services a reality. Repositories are becoming a part of the data workflow rather than focused on the software as an end in itself. Different methods of ingesting data into repositories are being developed. By obtaining data from existing multiple sources further up the research workflow chain, and associating that data with the requisite publications, the author’s need to engage directly with the repository at point of deposit is minimised. Copyright and version confusion still need tackling.
How can we improve matters?
Although the numbers of repositories is increasing, the corresponding amount of content is not keeping up. For repositories to survive they need to fit with systems and workflows and make life easier for their users. They should not be seen as a time-consuming extra. In essence they need to:
a) be integrated as part of the data flow rather than be a data store
b) offer valuable services relevant to and needed by their local community
A repository should have clear purposes that are obvious to its users. It should not necessarily be tied to the library: it is an institutional asset which can be embedded and integrated into institutional systems, both soft and hard. An example at Oxford is the new JISC-funded project, Building the Research Information Infrastructure, where semantic web technologies will be used to forge connections between for example, people, projects and publications (http://brii.medsci.ox.ac.uk). Working within a new and rapidly changing environment with constantly moving goal-posts and well-embedded existing workflows presents many difficulties. If repositories can overcome the hurdle of being an interesting but largely misunderstood and confusing extra, offer seamless deposit and make their value known to local user communities, then deposit and re-use will be adopted as part of normal day-to-day workflows. There is still some way to go.