The Biolibrarian Proposal

From irefindex

The survey

Please take the Biolibrarian proposal survey by visiting http://www.surveymonkey.com/s.aspx?sm=LubRkhzbX9a7e8aPGKGevQ_3d_3d

It's only six questions and can be filled out in less than five minutes.

What is the Biolibrarian proposal

The Biolibrarian proposal proposes the creation of new positions at university libraries around the world.

A Biolibrarian is trained in the use of biological databases that would initially include biological pathway, complex and interactions databases. This scope is intentionally limited for the purposes of the proposal but could be expanded.


It is envisioned that molecular biologists could meet with a Biolibrarian in the same way that they meet with and use the services of a librarian. The Biolibrarian could help molecular biologist researchers to locate pathways, complexes and interactions that their molecules of interest are involved in.

The Biolibrarian would help the biologist to access, use and interpret data from curated molecular databases (including pathway, complex, interaction, model organism and protein databases).

In addition, the Biolibrarian would be trained in the use of state of the art text mining tools to help researchers locate data for their molecules of interest in abstracts and full-text research articles.

Finally, Biolibrarians could help researchers enter verified information from full-text articles into curated databases where it would be available to researchers around the world that were querying for information on these same molecules.

Would you support such a service at your local university library? Do you have comments on this proposal? Follow the link above and take our survey.

You can read a synopsis of the proposal below.

This initiative is a proposed infrastructure project at the University of Oslo in Norway where this position type would be prototyped. The initiative is led by Ian Donaldson at the Biotechnology Centre of Oslo. The above survey is an attempt to assess support for the proposal at the University of Oslo and at Universities around the world where this project could be replicated.

Ian Donaldson was a lead bioinformatics developer and research scientist for the Biomolecular Interaction Network Database (BIND) between 2002 and 2005. This effort employed close to 30 curators. He was involved in many aspects of this project (including curation and data standard development) since the project’s inception in 1999.

Comments and suggestions are welcome. We are also interested in learning about similar proposals or projects that are already in place. Please email ian.donaldson@biotek.uio.no. If you would like to add your name to this wiki page in support of this application, please send a brief email.

The proposal in brief

Time plan: 5 years

Personnel: 8 – 10 biolibrarian curators

Deliverable: A prototype for biolibrarian positions at universities around the world.



We will propose a team of 8 to 10 curators who will search primary biomedical research literature and enter biomolecular interaction and pathway data into machine readable format to facilitate exchange and integration of data with other similar efforts as well as to facilitate human and machine based data mining. These personnel will be trained in the use of the latest pathway, complex, interaction, model organism and protein databases. They will act as a liason between researchers and databases to facilitate retrieval of information AND entry of curated information by local researchers.

Biomolecular interaction data consists of the set of experimentally verified interactions that occur between proteins, DNA’s, RNA’s, small molecules or complexes involving any of these molecular types. These data, along with associated reactions and state changes form the basis of biological pathways. As such, interaction and pathway data define the biological function of their participant molecules. The resulting network of interactions and pathways between molecules forms a map of living systems that may be searched and computed on. The resource is a broadly applicable to all molecular and medical life sciences.

Presently these data are collected by several small groups around the world. It is a labour intensive task that requires skill in reading research articles and knowledge of multiple standard data formats using large controlled vocabularies. Traditionally, these databases have had difficulties in securing long-term, stable funding since they compete with proposals for experimentalist research while they are essentially infrastructure projects. Despite the fact that these databases receive hundreds of citations per year, a survey of the major interaction databases indicates that they employ only a handful of full-time curators. This number is insufficient to keep up with the rate of research publications let alone the backlog of uncurated research articles. This infrastructure call represents a unique opportunity for Norway to establish a prototype position in this area that could be replicated across universities. The cost is fractional compared to the funds expended by universities on biomedical journal subscriptions. The payoff is a powerful dataset that may be data mined by humans and machine algorithms.

We will solicit letters of support from international interaction databases supporting our efforts in this area. We will also solicit national and international research groups to propose biomedical areas requiring curation. Finally, we will survey universities worlwide to assess their support of such a service. Newly curated data will be used to give context to high and low throughput proteomics and sequencing projects as well as provide tools for genome wide analysis studies related to human disease, cancer and personalized medicine. Graphical algorithms acting on large scale interaction and pathway maps have broad utility that includes (but is not limited to) identification of biological roles of proteins, identification of disease genes and selection of drug targets. The efficacy of these algorithms is dependent on the quality of the underlying data. Presently, high-quality, human curated data is dwarfed by less reliable data from high-throughput interactomics studies. The interpretation of these high-throughput studies themselves are benefited by the presence of human-curated data.

The proposed project will have high visibility and high impact. Data will be made freely available in internationally recognized formats (such as the HUPO PSI-MI standard) under a Creative Commons License. Data will be available via bulk-download, web-interface and at least one internationally recognized graphical viewer (http://cytoscape.org). Data will be integrated and exchanged with other similar database efforts to facilitate search and analysis. Integration will be accomplished using a system recently developed in the principal investigator’s research group (http://irefindex.uio.no). This same system will be used to monitor and ensure accepted curation practices. We will contribute to the maintenance and expansion of data exchange formats and controlled vocabularies. We will adhere to and develop curation practices set out by the International Molecular Exchange Consortium (http://imex.sourceforge.net/). Existing curation and database systems for handling data are already available from other IMEx groups and these will be installed, used and built upon.