Difference between revisions of "iRefIndex"

From irefindex
Line 196: Line 196:
Statistics for the iRefIndex include a breakdown of interactors and interactions from each data source.  
Statistics for the iRefIndex include a breakdown of interactors and interactions from each data source.  
For statistics on full public dataset please refer to:
For statistics on full public dataset (distributed on the FTP site contains) please refer to:
http://irefindex.uio.no/wiki/Statistics_iRefIndex_7.0 <br>
http://irefindex.uio.no/wiki/Statistics_iRefIndex_8.0 <br>
For statistics on the public dataset (distributed on the FTP site contains) please refer to:
http://irefindex.uio.no/wiki/Statistics_iRefIndex_free_7.0 <br>

Revision as of 09:26, 20 January 2011

A reference index for protein interaction data

iRefIndex provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. This index includes multiple interaction types including physical and genetic (mapped to their corresponding protein products) as determined by a multitude of methods. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.

iRefIndex assigns a global unique identifier (rigid) which looks like 'tjWXXjgPyHyT2J6EwED8zK2x18U' to identify interactions that are identical (according to the sequence and taxon ids of the interactors). iRefIndex also assigns similar looking keys to protein interactors. These keys are global meaning they can be generated by anyone using the method described in the paper. This method allows users to integrate their own data with the iRefIndex in a way that ensures proteins with the exact same sequence will be represented only once.

The iRefIndex paper has been published and is available here.

The iRefWeb paper has been published and is available here

iRefIndex is a PSIMex partner



Long term goals of the iRefIndex project

We believe that protein interaction data hold incredible potential for biomedical research. Presently, these data are collected and archived by multiple groups around the world and the number of groups taking part in this work is growing rather than diminishing.

As such, it is important that these databases have the means to effectively exchange and compare data and that they are curating and representing data using similar standards in order to make their data accessible and allow effective use.

To this end, the iRefIndex project has three long term objectives:

1) to facilitate exchange of interaction data between interaction databases.
The iRefIndex paper describes a method for assigning unique and global identifiers to protein interactors, interactions and complexes. This method is independent of the iRefIndex resource and may be used by anyone to facilitate exchange and consolidation of data.
2) to consolidate interaction data from multiple sources.
The method has been used by to index interaction records from multiple sources. The resulting iRefIndex may be used search for the existence of interaction data for any protein regardless of the original resource. Nine interaction databases have been incorporated so far, others will follow.
3) to provide feedback to source interaction databases.
During the process of data consolidation, iRefIndex uses a sophisticated method to keep track of potential problems with source records such as outdated or unfound protein identifiers or incorrectly assigned taxonomy identifiers. These data are provided as feedback files to source interaction databases for correction, clarification or improvements to our own system. This process will help to harmonize data representation and improve the overall quality of interaction records for all source databases. This process will also help source databases to exchange data with one another.

Data availability via download

A subset of iRefIndex data is provided as a tab-delimited text file in PSI-MITAB format.

Data is available via anonymous FTP at:


Username: ftp

Password: enter anonymous or your e-mail address

iRefIndex data is provided as a tab-delimited text file in PSI-MITAB 2.5 format. The format is described at



iRefIndex data is now provided as a tab-delimited text file in PSI-MITAB 2.6 format. The format is described at:


iRefIndex data is provided as a single file or in a number of data sets specific to the organism in which the interaction occurs. See the above link for details.

Source data for the current build is described at


If you need help, see the contact at the bottom of this page.

Data availability via web interface

iRefWeb provides a searchable web interface to the iRefIndex. iRefWeb also provides visualization of statistics related to iRefIndex and allows users to compare annotation of the same publication by multiple interaction databases.


Please see the upper right hand corner of this iRefWeb front page for the version of iRefIndex that iRefWeb is using.

The iRefWeb paper is available here.

Data availability via Cytoscape

Plugins for two versions of Cytoscape (http://cytoscape.org) have been released:

Cytoscape 2.7 users:

Install iRefScape 0.90 beta from the Plugins menu.

This version is not backwards compatible with Cytoscape 2.6.3.

Linux users with Java 1.6 (not 1.5) will not be able to install the plugin using the Plugin manager (instead, install manually – see links below)

Details on plugin installation and use are available at: README_Cytoscape_plugin_0.9x

Cytoscape 2.6.3 users:

Install iRefScape 0.89 beta from the Plugins menu

This will be the last iRefScape that supports Cytoscape 2.6.3

Details on plugin installation and use are available at: README_Cytoscape_plugin_0.8x

Sign up for the Google Groups email list (below) to be informed of the official release and updates.

Data availability via Web services

A beta version of web-services is available for testing purposes. These services are based on a template from the IntAct [1] group at EBI. Please see the contact information and Google Groups info below if you plan to use these services or would like to know when they are officially released.

README PSICQUIC web services for iRefIndex

Feedback files

We provide feedback to source databases about our data integration process. These files are described here :



iRefIndex data distributed on the FTP site includes only those data that may be freely distributed under the copyright license of the source database. This includes data from BIND, BioGRID, IntAct, MINT, MPPI and OPHID.

Data released on the public FTP site is released under Creative Commons Attribution License http://creativecommons.org/licenses/by/2.5/

iRefIndex also integrates data from CORUM, DIP, HPRD and MPact. These data are not distributed publicly. These data may be made available to academic users under an academic collaborative agreement (below). CORUM, MPact and Imex records from DIP will be made available in the next public release of iRefIndex. HPRD data will be made available under a collaborative agreement (see below).

Access to full data set and academic collaborations

Researchers participating in an academic collaboration with our group are welcome to the full data set. This means that the collaborating group leader must agree that

1) the data is to be used for non-commercial, academic research purposes and

2) the MITAB data files will not be redistributed outside the immediate research group and

3) they will provide us with feedback on any problems encountered with using the index (or suggested changes).

This feedback (part 3) forms the academic collaboration part and is very important to our work. We require that you provide this feedback sometime between the time that you have received the latest version of the data and before we release any newer version to you.

We do not require authorship on resulting publications unless we are directly involved in the work or provide you with additional information related to the index necessary for the work. We are eager to work directly with other groups if resources permit.

Contact ian.donaldson at biotek.uio.no if you are interested in using the full iRefIndex database or would like your database included in the public release of the index.


Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


Statistics for the iRefIndex include a breakdown of interactors and interactions from each data source.

For statistics on full public dataset (distributed on the FTP site contains) please refer to: http://irefindex.uio.no/wiki/Statistics_iRefIndex_8.0


The iRefIndex was developed at the Biotechnology Centre of Oslo, University of Oslo in the Donaldson lab by Sabry Razick and Ian Donaldson. George Magklaras provided systems engineer support and EMBNet Norway provided hardware support.


iRefIndex has been published and is available here. If you use iRefIndex, please cite:

Razick S, Magklaras G, Donaldson IM: iRefIndex: A consolidated protein interaction database with provenance. BMC Bioinformatics. 2008. 9(1):405 PMID 18823568.

Please also cite the source databases described below. iRefIndex consolidates protein interaction data from...

iRefIndex uses SEGUID based identifiers to group proteins into redundant groups. The SEGUID algorithm and database are described in [13].


1. Bader GD, Betel D, Hogue CW: BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 2003, 31(1):248-250.
2. Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E et al: The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res 2005, 33(Database issue):D418-424.
3. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006, 34(Database issue):D535-539.
4. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The Database of Interacting Proteins: 2004 update. Nucleic Acids Res 2004, 32(Database issue):D449-451.
5. Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M et al: Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res 2003, 13(10):2363-2371.
6. Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM et al: Human protein reference database--2006 update. Nucleic Acids Res 2006, 34(Database issue):D411-414.
7. Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A et al: IntAct: an open source molecular interaction database. Nucleic Acids Res 2004, 32(Database issue):D452-455.
8. Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R et al: IntAct--open source resource for molecular interaction data. Nucleic Acids Res 2007, 35(Database issue):D561-565.
9. Chatr-aryamontri A, Ceol A, Palazzi LM, Nardelli G, Schneider MV, Castagnoli L, Cesareni G: MINT: the Molecular INTeraction database. Nucleic Acids Res 2007, 35(Database issue):D572-574.
10. Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stumpflen V: MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res 2006, 34(Database issue):D436-441.
11. Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenbach I, Frishman G, Montrone C, Mark P, Stumpflen V, Mewes HW et al: The MIPS mammalian protein-protein interaction database. Bioinformatics 2005, 21(6):832-834.
12. Brown KR, Jurisica I: Online predicted human interaction database. Bioinformatics 2005, 21(9):2076-2082.
13. Babnigg G, Giometti CS: A database of unique protein sequence identifiers for proteome studies. Proteomics 2006, 6(16):4514-4522
14. Andreas Ruepp, Barbara Brauner, Irmtraud Dunger-Kaltenbach, Goar Frishman, Corinna Montrone, Michael Stransky, Brigitte Waegele, Thorsten Schmidt, Octave Noubibou Doudieu, Volker Stümpflen, H. Werner Mewes: CORUM: the Comprehensive Resource of Mammalian Protein Complexes. Nucleic Acids Res. 2008 Jan 1; 36(Database issue):D


A collection of resources related to finding and working with protein interaction data: Protein Interaction Resources.


Suggestions, requests and comments are welcome. Please email

ian.donaldson at biotek.uio.no.

Visiting and mail address info is here.

iRefIndex Google Group

Join the iRefIndex Google group and receive announcement emails and discuss issues related to iRefIndex. Sign up at irefindex google group.

All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).