http://irefindex.vib.be/wiki/api.php?action=feedcontributions&user=PaulBoddie&feedformat=atom
irefindex - User contributions [en]
2024-03-28T15:52:57Z
User contributions
MediaWiki 1.33.0
http://irefindex.vib.be/wiki/index.php?title=Main_Page&diff=4113
Main Page
2012-11-15T17:41:47Z
<p>PaulBoddie: Redirect to the real main page.</p>
<hr />
<div>#REDIRECT [[Donaldson Group]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefScape_1.0&diff=4112
iRefScape 1.0
2012-11-15T17:39:43Z
<p>PaulBoddie: Removed navigational icons from imagemaps using "desc none".</p>
<hr />
<div>__NOTOC__<br />
<br />
[[Image:NP_499166-NP_501526-iterations-1-400x278.png|right]]<br />
<br />
iRefScape is a plugin for Cytoscape that exposes iRefIndex data as a navigable graphical network.<br />
<br />
This page describes the iRefScape 1.0 plug-in for Cytoscape 2.8.x. See the [[#Compatibility_Information|compatibility information section]] for information on other versions.<br />
<br />
<div class="floatleft"><br />
<facebook-like /><br />
</div><br />
<br />
{|class="wikitable" style="text-align:left; clear:left; min-width:50%" border="0" cellpadding="10"<br />
| style="vertical-align: top" |<br />
== Installation ==<br />
<br />
See the [[#Installing_iRefScape|installation section]] for quick installation instructions and references to other documentation.<br />
|<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[#Installing_iRefScape|installation section]]<br />
desc none<br />
</imagemap><br />
|-<br />
| style="vertical-align: top" |<br />
== Publication ==<br />
<br />
[http://www.biomedcentral.com/content/pdf/1471-2105-12-388.pdf iRefScape. A Cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex.]<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px|left<br />
default [http://www.biomedcentral.com/content/pdf/1471-2105-12-388.pdf]<br />
desc none<br />
</imagemap><br />
|-<br />
| style="vertical-align: top" |<br />
== Contact information and mailing list ==<br />
Join the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group] to be informed of updates. See also the [[iRefScape|latest release of iRefScape]] which may differ from the release described here.<br />
|<imagemap><br />
Image:Internet-mail-80x80.png<br />
default [http://groups.google.com/group/irefindex?hl=en]<br />
desc none<br />
</imagemap><br />
|}<br />
<br />
__TOC__<br />
<br />
== Compatibility Information ==<br />
<br />
See the following table for more detailed iRefScape compatibility information.<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Cytoscape<br />
! align="center" style="background:#f0f0f0;"|iRefScape<br />
|-<br />
| 2.8.1, 2.8.2<br />
| iRefScape 1.0 (described on this page)<br />
|-<br />
| 2.7.0<br />
| [[iRefScape 0.9]]<br />
|-<br />
| 2.6.3<br />
| [[iRefScape 0.8]]<br />
|}<br />
<br />
== Installing iRefScape ==<br />
<br />
The plugin can be installed using Cytoscape's plugin menu. Select...<br />
<br />
# "Manage plugins"<br />
# "Available for Install"<br />
# "Network and Attribute I/O"<br />
# "iRefScape" (where the precise version will provide a specific version such as "iRefScape 1.0")<br />
<br />
Then follow the on-screen instructions.<br />
<br />
{|class="wikitable" style="text-align:left; clear: left; border: 1px solid #cccccc" cellpadding="10"<br />
| style="vertical-align: top" |<br />
=== Installation Guide ===<br />
<br />
More detailed instructions, troubleshooting tips and alternative methods are available in the [[iRefScape 1.0 Installation|installation guide]].<br />
|<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[iRefScape 1.0 Installation|installation guide]]<br />
desc none<br />
</imagemap><br />
|}<br />
<br />
After, installation, select the "iRefScape" entry from Cytoscape's plugin menu.<br />
<br />
When the plugin is started for the first time, it will download the publicly available data set.<br />
<br />
=== Tested systems ===<br />
This version of the iRefScape plugin has been tested with the following system configurations:<br />
<br />
{| cellspacing="0" cellpadding="10" border="1" style="margin: 2em"<br />
! style="background:#f0f0f0;" | Operating System<br />
! style="background:#f0f0f0;" | Java Version<br />
|-<br />
| Red Hat Enterprise Linux 5 (32-bit) (kernel 2.6.18)<br />
| 1.6.0_01 (32-bit)<br />
|-<br />
| Microsoft Windows 7 (64-bit)<br />
| 1.6.0_25 (64-bit)<br />
|-<br />
| Microsoft Windows Vista (32-bit)<br />
| 1.6.0 (32-bit)<br />
|-<br />
| Ubuntu Linux 8.04 (32-bit)<br />
| 1.6 (32-bit)<br />
|-<br />
| Mac OS X 10.6 (64-bit)<br />
| 1.6.0_15 (32-bit)<br />
|}<br />
<br />
Please refer to the [[iRefScape 1.0 Installation|installation guide]] for more details on system configuration issues.<br />
<br />
=== Source Code ===<br />
<br />
Since iRefScape is made available under version 3 or later of the [http://www.gnu.org/licenses/gpl.html GNU General Public License], the source code is also made available:<br />
<br />
* iRefScape 1.18:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/e12f853c5951 Source browser]<br />
* iRefScape 1.17:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/0001288b7527 Source browser]<br />
* iRefScape 1.16:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/3ade99fc92b6 Source browser]<br />
* [http://irefindex.uio.no/hg/iRefScape/ iRefScape repository home]<br />
<br />
Please consult the <tt>README.txt</tt> file in the source distribution for information on building the software.<br />
<br />
== Using the Wizard - an example search ==<br />
<br />
Click the "Wizard" button - a pop-up window will appear. <br />
<br />
Follow the prompts. Here is an example search:<br />
<br />
# Select "Search protein-protein interactions for a protein".<br />
# Select "UniProt identifier".<br />
# For "Taxonomy identifier", select "9606 (Human)" <br />
# Type <tt>QCR2_HUMAN</tt> in the provided space. Click "Next".<br />
# Click "Search & load".<br />
<!-- commenting these out since they are outdated<br />
The images below show each of the steps in the wizard.<br />
<br />
<gallery perrow="5"><br />
Image:IRefIndex-Cytoscape-Wizard.png|The iRefIndex wizard<br />
Image:IRefIndex-Cytoscape-Wizard-step2.png|Choosing a result type<br />
Image:IRefIndex-Cytoscape-Wizard-step3.png|Choosing a taxonomy type<br />
Image:IRefIndex-Cytoscape-Wizard-step4.png|Specifying the search term<br />
Image:IRefIndex-Cytoscape-Wizard-step5.png|Additional options<br />
</gallery><br />
--><br />
<br />
== Using the Search Panel ==<br />
<br />
To perform a search, the following steps are involved:<br />
<br />
# Enter query term(s)<br />
# Select a search type<br />
# Select taxonomy/organism<br />
# Adjust search options (iterations, new view, canonical expansion) - this is optional<br />
# Start the search<br />
<br />
=== Enter query term(s) ===<br />
<br />
Queries may be loaded from a file or by pasting the query into the text box (one query per line). Multiple queries can also be separated by pipe characters (<tt>|</tt>) or by tab characters. Queries with spaces in them should be enclosed in double quotes.<br />
<br />
=== Select a search type ===<br />
<br />
Example searches are listed below.<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Search Type<br />
! align="center" style="background:#f0f0f0;"|Example<br />
! align="center" style="background:#f0f0f0;"|Notes<br />
|-<br />
| <tt>RefSeq_Ac</tt>||<tt>NP_996224</tt>||See http://www.ncbi.nlm.nih.gov/protein/221379660<br />
|-<br />
| <tt>UniProt_Ac</tt>||<tt>Q7KSF4</tt>||See http://www.uniprot.org/uniprot/Q7KSF4<br />
|-<br />
| <tt>UniProt_ID</tt>||<tt>Q7KSF4_DROME</tt>||See http://www.uniprot.org/uniprot/Q7KSF4<br />
|-<br />
| <tt>geneID</tt>||<tt>42066</tt>||See http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Retrieve&dopt=full_report&list_uids=42066<br />
|-<br />
| <tt>geneSymbol</tt>||<tt>cher</tt>||See http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Retrieve&dopt=full_report&list_uids=42066<br />
|-<br />
| <tt>mass</tt>||<tt>72854<-->72866</tt>||Search protein interactors for a range of molecular mass (in Da).<br />
|-<br />
| <tt>rog</tt>||<tt>10121899</tt>||Redundant object group: iRefIndex's internal identifier for a protein. See note feature i.rog.<br />
|-<br />
| <tt>PMID</tt>||<tt>14605208</tt>||PubMed Identifier where an interaction is described. See http://www.ncbi.nlm.nih.gov/pubmed. Iterations and "Use canonical expansion" have no effect on this search type. This search will return all protein interactors in the given PMID and will automatically draw all interactions known between these proteins (even if these interactions are supported by different PMIDs). Select edges in the resulting graph, and see the i.PMID attribute in the Edge Attribute Browser.<br />
|-<br />
| <tt>src_intxn_id</tt>||<tt>EBI-212627</tt>||Source interaction database identifier. Iterations and "Use canonical expansion" have no effect on this search type. Caution: multiple databases may have overlapping interaction record identifiers (e.g. <tt>147805</tt> returns records from both BIND and BioGrid) and there is no way to limit this search to a specific database at this time.<br />
Equivalent interactions from other databases will be automatically retrieved using this search type (see provided example).<br />
|-<br />
| <tt>omim</tt>||<tt>227650</tt>||OMIM identifier. See http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=227650<br />
|-<br />
| <tt>digid</tt>||<tt>449</tt>||Internal identifier for a group of phenotypically related diseases. See [[DiG: Disease groups]]. A digid can be found by first performing a search for some omim identifier - the digid will then appear as the i.digid node attribute.<br />
|-<br />
|style="background:#f0f0f0;" colspan="3" align="center"| Additional search types: first select from Advanced features/Preferences.<br />
|-<br />
| <tt>dig_title</tt>||<tt>fanconi</tt>||Non-exact text search of OMIM titles. Select matching titles from the Query Helper and press return to copy titles to search box. Then hit "Search and load". See [[DiG: Disease groups]].<br />
|-<br />
| <tt>ROGID</tt>||<tt>5IrM14EfdlehbVJ0WAcAoQM3pFw9606</tt>||Exact search results for ROGID of a protein. This searches the i.rogid_TOP node feature. Users can also generate a ROGID for an amino acid sequence and taxon identifier pair using the Wizard/Create SEGUID/ROGID for sequence tool. See PMID 18823568.<br />
|-<br />
| <tt>RIGID</tt>||<tt>cXAoT7JjMde7J+CN/2tOR6gETyA</tt>||Exact search results for RIGID of an interaction. This searches the i.rigid edge feature. See PMID 18823568.<br />
|-<br />
|}<br />
<br />
=== Select taxonomy/organism ===<br />
<br />
This will limit the search results to a particular organism. An organism can be selected from the list, or a taxonomy identifier can be entered into the field itself. See [http://www.ncbi.nlm.nih.gov/taxonomy Entrez Taxonomy] for more details on taxonomy identifiers. For most search types, it is acceptable to leave this field set to <tt>Any</tt>.<br />
<br />
=== Adjust search options ===<br />
<br />
The following optional adjustments can be made:<br />
<br />
==== Iterations ====<br />
<br />
A distance from the query list's members can be specified:<br />
<br />
* Selecting <tt>0</tt> will return only interactions between nodes found by the query list<br />
* Selecting <tt>1</tt> will return immediate neighbours of nodes in the query list<br />
<br />
==== Create new view ====<br />
<br />
A new view will be opened for the search results if this option is selected. Otherwise, the results will be added to the current view.<br />
<br />
==== Use canonical expansion ====<br />
<br />
Selecting this option will expand the search to include all proteins that are related to the query protein (for example, splice isoforms). See [[Canonicalization]] for technical details.<br />
<br />
=== Start the search ===<br />
<br />
Press the "Search and load" button to perform the search.<br />
<br />
{{Note|<br />
See the [[iRefScape Batch Files]] document for information on using text files to describe searches, annotate result nodes and to define new search types using user-supplied data.<br />
}}<br />
<br />
== Viewing the Results ==<br />
<br />
=== Colours and Shapes ===<br />
<br />
* Blue nodes corresponds to proteins found by your query<br />
* Green nodes are interacting partners for your query protein<br />
* Purple hexagons are complex-nodes (also called pseudo-nodes); they keep partners of a complex together (i.e. QCR6_HUMAN is found in two complexes also involving "QCR2_HUMAN")<br />
* Orange-yellow edges indicate protein-protein interactions and pink edges represent membership of some protein in a complex<br />
<br />
=== Toggling Edges ===<br />
<br />
Multiple edges may appear between two nodes. These represent separate interaction records that support this link. Details on each original record can be viewed using the edge attribute viewer (below). You can toggle this multi-view on and off by selecting "Toggle selected multi-edges" in the iRefScape/View Tools menu. Only one of the edges will be shown in the collapsed view.<br />
<br />
=== iRefScape Menu ===<br />
<br />
The iRefScape menu in the Cytoscape menu bar contains a number of other functions that may help with searching and viewing interaction data. These are described in more detail in the [[iRefScape plugin menu]] document.<br />
<br />
=== Expanding the Interaction Map ===<br />
<br />
You can search for additional interactions by right-clicking on a node and selecting "iRefIndex -- Retrieve interactions".<br />
<br />
Some example result displays are shown below.<br />
<br />
<gallery widths="500px" heights="300px"><br />
Image:QCR2_HUMAN_initial.png|Results<br />
Image:QCR2_HUMAN.png|Results (tidied)<br />
</gallery><br />
<br />
== Attributes ==<br />
<br />
[[Image:iRefIndex-0.83-node-attributes-close-up-closed.png|right|The node attributes menu]]<br />
<br />
There are two types of attributes available from iRefIndex: node attributes and edge attributes. These may be used to view information about selected nodes or edges (like <tt>i.taxid</tt>). Some features may allow the user to link out to additional data sources through the "right-click" menu (like <tt>i.geneID</tt>). Features may also be used to sort and select nodes and edges with specific attributes (like <tt>i.order</tt>). The <tt>i.query</tt> feature shows the user's query that is responsible for returning the node or edge.<br />
<br />
Brief descriptions and examples of each attribute are provided below. <br />
<br />
The user must first select the attributes that are to be displayed. This can be done by clicking on the "attribute" icon at the top of the node or edge attribute browser, as shown in the illustrative images.<br />
<br />
<div style="clear: right"></div><br />
=== Node Attributes ===<br />
<br />
[[Image:iRefIndex-0.83-node-attributes-close-up-open.png|right|The open node attributes menu]]<br />
<br />
Each node represents a distinct amino acid sequence (protein) from a distinct organism (taxonomy identifier). Each of the attributes below, provide additional information about the node. Although each node is distinct, a graph produced by iRefIndex may contain multiple nodes that are related proteins (such as splice isoform products from the same gene). These nodes will all have the same <tt>i.canonical_rog</tt> and <tt>i.canonical_rogid</tt> feature values. See the notes below.<br />
<br />
Node attributes that can be lists of items (like <tt>i.UniProt</tt>) will have a corresponding attribute called <tt>i.''attribute name''_TOP</tt> (for example, <tt>i.UniProt_TOP</tt>) which provides the first item of the associated list.<br />
<br />
<div style="clear: right"></div><br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Attribute name<br />
! align="center" style="background:#f0f0f0;"|Data type<br />
! align="center" style="background:#f0f0f0;"|Example value<br />
! align="center" style="background:#f0f0f0;"|Description<br />
|-<br />
| <tt>ID</tt>||Integer||<tt>10121899</tt>||This is a unique identifier for the node assigned by iRefIndex (no two nodes will have the same ID). Each node corresponds to a distinct amino acid sequence from a distinct taxonomy identifier. See also <tt>i.rog</tt> and <tt>i.rogid</tt>.<br />
|-<br />
| <tt>canonicalName</tt>||Integer||<tt>10121899</tt>||This is the same as <tt>ID</tt>. This attribute is set by Cytoscape and is unrelated to the <tt>i.canonical_rog</tt> or <tt>i.canonical_rogid</tt> used by iRefIndex<br />
|-<br />
| <tt>i.RefSeq_Ac</tt>||List||<tt>[NP_996224]</tt> ||All RefSeq accessions with an amino acid sequence and taxon identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[RefSeq_Ac]'' on the web -- Entrez -- Protein" for more information. See also <tt>i.RefSeq_TOP</tt> for the first entry in this list of accessions.<br />
|-<br />
| <tt>i.UniProt_Ac</tt>||List||<tt>[Q7KSF4]</tt>||All UniProt accessions with an amino acid sequence and taxonomy identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[UniProt_Ac]'' on the web -- UniProt -- KB Beta" for more information. See also <tt>i.UniProt_Ac_TOP</tt> for the first entry in this list of accessions.<br />
|-<br />
| <tt>i.UniProt_ID</tt>||List||<tt>[Q7KSF4_DROME]</tt> ||All UniProt identifers with an amino acid sequence and taxonomy identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[UniProt_ID]'' on the web -- UniProt -- KB Beta" for more information. See also <tt>i.UniProt_ID_TOP</tt> for the first entry in this list of IDs.<br />
|-<br />
| <tt>i.canonical_rog</tt>||Integer||<tt>10121899</tt>||Related proteins (say splice isoforms from the same gene) will all belong to the same canonical group. One member of this group is assigned as the canonical representative of this group. The <tt>i.canonical_rog</tt> attribute lists the identifier of the protein's canonical group identifier. For example, all products of Entrez Gene 42066 have the same <tt>i.canonical_rog</tt> (<tt>10121899</tt>). Each of these gene products has its own identifier (because they each have a distinct amino acid sequence). One of the splice isoforms (<tt>NP_996224</tt>) was chosen as the canonical representative of this group. See the [http://irefindex.uio.no/wiki/Canonicalization canonicalization document] for more details on how canonical groups are constructed and how canonical representatives are chosen.<br />
|-<br />
| <tt>i.canonical_rogid</tt>||String||<tt>1ZFb1WlW0OgOlhiAPtkJTdb6oOg7227</tt>||This is a unique alphanumeric key for the canonical representative of the canonical group to which this node belongs. Briefly, an SHA-1 digest of the amino acid sequence is used to generate a unique 27 character key and this is prepended to the taxonomy identifier for the protein's source organism in order to make the rogid. See PMID 18823568 for details on how this key can be generated. This is a string equivalent of the <tt>i.canonical_rog</tt> attribute. All <tt>i.canonical_rog</tt> instances (each being an integer) have one corresponding <tt>i.canonical_rogid</tt>. See the [http://irefindex.uio.no/wiki/Canonicalization canonicalization document] for more details on how canonical groups are constructed and how canonical representatives are chosen. Note that the rogid for the protein represented by this specific node is listed under <tt>i.rogid</tt>.<br />
|-<br />
| <tt>i.dataset</tt>||Integer||<tt>0</tt>||In the batch query mode this can be used to locate the query batch (i.e. which group of queries were responsible for the node). In single query mode, when a sequence of queries are issued one after another this variable can be used to distinguish the results from each step. All nodes with a i.dataset value higher than 999 can be found using more than one batch of queries. <br />
|-<br />
| <tt>i.digid</tt>||List||<tt>449</tt>||This is an integer identifier that is shared by a group of disease entries in OMIM that are related by their titles. See [[DiG: Disease groups]] for more details. Also see <tt>i.omim</tt> and <tt>i.dig_title</tt>.<br />
|-<br />
| <tt>i.dig_title</tt>||List||<tt>[Fanconi anemia, complementation group B, 300514 (3), VACTERL association with hydrocephalus, X-linked, 314390 (3)]</tt>||These are entries from OMIM's Morbid Map that are all part of the same disease group. See [[DiG: Disease groups]] for more details. Also see <tt>i.omim</tt> and <tt>i.digid</tt>.<br />
|-<br />
| <tt>i.displayLabel</tt>||List||<tt>[Q7KSF4_DROME]</tt> ||This is a list of short labels chosen by iRefIndex to label the node using the VizMapper. The UniProt identifier is preferentially chosen (if one is available) followed by the Entrez Gene Symbol. See also <tt>i.displayLabel_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.geneID</tt>||List||<tt>[42066]</tt>||All NCBI Entrez Gene identifiers that encode a protein sequence identical to that of this node. Right click on this entry and select "Search ''[geneID]'' on the web -- Entrez -- Gene" for more information. See also <tt>i.geneID_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.geneSymbol</tt>||List||<tt>[CHER]</tt>||All NCBI Entrez Gene official symbols that encode a protein sequence identical to that of this node. Right click on this entry and select "Search ''[geneSymbol]'' on the web -- Entrez -- Gene" for more information. See also <tt>i.geneSymbol_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.interactor_description</tt>||List||<tt>[Q7KSF4_DROME, CHER, DMEL_CG3937, SKO, DMEL CG3937, FLN, CG3937, CHER, DMEL\\CG3937, FLN, SKO, CHER, NAME=CHER, DMEL_CG3937]</tt>||A collection of all the names in their short form as given by the original interaction databases. See also <tt>i.interactor_description_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.mass</tt>||Integer|| <tt>259142</tt> ||Mass associated with the protein sequence for this node. From UniProt, if available. You can search for nodes inside a mass range using the <tt>mass</tt> search type in the iRefIndex plugin.<br />
|-<br />
| <tt>i.omim</tt>||List||<tt>[608053]</tt>||List of OMIM disease identifiers associated with this protein. Right click on the entry and select "Search for ''[omim]'' on the web -- Entrez -- OMIM" for more information. <br />
|-<br />
| <tt>i.order</tt>||Integer|| <tt>0</tt> || The distance of this node from the query node (query node has distance <tt>0</tt>, nodes that are returned by a query because they are a part of the same canonical group have a value of <tt>10</tt>, direct neighbours have a value of<tt>1</tt>). Pseudonodes have negative values (<tt>-1</tt> is a complex holder, <tt>-2</tt> is a collapsed instance).<br />
|-<br />
| <tt>i.overall_degree_TOP</tt>||Integer|| <tt>42</tt> ||The total number of interactions described for this node in the iRefIndex database. Not all of these edges will be necessarily shown in the current view. This is the node degree in the full iRefIndex interactome. When calculating the value of this all proteins in iRefIndex (not only the ones currently loaded) will be used<br />
|-<br />
| <tt>i.popularity</tt>||List|| <tt>42</tt> || '''TO BE DESCRIBED'''<br />
|-<br />
| <tt>i.pseudonode</tt>||Boolean|| <tt>false</tt> || This is set to true is the node represents a "complex" or n-ary interaction record. Protein nodes with edges incident to a pseudonode are member interactors from the interaction record where specific interactions between pairs of interactors is unknown. Pseudonodes appear as hexagons when using the iRefIndex VizMapper style. <br />
|-<br />
| <tt>i.query</tt>||String||<tt>NP_996224</tt>||The user query used to retrieve this specific node. Neighbours of "query" nodes will not have an <tt>i.query</tt> value. Nodes returned by queries are coloured blue when using the iRefIndex VizMapper style.<br />
|-<br />
| <tt>i.rog</tt>||Integer||<tt>10121899</tt>||This is a unique identifier for the node assigned by iRefIndex (no two nodes will have the same ID). Each node corresponds to a distinct amino acid sequence associated with a distinct taxonomy identifier. <tt>i.rog</tt> also appears as the <tt>ID</tt> attribute. Each <tt>i.rog</tt> has a corresponding <tt>i.rogid</tt> - see below.<br />
|-<br />
| <tt>i.rogid</tt>||String||<tt>2mL9oLZ9g/SSPyK0nOz97RmOzPg3702</tt>||This is a unique alphanumeric key for the protein represented by this node. Briefly, an SHA-1 digest of the amino acid sequence is used to generate a unique 27 character key and this is prepended to the taxonomy identifier for the protein's source organism in order to make the rogid. See PMID 18823568 for details on how this key can be generated. This is a string equivalent of the <tt>i.rog</tt> attribute. All <tt>i.rog</tt> instances (each being an integer) have one corresponding <tt>i.rogid</tt>.<br />
|-<br />
| <tt>i.taxid</tt>||Integer||<tt>7227</tt>||The NCBI taxonomy identifier for this protein's source organism. See http://www.ncbi.nlm.nih.gov/taxonomy?term=7227 for more details of this example value for <tt>i.taxid</tt>.<br />
|-<br />
| <tt>i.xref</tt>||List||<tt>[AAF70826.1,Q9M6R5]</tt> ||All the accessions as given by the original interaction database records to describe this protein. See also <tt>i.xref_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.alive</tt>||Boolean||<tt>true or false</tt> ||This is true for all nodes after a search operation. This variable is used by the iRefScape filter and after a filter is applied, all nodes matching the filter criteria will have a true value for this variable (all other nodes will have false).<br />
|-<br />
| <tt>i.alive_degree</tt>||Integer||<tt>0,1,2-...</tt> ||This is will give the node degree after a search. When an iRefScape filter is applied this will give the number of nodes with "i.alive=true" connected to a particular node(How many nodes matching the filter criteria has connections with a particular node). <br />
|-<br />
|}<br />
<br />
===Edge Attributes===<br />
<br />
Each edge represents a distinct primary database record that supports some relationship between the two incident nodes. So, if an interaction between two proteins has been annotated by two databases (or twice by the same database) then two edges will appear between those two protein nodes.<br />
<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Attribute name<br />
! align="center" style="background:#f0f0f0;"|Data type<br />
! align="center" style="background:#f0f0f0;"|Example value<br />
! align="center" style="background:#f0f0f0;"|Description<br />
|-<br />
| <tt>ID</tt>||String||<tt>10121899 (2771704(40952)) 13911416</tt>||This is a unique identifier for the edge assigned by Cytoscape (no two edges will have same <tt>ID</tt>). See <tt>i.rig</tt> and <tt>i.rigid</tt> for unique identifiers for the edge assigned by iRefIndex.<br />
|-<br />
| <tt>i.PMID</tt>||Integer||<tt>14605208</tt>||Publication identifier of the publication where the interaction represented by the edge mentioned. Right click on this entry and select "Search ''[PMID]'' on the web -- Entrez -- Pubmed" for more details on the publication.<br />
|-<br />
| <tt>i.bait</tt>||Integer||<tt>13911416</tt>||Node ID for the protein that was used as a bait in this experiment. Only applicable where the experimental system (see <tt>i.method_name</tt>) used to support this relationship was a bait-prey system (for example, two hybrid).<br />
|-<br />
| <tt>i.canonical_rig</tt>||Integer||<tt>27799</tt>||See notes for the <tt>i.rig</tt> edge feature. This is the rig constructed for the interaction using its canonical rogs. Use a web browser to query http://wodaklab.org/iRefWeb/interaction/show/27799 (where <tt>27799</tt> is the <tt>i.canonical_rig</tt> value) to retrieve more information on this interaction and equivalent source interaction records.<br />
|-<br />
| <tt>i.experiment</tt>||String||<tt>Giot L [2003]</tt>||A short label for the experiment where this interaction was found (usually contains authors names).<br />
|-<br />
| <tt>i.flag</tt>||Integer||<tt>1</tt>||Used by iRefIndex plugin to control display of edges (<tt>0</tt> being the representative edge, used in edge toggle; <tt>1</tt> being an edge which will disappear during edge toggle; <tt>2</tt> being a complex holder edge; <tt>6</tt> being a path; <tt>7</tt> being an edge from or to a collapsed node).<br />
|-<br />
| <tt>i.host_taxid</tt>||Integer||<tt>7227</tt>||Indicates the organism taxonomy identifier where the interaction was experimentally demonstrated.<br />
|-<br />
| <tt>i.isLoop</tt>||Integer||<tt>1</tt>||Indicates whether the interaction is a self interaction (such as a dimer or possibly multimer of the same protein type). See the source interaction record for details.<br />
|-<br />
| <tt>i.method_cv</tt>||String||<tt>MI:0018</tt>||PSI-MI controlled vocabulary term identifier for the method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The name of the method is also given in the <tt>i.method_name</tt> feature.<br />
|-<br />
| <tt>i.method_name</tt>||String||<tt>two hybrid</tt>||PSI-MI controlled vocabulary term name for the method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term identifer is also given in the <tt>i.method_cv</tt> feature.<br />
|-<br />
| <tt>i.participant_identification</tt>||String||<tt>predetermined participant</tt>||PSI-MI controlled vocabulary term for the participant identification method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The identifier for the term is also given in the <tt>i.participant_cv</tt> feature.<br />
|-<br />
| <tt>i.participant_cv</tt>||String||<tt>predetermined participant</tt>||PSI-MI controlled vocabulary term identifier for the participant identification method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.participant_identification</tt> feature.<br />
|-<br />
| <tt>i.query</tt>||String||<tt>NP_996224</tt>||The user's query that is responsible for returning this edge.<br />
|-<br />
| <tt>i.rig</tt>||Integer||<tt>27799</tt>||Redundant interaction group identifier for the interaction. <br />
This is an integer equivalent of <tt>i.rigid</tt>. Every rig has one corresponding rigid.<br />
|-<br />
| <tt>i.rigid</tt>||String||<tt>TAabV6yJ1XzUvEhYwZLpu5reBU0</tt>||Redundant interaction group identifier for the interaction. This is a universal key generated for the interaction by ordering according to ASCII value and concatentating the rogids participating in the interaction and then generating a Base-64 representation of an SHA-1 digest of the resulting string. See PMID 18823568 for details on how this key can be generated.<br />
|-<br />
| <tt>i.score_hpr</tt>||Integer||<tt>15</tt>||The hpr score (highest pmid re-use) is the highest number of interactions that any one PMID (supporting this interaction) is used to support. See PMID 18823568 for details. See also <tt>i.score_np</tt> and <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.score_lpr</tt>||Integer||<tt>11</tt>||The lpr score (lowest pmid re-use) is the lowest number of distinct interactions that any one PMID (supporting this interaction) is used to support. An lpr of greater than 20 is considered to be a high-throughput experiment. See PMID 18823568 for details. See also <tt>i.score_np</tt> and <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.score_np</tt>||Integer||<tt>2</tt>||Number of PubMed Identifiers (PMIDs) pointing to literature where this interaction is supported. See PMID 18823568 for details. See also <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.source_protein</tt>||Integer||<tt>-1</tt>||'''TO BE DESCRIBED'''<br />
|-<br />
| <tt>i.src_intxn_db</tt>||String||<tt>grid</tt>||Original interaction database where this interaction record was obtained.<br />
|-<br />
| <tt>i.src_intxn_id</tt>||String||<tt>38677</tt>||Original interaction database where this interaction record was obtained. <br />
In some case, it may be possible to right click and "Search ''[src_intxn_id]'' on the web -- Interaction databases -- the database" to see the original record.<br />
|-<br />
| <tt>i.type_cv</tt>||String||<tt>MI:0407</tt>||PSI-MI controlled vocabulary term identifier for the interaction type that occurs between the two proteins. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.type_name</tt> feature.<br />
|-<br />
| <tt>i.type_name</tt>||String||<tt>direct interaction</tt>||PSI-MI controlled vocabulary term identifier for the interaction type that occurs between the two proteins. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.type_name</tt> feature.<br />
|-<br />
| <tt>i.target_protein</tt>||Integer||<tt>-1</tt>||'''TO BE DESCRIBED'''<br />
|-<br />
|}<br />
<br />
=== User Attributes ===<br />
<br />
See [[iRefScape Batch Files]] for information on adding attributes to search results.<br />
<br />
== Obtaining Updates to the Data ==<br />
<br />
You can check for and download updates to the dataset used by your plugin using the Wizard (see "Check for iRefIndex updates").<br />
<br />
iRefIndex updates are announced through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group]<br />
<br />
==Obtaining Updates to the Plugin==<br />
<br />
If you already have a plugin called iRefScape (a menu entry "iRefScape" under the plugin menu of Cytoscape) and you want to make sure you have the latest version, use "Update plugins" from the "Plugins" menu. However, if you want to reinstall the plugin, you should uninstall any previous version of the plugin first.<br />
<br />
Plugin updates are announced through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group]<br />
<br />
<!--<br />
<br />
==Integrating User Data into the Plugin==<br />
<br />
===How to create node and edge attributes ===<br />
<br />
Example: Attaching [[DiG: Disease groups]] identifiers to nodes<br />
<br />
==Updating==<br />
# From Cytoscape updater<br />
# Using plugins update feature<br />
<br />
== Log Files, Search Details and Errors ==<br />
# How to interpret log messages and save them for later reference. <br />
<br />
==Using the plugin as a search tool ==<br />
The plugin could also be used to search the current network. However, there is a better search option in Cytoscape with Google suggest which may be more convenient to use. The reason for including the search function was that the Cytoscape search filed remained inactive on some occasions for networks crated using the plugin. The reason for this is still unknown and deleting a node on the network seems to activate it, when this bug will be fixed the users are encouraged to use the Cytoscape search option.<br />
Currently, if a user performs a search with a term and if the corresponding protein is already loaded, the loaded protein (corresponding node) would be highlighted with Cytoscape default highlight colors. <br />
<br />
<br />
== Exit plugin and force terminate operations ==<br />
The exit button performs two functions. <br />
# First one is to exit iRefIndex plugin, where the outcome is to detach the plugin from Cytoscape. <br />
# The second function "FORCE STOP" (only available during a active task) is to terminate current operation. The "FORCE STOP" is useful when the search query or a subsequent operation takes too long to finish or none-responding. When a force stop is performed the out come is unpredictable and behavior was undefined, therefore results after such operation could not be trusted. <br />
<br />
--><br />
<br />
==Advanced features==<br />
<br />
The advanced features panel holds a number of tabbed panels, most of which expose settings which can be adjusted to change the behaviour of the normal search operations. Many panels offer contextual help via the iRefScape help system, but a brief description of each panel is also given here.<br />
<br />
{| cellpadding="10" cellspacing="0" border="1"<br />
! Preferences<br />
| This panel configures the range of search types (such as <tt>UniProt_Ac</tt>) presented in the main query interface. More search types can be added, and existing search types can be removed.<br />
|-<br />
! Statistics<br />
| A selection of statistics measures for the current network can be calculated and displayed using this panel.<br />
|-<br />
! Compare<br />
| This panel configures the <tt>COMPARE</tt> search operation and the equivalent functionality in the "Grouping" submenu of the iRefScape menu.<br />
|-<br />
! Summary<br />
| This panel generates node-by-node summaries where the attributes of each selected node (or of all nodes in the current network, if no nodes are selected) are presented in a separate table in the help viewer.<br />
|-<br />
! Filter<br />
| As an alternative to the manual selection of nodes and edges using the graphical user interface, this panel permits the selection of nodes and edges according to certain criteria based on node and edge attributes.<br />
|-<br />
! Path parameters<br />
| This pane provides options that configure the path-finding functionality described below.<br />
|-<br />
! Loading options<br />
| The options presented here affect the retrieval of data in search operations, including or excluding certain kinds of data (such as lists of values for certain attributes) in order to either simplify the results or speed up each search operation.<br />
|-<br />
! Import<br />
| The import panel provides the ability to import a generic Cytoscape network into iRefScape by interpreting node attributes as iRefScape queries.<br />
|-<br />
! Export<br />
| The export panel provides the ability to export an iRefScape network in such a way that other Cytoscape plugins may be able to access and manipulate the network's essential information.<br />
|}<br />
<br />
=== Path-finding ===<br />
<br />
[[Image:NP_002515-NP_742031.png|thumb|187px|The path in the results, highlighted in green. Solid green lines indicate presence of evidence for this step of the path in the direction specified by the query ''or'' the presence of evidence that has no directionality. A dashed green line indicates there is evidence for this step of the path but only in the direction that is opposite to that specified in the query.]]<br />
<br />
iRefScape can be used to find interaction events connecting two proteins or a sequence of events involving several proteins. <br />
<br />
This process intakes two terminal nodes as input and returns all reasonable paths connecting these two. The results returned here are pathway independent. In other words, the sequences of interactions connecting the nodes are not constructed using currently published pathways. However, the paths returned may contain pathway centric information.<br />
<br />
The query format is as follows:<br />
<br />
NP_203524 <==> NP_002871<br />
<br />
Additional type and taxonomy parameters were also supplied as required:<br />
<br />
* '''Search type:''' <tt>RefSeq_Ac</tt><br />
* '''Taxonomy:''' <tt>9606 (Homo sapiens)</tt><br />
<br />
This query located all reasonable paths between <tt>NP_203524</tt> and <tt>NP_002871</tt> and the returned path also contains the shortest path between them. The results of the path finding was sorted in the ascending order of path length and the maximum path length was restricted to a default value of 6; this value can be modified by changing the value of "Maximum distance" from the "Path parameters" tab in the advanced options panel. The paths found in this way were "reasonable paths", this concept is different from finding the shortest path or finding all the paths. A "reasonable path" from A to B is a path extending from A to B where none of the intermediate points can be reached from A with fewer steps by a path that extends from A via B (in other words, when evaluating a path from A to B, nodes beyond B are not considered).<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Reversing the Path ===<br />
<br />
[[Image:NP_742031-NP_002515.png|thumb|187px|The path in the results, highlighted in green]]<br />
<br />
The query rewritten to find the reversed path is as follows:<br />
<br />
NP_002871 <==> NP_203524<br />
<br />
In this case, the same nodes and edges are retrieved and the path is merely reversed.<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Differences in Forward and Reverse Directions ===<br />
<br />
[[Image:P62070-Q13322.png|thumb|198px|The path in the results, highlighted in green. Here, many nodes have been hidden in order to show the nodes involved in the path.]]<br />
<br />
Consider the following path query (using <tt>UniProt_Ac</tt> as the search type:<br />
<br />
P62070 <==> Q13322<br />
<br />
This produces a network of 214 nodes and 253 edges, and the result is shown in the illustration.<br />
<br />
<div style="clear: right"></div><br />
<br />
[[Image:Q13322-P62070.png|thumb|270px|The reverse path in the results, highlighted in green. Here, many nodes have been hidden in order to show the nodes involved in the path.]]<br />
<br />
However, when searching with the accessions reversed...<br />
<br />
Q13322 <==> P62070<br />
<br />
...a network of 46 nodes and 91 edges was produced, as illustrated.<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Path Selection ===<br />
<br />
[[File:IRefScape-1.18-path-selector.png|thumb|500px|The path selector for the results]]<br />
<br />
After the path-finding is completed the "path selection" panel can be used to selectively load the paths. In order to make the selection easier, the paths found can be described by a particular attribute type: by selecting a value from the list for "Convert pop-up type to" (such as <tt>UniProt_Ac</tt>) and pressing the "Convert" button, a tooltip appearing over each path description will show the requested attribute values for each component of the path. Thus, a path description such as...<br />
<br />
4664766 -> 2079075 -> 4770079<br />
<br />
...will provide a tooltip showing the following identifiers:<br />
<br />
Q13322 -> P06241 -> P62070<br />
<br />
A "query helper" panel will also show the converted identifiers.<br />
<br />
=== List Comparison ===<br />
<br />
This feature is available with version 0.91 and later.<br />
<br />
This feature provides a way to compare two lists of proteins. When a <tt>COMPARE{<List1>,<List2>}</tt> format query is issued with default settings an interaction network is loaded with interactions involving only the proteins of the list and proteins which are not in the list but interacts with at least two proteins from each list (intermediate components). At the end of the operation, in addition to the Cytoscape network a adjacency cube (adjacency matrix with colours as the third dimension) is also created. This adjacency cube is synchronized with the network and can be used examine the results easily. A summary report function is provided to list the overall summary of each protein in the list sorted order so that the most connected protein appear first. The identifiers used to display the proteins in the adjacency cube are either iROGID or the ROGID of complexes. The user has the option to visualize these in popular identifier types using convert feature.<br />
<br />
An example query (from PMID:20670417):<br />
<br />
COMPARE{P08588,P16671|P07550,P13945}<br />
<br />
This query compares two groups:<br />
<br />
# P08588,P16671<br />
# P07550,P13945<br />
<br />
Members within the group are separated with a comma (<tt>,</tt>); groups are separated by a pipe (<tt>|</tt>).<br />
<br />
====Questions and answers about list comparison====<br />
<br />
''What is the maximum number of members a group can have?''<br />
<br />
You could have any number of members. The more members there are, the more time it will take for the operation, and the more memory it will need. For instance the above example search will complete comfortably in 1 minute with 256MB of allocated memory. If you have more than 100 members we recommend having at least 1GB dedicated memory for Cytoscape. <br />
<br />
''Can I compare more than two groups?''<br />
No. Only two groups could be compared in the current version. If a protein appears in both groups being compared these proteins will be treated as a third group. But this third group is defined after the execution. <br />
<br />
''What if a protein or protein resulting from query appears in more than one group?''<br />
<br />
All proteins found in more than one group are treated as a new group (group 3).<br />
<br />
==Troubleshooting==<br />
<br />
* See http://cytoscape.org/ for a manual and a set of tutorials which describe the installation and use of Cytoscape.<br />
* For problems with Cytoscape installation or use, try the [http://groups-beta.google.com/group/cytoscape-helpdesk Cytoscape Help Desk].<br />
* If you have problems with installation or use, please share your experience with us through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group].<br />
* When updating data on Microsoft Windows XP and Vista, a "Failed to find resources message" may appear in the log message window. If this happens please run the update again and the plugin will check and correct the problem during the second attempt.<br />
* If you are working with large graphs, make sure Cytoscape has at least 128MB memory. See the [http://cytoscape.org/cgi-bin/moin.cgi/How_to_increase_memory_for_Cytoscape Cytoscape documentation] for more information on setting up memory allowances.<br />
<br />
<br />
==Internal Testing==<br />
Our internal test results for this release of the plugin can be found on the [[iRefScape Test Cases 1.0]] page.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Donaldson_Group&diff=4111
Donaldson Group
2012-11-15T16:05:43Z
<p>PaulBoddie: Removed navigational icons from imagemaps using "desc none".</p>
<hr />
<div>__NOTOC__<br />
<br />
= The Donaldson Group at the Biotechnology Centre of Oslo =<br />
<br />
<div class="floatright"><br />
<imagemap><br />
Image:BiO-logo-liten-pms-border.png<br />
default [http://www.biotek.uio.no]<br />
desc none<br />
</imagemap><br />
<br />
<facebook-like /><br />
</div><br />
<br />
== Research Interests ==<br />
<br />
Our primary interests include protein interaction data consolidation, text mining and data mining especially with respect to diseases. <br />
<br />
Our recent work on a consolidated protein interaction database can be found at http://irefindex.uio.no/ .<br />
<br />
Email: ian.oslo@gmail.com<br />
<br />
== Projects ==<br />
<br />
{|class="wikitable" style="text-align:left; clear:left" border="0" cellpadding="10"<br />
<br />
|-<br />
|<imagemap><br />
Image:iRefIndex_logo.png|100x100px<br />
default [[iRefIndex]]<br />
desc none<br />
</imagemap><br />
|<br />
=== [[iRefIndex | iRefIndex, iRefWeb, iRefScape, iRefR]] ===<br />
<br />
[[iRefIndex|http://irefindex.uio.no/]]<br/> iRefIndex (interaction Reference Index) provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.<br />
<br />
iRefIndex is available via a number of interfaces: in MITAB tab-delimited text (iRefIndex), web-site (iRefWeb), Cytoscape plugin (iRefScape) and an R package (iRefR). <br />
<br />
|-<br />
|<imagemap><br />
Image:Magrathea_logo.png|100x100px<br />
default [[Magrathea]]<br />
desc none<br />
</imagemap><br />
|<br />
=== [[Magrathea]] ===<br />
<br />
[[Magrathea|http://magrathea.uio.no/]]<br/> Magrathea is prototype software demonstrating how animations of molecular pathways can be driven automatically using local context of the participant molecules. <br />
<br />
|-<br />
|<imagemap><br />
Image:ancientlibraryalex.jpg|100x100px<br />
default [[The Biolibrarian Proposal]]<br />
desc none<br />
</imagemap><br />
|<br />
=== [[The Biolibrarian Proposal]] ===<br />
<br />
The Biolibrarian proposal proposes the creation of new positions at university libraries around the world. <br />
These people would act as local biocurators that help local university researchers submit data to relevant biological databases.<br />
<br />
|-<br />
|<imagemap><br />
Image:Vitruvian_man.jpg|100x150px<br />
default [[DiG:_Disease_groups]]<br />
desc none<br />
</imagemap><br />
|<br />
=== [[DiG: Disease groups|DiG: Disease Groups]] ===<br />
<br />
[[DiG:_Disease_groups|http://donaldson.uio.no/wiki/DiG:_Disease_groups]]<br/> The Disease Groups project groups together phenotypically related disease-gene associations found in OMIM's Morbid Map. The resulting map of disease genes may be used to explore relationships between disease genes in the human protein-interactome.<br />
<br />
|-<br />
|<imagemap><br />
Image:Bioscape_logo.gif|140x140px<br />
default [[Bioscape]]<br />
desc none<br />
</imagemap><br />
|<br />
=== [[Bioscape]] ===<br />
<br />
http://bioscape.uio.no/<br/> Bioscape is our in-house text-mining system used to locate gene and protein mentions in PubMed abstracts.<br />
|}<br />
<br />
== Group Members ==<br />
<br />
* [http://www.biotek.uio.no/english/about/organization/donaldson-group/people/iand/ Ian Donaldson]<br />
* [http://www.biotek.uio.no/english/about/organization/donaldson-group/people/paulbodd/ Paul Boddie]<br />
<br />
== Past Group Members ==<br />
* Katerina Michalickova<br />
* Hanna Nemchenko<br />
* Sabry Razick: Now in Trondheim at [http://www.ntnu.edu/employees/sabry.razick NTNU].<br />
* [[Antonio Mora]]<br />
<br />
==Local Seminar Series==<br />
<br />
The Biotechnology Centre of Oslo holds a weekly [[Bioseminar|Tuesday seminar]] at Forskningsparken, Gaustadalléen 21, Oslo.<br />
<br />
The [http://www.ifi.uio.no/research/clsi/seminars.html Computational Life Science seminars] are held every Wednesday at Ole-Johan Dahls hus, located at Gaustadalléen 23D, Oslo (opposite the Forskningsparken main entrance).<br />
<br />
==Courses==<br />
<br />
{|class="wikitable" style="text-align:left" border="0" cellpadding="10"<br />
|-<br />
|<imagemap><br />
Image:Bioinfo_course_logo.jpg|100x100px<br />
default [[Bioinformatics course]]<br />
desc none<br />
</imagemap> ||<br />
=== [[Bioinformatics_course|Bioinformatics for molecular biology]] ===<br />
<br />
A new, two-week, intensive bioinformatics course that covers various aspects of bioinformatics analyses for molecular biology. Statistics, multiple hypothesis testing, microarray analysis, sequence alignments, working with protein structures, protein interaction networks and more. See the [[Bioinformatics course|course page]] for schedule information along with all material used in the course. The course is composed of lectures and practical tutorials. <br />
|}<br />
<br />
Introductory Perl is taught by Antonio Mora and Ian Donaldson as part of the [http://www.uio.no/studier/emner/matnat/molbio/MBV3070/ MBV3070] course. The slides for these lectures are available here at [[MBV3070|Perl lectures for MBV3070]].<br />
<!--Antonio Mora and Ian Donaldson also hold the "Applied readings in mathematics, computer science and biology" course every second Autumn term. See [http://www.uio.no/studier/emner/matnat/molbio/MBV-INF4410/ MBV-INF4410].<br />
--><br />
<br />
Ian Donaldson is organizing this year's Molecular Biotechnology Course at the Biotechnology Centre of Oslo. You can find the MBV9100 course web page [https://www.biotek.uio.no/events/courses_workshops/2011/MBV9100BTS.html here] and the latest schedule [[MBV9100|here]].<br />
<br />
==Contact==<br />
<br />
ian.oslo at gmail.com</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex&diff=4110
iRefIndex
2012-11-15T16:04:27Z
<p>PaulBoddie: Removed navigational icons from imagemaps using "desc none".</p>
<hr />
<div>__NOTOC__<br />
<br />
<div style="float:right"><br />
<facebook-like /><br />
</div><br />
<br />
<div class="floatleft"><br />
<imagemap><br />
Image:iRefIndex_logo.png|120x120px<br />
default [[iRefIndex#A_reference_index_for_protein_interaction_data]]<br />
desc none<br />
</imagemap><br />
</div><br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, InnateDB, IntAct, MatrixDB, MINT, MPact, MPIDB, MPPI and OPHID. <br />
<br />
The iRefScape plugin for Cytoscape has been published recently [http://www.biomedcentral.com/1471-2105/12/388 here].<br />
<br />
[[iRefIndex#A_reference_index_for_protein_interaction_data|Read more]]<br />
<br />
{|class="wikitable" style="text-align:left; clear: left" border="0" cellpadding="10"<br />
|<imagemap><br />
Image:Document-save-80x80.png<br />
default [[README_MITAB2.6_for_iRefIndex]]<br />
desc none<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[README_MITAB2.6_for_iRefIndex|Download]] ===<br />
<br />
Download the current iRefIndex release in PSI-MITAB tab-delimited format via FTP.<br />
|<imagemap><br />
Image:Accessories-text-editor-80x80.png<br />
default [[iRefIndex_Release_Notes]]<br />
desc none<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex Release Notes]] ===<br />
<br />
View release notes and news for each release of iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px<br />
default [[iRefIndex_Citations]]<br />
desc none<br />
</imagemap><br />
| style="vertical-align: top" colspan="3" |<br />
=== [[iRefIndex_Citations | Publications, citing, citations and further reading]] ===<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:CytoscapeLogo.png|80x80px<br />
default [[iRefScape]]<br />
desc none<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefScape|iRefScape]] ===<br />
<br />
iRefScape is a plugin for [http://www.cytoscape.org/ Cytoscape] that exposes iRefIndex data as a navigable graphical network.<br />
|<imagemap><br />
Image:firefox.png|80x80px<br />
default [http://wodaklab.org/iRefWeb/]<br />
desc none<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [http://wodaklab.org/iRefWeb/ iRefWeb] ===<br />
<br />
iRefWeb provides a searchable web interface to the iRefIndex. This interface was developed as part of a collaboration with the Wodak group at the hospital for Sick Children in Toronto, Canada.<br />
|-<br />
|<imagemap><br />
Image:R-logo.jpg|80x80px<br />
default [[iRefR]]<br />
desc none<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefR]] ===<br />
<br />
An R package providing access to iRefIndex data.<br />
|<imagemap><br />
Image:Applications-internet-80x80.png<br />
default [[README_PSICQUIC_web_services_for_iRefIndex]]<br />
desc none<br />
</imagemap><br />
| style="vertical-align: top" |<br />
<br />
=== [[PSICQUIC|Web services]] ===<br />
<br />
iRefIndex PSICQUIC and PSISCORE web services are now running on release 9.0 of iRefIndex <br />
|-<br />
|<imagemap><br />
Image:Youtube-256.png|80x80px<br />
default [[iRefIndex_Videos]]<br />
desc none<br />
</imagemap> <br />
| style="vertical-align: top" |<br />
<br />
=== [[iRefIndex Videos|iRefIndex videos]] ===<br />
<br />
Video learning materials for iRefIndex, iRefScape and iRefWeb.<br />
|<imagemap><br />
Image:Internet-mail-80x80.png<br />
default [[iRefIndex#Contact and mailing list]]<br />
desc none<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex#Contact and mailing list|Contact information and mailing list]] ===<br />
<br />
How to get in touch with the developers.<br />
|-<br />
|<imagemap><br />
Image:Emblem-notice-80x80.png<br />
default [[Sources_iRefIndex]]<br />
desc none<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Sources_iRefIndex|Source data information]] ===<br />
<br />
Details of all the different source databases that provide the foundation for iRefIndex.<br />
|<imagemap><br />
Image:X-office-spreadsheet-80x80.png<br />
default [[Statistics_iRefIndex]]<br />
desc none<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Statistics_iRefIndex|Statistical information]] ===<br />
<br />
Statistics for the current iRefIndex release.<br />
|}<br />
<br />
== Technical information on the iRefIndex database ==<br />
<br />
Build process: [[iRefIndex Manual]]<br />
<br />
Feedback files: [[README iRefIndex Feedback]]<br />
<br />
Mapping files: [[Protein identifier mapping]]<br />
<br />
Normalization of MI cv terms: [[Mapping of terms to MI term ids - iRefIndex]]<br />
<br />
Canonicalization: [[Canonicalization]]<br />
<br />
Disease Groups: [[DiG: Disease groups]]<br />
<br />
All iRefIndex pages and archived releases: [[iRefIndex#All_iRefIndex_Pages|see below]]<br />
<br />
License and disclaimer: [[iRefIndex#License_and_disclaimer|see below]]<br />
<br />
----<br />
<br />
== A reference index for protein interaction data ==<br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including [http://bond.unleashedinformatics.com/ BIND], [http://www.thebiogrid.org/ BioGRID], [http://mips.gsf.de/genre/proj/corum/index.html CORUM], [http://dip.doe-mbi.ucla.edu/ DIP], [http://www.hprd.org/ HPRD], [http://www.ebi.ac.uk/intact/site/index.jsf IntAct], [http://mint.bio.uniroma2.it/mint/Welcome.do MINT], [http://mips.gsf.de/genre/proj/mpact MPact], [http://mips.gsf.de/proj/ppi/ MPPI] and [http://ophid.utoronto.ca/ OPHID]. This index includes multiple interaction types including physical and genetic (mapped to their corresponding protein products) as determined by a multitude of methods. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.<br />
<br />
iRefIndex assigns a global unique identifier (rigid) which looks like 'tjWXXjgPyHyT2J6EwED8zK2x18U' to identify interactions that are identical (according to the sequence and taxon ids of the interactors). iRefIndex also assigns similar looking keys to protein interactors. These keys are global meaning they can be generated by anyone using the method described in the paper. This method allows users to integrate their own data with the iRefIndex in a way that ensures proteins with the exact same sequence will be represented only once.<br />
<br />
== Publications and further reading ==<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex are provided on the [[iRefIndex Citations]] page.<br />
<br />
----<br />
<br />
== Long term goals of the iRefIndex project ==<br />
<br />
We believe that protein interaction data hold incredible potential for biomedical research. Presently, these data are collected and archived by multiple groups around the world and the number of groups taking part in this work is growing rather than diminishing. <br />
<br />
As such, it is important that these databases have the means to effectively exchange and compare data and that they are curating and representing data using similar standards in order to make their data accessible and allow effective use.<br />
<br />
To this end, the iRefIndex project has three long term objectives:<br />
<br />
;1) to facilitate exchange of interaction data between interaction databases. <br />
<br />
:The iRefIndex paper describes a method for assigning unique and global identifiers to protein interactors, interactions and complexes. This method is independent of the iRefIndex resource and may be used by anyone to facilitate exchange and consolidation of data.<br />
<br />
;2) to consolidate interaction data from multiple sources. <br />
<br />
:The method has been used by to index interaction records from multiple sources. The resulting iRefIndex may be used search for the existence of interaction data for any protein regardless of the original resource. Nine interaction databases have been incorporated so far, others will follow.<br />
<br />
;3) to provide feedback to source interaction databases. <br />
<br />
:During the process of data consolidation, iRefIndex uses a sophisticated method to keep track of potential problems with source records such as outdated or unfound protein identifiers or incorrectly assigned taxonomy identifiers. These data are provided as feedback files to source interaction databases for correction, clarification or improvements to our own system. This process will help to harmonize data representation and improve the overall quality of interaction records for all source databases. This process will also help source databases to exchange data with one another.<br />
<br />
== iRefIndex availability ==<br />
<br />
iRefIndex is made available in a number of formats: MITAB tab-delimited text files, iRefWeb interface, iRefScape plugin for Cytoscape, PSICQUIC Web services, and an interface for the R programming language environment. See the links at the top of this page. For the license and disclaimer, [[iRefIndex#License_and_disclaimer | see below]].<br />
<br />
== Credits and collaborations ==<br />
<br />
'''Sabry Razick and Ian Donaldson''' developed iRefIndex at the Biotechnology Centre of Oslo, University of Oslo. <br />
<br />
'''Paul Boddie''' provides ongoing maintenance and development.<br />
<br />
'''George Magklaras''' provides systems engineer support and [http://www.no.embnet.org/ EMBNet Norway] provided hardware support.<br />
<br />
'''Antonio Mora''' developed [[iRefR|iRefR]].<br />
<br />
'''Katerina Michalickova''' developed [[DiG:_Disease_groups|Disease groups]].<br />
<br />
'''Brian Turner and Andrei Turinsky''' from the [http://wodaklab.org/ws/ Wodak group] at the Hospital for Sick Children in Toronto, Canada developed the [http://wodaklab.org/iRefWeb/ iRefWeb interface].<br />
<br />
<imagemap><br />
Image:IMEx_logo_webmedium.jpg|100x100px<br />
default [http://www.psimex.org]<br />
desc none<br />
</imagemap> <br />
<br />
iRefIndex is a PSIMex partner: http://www.psimex.org<br />
<br />
<!--<br />
=== iRefWeb in the NCBI LinkOut programme ===<br />
<br />
Many [http://www.ncbi.nlm.nih.gov/gene Entrez Gene] records provided by NCBI contain links to iRefWeb in the [http://www.ncbi.nlm.nih.gov/projects/linkout/index.html LinkOut] section, allowing users to consult iRefWeb for related protein interactions when browsing gene information. The software which exposes iRefIndex information to the LinkOut programme can be found on the [[iRefWeb LinkOut Generator]] page.<br />
<br />
--><br />
----<br />
<br />
== License and disclaimer==<br />
<br />
Data released on the public FTP site is released under the [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution 2.5 Generic (CC BY 2.5) license].<br />
<br />
<imagemap><br />
Image:By-100x35.png<br />
default [http://creativecommons.org/licenses/by/2.5/]<br />
desc none<br />
</imagemap><br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
----<br />
<br />
== Contact and mailing list ==<br />
<br />
Suggestions, requests and comments are welcome.<br />
<br />
<pre>ian.oslo@gmail.com</pre><br />
<br />
Full contact details are available at the [http://www.biotek.uio.no/english/research/groups/donaldson-group/ group home page].<br />
<br />
<imagemap><br />
Image:google-groups-logo.gif<br />
default [http://groups.google.com/group/irefindex]<br />
desc none<br />
</imagemap><br />
<br />
See the [http://groups.google.com/group/irefindex iRefIndex Google Group] for announcements and discussion.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex_Maintenance&diff=4107
iRefIndex Maintenance
2012-10-26T16:05:49Z
<p>PaulBoddie: Added note.</p>
<hr />
<div>{{Note|<br />
This page describes maintenance issues related to the code supporting iRefIndex release 9 and earlier.<br />
}}<br />
<br />
When [[iRefIndex Build Process|building iRefIndex]] there can be a need to manage the database system and to assess whether enough disk space is available. The use of MySQL's single tablespace can lead to a very large single file in the filesystem that can appear to use most of the available space (as reported by <tt>df -h</tt>):<br />
<br />
<pre><br />
Filesystem Size Used Avail Use% Mounted on<br />
/dev/sda1 996M 575M 370M 61% /<br />
/dev/sda3 24G 885M 22G 4% /biotek/cn1/programs<br />
/dev/sda8 475G 443G 8.0G 99% /biotek/cn1/storage<br />
tmpfs 7.9G 0 7.9G 0% /dev/shm<br />
/dev/sda5 9.5G 151M 8.9G 2% /tmp<br />
/dev/sda6 7.6G 3.2G 4.1G 44% /usr<br />
/dev/sda7 1.5G 437M 941M 32% /var<br />
</pre><br />
<br />
The file itself will look like this (as reported by <tt>ls -lh</tt>):<br />
<br />
<pre><br />
-rwxr-xr-x 1 nobody nobody 440G Feb 17 13:39 /biotek/cn1/storage/mysql/var/ibdata1<br />
</pre><br />
<br />
The following resources describe the situation and potential solutions:<br />
<br />
* [http://forums.mysql.com/read.php?35,121880,121886 MySQL Forums :: Database Administration :: how to shrink a MySQL database]<br />
* [http://vdachev.net/2007/02/22/mysql-reducing-ibdata1/ MySQL: Reducing ibdata1]<br />
* [http://dev.mysql.com/doc/refman/5.0/en/innodb-configuration.html MySQL 5.0 Reference Manual :: 13 Storage Engines :: 13.2 The InnoDB Storage Engine :: 13.2.2 InnoDB Configuration]<br />
* [http://dev.mysql.com/doc/refman/5.0/en/multiple-tablespaces.html MySQL 5.0 Reference Manual :: 13 Storage Engines :: 13.2 The InnoDB Storage Engine :: 13.2.2 InnoDB Configuration :: 13.2.2.1 Using Per-Table Tablespaces]<br />
<br />
Unfortunately, reducing the database footprint on the disk requires substantial administrative work in single tablespace mode.<br />
<br />
== Useful Common Options ==<br />
<br />
When dealing with multiple instances of MySQL it can be useful to remember the <tt>--defaults-file</tt> option to the MySQL tools. For example:<br />
<br />
<pre><br />
mysql --defaults-file=/home/mysql/etc/my.cnf -h localhost -u root -p -A<br />
</pre><br />
<br />
This option is omitted from the examples given below.<br />
<br />
== Dumping and Restoring Databases ==<br />
<br />
MySQL supports SQL and delimited/tabular dumps. Although the latter is arguably more elegant, it does not offer much help with the task of restoring the tables in the correct order so that foreign key constraints are always satisfied. Thus, only the SQL-based dump format is discussed here.<br />
<br />
To dump a database:<br />
<br />
<pre><br />
mysqldump -h <host> -u <username> -p --databases <database>... > <dump file><br />
</pre><br />
<br />
To do so in the background:<br />
<br />
<pre><br />
nohup mysqldump -h <host> -u <username> --password=<password> --databases <database>... > <dump file> 2> <log file> &<br />
</pre><br />
<br />
To restore a database:<br />
<br />
<pre><br />
mysql -h <host> -u <username> -p < <dump file><br />
</pre><br />
<br />
To do so in the background:<br />
<br />
<pre><br />
nohup mysql -h <host> -u <username> --password=<password> < <dump file> > <log file> 2>&1 &<br />
</pre><br />
<br />
== Promising but Ultimately Unusable Solutions ==<br />
<br />
Alternatives to dumping and dropping all InnoDB databases, deleting the shared tablespace, then restoring the databases do not really exist. Here are some approaches that almost offered an alternative.<br />
<br />
=== Moving Tables into Separate Table Files ===<br />
<br />
To move existing tables into separate files, the server must be stopped:<br />
<br />
mysqladmin -u root --password=<password> shutdown<br />
<br />
Then, the <tt>my.cnf</tt> file needs to be changed to enable the single file per table mode. In the <tt>[mysqld]</tt> section, the following line can be added:<br />
<br />
innodb_file_per_table<br />
<br />
The server can then be started:<br />
<br />
mysqld_safe &<br />
<br />
At this point, existing tables residing in the single, common table storage file will continue to be accessible, and only new tables will be created in separate files, but it is possible to move existing tables into separate files by either using an <tt>alter table</tt> command on each table or by running <tt>[http://dev.mysql.com/doc/refman/5.0/en/mysqlcheck.html mysqlcheck]</tt> as follows:<br />
<br />
mysqlcheck --optimize --databases <database>...<br />
<br />
This is discussed in the user comments of the [http://dev.mysql.com/doc/refman/5.0/en/multiple-tablespaces.html Using Per-Table Tablespaces] section of the MySQL documentation.<br />
<br />
After this process has completed, a directory for the database will appear inside the directory specified by <tt>datadir</tt> in the defaults (<tt>my.cnf</tt>) file. However, the shared tablespace (<tt>ibdata1</tt>) will not be reduced or removed, and InnoDB will retain state information about the tables, regardless of their location, such that removing the shared tablespace will render the tables inaccessible.<br />
<br />
Unfortunately, the InnoDB "data dictionary" maintained in the <tt>ibdata1</tt> file cannot be easily replaced, and although various hacks exist (such as [http://www.chriscalender.com/?p=28 Recovering an InnoDB table from only an .ibd file.]) to attempt to register separate tables with a new, clean "data dictionary", the process seems somewhat speculative.<br />
<br />
=== XtraDB-related Tools ===<br />
<br />
[http://www.percona.com/docs/wiki/percona-xtrabackup:xtrabackup_manual XtraBackup] and [http://www.percona.com/docs/wiki/percona-xtradb:patch:innodb_expand_import innodb_expand_import] should together be able to handle the migration of .ibd files. Unfortunately, the software does not even build according to a variety of ways of interpreting the build instructions.<br />
<br />
== Recommendations ==<br />
<br />
A full version of iRefIndex 7.0 requires approximately 150GB in the database. With only two full versions residing in MySQL, approximately two thirds of the 475GB volume will be used, leaving around 150GB space. When importing data into the databases, collections of data in the filesystem require around 60GB per version, meaning that if this data is to be situated locally - for improved performance - there needs to be enough local space for at least one of these collections at build-time. When dumping data, over 100GB is required for the output files, which should also be situated locally if this process is to run at the maximum speed.<br />
<br />
Thus, the 475GB volume can be used as follows:<br />
<br />
* 150GB - 170GB (one iRefIndex version) for MySQL<br />
* or 300GB - 340GB (two iRefIndex versions) for MySQL<br />
* 100GB - 120GB for a single database dump<br />
* 60GB - 70GB for source data<br />
<br />
Clearly, avoiding two co-resident versions of iRefIndex in MySQL is desirable.<br />
<br />
Once a database is dropped from MySQL, although this space should be reusable by MySQL, it will not be recovered for general filesystem use. Thus, a policy is required of removing databases that do not need to reside on cn1, and since the resources of cn1 are only really needed when building databases or generating products from databases, should a live instance of a database need to be available, it could arguably reside on another machine.<br />
<br />
Removing source data and database dumps after use can also help to avoid disk space issues. Source files can be copied from other machines when needed, removed after use, and any output or dump files can be moved away after being generated.<br />
<br />
== Related Useful Resources ==<br />
<br />
* [http://www.debian-administration.org/articles/442 Resetting a forgotten MySQL root password]<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex_Development&diff=4106
iRefIndex Development
2012-10-26T16:04:27Z
<p>PaulBoddie: Added note.</p>
<hr />
<div>{{Note|<br />
This page describes development processes related to the code supporting iRefIndex release 9 and earlier.<br />
}}<br />
<br />
See [[iRefIndex Issues and Notes]] for details of ongoing work to improve the iRefIndex software.<br />
<br />
== Adding Sources to iRefIndex ==<br />
<br />
# Identify the location of the downloaded data.<br />
# Evaluate the form of the data:<br />
#* For PSI MI XML (Molecular Interaction XML) documents, check the version of the format employed by the data documents.<br />
#* For the specific version, review the format's schema and how the data uses the schema. For example, PSI MI XML permits the specification of interactors within interaction descriptions as well as in a separate interactor list.<br />
#* For MITAB files, see [[iRefIndex MITAB Mapping]].<br />
# Review existing, similar mapper definition files.<br />
<br />
=== Evaluating the Data ===<br />
<br />
The <tt>show_xml_paths.py</tt> script in the <tt>tools</tt> directory within the <tt>iRef_PSI_XML2RDBMS</tt> directory can be used to show the different element paths used in an XML data file to hold data items. For example:<br />
<br />
python tools/show_xml_paths.py --data /home/irefindex/data/MINT/2010-09-14/10023771.psi25.xml<br />
<br />
The resulting list of paths indicates the places in the element hierarchy of a PSI-MI XML file where information is actually stored. For example:<br />
<br />
<pre><br />
entrySet/@level<br />
entrySet/@minorVersion<br />
entrySet/@version<br />
entrySet/@xmlns<br />
entrySet/@xmlns:xsi<br />
entrySet/@xsi:schemaLocation<br />
entrySet/entry/experimentList/experimentDescription/@id<br />
entrySet/entry/experimentList/experimentDescription/attributeList/attribute<br />
entrySet/entry/experimentList/experimentDescription/attributeList/attribute/@name<br />
entrySet/entry/experimentList/experimentDescription/attributeList/attribute/@nameAc<br />
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@db<br />
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@dbAc<br />
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@id<br />
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@refType<br />
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@refTypeAc<br />
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/@ncbiTaxId<br />
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/names/fullName<br />
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/names/shortLabel<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/alias<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/alias/@type<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/alias/@typeAc<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/fullName<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@db<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@dbAc<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@id<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@refType<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@refTypeAc<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@db<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@dbAc<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@id<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@refType<br />
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/experimentList/experimentDescription/names/fullName<br />
entrySet/entry/experimentList/experimentDescription/names/shortLabel<br />
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@db<br />
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@dbAc<br />
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@id<br />
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@refType<br />
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@refTypeAc<br />
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@db<br />
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@dbAc<br />
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@id<br />
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@refType<br />
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/@id<br />
entrySet/entry/interactionList/interaction/attributeList/attribute<br />
entrySet/entry/interactionList/interaction/attributeList/attribute/@name<br />
entrySet/entry/interactionList/interaction/attributeList/attribute/@nameAc<br />
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/names/fullName<br />
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/names/shortLabel<br />
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/confidenceList/confidence/value<br />
entrySet/entry/interactionList/interaction/experimentList/experimentRef<br />
entrySet/entry/interactionList/interaction/interactionType/names/fullName<br />
entrySet/entry/interactionList/interaction/interactionType/names/shortLabel<br />
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@db<br />
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@id<br />
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@refType<br />
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/intraMolecular<br />
entrySet/entry/interactionList/interaction/modelled<br />
entrySet/entry/interactionList/interaction/names/shortLabel<br />
entrySet/entry/interactionList/interaction/negative<br />
entrySet/entry/interactionList/interaction/participantList/participant/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/names/fullName<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/names/shortLabel<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/names/fullName<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/names/shortLabel<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/names/fullName<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/names/shortLabel<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/names/fullName<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/names/shortLabel<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/isLink<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/names/fullName<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/names/shortLabel<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/names/fullName<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/names/shortLabel<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/names/shortLabel<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/interactorRef<br />
entrySet/entry/interactionList/interaction/participantList/participant/names/shortLabel<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/alias<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/alias/@type<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/alias/@typeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/fullName<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/shortLabel<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactionList/interaction/xref/primaryRef/@db<br />
entrySet/entry/interactionList/interaction/xref/primaryRef/@dbAc<br />
entrySet/entry/interactionList/interaction/xref/primaryRef/@id<br />
entrySet/entry/interactionList/interaction/xref/primaryRef/@refType<br />
entrySet/entry/interactionList/interaction/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactorList/interactor/@id<br />
entrySet/entry/interactorList/interactor/attributeList/attribute<br />
entrySet/entry/interactorList/interactor/attributeList/attribute/@name<br />
entrySet/entry/interactorList/interactor/attributeList/attribute/@nameAc<br />
entrySet/entry/interactorList/interactor/interactorType/names/fullName<br />
entrySet/entry/interactorList/interactor/interactorType/names/shortLabel<br />
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@db<br />
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@dbAc<br />
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@id<br />
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@refType<br />
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@db<br />
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@dbAc<br />
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@id<br />
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@refType<br />
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/interactorList/interactor/names/alias<br />
entrySet/entry/interactorList/interactor/names/alias/@type<br />
entrySet/entry/interactorList/interactor/names/alias/@typeAc<br />
entrySet/entry/interactorList/interactor/names/fullName<br />
entrySet/entry/interactorList/interactor/names/shortLabel<br />
entrySet/entry/interactorList/interactor/organism/@ncbiTaxId<br />
entrySet/entry/interactorList/interactor/organism/names/fullName<br />
entrySet/entry/interactorList/interactor/organism/names/shortLabel<br />
entrySet/entry/interactorList/interactor/sequence<br />
entrySet/entry/interactorList/interactor/xref/primaryRef/@db<br />
entrySet/entry/interactorList/interactor/xref/primaryRef/@dbAc<br />
entrySet/entry/interactorList/interactor/xref/primaryRef/@id<br />
entrySet/entry/interactorList/interactor/xref/primaryRef/@refType<br />
entrySet/entry/interactorList/interactor/xref/primaryRef/@refTypeAc<br />
entrySet/entry/interactorList/interactor/xref/primaryRef/@version<br />
entrySet/entry/interactorList/interactor/xref/secondaryRef/@db<br />
entrySet/entry/interactorList/interactor/xref/secondaryRef/@dbAc<br />
entrySet/entry/interactorList/interactor/xref/secondaryRef/@id<br />
entrySet/entry/interactorList/interactor/xref/secondaryRef/@refType<br />
entrySet/entry/interactorList/interactor/xref/secondaryRef/@refTypeAc<br />
entrySet/entry/interactorList/interactor/xref/secondaryRef/@secondary<br />
entrySet/entry/interactorList/interactor/xref/secondaryRef/@version<br />
entrySet/entry/source/@releaseDate<br />
entrySet/entry/source/attributeList/attribute<br />
entrySet/entry/source/attributeList/attribute/@name<br />
entrySet/entry/source/attributeList/attribute/@nameAc<br />
entrySet/entry/source/names/fullName<br />
entrySet/entry/source/names/shortLabel<br />
entrySet/entry/source/xref/primaryRef/@db<br />
entrySet/entry/source/xref/primaryRef/@dbAc<br />
entrySet/entry/source/xref/primaryRef/@id<br />
entrySet/entry/source/xref/primaryRef/@refType<br />
entrySet/entry/source/xref/primaryRef/@refTypeAc<br />
entrySet/entry/source/xref/primaryRef/@secondary<br />
entrySet/entry/source/xref/secondaryRef/@db<br />
entrySet/entry/source/xref/secondaryRef/@dbAc<br />
entrySet/entry/source/xref/secondaryRef/@id<br />
entrySet/entry/source/xref/secondaryRef/@refType<br />
entrySet/entry/source/xref/secondaryRef/@refTypeAc<br />
</pre><br />
<br />
With this information, a suitable mapper file can be identified for the conversion of the XML-encoded data into tabular data to be stored in a database. In the above example, it is apparent that the experiment, interaction and interactor details reside alongside each other within each <tt>entry</tt> element:<br />
<br />
<pre><br />
entrySet/entry/experimentList/experimentDescription<br />
entrySet/entry/interactionList/interaction<br />
entrySet/entry/interactionList/interaction/participantList/participant<br />
entrySet/entry/interactorList/interactor<br />
</pre><br />
<br />
In contrast, other PSI-MI XML files adopt a different structure which can be reduced to the following:<br />
<br />
<pre><br />
entrySet/entry/interactionList/interaction<br />
entrySet/entry/interactionList/interaction/experimentList/experimentDescription<br />
entrySet/entry/interactionList/interaction/participantList/participant<br />
</pre><br />
<br />
The different sources can be divided into a number of subformats as follows:<br />
<br />
{| border="1" cellspacing="0" cellpadding="5"<br />
! Subformat<br />
! Sources<br />
! Notes<br />
|-<br />
| Separate experiment, interaction, interactor lists<br />
| BioGRID, HPRD, IntAct, MINT, OPHID<br />
| BioGRID uses proteininteractor instead of interactor<br>OPHID uses proteinParticipant, proteinInteractor<br />
|-<br />
| Interaction contains experiment; separate interactor list<br />
| DIP<br />
|<br />
|-<br />
| Interaction contains experiment and interactor/participant<br />
| BIND Translation, CORUM, InnateDB, MatrixDB, MPACT, MPPI<br />
| InnateDB provides apparently redundant lists of experiments and interactors<br>MPPI uses proteinParticipant, proteinInteractor<br />
|}<br />
<br />
=== Reviewing Mapper Files ===<br />
<br />
The mapper files already in existence can be reviewed by using the <tt>show_xml_paths.py</tt> script together with one of these files which reside in the <tt>mapper</tt> subdirectory of the <tt>iRef_PSI_XML2RDBMS</tt> directory. For example:<br />
<br />
python tools/show_xml_paths.py --mapper mapper/Map25_CORUM.xml<br />
<br />
The resulting output describes the structure of the data and how the mapper will attempt to interpret that data. For example (for CORUM):<br />
<br />
<pre><br />
Element experimentDescription ...<br />
Table int_name ...<br />
_euid ...<br />
<incremental><br />
_idetlbl ...<br />
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel<br />
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/alias<br />
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullname<br />
_idetncat ...<br />
24<br />
25<br />
25<br />
Table int_xref ...<br />
_euid ...<br />
<incremental><br />
_brefdb ...<br />
entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@db<br />
entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef/@db<br />
_brefid ...<br />
entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@id<br />
entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef/@id<br />
_brefct ...<br />
4<br />
5<br />
Table int_xref ...<br />
_euid ...<br />
<incremental><br />
_idetdb ...<br />
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@db<br />
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@db<br />
_idetid ...<br />
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@id<br />
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@id<br />
_idetct ...<br />
6<br />
7<br />
Element experimentList ...<br />
Table int_experiment ...<br />
_euidr ...<br />
_euid<br />
_iuider ...<br />
_iuid<br />
Element interaction ...<br />
Table int_name ...<br />
_iuid ...<br />
<incremental><br />
_iuiflnm ...<br />
entry/interactionList/interaction/names/fullName<br />
_iuiflnmct ...<br />
12<br />
Table int_source ...<br />
_iuid ...<br />
<incremental><br />
_itp ...<br />
entry/interactionList/interaction/xref<br />
_isrc ...<br />
entry/interactionList/interaction/xref<br />
_ifle ...<br />
entry/interactionList/interaction/xref<br />
Table int_xref ...<br />
_iuid ...<br />
<incremental><br />
_idb ...<br />
entry/interactionList/interaction/xref/primaryRef/@db<br />
_iref ...<br />
entry/interactionList/interaction/xref/primaryRef/@id<br />
_irefcat ...<br />
0<br />
Element participant ...<br />
Table int_name ...<br />
_ouid ...<br />
<incremental><br />
_olb ...<br />
entry/interactionList/interaction/participantList/participant/interactor/names/shortLabel<br />
entry/interactionList/interaction/participantList/participant/interactor/names/alias<br />
entry/interactionList/interaction/participantList/participant/interactor/names/fullName<br />
_olbct ...<br />
13<br />
14<br />
15<br />
Table int_object ...<br />
_ouid ...<br />
<incremental><br />
_oltyp ...<br />
entry/interactionList/interaction/participantList/participant/interactor/interactorType/names/shortLabel<br />
_osrc ...<br />
entry/interactionList/interaction/participantList/participant/interactor/names<br />
_ofil ...<br />
entry/interactionList/interaction/participantList/participant/interactor/names<br />
Table int_sequence ...<br />
_ouid ...<br />
<incremental><br />
_obsq ...<br />
entry/interactionList/interaction/participantList/participant/interactor/sequence<br />
Table int_xref ...<br />
_ouid ...<br />
<incremental><br />
_odb ...<br />
entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@db<br />
entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@db<br />
_orefid ...<br />
entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@id<br />
entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@id<br />
_oicat ...<br />
2<br />
3<br />
_otax ...<br />
entry/interactionList/interaction/participantList/participant/interactor/organism/@ncbiTaxId<br />
_otp ...<br />
entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@refType<br />
entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@refType<br />
Element participantList ...<br />
Table int_source2object ...<br />
_iuidr ...<br />
_iuid<br />
_what ...<br />
entry/interactionList/interaction/participantList/participant/interactor/names<br />
_isrcr ...<br />
entry/interactionList/interaction/participantList/participant/interactor/names<br />
_ifler ...<br />
entry/interactionList/interaction/participantList/participant/interactor/names<br />
_refob ...<br />
_ouid<br />
</pre><br />
<br />
=== Adapting an Existing Mapper File ===<br />
<br />
Given an analysis of the data and the identification of the data's "subformat" (explained above), it should be possible to take an existing mapper file which supports the same subformat and to modify it to understand the new data source. For example, the InnateDB data resembles the data of various other sources (listed in the table above), and some comparisons of the structure of the data can be performed to see which source is closest in structure to InnateDB by using a <tt>diff</tt>-like program, potentially a graphical program such as <tt>kompare</tt> or <tt>kdiff3</tt>.<br />
<br />
Once a similar source has been identified, the corresponding mapper file can be copied and modified. For example:<br />
<br />
cp mapper/Map25_CORUM.xml mapper/Map25_InnateDB.xml<br />
<br />
Then, it is necessary to update the new mapper file with details that differ from those in the closest source. In some cases, it can also be useful to consult other mapper files. For example, the following path may not be present in the CORUM mapper file despite being provided by the data file:<br />
<br />
entry/experimentList/experimentDescription/hostOrganismList/hostOrganism<br />
<br />
However, such information can be used by iRefIndex and may be supported by other mapper files. We may therefore decide to incorporate such information into our new mapper file (and perhaps into CORUM's mapper file, too). To do so, we first inspect other mapper files for the presence of such information and then isolate the section which supports its retrieval. For example, from the IntAct mapper file, using the <tt>show_xml_paths.py</tt> script...<br />
<br />
python show_xml_paths.py --mapper mapper/Map25_INTACT_MINT_BIOGRID.xml --verbose<br />
<br />
Here, the <tt>--verbose</tt> flag provides identifier information which makes finding the element, table and mapping definitions easier:<br />
<br />
<pre><br />
Element experimentDescription (grouper/id=3)...<br />
<br />
[...]<br />
<br />
Table int_name (sqlref/id=23)...<br />
_euid ...<br />
<incremental> (provides experimentDescription)<br />
_exorg ...<br />
entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/@ncbiTaxId<br />
_exorgc ...<br />
38<br />
</pre><br />
<br />
From the above, it becomes apparent that we need a <tt>group</tt> definition for <tt>elementDescription</tt>, and that <tt>int_name</tt> must be populated according to the stated mappings for <tt>_euid</tt>, <tt>_exorg</tt> and <tt>_exorgc</tt>.<br />
<br />
Fortunately, a <tt>group</tt> definition already exists in our new mapper file:<br />
<br />
<pre><br />
<group id ="3" element ="experimentDescription" parpos="4" atrib="_AUTO_"><br />
<path></path><br />
<ref choice="no" /><br />
</group><br />
</pre><br />
<br />
Meanwhile, the following table modifying section (corresponding to the identifier <tt>23</tt>) must be located in the IntAct mapper file:<br />
<br />
<pre><br />
<sql id="23" userefs="no" ><br />
<stmt>INSERT INTO int_name(uid,name,category) VALUES ('_euid','_exorg','_exorgc');</stmt><br />
<variablelist><br />
<variable name="_euid" ></variable><br />
<variable name="_exorg"></variable><br />
<variable name="_exorgc"></variable><br />
</variablelist><br />
</sql><br />
</pre><br />
<br />
Since no conflicting section (with the same identifier for an <tt>sql</tt> element) exists in the new mapper file, this can be copied without changes. Then, it is necessary to ensure that mappings for <tt>_euid</tt>, <tt>_exorg</tt> and <tt>_exorgc</tt> are present; the following mappings happen to be found in the IntAct mapper file:<br />
<br />
<pre><br />
<map id="42" sqlref="23" name="_euid" grouper="3"><br />
<instruct><br />
<param choice="yes" /><br />
</instruct><br />
</map><br />
<map id="43" sqlref="23" name="_exorg" grouper="3"><br />
<instruct><br />
<readfromfile choice="yes"><br />
<path variable="_exorg" groupTag="ex" usetext="no" attribute="ncbiTaxId">entry,experimentList,experimentDescription,hostOrganismList,hostOrganism</path><br />
</readfromfile><br />
</instruct><br />
</map><br />
<map id="43" sqlref="23" name="_exorgc" grouper="3"><br />
<instruct><br />
<readfromfile choice="yes"><br />
<path variable="_exorgc" groupTag="ex" usetext="no" prefix="yes" val="38">entry,experimentList,experimentDescription,hostOrganismList,hostOrganism</path><br />
</readfromfile><br />
</instruct><br />
</map><br />
</pre><br />
<br />
Although <tt>_euid</tt> is a known name in the new mapper file already, no mapper definition exists to connect it to the table modification section (with identifier <tt>23</tt>). Thus, it is necessary to copy the above mappings, to adjust their identifiers to avoid conflicts with other mappings, and to make sure that the <tt>grouper</tt> attributes refer to the correct group definition for <tt>experimentDescription</tt>. Fortunately, no such adjustments are required in this case and the definitions can be copied directly.<br />
<br />
=== Choosing Sources of Data ===<br />
<br />
In the above example, the <tt>hostOrganism</tt> information was extracted from the separate <tt>experimentList</tt>, but in order to maintain consistency with the other sources in the data file, we may choose a different source in the <tt>interactionList</tt>, particularly since the information appears to be duplicated in that location. Thus, a mapping is required that can extract data from the following path:<br />
<br />
entry/interactionList/interaction/experimentList/experimentDescription/hostOrganismList/hostOrganism<br />
<br />
Even if a mapping does not exist for the above path, there may be mappings involving similar paths:<br />
<br />
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel<br />
<br />
In this case, a different mapper file should be chosen to provide a suitable definition:<br />
<br />
python show_xml_paths.py --mapper mapper/Map25_DIP.xml --verbose<br />
<br />
A suitable section of that file can be summarised as follows:<br />
<br />
<pre><br />
Element experimentDescription (grouper/id=3)...<br />
<br />
[...]<br />
<br />
Table int_name (sqlref/id=15)...<br />
_euid ...<br />
<incremental> (provides experimentDescription)<br />
_idetlbl ...<br />
(intdetectionshortLabel) ...<br />
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel<br />
(intdetectionalias) ...<br />
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/alias<br />
(intdtfull) ...<br />
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullName<br />
_idetncat ...<br />
(intdetectionshlacategory) ...<br />
24<br />
(intdetectionoblac) ...<br />
25<br />
(indtcatful) ...<br />
26<br />
</pre><br />
<br />
The definitions involved can then be incorporated into the new mapper file, adjusting identifiers appropriately. Although the <tt>hostOrganism</tt> data in the above example originates from a different place in the element hierarchy, a simple path alteration to point to a location in the <tt>interactionList</tt> hierarchy is probably sufficient to make the initial mapping definitions consistent with the newly incorporated definitions for <tt>interactionDetectionMethod</tt> - the <tt>grouper</tt> identifiers for the mapping definitions refer to <tt>experimentDescription</tt> in all cases.<br />
<br />
=== Defining the Database Identifier ===<br />
<br />
A new database identifier is required for new data sources. Source identifiers are defined in each data source's configuration file in the <tt>source</tt> element. For example, for InnateDB:<br />
<br />
<pre><br />
<specs><br />
<source>InnateDB</source><br />
<filetype>.xml</filetype><br />
</specs><br />
</pre><br />
<br />
These identifiers are then mapped to database identifiers in the iRefIndex <tt>int_db</tt> table. Thus, an <tt>int_db</tt> record must be defined, assigning a database identifier (an integer) which corresponds to this source identifier.<br />
<br />
To define a new database identifier the <tt>Create_iRefIndex.sql</tt> file, which resides in the <tt>SQL</tt> directory of the <tt>BioPSI_Suplimenter</tt> software distribution, must be modified, adding a new statement as follows:<br />
<br />
INSERT INTO int_db(id,name) VALUES(<database identifier>,'<source identifier>');<br />
<br />
For example:<br />
<br />
INSERT INTO int_db(id,name) VALUES(178,'InnateDB');<br />
<br />
=== Re-running the Parser ===<br />
<br />
If data files have already been parsed, but the development process dictates that they be parsed again, potentially to populate a test database, it is necessary to delete various files which are written to the filesystem by the parser in order to prevent repeated parsing of data. These files are called <tt>lastUpdate.obj</tt> and contain information about previous parsing operations.<br />
<br />
find /home/irefindex/data -name lastUpdate.obj | xargs rm<br />
<br />
=== Modifying Generated Data ===<br />
<br />
New sources require changes to some generated data tables and to the programs that populate them:<br />
<br />
* In the definition of the <tt>cy_edgeatrib_canonical</tt> table in the <tt>make_canonical_tables.sql</tt> file in the <tt>SQL_commands</tt> directory, a new column is required for each newly defined source.<br />
* The <tt>src/process/no/uio/biotek/Make_Cy_tables.java</tt> file in <tt>BioPSI_Suplimenter</tt> needs to be changed so that the <tt>popCombine_edge</tt> method has a set of statements populating the new column created in the <tt>cy_edgeatrib_canonical</tt> table, and the <tt>src/process/no/uio/biotek/PreProcess_process.java</tt> file also needs changing so that the definition of the <tt>cy_edgeatrib</tt> table in the <tt>commit</tt> method includes a new column for each newly defined source.<br />
* The <tt>make_iRefWeb.sql</tt> file in the <tt>SQL_commands</tt> directory requires a statement defining each new source in the <tt>source_db</tt> table.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex_Issues_and_Notes&diff=4105
iRefIndex Issues and Notes
2012-10-26T16:02:30Z
<p>PaulBoddie: Replaced the previous list of work items with a note about the new implementation.</p>
<hr />
<div>This document describes ongoing work to improve the iRefIndex software.<br />
<br />
== New Implementation for iRefIndex 10 ==<br />
<br />
A new implementation of the iRefIndex software is intended to permit building of the release data using only command line tools and in a fashion that encourages automation. The [http://sourceforge.net/p/irefindex/ iRefIndex SourceForge project] provides the source code of this implementation.<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=README_MITAB2.6_for_iRefIndex_9.0&diff=4092
README MITAB2.6 for iRefIndex 9.0
2012-06-07T13:32:36Z
<p>PaulBoddie: Changed the role documentation to reflect actual practice.</p>
<hr />
<div><div class="floatright" style="text-align: center"><br />
<br />
'''iRefIndex 9.0 Downloads'''<br />
<imagemap><br />
Image:Document-save-80x80.png<br />
default [ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/psimi_tab/MITAB2.6/]<br />
</imagemap><br />
<br />
'''Parsing MITAB Format Data'''<br />
<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[iRefIndex_MITAB2.6_Parser]]<br />
</imagemap><br />
</div><br />
<br />
Last edited: {{REVISIONYEAR}}-{{padleft:{{REVISIONMONTH}}|2}}-{{REVISIONDAY2}}<br />
<br />
Applies to iRefIndex release: 9.0<br />
<br />
Release date: 2011-11-07<br />
<br />
Download location: ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/psimi_tab/MITAB2.6/<br>(use <tt>anonymous</tt> as the login and your email address as the password)<br />
<br />
Authors: Ian Donaldson, Sabry Razick, Paul Boddie<br />
<br />
Database: iRefIndex (http://irefindex.uio.no)<br />
<br />
Organization: Biotechnology Centre of Oslo, University of Oslo <br />
(http://www.biotek.uio.no/) <br />
<br />
[[#Description|License of the source database]].<br />
<br />
== <span style="color:#0f0086"> Description </span> ==<br />
<br />
This file describes the contents of the <br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/current/psimi_tab/<br />
<br />
directory and the format of the tab-delimited text files contained within. Each index file follows the PSI-MITAB2.6 format with additional columns for annotating edges and nodes. Assignment of source interaction records to these redundant groups is described at http://irefindex.uio.no. The PSI-MI2.6 format plus additional columns is described below.<br />
<br />
A supplementary file lists just database:accession pairs for proteins and their mapping to irog, icrog and Entrez Gene identifiers. See<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/current/Mappingfiles/<br />
<br />
and README at<br />
<br />
http://irefindex.uio.no/wiki/Protein_identifier_mapping<br />
<br />
This file is precalculated from the MITAB distribution as a convenience to users.<br />
<br />
Details on the build process are available from the publication PMID 18823568.<br />
<br />
This distribution includes data consolidated using the iRefIndex method for BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPPI, MPact and OPHID.<br />
<br />
<br />
{|<br />
|Sources || http://irefindex.uio.no/wiki/Sources_iRefIndex_8.0<br />
|-<br />
|Statistics || http://irefindex.uio.no/wiki/Statistics_iRefIndex_8.0<br />
|-<br />
|Download location || ftp://ftp.no.embnet.org/irefindex/data/current/psimi_tab/<br><br />
|}<br />
<br />
== Directory contents ==<br />
<br />
{|<br />
|<tt>README</tt> ||pointer to this file<br />
|-<br />
|<tt>xxxx.mitab.mmddyyyy.txt.zip</tt> ||individual indices in PSI-MITAB2.6 format<br><br />
|}<br />
<br />
iRefIndex data is distributed as a set of tab-delimited text files with names of the form <tt>xxxx.mitab.mmddyyyy.txt.zip</tt> where <tt>mmddyyyy</tt> represents the file's creation date.<br />
<br />
The complete index is available as <tt>All.mitab.mmddyyyy.txt.zip</tt> .<br />
<br />
Taxon specific data sets are also available for:<br />
<br />
{|<br />
| ||'''Taxon Id'''<br />
|-<br />
|Homo sapiens ||9606 (human)<br />
|-<br />
|Mus musculus ||10090 (mouse)<br />
|-<br />
|Rattus norvegicus ||10116 (brown rat)<br />
|-<br />
|Caenorhabditis elegans ||6239 (nematode)<br />
|-<br />
|Drosophila melanogaster ||7227 (fruit fly)<br />
|-<br />
|Saccharomyces cerevisiae ||4932 (baker's yeast)<br />
|-<br />
|Saccharomyces cerevisiae S288c ||559292<br />
|-<br />
|Escherichia coli. ||562 (E. Coli)<br />
|-<br />
|Other ||other<br />
|-<br />
|All ||all<br />
|}<br />
<br />
Taxon specific subsets of the data are named <tt>xxxx.mitab.mmddyyyy.txt.zip</tt> where <tt>xxxx</tt> is the taxonomy identifier of at least one of the interactors according to either the source interaction database or the sequence database record. Each zip compressed file contains a single text file with the corresponding name <tt>xxxx.mitab.mmddyyyy.txt</tt>.<br />
<br />
In some cases, other objects may belong to other taxons if a virus-host interaction is being represented or if a protein from another organism has been used to model a protein in the specified organism. <br />
<br />
Taxonomy identifiers are provided in the data sets allowing these exceptions to be identified. The taxonomy identifiers listed are derived from the source protein sequence record. In some cases, this taxonomy identifier will be a child of the taxon listed in the file's title; for example, Escherichia coli K12 (taxonomy identifier 83333) will appear in the Escherichia coli (taxonomy identifier 562) file.<br />
<br />
A description of the NCBI taxon identifiers is available at the following location:<br />
<br />
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy <br />
<br />
The above data taxon division scheme leads to duplications; for instance, an interaction present in the mouse index could also appear in the human index if the interaction record lists protein sequence records from both human and mouse. The <tt>All.mitab.mmddyyyy</tt> file is a complete and non-redundant listing. <br />
<br />
The data format and divisions provided in this initial release were chosen in the hopes that they would be immediately useful to the largest possible set of users. Other formats and divisions are possible and we welcome your input on future releases.<br />
<br />
== Changes from last version ==<br />
<br />
This is the third release of iRefIndex in PSI-MITAB2.6 format.<br />
<br />
* RIGIDs in previous releases of iRefIndex were [[Bugzilla:242|incorrectly computed]]. Although the properties of such RIGIDs were not compromised - distinct RIGIDs should still have referred to distinct interactions - each RIGID made use of substantially less information from its components. RIGIDs in this release should now be computed correctly.<br />
* [[Bugzilla:245|Duplicate lines]] are now no longer produced in the MITAB output. Previously, database records containing additional information not reproduced in the MITAB output were written to the files on a record-by-record basis. However, since these individual records provide no useful additional information purely through their presence, and since the result is merely a collection of redundant records, lines which are the same as others are now filtered out when writing the MITAB files.<br />
* Many proteins previously assigned the 4932 taxonomy identifier have been [[Bugzilla:247|recategorised]] as having taxonomy identifier 559292. Thus, for convenience, an additional 559292 file is produced alongside the existing (but substantially smaller) 4932 file to hold interactions involving proteins associated with both taxons.<br />
* [[Bugzilla:248|Interactions not involving proteins associated with a specific organism]] are now excluded from organism-specific files. Note that complexes may consist of a number of lines where interactors may have a different taxonomy identifier from that of the specific file being consulted, but in such cases there will always be a member of the complex labelled with the appropriate taxonomy identifier, and thus the complex describes a "mixed species" interaction which should be retained just as binary interactions are where one participant is native to the file and the other is "foreign".<br />
* Previously, PubMed identifiers were being given as interaction detection methods for CORUM-originating interactions. This has now been [[Bugzilla:249|resolved]].<br />
<br />
References:<br />
<br />
* http://code.google.com/p/psimi/issues/detail?id=2<br />
* http://code.google.com/p/psimi/wiki/PsimiTabFormat<br />
<br />
=== Mapping to Legacy RIGIDs ===<br />
<br />
A mapping from current to legacy RIGIDs is provided on the FTP site as <tt>legacy.txt</tt> at the following location:<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/<br />
<br />
This file contains all non-canonical iRefIndex 9.0 RIGIDs mapped to legacy RIGIDs which have been computed for the interactions. Note that many of the legacy RIGIDs may not exist in previous releases of iRefIndex because other changes in the underlying data (such as taxonomy identifier changes) have occurred. Thus, even if the RIGID computation method had remained the same in iRefIndex 9.0 as in previous releases, many RIGIDs would have changed in iRefIndex 9.0 anyway.<br />
<br />
Other files are also provided to give a specific mapping from iRefIndex 9.0 to iRefIndex 8.0 and can be found at the following location:<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/Mappingfiles/<br />
<br />
== Known Issues ==<br />
<br />
* We have replaced the pipe character (<tt>|</tt>) of the PDB identifiers with an underscore character (<tt>_</tt>) <br />
<br />
This decision was taken to avoid unexpected parsing problems: the PSI-MITAB format uses pipes (<tt>|</tt>) as a separator character where multiple values occur in the same column.<br />
<br />
As a result, column number 37 (OriginalReferenceA) and column number 38 (OriginalReferenceB) may differ from the original reference in such cases.<br />
<br />
== Understanding the iRefIndex MITAB format ==<br />
<br />
iRefIndex is distributed in PSI-MITAB format. Version 2.5 of the format was originally described in PMID 17925023 ([http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2189715/?tool=pubmed full text]). This file describes the columns defined by version 2.6 of the PSI-MITAB format plus columns added by iRefIndex.<br />
<br />
Since the PSI-MITAB format allows for only two interactors to be described on each line, it is best suited for describing binary interaction data (the original experiment, say yeast two hybrid, gives a binary readout). However, other source PSI-MI XML source records will describe interactions involving only one interactor type (dimers or multimers) or they will contain associative (also known as "n-ary") interaction data from, for example, immunoprecipitation experiments where the exact interactions between any pair of interactors are unknown. These cases are problematic for the PSI-MITAB format. This document describes exactly how we use the MITAB format to describe these alternate (non-binary) interaction types.<br />
<br />
=== What each line represents ===<br />
<br />
Each line or row in the MITAB file represents a ''single'' interaction record from one primary data source describing an interaction involving the exact same set of proteins (as defined by their primary sequence and taxonomy identifiers).<br />
<br />
{{Note|Each line in this file represents a single source database record that supports either:<br />
<br />
# an interaction between two proteins (binary interaction) or<br />
# the membership of a protein in some complex (complex membership) or<br />
# an interaction that involves only one protein type (multimer or self-interaction).<br />
<br />
|Important}}<br />
<br />
<br />
Each row in this table has a natural key pointing to an original interaction record in some source database that is listed under column 14 (interactionIdentifier). For example:<br />
<br />
intact:EBI-761694<br />
<br />
{{Note|<br />
Prior to release 7.0, each line represented a ''group'' of interaction records involving the exact same set of proteins (as defined by their primary sequence and taxonomy identifiers). This ''collapsed'' or non-redundant format did not allow us to easily describe meta-data associated with each source record. Therefore, we have moved to this ''expanded'' or redundant version. Users can still collapse multiple rows that all provide evidence for an interaction between the same set of proteins using the keys provided (for example, RIGIDs).<br />
}}<br />
<br />
Rows in this table that all provide evidence for an interaction between the same set of proteins can be identified using the RIGID key (redundant interaction group identifier). The RIGID is a 27 character key that is derived from the ROGIDs of the interactors involved in the interaction record. The ROGID is a SHA-1 digest of the protein interactor's primary amino acid sequence concatenated with the NCBI taxonomy identifier (see the paper for details).<br />
<br />
{{Note|<br />
The RIGID key is now listed (by itself) in column 35 (Checksum_Interaction) as part of the new extended PSI-MITAB format. This is a universal key that can be generated by each and every interaction database and may be included in MITAB2.6 distributions from other source databases. The intention of this key is to aid third party integration of data collected from multiple databases (for example, from PSICQUIC web services). <br />
}}<br />
<br />
=== Representation of interactions ===<br />
<br />
==== Binary interaction data ====<br />
<br />
This is the most common data type.<br />
<br />
For binary interaction data, column 53 (edgetype) will contain an X. Interactors A and B will list the two proteins for which interaction evidence is provided in the row. User's should pay close attention to columns 12 (interactionType) and 7 (Method) when deciding what binary data they wish to accept as evidence of a direct physical interaction.<br />
<br />
==== Complexes (a.k.a. n-ary data) ====<br />
<br />
Certain experimental methods (like immunoprecipitations) provide evidence that a list of 3 or more proteins are associated but cannot provide evidence for a direct interaction between any given pair of proteins in that list. <br />
<br />
In these cases, interactor A (column 1) is used as a placeholder to represent the ''complex'' or ''list'' of proteins while interactor B is used to list one of the members of the list: therefore, the entire ''n-ary interaction record'' is described using one row for each interactor. Each of these rows will have the same ''interactor A''. This method of representation is referred to as a '''bi-partite model''' since there are two kinds of nodes corresponding to complexes and proteins. <br />
<br />
These interactions are marked by a C in column 53 (edgetype).<br />
<br />
As an example, let’s say that a source interaction record contained interactors A, B and C found by affinity purification and mass-spec where a tagged version of protein A was used as the bait protein to perform the immunoprecipitation. <br />
<br />
Then we would represent the complex in the MITAB file using three lines:<br />
<br />
X-A<br />
X-B<br />
X-C<br />
<br />
All three entries would have the same string in column 1 (the RIGID for the complex). All three entries would have a C in column 53 (edgetype).<br />
<br />
Other databases take an interaction record with multiple interactors (n-ary data) and make a list of binary interactions (based on the spoke or matrix model) and then list these binary interactions in the MITAB. For the example above, using a '''spoke model''' to transform the data into a set of binary interactions, these data would be represented using two lines in the MITAB file:<br />
<br />
A-B<br />
A-C<br />
<br />
Here A is chosen as the "hub" of the spoke model since it was the "bait" protein. For experimental systems that do not have "baits" and "preys" (such as X-ray crystallography), an arbitrary protein might be chosen as the bait.<br />
<br />
Alternatively, a '''matrix model''' might be used to transform the n-ary data into a list of binary interactions. Here all pairwise combinations of interactors in the original n-ary data are represented as binary interactions. So, in the above example, the immunoprecipitated complex would be represented using three lines of the MITAB file:<br />
<br />
A-B<br />
B-C<br />
A-C<br />
<br />
All three methods for representing n-ary data in a MITAB file (bi-partite, spoke, and matrix) are different representations of the same data. The model type that is chosen to describe n-ary data is listed in column 16 (expansion) of the MITAB2.6 format.<br />
<br />
We have chosen to use the bi-partite method of representation so that it is impossible to mistake spoke or matrix binary entries for true binary entries; the identifiers used for complexes will, of course, not appear in a protein database and any programme that tries to treat complex identifiers as though they were protein identifiers will fail. The method allows you to reconstruct the members of the original interaction record that describes a complex of proteins (say from an affinity purification experiment). From there, you can choose to make a spoke or matrix model by yourself if you want. <br />
<br />
Users are advised that other databases may use spoke and matrix model representations of complexes in the MITAB format. <br />
<br />
==== Intramolecular interactions and multimers ====<br />
<br />
These row types form a minority of the data and are rare incomparison to the above types.<br />
<br />
Sometimes source interaction records in PSI-MI format only list one interactor. These are cases where either<br />
<br />
<ol><br />
<li>an intra-molecular interaction is being represented or</li><br />
<li>a multimer (3 or more) of some protein is being represented.</li><br />
</ol> <br />
These records are difficult to represent in the PSI-MITAB format because PSI-MITAB requires that each row (interaction) list two interactors. <br />
We are representing these interaction records using the following format to reflect the original format provided as closely as possible.<br />
<ol><br />
<li>Interactions involving only one interactor. The uidA and uidB would be the same and the edge type would be 'Y' (column number 53 (edgetype)). Therefore, when ever there is an edge type 'Y' this means that this interaction involves only one protein (although the interaction is given as between two interactors), and thus column number 54 (numParticipants) would always be 1. For example:<br />
<pre>{A - A, edge type 'Y', numParticipants=1}</pre></li><br />
<li>When the interaction is described as involving two interactors but both of them refer to the same protein. This would be represented as a normal binary interaction and would have the edge type = 'X' (column number 53 (edgetype)), and thus column number 54 (numParticipants) would always be 2. For example:<br />
<pre>{A - A, edge type 'X', numParticipants=2}</pre></li><br />
<li>When the interaction is described as involving more than 2 interactors and all those interactors are referring to the same protein, a bi-partite representation will be used. The edge type would be 'C' (column number 53 (edgetype)). For example, with regard to complexes (a.k.a. n-ary data):<br />
<pre><br />
{C - A, edge type 'C', numParticipants=3<br />
C - A, edge type 'C', numParticipants=3<br />
C - A, edge type 'C', numParticipants=3}<br />
</pre></li><br />
</ol><br />
<br />
We draw extra attention to the fact that the RIGID (column number 35 (Checksum_Interaction)) for these interactions will be the SHA-1 digest of the ROGIDs for each of the distinct subunit types (see columns 33 (Checksum_A) and 34 (Checksum_B)). Thus interactions involving 1, 2 or more subunits of the same protein would all have the same RIGID.<br />
<br />
=== Keys for grouping together redundant interactors and interactions ===<br />
<br />
A number of keys are provided in this file to help users group together rows that all provide evidence for some kind of interaction between the same set (or a related set) of proteins. See columns 33-35 (Checksum_A, Checksum_B and Checksum_Interaction) and 43-51 (integer identifier and canonical data columns).<br />
<br />
The process of creating keys that group proteins and interactions into canonical groups was described after the original paper in the [[Canonicalization]] document. <br />
<br />
=== Provenance data ===<br />
<br />
Provenance data (where we retrieved source records from and how we mapped interactors and interactions to ROGIDs) is described in columns 37-42 (original and final references plus mapping scores).<br />
<br />
== License ==<br />
<br />
Data released on this public ftp site are released under the Creative <br />
Commons Attribution License http://creativecommons.org/licenses/by/2.5/. <br />
This means that you are free to use, modify and redistribute these data <br />
for personal or commercial use so long as you provide appropriate <br />
credit. See next section.<br />
<br />
<br />
Copyright © 2008-2011 Ian Donaldson<br />
<br />
== Citation ==<br />
<br />
Credit should include citing the iRefIndex paper (PMID 18823568) and any of the <br />
source databases upon which this resource is based. See <br />
http://irefindex.uio.no for appropriate citations.<br />
<br />
== Disclaimer ==<br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY <br />
WARRANTY; without even the implied warranty of MERCHANTABILITY or <br />
FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
== Description of PSI-MITAB2.6 file ==<br />
<br />
Each line in this file represents a single source database record that supports either:<br />
<br />
# an interaction between two proteins (binary interaction) or<br />
# the membership of a protein in some complex (complex membership) or<br />
# an interaction that involves only one protein type (multimer or self-interaction).<br />
<br />
=== Column number: 1 (uidA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Unique identifier for interactor A. <br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column contains an identifier, taken from a major database, for a protein representing the interactor A. A UniProt or a RefSeq accession is provided (in that order of preference) wherever possible. See column 3 for a list of prefixes that may be employed in this column in addition to the following:<br />
<br />
;<tt>complex</tt><br />
:If interactor A is being used to represent a complex, then the rogid for the complex will be listed here, such as the following:<br />
<br />
<pre>complex:xBr9cTXgzPLNxsaKiYyHcoEm/DM</pre><br />
<br />
[[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for an explanation.<br />
<br />
In rare cases, a rogid may appear here if a protein interactor has a sequence but no known, valid ''<tt>database:accession</tt>'' pair.<br />
<br />
=== Column number: 2 (uidB)===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Unique identifier interactor B.<br />
|-<br />
|Example: ||<pre>uniprotkb:P06722</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 1.<br />
<br />
=== Column number: 3 (altA)===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Alternative identifiers for interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367|refseq:NP_418591|entrezgene/locuslink:948691|rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333|irogid:12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
All ''<tt>database:accession</tt>'' pairs listed in Column 3 point to protein records that describe the exact same sequence from the same taxon.<br />
<br />
Each pipe-delimited entry is a database_name:accession pair delimited by a colon. Database names are taken from the MI controlled vocabulary at the following location:<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
Database references listed in this column may include the following:<br />
<br />
;<tt>uniprotkb</tt><br />
:The accessions this protein is known by in UniProt (http://www.uniprot.org/). More information regarding this protein can be retrieved using this accession from UniProt. See the AC line in the flat file. http://au.expasy.org/sprot/userman.html#AC_line. <br />
;<tt>refseq</tt><br />
:If a protein accession exists in the RefSeq data base (http://www.ncbi.nlm.nih.gov/RefSeq/) that reference is indicated here. More information about this protein can be obtained from RefSeq using this accession. <br />
;<tt>entrezgene/locuslink</tt><br />
:NCBI gene Identifiers for the gene encoding this protein. See ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq column GeneID given protein's accession.version<br />
;<tt><em>other</em></tt><br />
:If none of the three identifier types are available then other <tt><em>databasename</em>:<em>accession</em></tt> pairs will be listed. These database names may not follow the MI controlled vocabulary.<br />
<br />
Example:<br />
<br />
<pre>emb:CAA44868.1|gb:AAA23715.1|gb:AAB02995.1|emb:CAA56736.1|uniprot:P24991</pre><br />
<br />
;<tt>rogid</tt><br />
:Column 33 repeated here for convenience.<br />
<br />
;<tt>irogid</tt><br />
:Column 43 repeated here for convenience.<br />
<br />
{{Note|<br />
The rogid of a complex or a n-ary interaction is the rigid of that <br />
interaction. However, the irogid of the complex is not the irigid.<br />
The irogid for the complex is an integer and it is non-overlapping <br />
with any protein irogids<br />
}}<br />
<br />
=== Column number: 4 (altB)===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Alternative identifiers for interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P06722|refseq:NP_417308|entrezgene/locuslink:947299</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 3. (Columns 34 and 44 are related to this column.)<br />
<br />
=== Column number: 5 (aliasA) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Aliases for interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:MUTL_ECOLI|entrezgene/locuslink:mutL|crogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333|icrogid:12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each pipe-delimited entry is a <tt><em>database name</em>:<em>alias</em></tt> pair delimited by a <br />
colon. Database names are taken from the PSI-MI controlled vocabulary <br />
at the following location:<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
Database names and sources listed in this column may include the following:<br />
<br />
;<tt>uniprotkb:<em>entry name</em></tt><br />
:the entry name given by UniProt. See the description for "Entry name" in the section of http://au.expasy.org/sprot/userman.html#ID_line concerning the "ID (IDentification)" line of the flat file<br />
;<tt>entrezgene/locuslink:<em>symbol</em></tt><br />
:the NCBI gene symbol for the gene encoding this protein. See the section in ftp://ftp.ncbi.nlm.nih.gov/gene/README for <tt>gene_info</tt>, specifically details for the <tt>Symbol</tt> column<br />
;<tt>crogid</tt><br />
:Column 46 repeated here for convenience.<br />
;<tt>icrogid</tt><br />
:Column 49 repeated here for convenience.<br />
;<tt>other db:accession pairs</tt><br />
:Other db:accession pairs may be added (after icrogid) that all belong to the same canonical group. These are purely meant to facilitate look-up by PSICQUIC and other services - these sequences are related (but not identical) with interactor A sequence.<br />
;<tt>NA</tt><br />
:<tt>NA</tt> may be listed here if aliases are <em>not available</em><br />
<br />
=== Column number: 6 (aliasB) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Aliases for interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:MUTH_ECOLI|entrezgene/locuslink:mutH</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 5. (Columns 47 and 50 are related to this column.)<br />
<br />
=== Column number: 7 (Method) ===<br />
<br />
{|<br />
|Column type: ||String <br />
|-<br />
|Description: ||Interaction detection method<br />
|-<br />
|Example: ||<pre>MI:0039(2h fragment pooling)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
Only a single method will appear in this column. Previously, multiple methods appeared.<br />
}}<br />
<br />
Both the controlled vocabulary term identifier for the method (e.g. MI:0399) and the controlled vocabulary term short label in brackets (e.g. 2h fragment pooling) will appear in this column. See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to look up controlled vocabulary term identifiers.<br />
<br />
The interaction detection method is from the original record. Path for PSI-MI 2.5:<br />
<br />
<pre>entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel/</pre><br />
<br />
<br />
{{Note|<br />
If a controlled vocabulary term identifier was not provided by the source database then an attempt was made to use the supplied short label to find the correct term identifier. If a term identifier could not be found, then <tt>MI:0000</tt> will appear before the shortLabels.<br />
}}<br />
<br />
<tt>NA</tt> or <tt>-1</tt> may appear in place of a recognised shortLabel.<br />
<br />
For example:<br />
<br />
<pre><br />
MI:0000(-1)<br />
MI:0000(NA)<br />
</pre><br />
<br />
=== Column number: 8 (author) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||<br />
|-<br />
|Example: ||<pre>hall-1999-1|hall-1999-2|mansour-2001-1|mansour-2001-2|hall-1999</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
According to MITAB2.6 format this column should contain a pipe-delimited list of author surnames in which the interaction has been shown.<br />
<br />
{{Note|<br />
This column will usually include only one author name reference. However, some experimental evidences have secondary references which could be included here.<br />
This filed also includes references which are not author names as in the following examples:<br />
* OPHID Predicted Protein Interaction<br />
* HPRD Text Mining Confirmation<br />
* MINT Text Mining Confirmation<br />
}}<br />
<br />
=== Column number: 9 (pmids) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||PubMed Identifiers<br />
|-<br />
|Example: ||<pre>pubmed:9880500|pubmed:11585365</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is a non-redundant list of PubMed identifiers pointing to literature that supports the interaction. <br />
According to MITAB2.6 format, this column should contain a pipe-delimited set of <tt><em>databaseName</em>:<em>identifier</em></tt> pairs such as <tt>pubmed:12345</tt>.<br />
The source database name is always <tt>pubmed</tt>.<br />
<br />
{{Note|<br />
This column will usually include only one PubMed reference that describes where the experimental evidence is found. In some cases, secondary references are provided by the source database and will be included here.<br />
}}<br />
<br />
<br />
The special value <tt>-</tt> may appear in place of the identifiers.<br />
<br />
=== Column number: 10 (taxa) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Taxonomy identifier for canonical interactor A<br />
|-<br />
|Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre><br />
<br />
|}<br />
<br />
'''Notes'''<br />
<br />
The NCBI taxonomy identifier listed here is that of the sequence record for the interactor and may be corrected from what was provided by the source database. See the methods section of the iRefIndex paper for more details. See also the NCBI taxonomy database at the following location:<br />
<br />
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy<br />
<br />
According to the MITAB2.6 format, this column should contain a pipe delimited set of <tt><em>databaseName</em>:<em>identifier</em></tt> pairs such as <tt>taxid:12345</tt>. The source database name has been listed as taxid since it is always NCBI's taxonomy database. The value in this column will be <tt>NA</tt> if the interactor is a complex.<br />
<br />
=== Column number: 11 (taxb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Taxonomy identifier for canonical interactor B<br />
|-<br />
|Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 10.<br />
<br />
=== Column number: 12 (interactionType) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Interaction Type from controlled vocabulary or short label<br />
|-<br />
|Example: ||<pre>MI:0218(physical interaction)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
Only one interaction type will be present in each line of the file.<br />
}}<br />
<br />
The interaction type is taken from the PSI-MI controlled vocabulary and represented as...<br />
<br />
<pre>database:identifier(interaction type)</pre><br />
<br />
...(when available in the interaction record) or Path for PSI-MI 2.5:<br />
<br />
<pre>entrySet/entry/interactionList/interaction/interactionType/names/shortLabel</pre><br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for interaction types.<br />
<br />
{{Note|<br />
If the MI controlled vocabulary identifier was not provided by the source database, but a text description was provided, then an attempt was made to map the text to the correct controlled vocabulary term identifier.<br />
If this was not possible then <tt>MI:0000</tt> is listed.<br />
|Change}}<br />
<br />
<tt>NA</tt> may be listed here if the interaction type is not available (meaning that we could not find the interaction type in the record provided by the source database).<br />
<br />
=== Column number: 13 (sourcedb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Source database for this interaction record <br />
|-<br />
|Example: ||<pre>MI:0469(intact)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Taken from the PSI-MI controlled vocabulary and represented as...<br />
<br />
<pre>database:identifier(source name)</pre><br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for database sources.<br />
<br />
{{Note|<br />
Only one source database will be listed in each row.<br />
|Change}}<br />
<br />
=== Column number: 14 (interactionIdentifier) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||source interaction-database and accession<br />
|-<br />
|Example: ||<pre>intact:EBI-761694|rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA|irigid:1234|edgetype:X</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each reference is presented as a <tt><em>database name</em>:<em>identifier</em></tt> pair. <br />
<br />
{{Note|<br />
The source database is listed first. Additional information is pipe-delimited and presented here for the convenience of PSICQUIC web-service users (these services presently truncate this file at column 15 as they only support MITAB2.5). See columns 35,45,53. <br />
|Change}}<br />
<br />
The source database names that appear in this column are taken from the<br />
PSI-MI controlled vocabulary at the following location (where possible):<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
If an interaction record identifier is not provided by the source database, this entry will appear as <tt><em>database-name</em>:-</tt> with the identifier region replaced with a dash (<tt>-</tt>).<br />
<br />
=== Column number: 15 (confidence) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Confidence scores<br />
|-<br />
|Example: ||<pre>lpr:1|hpr:12|np:1|PSICQUIC entries are truncated here. See irefindex.uio.no</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each reference is presented as a <tt>''scoreName'':''score''</tt> pair. Three confidence <br />
scores are provided: <tt>lpr</tt>, <tt>hpr</tt> and <tt>np</tt>.<br />
<br />
PubMed Identifiers (PMIDs) point to literature references that support <br />
an interaction. A PMID may be used to support more than one interaction. <br />
<br />
The lpr score (lowest PMID re-use) is the lowest number of distinct <br />
interactions (RIGIDs: see column 35) that any one PMID (supporting the <br />
interaction in this row) is used to support. A value of one indicates <br />
that at least one of the PMIDs supporting this interaction has never <br />
been used to support any other interaction. This likely indicates that <br />
only one interaction was described by that reference and that the <br />
present interaction is not derived from high throughput methods.<br />
<br />
The hpr score (highest PMID re-use) is the highest number of <br />
interactions (RIGIDs: see column 35) that any one PMID (supporting the <br />
interaction in this row) is used to support. A high value (e.g. greater <br />
than 50) indicates that one PMID describes at least 50 other <br />
interactions and it is more likely that high-throughput methods were <br />
used.<br />
<br />
The np score (number PMIDs) is the total number of unique PMIDs used to <br />
support the interaction described in this row.<br />
<br />
<tt>-</tt> may appear in the score field, indicating the absence of a score value.<br />
<br />
----<br />
<br />
{{Note|<br />
COLUMNS PAST THIS POINT (16 - 31) ARE PART OF THE NEW PSI-MITAB 2.6 FORMAT<br />
|Note}}<br />
<br />
=== Column number: 16 (expansion) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Model used to convert n-ary data into binary data for purpose of export in MITAB file<br />
|-<br />
|Example: ||<pre>bipartite</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
For iRefIndex, this column will always contain either <tt>bipartite</tt> or <tt>none</tt>.<br />
<br />
Other databases may use either <tt>spoke</tt> or <tt>matrix</tt> or <tt>none</tt> in this column.<br />
<br />
See <br />
[[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for an explanation.<br />
<br />
=== Column number: 17 (biological_role_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Biological role of interactor A<br />
|-<br />
|Example: ||<pre>MI:0501(enzyme)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
When provided by the source database, this includes single entries such as <tt>MI:0501(enzyme)</tt>, <tt>MI:0502(enzyme target)</tt>, <tt>MI:0580(electron acceptor)</tt>, or <tt>MI:0499(unspecified role)</tt>.<br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to browse possible values for biological role.<br />
<br />
For complexes and when no role is explicitly specified this column will contain the following:<br />
<br />
MI:0000(unspecified)<br />
<br />
=== Column number: 18 (biological_role_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Biological role of interactor B<br />
|-<br />
|Example: ||<pre>MI:0501(enzyme)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 17.<br />
<br />
=== Column number: 19 (experimental_role_A) ===<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Indicates the experimental role of the interactor (such as bait or prey). <br />
|-<br />
|Example: ||<pre>MI:0496(bait)</pre><br />
|-<br />
|Example: ||<pre>MI:0498(prey)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column indicates the experimental role (if any was provided by the source database) that was played by interactor A.<br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to see definitions of bait and prey.<br />
as well as browse other possible values of experimental role that may appear in this column for other databases.<br />
<br />
For complexes and when no role is explicitly specified this column will contain the following:<br />
<br />
MI:0000(unspecified)<br />
<br />
=== Column number: 20 (experimental_role_B) ===<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Indicates the experimental role of the interactor (such as bait or prey). <br />
|-<br />
|Example: ||<pre>MI:0496(bait)</pre><br />
|-<br />
|Example: ||<pre>MI:0498(prey)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column indicates the experimental role (if any) that was played by interactor B.<br />
<br />
See notes above for column 19.<br />
<br />
=== Column number: 21 (interactor_type_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||describes the type of molecule that A is <br />
|-<br />
|Example: ||<pre>MI:0326(protein)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
For iRefIndex, this will always be one of...<br />
<br />
<pre><br />
MI:0326(protein)<br />
MI:0315(protein complex)<br />
</pre><br />
<br />
=== Column number: 22 (interactor_type_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||describes the type of molecule that B is <br />
|-<br />
|Example: ||<pre>MI:0326(protein)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See column 21.<br />
<br />
=== Column number: 23 (xrefs_A) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for molecule A <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list cross-references to annotation information for molecule A. For example, Gene Ontology identifiers or OMIM identifiers.<br />
<pre>omim:152430(longevity)|go:"GO:0016233"(telomere capping)</pre><br />
<br />
=== Column number: 24 (xrefs_B) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for molecule B <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
See notes to column 23.<br />
<br />
=== Column number: 25 (xrefs_Interaction) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for the interaction <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list cross-references to annotation information for the interaction. For example, Gene Ontology identifiers or OMIM identifiers.<br />
<pre>go:"GO:0048786"(presynaptic active zone)</pre><br />
<br />
=== Column number: 26 (Annotations_A) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Annotations for molecule A <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list free-text annotation information for the interaction. For example:<br />
<pre>This protein has an apparent MW of 25 kDa|This protein binds 7 zinc molecules</pre><br />
<br />
Some databases may use <tt>dataset:<em>*</em></tt> or <tt>data-processing:<em>*</em></tt> (where <tt>*</tt> is non-controlled free-text) in this column.<br />
<br />
=== Column number: 27 (Annotations_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Annotations for molecule B <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
See notes to column 26.<br />
<br />
=== Column number: 28 (Annotations_Interaction) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Annotations for interaction <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list free-text annotation information for the interaction. For example:<br />
<pre>figure-legend:F1A|prediction score:432|comment:prediction based on phage display consensus|author-confidence:8|comment:AD-ORFeome library used in the experiment.</pre><br />
The prefixes used before the <tt>:</tt> (like "comment") are database specific and not controlled.<br />
<br />
Some databases may use ''<tt>dataset:*</tt>'' or ''<tt>data-processing:*</tt>'' (where <tt>*</tt> is non-controlled free-text) in this column.<br />
<br />
=== Column number: 29 (Host_organism_taxid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||The taxonomy identifier of the host organism where the interaction was experimentally demonstrated<br />
|-<br />
|Example: || <pre>taxid:10090(Mus musculus)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This may differ from the taxonomy identifier associated with the interactors. Other possible entries are: <br />
<br />
* <tt>taxid:-1(in vitro)</tt><br />
* <tt>taxid:-4(in vivo)</tt><br />
<br />
A dash (<tt>-</tt>) will be used when no information about the host organism is available.<br />
<br />
<tt>taxid:32644(unidentified)</tt> will be used when the source specifies the host organism taxonomy identifier as 32644.<br />
<br />
=== Column number: 30 (parameters_Interaction) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Parameters for the interaction<br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
Internal note : use of this column is not well-defined or characterized.<br />
<br />
=== Column number: 31 (Creation_date) ===<br />
<br />
{|<br />
|Column type: ||String (yyyy/mm/dd)<br />
|-<br />
|Description: ||When was the entry created.<br />
|-<br />
|Example: || <pre>2010/05/06</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This will be the release date of iRefIndex for all entries in this file. <br />
<br />
This date will not match the date for the corresponding record in the source database.<br />
<br />
=== Column number: 32 (Update_date) ===<br />
<br />
{|<br />
|Column type: ||String (yyyy/mm/dd)<br />
|-<br />
|Description: ||When was this record last updated?<br />
|-<br />
|Example: || <pre>2010/05/06</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This will be the release date of iRefIndex for all entries in this file. <br />
<br />
This date will not match the date for the corresponding record in the source database.<br />
<br />
=== Column number: 33 (Checksum_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for interactor A. <br />
|-<br />
|Example: ||<pre>rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
This column contains a universal key for interactor A .<br />
|Note}}<br />
<br />
This column may be used to identify other interactors in this file that have the exact same amino acid sequence and taxon id. <br />
<br />
This universal key listed here is the ROGID (redundant object group identifier) described in the original iRefIndex paper, PMID 18823568. <br />
<br />
Column 3 lists database names and accessions that all have this same key. <br />
<br />
The ROGID for proteins, consists of the base-64 version of the SHA-1 key for the protein sequence concatenated with the taxonomy identifier for the protein. For complex nodes, the ROGID is calculated as the SHA-1 digest of the ROGIDs of all the protein participants (after first ordering them by ASCII-based lexicographical sorting in ascending order and concatenating them) See the iRefIndex paper for details. The SHA-1 key is always 27 characters long. So the ROGID will be composed of 27 characters concatenated with a taxonomy identifier for proteins.<br />
<br />
=== Column number: 34 (Checksum_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for interactor B. <br />
|-<br />
|Example: ||<pre>rogid:AhmYiMtz8lR12Gixt91txbAd3JY83333</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
See notes for column 33.<br />
<br />
=== Column number: 35 (Checksum_Interaction) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for this interaction<br />
|-<br />
|Example: ||<pre>rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column may be used to identify other rows (interaction records) in this file that describe interactions between the same set of proteins from the same taxon id.<br />
<br />
This universal key listed here is the RIGID (redundant interaction group identifier) described in the original iRefIndex paper, PMID 18823568. <br />
<br />
The RIGID consists of the ROG identifiers for each of the protein participants (see notes above) ordered by ASCII-based lexicographic sorting in ascending order, concatenated and then digested with the SHA-1 algorithm. See the iRefIndex paper for details. This identifier points to a set of redundant protein-protein interactions that involve the same set of proteins with the exact same primary sequences.<br />
<br />
=== Column number: 36 (Negative) ===<br />
<br />
{|<br />
|Column type: || Boolean (true or false)<br />
|-<br />
|Description: ||Does the interaction record provide evidence that some interaction does NOT occur.<br />
|-<br />
|Example: ||<pre>false</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This value will be false for all lines in this file since iRefIndex does not include "negative" interactions from any of the source databases.<br />
<br />
<hr><br />
<br />
{{Note|<br />
COLUMNS PAST THIS POINT (37 -) ARE NOT DEFINED BY THE PSI-MITAB2.6 STANDARD.<br />
THESE COLUMNS ARE SPECIFIC TO THIS IREFINDEX RELEASE AND MAY CHANGE FROM ONE RELEASE TO ANOTHER<br />
|Important}}<br />
<br />
=== Column number: 37 (OriginalReferenceA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used in the original interaction record to describe interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is the protein reference that was found in the original interaction record to describe interactor A. It is a colon-delimited pair of database name and accession. It may be either the primary or secondary reference for the protein provided by the source database.<br />
<br />
For complexes this will be the ROGID of the complex.<br />
<br />
=== Column number: 38 (OriginalReferenceB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used in the original interaction record to describe interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 37.<br />
<br />
=== Column number: 39 (FinalReferenceA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used by iRefIndex to describe interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Column 37 (OriginalReferenceA) was used by the iRefIndex consolidation process to arrive at this FinalReferenceA. <br />
This database name and accession pair will usually be the same as that listed in column 37, unless the provided reference was malformed, had to be updated or was ambiguous.<br />
<br />
Examples:<br />
<br />
# The original reference is malformed. For example: <tt>RefSeq:NP 036076</tt> instead of <tt>RefSeq:NP_036076</tt>.<br />
# The original reference is incomplete. For example: <tt>PDB:1KQ1|</tt> (missing chain information). <br />
# The original reference is deprecated. For example: <tt>UniProt:Q9H233</tt> (the value of FinalReferenceA will be the latest available accession in this case).<br />
# The original reference is ambiguous. For example: a gene identifier is provided (the value of FinalReferenceA will be a protein product selected in a systematic way in this case).<br />
<br />
=== Column number: 40 (FinalReferenceB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used by iRefIndex to describe interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 39.<br />
<br />
=== Column number: 41 (MappingScoreA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (columns 37) to the final protein reference (columns 39). <br />
|-<br />
|Example: ||<pre>PTUO+</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column contains a description of mapping operations as a condensed string of letters. See the original iRefIndex paper, PMID 18823568. <br />
For complexes, this column will contain <pre>-</pre>.<br />
<br />
=== Column number: 42 (MappingScoreB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (column 38) to the final protein reference (column 40). <br />
|-<br />
|Example: ||<pre>SU</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 41.<br />
<br />
=== Column number: 43 (irogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for interactor A. <br />
|-<br />
|Example: ||<pre>2345</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
This is an internal, integer-equivalent of the alphanumeric identifier in column 33 for interactor A. All interactors with the same sequence and taxon origin will have the same irogid.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 44 (irogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for interactor B.<br />
|-<br />
|Example: ||<pre>456543</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 43.<br />
<br />
=== Column number: 45 (irigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer RIGID for this interaction.<br />
|-<br />
|Example: ||<pre>1234</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is an internal, integer-equivalent of the alphanumeric identifier in column 35 for this interaction. All interactions involving the same interactors (same sequence and same taxon) will have the same irigid.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 46 (crogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric ROGID for the canonical group to which interactor A belongs. <br />
|-<br />
|Example: ||<pre>hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column may be used to identify other interactors in this file that all belong to the same canonical group.<br />
<br />
Members of a canonical group may include splice isoform products from the same or related genes. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the same taxon). One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in this column.<br />
<br />
See http://irefindex.uio.no/wiki/Canonicalization for a description of canonicalization.<br />
<br />
=== Column number: 47 (crogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric ROGID for the canonical group to which interactor B belongs. <br />
|-<br />
|Example: ||<pre>AhmYiMtz8lR12Gixt91txbAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 46.<br />
<br />
=== Column number: 48 (crigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric RIGID for the canonical group to which this interaction belongs. <br />
|-<br />
|Example: ||<pre>3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is the RIGID for this interaction calculated using the canonical ROGIDs (preceding two columns).<br />
<br />
This column may be used to identify other interactions in this file that all belong to the same canonical group.<br />
<br />
<br />
=== Column number: 49 (icrogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for the canonical group to which interactor A belongs. <br />
|-<br />
|Example: ||<pre>2345</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
This is an internal, integer-equivalent of the alphanumeric canonical ROGID in column 46 for interactor A. Interactors with the same icrogid may have different sequences but are related; e.g. different splice isoforms of the same gene.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 50 (icrogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for the canonical group to which interactor B belongs. <br />
|-<br />
|Example: ||<pre>456543</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 49.<br />
<br />
=== Column number: 51 (icrigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer RIGID for the canonical group to which this interaction belongs. <br />
|-<br />
|Example: ||<pre>12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is an internal, integer-equivalent of the canonical RIGID. See column 48.<br />
<br />
This integer may be used to query the iRefWeb interface for the interaction record. For example:<br />
<br />
http://wodaklab.org/iRefWeb/interaction/show/13653<br />
<br />
...where 13653 is the integer, canonical RIGID.<br />
<br />
This identifier serves to group together evidence for interactions that involve the same set (or a related set) of proteins.<br />
<br />
Starting with release 6.0, this canonical RIGID is stable from one release of iRefIndex to another.<br />
<br />
=== Column number: 52 (imex_id) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||IMEx identifier if available<br />
|-<br />
|Example: ||<pre>imex:IM-12202-3</pre><br />
|-<br />
|Example: ||<pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
When no information available a dash (<tt>-</tt>) will be used.<br />
<br />
=== Column number: 53 (edgetype) ===<br />
<br />
{|<br />
|Column type: ||Character<br />
|-<br />
|Description: ||Does the edge represent a binary interaction (X), member of complex (C) data, or a multimer (Y)?<br />
|-<br />
|Example: ||<pre>X</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Edges can be labelled as either <tt>X</tt>, <tt>C</tt> or <tt>Y</tt>:<br />
<br />
;<tt>X</tt><br />
:a binary interaction with two protein participants<br />
<br />
;<tt>C</tt><br />
:denotes that this edge is a binary expansion of interaction record that had 3 or more interactors (so-called "complex" or "n-ary" data). The expansion type is described in column 16 (expansion). In the case of iRefIndex, the expansion is always "bipartite" meaning that Interactor A of this row represents the complex itself and Interactor B represents a protein that is a member of this group.<br />
See [[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for further explanation.<br />
<br />
;<tt>Y</tt><br />
:for dimers and polymers. In case of dimers and polymers when the number of subunits is not described in the original interaction record, the edge is labelled with a <tt>Y</tt>. Interactor A will be identical to the Interactor B. The graphical representation of this will appear as a single node connected to itself (loop). The actual number of self-interacting subunits may be 2 (dimer) or more (say 5 for a pentamer). Refer to the original interaction record for more details and see column 54.<br />
<br />
=== Column number: 54 (numParticipants) ===<br />
<br />
{|<br />
|Column type: ||Integer<br />
|-<br />
|Description: ||Number of participants in the interaction<br />
|-<br />
|Example: ||<pre>2</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
* For edges labelled <tt>X</tt> (see column 53) this value will be two. <br />
* For edges labelled <tt>C</tt>, this value will be equivalent to the number of protein interactors in the original n-ary interaction record.<br />
* For interactions labelled <tt>Y</tt>, this value will either be the number of self-interacting subunits (if present in the original interaction record) or 1 where the exact number of subunits is unknown or unspecified.<br />
<br />
{{Note|<br />
The number of participants can be greater than the number of distinct proteins involved in an interaction because a single protein can participate more than once in an interaction. Such participation is enumerated and counted to produce the value in this column.<br />
|Important}}<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=README_MITAB2.6_for_iRefIndex_9.0&diff=4091
README MITAB2.6 for iRefIndex 9.0
2012-06-07T12:42:13Z
<p>PaulBoddie: /* Column number: 17 (biological_role_A) */ Added explicit unspecified value details.</p>
<hr />
<div><div class="floatright" style="text-align: center"><br />
<br />
'''iRefIndex 9.0 Downloads'''<br />
<imagemap><br />
Image:Document-save-80x80.png<br />
default [ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/psimi_tab/MITAB2.6/]<br />
</imagemap><br />
<br />
'''Parsing MITAB Format Data'''<br />
<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[iRefIndex_MITAB2.6_Parser]]<br />
</imagemap><br />
</div><br />
<br />
Last edited: {{REVISIONYEAR}}-{{padleft:{{REVISIONMONTH}}|2}}-{{REVISIONDAY2}}<br />
<br />
Applies to iRefIndex release: 9.0<br />
<br />
Release date: 2011-11-07<br />
<br />
Download location: ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/psimi_tab/MITAB2.6/<br>(use <tt>anonymous</tt> as the login and your email address as the password)<br />
<br />
Authors: Ian Donaldson, Sabry Razick, Paul Boddie<br />
<br />
Database: iRefIndex (http://irefindex.uio.no)<br />
<br />
Organization: Biotechnology Centre of Oslo, University of Oslo <br />
(http://www.biotek.uio.no/) <br />
<br />
[[#Description|License of the source database]].<br />
<br />
== <span style="color:#0f0086"> Description </span> ==<br />
<br />
This file describes the contents of the <br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/current/psimi_tab/<br />
<br />
directory and the format of the tab-delimited text files contained within. Each index file follows the PSI-MITAB2.6 format with additional columns for annotating edges and nodes. Assignment of source interaction records to these redundant groups is described at http://irefindex.uio.no. The PSI-MI2.6 format plus additional columns is described below.<br />
<br />
A supplementary file lists just database:accession pairs for proteins and their mapping to irog, icrog and Entrez Gene identifiers. See<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/current/Mappingfiles/<br />
<br />
and README at<br />
<br />
http://irefindex.uio.no/wiki/Protein_identifier_mapping<br />
<br />
This file is precalculated from the MITAB distribution as a convenience to users.<br />
<br />
Details on the build process are available from the publication PMID 18823568.<br />
<br />
This distribution includes data consolidated using the iRefIndex method for BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPPI, MPact and OPHID.<br />
<br />
<br />
{|<br />
|Sources || http://irefindex.uio.no/wiki/Sources_iRefIndex_8.0<br />
|-<br />
|Statistics || http://irefindex.uio.no/wiki/Statistics_iRefIndex_8.0<br />
|-<br />
|Download location || ftp://ftp.no.embnet.org/irefindex/data/current/psimi_tab/<br><br />
|}<br />
<br />
== Directory contents ==<br />
<br />
{|<br />
|<tt>README</tt> ||pointer to this file<br />
|-<br />
|<tt>xxxx.mitab.mmddyyyy.txt.zip</tt> ||individual indices in PSI-MITAB2.6 format<br><br />
|}<br />
<br />
iRefIndex data is distributed as a set of tab-delimited text files with names of the form <tt>xxxx.mitab.mmddyyyy.txt.zip</tt> where <tt>mmddyyyy</tt> represents the file's creation date.<br />
<br />
The complete index is available as <tt>All.mitab.mmddyyyy.txt.zip</tt> .<br />
<br />
Taxon specific data sets are also available for:<br />
<br />
{|<br />
| ||'''Taxon Id'''<br />
|-<br />
|Homo sapiens ||9606 (human)<br />
|-<br />
|Mus musculus ||10090 (mouse)<br />
|-<br />
|Rattus norvegicus ||10116 (brown rat)<br />
|-<br />
|Caenorhabditis elegans ||6239 (nematode)<br />
|-<br />
|Drosophila melanogaster ||7227 (fruit fly)<br />
|-<br />
|Saccharomyces cerevisiae ||4932 (baker's yeast)<br />
|-<br />
|Saccharomyces cerevisiae S288c ||559292<br />
|-<br />
|Escherichia coli. ||562 (E. Coli)<br />
|-<br />
|Other ||other<br />
|-<br />
|All ||all<br />
|}<br />
<br />
Taxon specific subsets of the data are named <tt>xxxx.mitab.mmddyyyy.txt.zip</tt> where <tt>xxxx</tt> is the taxonomy identifier of at least one of the interactors according to either the source interaction database or the sequence database record. Each zip compressed file contains a single text file with the corresponding name <tt>xxxx.mitab.mmddyyyy.txt</tt>.<br />
<br />
In some cases, other objects may belong to other taxons if a virus-host interaction is being represented or if a protein from another organism has been used to model a protein in the specified organism. <br />
<br />
Taxonomy identifiers are provided in the data sets allowing these exceptions to be identified. The taxonomy identifiers listed are derived from the source protein sequence record. In some cases, this taxonomy identifier will be a child of the taxon listed in the file's title; for example, Escherichia coli K12 (taxonomy identifier 83333) will appear in the Escherichia coli (taxonomy identifier 562) file.<br />
<br />
A description of the NCBI taxon identifiers is available at the following location:<br />
<br />
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy <br />
<br />
The above data taxon division scheme leads to duplications; for instance, an interaction present in the mouse index could also appear in the human index if the interaction record lists protein sequence records from both human and mouse. The <tt>All.mitab.mmddyyyy</tt> file is a complete and non-redundant listing. <br />
<br />
The data format and divisions provided in this initial release were chosen in the hopes that they would be immediately useful to the largest possible set of users. Other formats and divisions are possible and we welcome your input on future releases.<br />
<br />
== Changes from last version ==<br />
<br />
This is the third release of iRefIndex in PSI-MITAB2.6 format.<br />
<br />
* RIGIDs in previous releases of iRefIndex were [[Bugzilla:242|incorrectly computed]]. Although the properties of such RIGIDs were not compromised - distinct RIGIDs should still have referred to distinct interactions - each RIGID made use of substantially less information from its components. RIGIDs in this release should now be computed correctly.<br />
* [[Bugzilla:245|Duplicate lines]] are now no longer produced in the MITAB output. Previously, database records containing additional information not reproduced in the MITAB output were written to the files on a record-by-record basis. However, since these individual records provide no useful additional information purely through their presence, and since the result is merely a collection of redundant records, lines which are the same as others are now filtered out when writing the MITAB files.<br />
* Many proteins previously assigned the 4932 taxonomy identifier have been [[Bugzilla:247|recategorised]] as having taxonomy identifier 559292. Thus, for convenience, an additional 559292 file is produced alongside the existing (but substantially smaller) 4932 file to hold interactions involving proteins associated with both taxons.<br />
* [[Bugzilla:248|Interactions not involving proteins associated with a specific organism]] are now excluded from organism-specific files. Note that complexes may consist of a number of lines where interactors may have a different taxonomy identifier from that of the specific file being consulted, but in such cases there will always be a member of the complex labelled with the appropriate taxonomy identifier, and thus the complex describes a "mixed species" interaction which should be retained just as binary interactions are where one participant is native to the file and the other is "foreign".<br />
* Previously, PubMed identifiers were being given as interaction detection methods for CORUM-originating interactions. This has now been [[Bugzilla:249|resolved]].<br />
<br />
References:<br />
<br />
* http://code.google.com/p/psimi/issues/detail?id=2<br />
* http://code.google.com/p/psimi/wiki/PsimiTabFormat<br />
<br />
=== Mapping to Legacy RIGIDs ===<br />
<br />
A mapping from current to legacy RIGIDs is provided on the FTP site as <tt>legacy.txt</tt> at the following location:<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/<br />
<br />
This file contains all non-canonical iRefIndex 9.0 RIGIDs mapped to legacy RIGIDs which have been computed for the interactions. Note that many of the legacy RIGIDs may not exist in previous releases of iRefIndex because other changes in the underlying data (such as taxonomy identifier changes) have occurred. Thus, even if the RIGID computation method had remained the same in iRefIndex 9.0 as in previous releases, many RIGIDs would have changed in iRefIndex 9.0 anyway.<br />
<br />
Other files are also provided to give a specific mapping from iRefIndex 9.0 to iRefIndex 8.0 and can be found at the following location:<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/Mappingfiles/<br />
<br />
== Known Issues ==<br />
<br />
* We have replaced the pipe character (<tt>|</tt>) of the PDB identifiers with an underscore character (<tt>_</tt>) <br />
<br />
This decision was taken to avoid unexpected parsing problems: the PSI-MITAB format uses pipes (<tt>|</tt>) as a separator character where multiple values occur in the same column.<br />
<br />
As a result, column number 37 (OriginalReferenceA) and column number 38 (OriginalReferenceB) may differ from the original reference in such cases.<br />
<br />
== Understanding the iRefIndex MITAB format ==<br />
<br />
iRefIndex is distributed in PSI-MITAB format. Version 2.5 of the format was originally described in PMID 17925023 ([http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2189715/?tool=pubmed full text]). This file describes the columns defined by version 2.6 of the PSI-MITAB format plus columns added by iRefIndex.<br />
<br />
Since the PSI-MITAB format allows for only two interactors to be described on each line, it is best suited for describing binary interaction data (the original experiment, say yeast two hybrid, gives a binary readout). However, other source PSI-MI XML source records will describe interactions involving only one interactor type (dimers or multimers) or they will contain associative (also known as "n-ary") interaction data from, for example, immunoprecipitation experiments where the exact interactions between any pair of interactors are unknown. These cases are problematic for the PSI-MITAB format. This document describes exactly how we use the MITAB format to describe these alternate (non-binary) interaction types.<br />
<br />
=== What each line represents ===<br />
<br />
Each line or row in the MITAB file represents a ''single'' interaction record from one primary data source describing an interaction involving the exact same set of proteins (as defined by their primary sequence and taxonomy identifiers).<br />
<br />
{{Note|Each line in this file represents a single source database record that supports either:<br />
<br />
# an interaction between two proteins (binary interaction) or<br />
# the membership of a protein in some complex (complex membership) or<br />
# an interaction that involves only one protein type (multimer or self-interaction).<br />
<br />
|Important}}<br />
<br />
<br />
Each row in this table has a natural key pointing to an original interaction record in some source database that is listed under column 14 (interactionIdentifier). For example:<br />
<br />
intact:EBI-761694<br />
<br />
{{Note|<br />
Prior to release 7.0, each line represented a ''group'' of interaction records involving the exact same set of proteins (as defined by their primary sequence and taxonomy identifiers). This ''collapsed'' or non-redundant format did not allow us to easily describe meta-data associated with each source record. Therefore, we have moved to this ''expanded'' or redundant version. Users can still collapse multiple rows that all provide evidence for an interaction between the same set of proteins using the keys provided (for example, RIGIDs).<br />
}}<br />
<br />
Rows in this table that all provide evidence for an interaction between the same set of proteins can be identified using the RIGID key (redundant interaction group identifier). The RIGID is a 27 character key that is derived from the ROGIDs of the interactors involved in the interaction record. The ROGID is a SHA-1 digest of the protein interactor's primary amino acid sequence concatenated with the NCBI taxonomy identifier (see the paper for details).<br />
<br />
{{Note|<br />
The RIGID key is now listed (by itself) in column 35 (Checksum_Interaction) as part of the new extended PSI-MITAB format. This is a universal key that can be generated by each and every interaction database and may be included in MITAB2.6 distributions from other source databases. The intention of this key is to aid third party integration of data collected from multiple databases (for example, from PSICQUIC web services). <br />
}}<br />
<br />
=== Representation of interactions ===<br />
<br />
==== Binary interaction data ====<br />
<br />
This is the most common data type.<br />
<br />
For binary interaction data, column 53 (edgetype) will contain an X. Interactors A and B will list the two proteins for which interaction evidence is provided in the row. User's should pay close attention to columns 12 (interactionType) and 7 (Method) when deciding what binary data they wish to accept as evidence of a direct physical interaction.<br />
<br />
==== Complexes (a.k.a. n-ary data) ====<br />
<br />
Certain experimental methods (like immunoprecipitations) provide evidence that a list of 3 or more proteins are associated but cannot provide evidence for a direct interaction between any given pair of proteins in that list. <br />
<br />
In these cases, interactor A (column 1) is used as a placeholder to represent the ''complex'' or ''list'' of proteins while interactor B is used to list one of the members of the list: therefore, the entire ''n-ary interaction record'' is described using one row for each interactor. Each of these rows will have the same ''interactor A''. This method of representation is referred to as a '''bi-partite model''' since there are two kinds of nodes corresponding to complexes and proteins. <br />
<br />
These interactions are marked by a C in column 53 (edgetype).<br />
<br />
As an example, let’s say that a source interaction record contained interactors A, B and C found by affinity purification and mass-spec where a tagged version of protein A was used as the bait protein to perform the immunoprecipitation. <br />
<br />
Then we would represent the complex in the MITAB file using three lines:<br />
<br />
X-A<br />
X-B<br />
X-C<br />
<br />
All three entries would have the same string in column 1 (the RIGID for the complex). All three entries would have a C in column 53 (edgetype).<br />
<br />
Other databases take an interaction record with multiple interactors (n-ary data) and make a list of binary interactions (based on the spoke or matrix model) and then list these binary interactions in the MITAB. For the example above, using a '''spoke model''' to transform the data into a set of binary interactions, these data would be represented using two lines in the MITAB file:<br />
<br />
A-B<br />
A-C<br />
<br />
Here A is chosen as the "hub" of the spoke model since it was the "bait" protein. For experimental systems that do not have "baits" and "preys" (such as X-ray crystallography), an arbitrary protein might be chosen as the bait.<br />
<br />
Alternatively, a '''matrix model''' might be used to transform the n-ary data into a list of binary interactions. Here all pairwise combinations of interactors in the original n-ary data are represented as binary interactions. So, in the above example, the immunoprecipitated complex would be represented using three lines of the MITAB file:<br />
<br />
A-B<br />
B-C<br />
A-C<br />
<br />
All three methods for representing n-ary data in a MITAB file (bi-partite, spoke, and matrix) are different representations of the same data. The model type that is chosen to describe n-ary data is listed in column 16 (expansion) of the MITAB2.6 format.<br />
<br />
We have chosen to use the bi-partite method of representation so that it is impossible to mistake spoke or matrix binary entries for true binary entries; the identifiers used for complexes will, of course, not appear in a protein database and any programme that tries to treat complex identifiers as though they were protein identifiers will fail. The method allows you to reconstruct the members of the original interaction record that describes a complex of proteins (say from an affinity purification experiment). From there, you can choose to make a spoke or matrix model by yourself if you want. <br />
<br />
Users are advised that other databases may use spoke and matrix model representations of complexes in the MITAB format. <br />
<br />
==== Intramolecular interactions and multimers ====<br />
<br />
These row types form a minority of the data and are rare incomparison to the above types.<br />
<br />
Sometimes source interaction records in PSI-MI format only list one interactor. These are cases where either<br />
<br />
<ol><br />
<li>an intra-molecular interaction is being represented or</li><br />
<li>a multimer (3 or more) of some protein is being represented.</li><br />
</ol> <br />
These records are difficult to represent in the PSI-MITAB format because PSI-MITAB requires that each row (interaction) list two interactors. <br />
We are representing these interaction records using the following format to reflect the original format provided as closely as possible.<br />
<ol><br />
<li>Interactions involving only one interactor. The uidA and uidB would be the same and the edge type would be 'Y' (column number 53 (edgetype)). Therefore, when ever there is an edge type 'Y' this means that this interaction involves only one protein (although the interaction is given as between two interactors), and thus column number 54 (numParticipants) would always be 1. For example:<br />
<pre>{A - A, edge type 'Y', numParticipants=1}</pre></li><br />
<li>When the interaction is described as involving two interactors but both of them refer to the same protein. This would be represented as a normal binary interaction and would have the edge type = 'X' (column number 53 (edgetype)), and thus column number 54 (numParticipants) would always be 2. For example:<br />
<pre>{A - A, edge type 'X', numParticipants=2}</pre></li><br />
<li>When the interaction is described as involving more than 2 interactors and all those interactors are referring to the same protein, a bi-partite representation will be used. The edge type would be 'C' (column number 53 (edgetype)). For example, with regard to complexes (a.k.a. n-ary data):<br />
<pre><br />
{C - A, edge type 'C', numParticipants=3<br />
C - A, edge type 'C', numParticipants=3<br />
C - A, edge type 'C', numParticipants=3}<br />
</pre></li><br />
</ol><br />
<br />
We draw extra attention to the fact that the RIGID (column number 35 (Checksum_Interaction)) for these interactions will be the SHA-1 digest of the ROGIDs for each of the distinct subunit types (see columns 33 (Checksum_A) and 34 (Checksum_B)). Thus interactions involving 1, 2 or more subunits of the same protein would all have the same RIGID.<br />
<br />
=== Keys for grouping together redundant interactors and interactions ===<br />
<br />
A number of keys are provided in this file to help users group together rows that all provide evidence for some kind of interaction between the same set (or a related set) of proteins. See columns 33-35 (Checksum_A, Checksum_B and Checksum_Interaction) and 43-51 (integer identifier and canonical data columns).<br />
<br />
The process of creating keys that group proteins and interactions into canonical groups was described after the original paper in the [[Canonicalization]] document. <br />
<br />
=== Provenance data ===<br />
<br />
Provenance data (where we retrieved source records from and how we mapped interactors and interactions to ROGIDs) is described in columns 37-42 (original and final references plus mapping scores).<br />
<br />
== License ==<br />
<br />
Data released on this public ftp site are released under the Creative <br />
Commons Attribution License http://creativecommons.org/licenses/by/2.5/. <br />
This means that you are free to use, modify and redistribute these data <br />
for personal or commercial use so long as you provide appropriate <br />
credit. See next section.<br />
<br />
<br />
Copyright © 2008-2011 Ian Donaldson<br />
<br />
== Citation ==<br />
<br />
Credit should include citing the iRefIndex paper (PMID 18823568) and any of the <br />
source databases upon which this resource is based. See <br />
http://irefindex.uio.no for appropriate citations.<br />
<br />
== Disclaimer ==<br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY <br />
WARRANTY; without even the implied warranty of MERCHANTABILITY or <br />
FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
== Description of PSI-MITAB2.6 file ==<br />
<br />
Each line in this file represents a single source database record that supports either:<br />
<br />
# an interaction between two proteins (binary interaction) or<br />
# the membership of a protein in some complex (complex membership) or<br />
# an interaction that involves only one protein type (multimer or self-interaction).<br />
<br />
=== Column number: 1 (uidA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Unique identifier for interactor A. <br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column contains an identifier, taken from a major database, for a protein representing the interactor A. A UniProt or a RefSeq accession is provided (in that order of preference) wherever possible. See column 3 for a list of prefixes that may be employed in this column in addition to the following:<br />
<br />
;<tt>complex</tt><br />
:If interactor A is being used to represent a complex, then the rogid for the complex will be listed here, such as the following:<br />
<br />
<pre>complex:xBr9cTXgzPLNxsaKiYyHcoEm/DM</pre><br />
<br />
[[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for an explanation.<br />
<br />
In rare cases, a rogid may appear here if a protein interactor has a sequence but no known, valid ''<tt>database:accession</tt>'' pair.<br />
<br />
=== Column number: 2 (uidB)===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Unique identifier interactor B.<br />
|-<br />
|Example: ||<pre>uniprotkb:P06722</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 1.<br />
<br />
=== Column number: 3 (altA)===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Alternative identifiers for interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367|refseq:NP_418591|entrezgene/locuslink:948691|rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333|irogid:12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
All ''<tt>database:accession</tt>'' pairs listed in Column 3 point to protein records that describe the exact same sequence from the same taxon.<br />
<br />
Each pipe-delimited entry is a database_name:accession pair delimited by a colon. Database names are taken from the MI controlled vocabulary at the following location:<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
Database references listed in this column may include the following:<br />
<br />
;<tt>uniprotkb</tt><br />
:The accessions this protein is known by in UniProt (http://www.uniprot.org/). More information regarding this protein can be retrieved using this accession from UniProt. See the AC line in the flat file. http://au.expasy.org/sprot/userman.html#AC_line. <br />
;<tt>refseq</tt><br />
:If a protein accession exists in the RefSeq data base (http://www.ncbi.nlm.nih.gov/RefSeq/) that reference is indicated here. More information about this protein can be obtained from RefSeq using this accession. <br />
;<tt>entrezgene/locuslink</tt><br />
:NCBI gene Identifiers for the gene encoding this protein. See ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq column GeneID given protein's accession.version<br />
;<tt><em>other</em></tt><br />
:If none of the three identifier types are available then other <tt><em>databasename</em>:<em>accession</em></tt> pairs will be listed. These database names may not follow the MI controlled vocabulary.<br />
<br />
Example:<br />
<br />
<pre>emb:CAA44868.1|gb:AAA23715.1|gb:AAB02995.1|emb:CAA56736.1|uniprot:P24991</pre><br />
<br />
;<tt>rogid</tt><br />
:Column 33 repeated here for convenience.<br />
<br />
;<tt>irogid</tt><br />
:Column 43 repeated here for convenience.<br />
<br />
{{Note|<br />
The rogid of a complex or a n-ary interaction is the rigid of that <br />
interaction. However, the irogid of the complex is not the irigid.<br />
The irogid for the complex is an integer and it is non-overlapping <br />
with any protein irogids<br />
}}<br />
<br />
=== Column number: 4 (altB)===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Alternative identifiers for interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P06722|refseq:NP_417308|entrezgene/locuslink:947299</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 3. (Columns 34 and 44 are related to this column.)<br />
<br />
=== Column number: 5 (aliasA) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Aliases for interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:MUTL_ECOLI|entrezgene/locuslink:mutL|crogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333|icrogid:12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each pipe-delimited entry is a <tt><em>database name</em>:<em>alias</em></tt> pair delimited by a <br />
colon. Database names are taken from the PSI-MI controlled vocabulary <br />
at the following location:<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
Database names and sources listed in this column may include the following:<br />
<br />
;<tt>uniprotkb:<em>entry name</em></tt><br />
:the entry name given by UniProt. See the description for "Entry name" in the section of http://au.expasy.org/sprot/userman.html#ID_line concerning the "ID (IDentification)" line of the flat file<br />
;<tt>entrezgene/locuslink:<em>symbol</em></tt><br />
:the NCBI gene symbol for the gene encoding this protein. See the section in ftp://ftp.ncbi.nlm.nih.gov/gene/README for <tt>gene_info</tt>, specifically details for the <tt>Symbol</tt> column<br />
;<tt>crogid</tt><br />
:Column 46 repeated here for convenience.<br />
;<tt>icrogid</tt><br />
:Column 49 repeated here for convenience.<br />
;<tt>other db:accession pairs</tt><br />
:Other db:accession pairs may be added (after icrogid) that all belong to the same canonical group. These are purely meant to facilitate look-up by PSICQUIC and other services - these sequences are related (but not identical) with interactor A sequence.<br />
;<tt>NA</tt><br />
:<tt>NA</tt> may be listed here if aliases are <em>not available</em><br />
<br />
=== Column number: 6 (aliasB) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Aliases for interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:MUTH_ECOLI|entrezgene/locuslink:mutH</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 5. (Columns 47 and 50 are related to this column.)<br />
<br />
=== Column number: 7 (Method) ===<br />
<br />
{|<br />
|Column type: ||String <br />
|-<br />
|Description: ||Interaction detection method<br />
|-<br />
|Example: ||<pre>MI:0039(2h fragment pooling)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
Only a single method will appear in this column. Previously, multiple methods appeared.<br />
}}<br />
<br />
Both the controlled vocabulary term identifier for the method (e.g. MI:0399) and the controlled vocabulary term short label in brackets (e.g. 2h fragment pooling) will appear in this column. See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to look up controlled vocabulary term identifiers.<br />
<br />
The interaction detection method is from the original record. Path for PSI-MI 2.5:<br />
<br />
<pre>entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel/</pre><br />
<br />
<br />
{{Note|<br />
If a controlled vocabulary term identifier was not provided by the source database then an attempt was made to use the supplied short label to find the correct term identifier. If a term identifier could not be found, then <tt>MI:0000</tt> will appear before the shortLabels.<br />
}}<br />
<br />
<tt>NA</tt> or <tt>-1</tt> may appear in place of a recognised shortLabel.<br />
<br />
For example:<br />
<br />
<pre><br />
MI:0000(-1)<br />
MI:0000(NA)<br />
</pre><br />
<br />
=== Column number: 8 (author) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||<br />
|-<br />
|Example: ||<pre>hall-1999-1|hall-1999-2|mansour-2001-1|mansour-2001-2|hall-1999</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
According to MITAB2.6 format this column should contain a pipe-delimited list of author surnames in which the interaction has been shown.<br />
<br />
{{Note|<br />
This column will usually include only one author name reference. However, some experimental evidences have secondary references which could be included here.<br />
This filed also includes references which are not author names as in the following examples:<br />
* OPHID Predicted Protein Interaction<br />
* HPRD Text Mining Confirmation<br />
* MINT Text Mining Confirmation<br />
}}<br />
<br />
=== Column number: 9 (pmids) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||PubMed Identifiers<br />
|-<br />
|Example: ||<pre>pubmed:9880500|pubmed:11585365</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is a non-redundant list of PubMed identifiers pointing to literature that supports the interaction. <br />
According to MITAB2.6 format, this column should contain a pipe-delimited set of <tt><em>databaseName</em>:<em>identifier</em></tt> pairs such as <tt>pubmed:12345</tt>.<br />
The source database name is always <tt>pubmed</tt>.<br />
<br />
{{Note|<br />
This column will usually include only one PubMed reference that describes where the experimental evidence is found. In some cases, secondary references are provided by the source database and will be included here.<br />
}}<br />
<br />
<br />
The special value <tt>-</tt> may appear in place of the identifiers.<br />
<br />
=== Column number: 10 (taxa) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Taxonomy identifier for canonical interactor A<br />
|-<br />
|Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre><br />
<br />
|}<br />
<br />
'''Notes'''<br />
<br />
The NCBI taxonomy identifier listed here is that of the sequence record for the interactor and may be corrected from what was provided by the source database. See the methods section of the iRefIndex paper for more details. See also the NCBI taxonomy database at the following location:<br />
<br />
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy<br />
<br />
According to the MITAB2.6 format, this column should contain a pipe delimited set of <tt><em>databaseName</em>:<em>identifier</em></tt> pairs such as <tt>taxid:12345</tt>. The source database name has been listed as taxid since it is always NCBI's taxonomy database. The value in this column will be <tt>NA</tt> if the interactor is a complex.<br />
<br />
=== Column number: 11 (taxb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Taxonomy identifier for canonical interactor B<br />
|-<br />
|Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 10.<br />
<br />
=== Column number: 12 (interactionType) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Interaction Type from controlled vocabulary or short label<br />
|-<br />
|Example: ||<pre>MI:0218(physical interaction)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
Only one interaction type will be present in each line of the file.<br />
}}<br />
<br />
The interaction type is taken from the PSI-MI controlled vocabulary and represented as...<br />
<br />
<pre>database:identifier(interaction type)</pre><br />
<br />
...(when available in the interaction record) or Path for PSI-MI 2.5:<br />
<br />
<pre>entrySet/entry/interactionList/interaction/interactionType/names/shortLabel</pre><br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for interaction types.<br />
<br />
{{Note|<br />
If the MI controlled vocabulary identifier was not provided by the source database, but a text description was provided, then an attempt was made to map the text to the correct controlled vocabulary term identifier.<br />
If this was not possible then <tt>MI:0000</tt> is listed.<br />
|Change}}<br />
<br />
<tt>NA</tt> may be listed here if the interaction type is not available (meaning that we could not find the interaction type in the record provided by the source database).<br />
<br />
=== Column number: 13 (sourcedb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Source database for this interaction record <br />
|-<br />
|Example: ||<pre>MI:0469(intact)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Taken from the PSI-MI controlled vocabulary and represented as...<br />
<br />
<pre>database:identifier(source name)</pre><br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for database sources.<br />
<br />
{{Note|<br />
Only one source database will be listed in each row.<br />
|Change}}<br />
<br />
=== Column number: 14 (interactionIdentifier) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||source interaction-database and accession<br />
|-<br />
|Example: ||<pre>intact:EBI-761694|rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA|irigid:1234|edgetype:X</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each reference is presented as a <tt><em>database name</em>:<em>identifier</em></tt> pair. <br />
<br />
{{Note|<br />
The source database is listed first. Additional information is pipe-delimited and presented here for the convenience of PSICQUIC web-service users (these services presently truncate this file at column 15 as they only support MITAB2.5). See columns 35,45,53. <br />
|Change}}<br />
<br />
The source database names that appear in this column are taken from the<br />
PSI-MI controlled vocabulary at the following location (where possible):<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
If an interaction record identifier is not provided by the source database, this entry will appear as <tt><em>database-name</em>:-</tt> with the identifier region replaced with a dash (<tt>-</tt>).<br />
<br />
=== Column number: 15 (confidence) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Confidence scores<br />
|-<br />
|Example: ||<pre>lpr:1|hpr:12|np:1|PSICQUIC entries are truncated here. See irefindex.uio.no</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each reference is presented as a <tt>''scoreName'':''score''</tt> pair. Three confidence <br />
scores are provided: <tt>lpr</tt>, <tt>hpr</tt> and <tt>np</tt>.<br />
<br />
PubMed Identifiers (PMIDs) point to literature references that support <br />
an interaction. A PMID may be used to support more than one interaction. <br />
<br />
The lpr score (lowest PMID re-use) is the lowest number of distinct <br />
interactions (RIGIDs: see column 35) that any one PMID (supporting the <br />
interaction in this row) is used to support. A value of one indicates <br />
that at least one of the PMIDs supporting this interaction has never <br />
been used to support any other interaction. This likely indicates that <br />
only one interaction was described by that reference and that the <br />
present interaction is not derived from high throughput methods.<br />
<br />
The hpr score (highest PMID re-use) is the highest number of <br />
interactions (RIGIDs: see column 35) that any one PMID (supporting the <br />
interaction in this row) is used to support. A high value (e.g. greater <br />
than 50) indicates that one PMID describes at least 50 other <br />
interactions and it is more likely that high-throughput methods were <br />
used.<br />
<br />
The np score (number PMIDs) is the total number of unique PMIDs used to <br />
support the interaction described in this row.<br />
<br />
<tt>-</tt> may appear in the score field, indicating the absence of a score value.<br />
<br />
----<br />
<br />
{{Note|<br />
COLUMNS PAST THIS POINT (16 - 31) ARE PART OF THE NEW PSI-MITAB 2.6 FORMAT<br />
|Note}}<br />
<br />
=== Column number: 16 (expansion) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Model used to convert n-ary data into binary data for purpose of export in MITAB file<br />
|-<br />
|Example: ||<pre>bipartite</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
For iRefIndex, this column will always contain either <tt>bipartite</tt> or <tt>none</tt>.<br />
<br />
Other databases may use either <tt>spoke</tt> or <tt>matrix</tt> or <tt>none</tt> in this column.<br />
<br />
See <br />
[[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for an explanation.<br />
<br />
=== Column number: 17 (biological_role_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Biological role of interactor A<br />
|-<br />
|Example: ||<pre>MI:0501(enzyme)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
When provided by the source database, this includes single entries such as <tt>MI:0501(enzyme)</tt>, <tt>MI:0502(enzyme target)</tt>, <tt>MI:0580(electron acceptor)</tt>, or <tt>MI:0499(unspecified role)</tt>.<br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to browse possible values for biological role.<br />
<br />
For complexes and when no role is specified this column will indicate this with <tt>MI:0000(unspecified)</tt>.<br />
<br />
=== Column number: 18 (biological_role_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Biological role of interactor B<br />
|-<br />
|Example: ||<pre>MI:0501(enzyme)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 17.<br />
<br />
=== Column number: 19 (experimental_role_A) ===<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Indicates the experimental role of the interactor (such as bait or prey). <br />
|-<br />
|Example: ||<pre>MI:0496(bait)</pre><br />
|-<br />
|Example: ||<pre>MI:0498(prey)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column indicates the experimental role (if any was provided by the source database) that was played by interactor A.<br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to see definitions of bait and prey.<br />
as well as browse other possible values of experimental role that may appear in this column for other databases.<br />
<br />
For complexes and when no role is specified this column will contain the following:<br />
<br />
<pre>MI:0499(unspecified role)</pre><br />
<br />
=== Column number: 20 (experimental_role_B) ===<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Indicates the experimental role of the interactor (such as bait or prey). <br />
|-<br />
|Example: ||<pre>MI:0496(bait)</pre><br />
|-<br />
|Example: ||<pre>MI:0498(prey)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column indicates the experimental role (if any) that was played by interactor B.<br />
<br />
See notes above for column 19.<br />
<br />
=== Column number: 21 (interactor_type_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||describes the type of molecule that A is <br />
|-<br />
|Example: ||<pre>MI:0326(protein)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
For iRefIndex, this will always be one of...<br />
<br />
<pre><br />
MI:0326(protein)<br />
MI:0315(protein complex)<br />
</pre><br />
<br />
=== Column number: 22 (interactor_type_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||describes the type of molecule that B is <br />
|-<br />
|Example: ||<pre>MI:0326(protein)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See column 21.<br />
<br />
=== Column number: 23 (xrefs_A) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for molecule A <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list cross-references to annotation information for molecule A. For example, Gene Ontology identifiers or OMIM identifiers.<br />
<pre>omim:152430(longevity)|go:"GO:0016233"(telomere capping)</pre><br />
<br />
=== Column number: 24 (xrefs_B) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for molecule B <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
See notes to column 23.<br />
<br />
=== Column number: 25 (xrefs_Interaction) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for the interaction <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list cross-references to annotation information for the interaction. For example, Gene Ontology identifiers or OMIM identifiers.<br />
<pre>go:"GO:0048786"(presynaptic active zone)</pre><br />
<br />
=== Column number: 26 (Annotations_A) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Annotations for molecule A <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list free-text annotation information for the interaction. For example:<br />
<pre>This protein has an apparent MW of 25 kDa|This protein binds 7 zinc molecules</pre><br />
<br />
Some databases may use <tt>dataset:<em>*</em></tt> or <tt>data-processing:<em>*</em></tt> (where <tt>*</tt> is non-controlled free-text) in this column.<br />
<br />
=== Column number: 27 (Annotations_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Annotations for molecule B <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
See notes to column 26.<br />
<br />
=== Column number: 28 (Annotations_Interaction) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Annotations for interaction <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list free-text annotation information for the interaction. For example:<br />
<pre>figure-legend:F1A|prediction score:432|comment:prediction based on phage display consensus|author-confidence:8|comment:AD-ORFeome library used in the experiment.</pre><br />
The prefixes used before the <tt>:</tt> (like "comment") are database specific and not controlled.<br />
<br />
Some databases may use ''<tt>dataset:*</tt>'' or ''<tt>data-processing:*</tt>'' (where <tt>*</tt> is non-controlled free-text) in this column.<br />
<br />
=== Column number: 29 (Host_organism_taxid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||The taxonomy identifier of the host organism where the interaction was experimentally demonstrated<br />
|-<br />
|Example: || <pre>taxid:10090(Mus musculus)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This may differ from the taxonomy identifier associated with the interactors. Other possible entries are: <br />
<br />
* <tt>taxid:-1(in vitro)</tt><br />
* <tt>taxid:-4(in vivo)</tt><br />
<br />
A dash (<tt>-</tt>) will be used when no information about the host organism is available.<br />
<br />
<tt>taxid:32644(unidentified)</tt> will be used when the source specifies the host organism taxonomy identifier as 32644.<br />
<br />
=== Column number: 30 (parameters_Interaction) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Parameters for the interaction<br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
Internal note : use of this column is not well-defined or characterized.<br />
<br />
=== Column number: 31 (Creation_date) ===<br />
<br />
{|<br />
|Column type: ||String (yyyy/mm/dd)<br />
|-<br />
|Description: ||When was the entry created.<br />
|-<br />
|Example: || <pre>2010/05/06</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This will be the release date of iRefIndex for all entries in this file. <br />
<br />
This date will not match the date for the corresponding record in the source database.<br />
<br />
=== Column number: 32 (Update_date) ===<br />
<br />
{|<br />
|Column type: ||String (yyyy/mm/dd)<br />
|-<br />
|Description: ||When was this record last updated?<br />
|-<br />
|Example: || <pre>2010/05/06</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This will be the release date of iRefIndex for all entries in this file. <br />
<br />
This date will not match the date for the corresponding record in the source database.<br />
<br />
=== Column number: 33 (Checksum_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for interactor A. <br />
|-<br />
|Example: ||<pre>rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
This column contains a universal key for interactor A .<br />
|Note}}<br />
<br />
This column may be used to identify other interactors in this file that have the exact same amino acid sequence and taxon id. <br />
<br />
This universal key listed here is the ROGID (redundant object group identifier) described in the original iRefIndex paper, PMID 18823568. <br />
<br />
Column 3 lists database names and accessions that all have this same key. <br />
<br />
The ROGID for proteins, consists of the base-64 version of the SHA-1 key for the protein sequence concatenated with the taxonomy identifier for the protein. For complex nodes, the ROGID is calculated as the SHA-1 digest of the ROGIDs of all the protein participants (after first ordering them by ASCII-based lexicographical sorting in ascending order and concatenating them) See the iRefIndex paper for details. The SHA-1 key is always 27 characters long. So the ROGID will be composed of 27 characters concatenated with a taxonomy identifier for proteins.<br />
<br />
=== Column number: 34 (Checksum_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for interactor B. <br />
|-<br />
|Example: ||<pre>rogid:AhmYiMtz8lR12Gixt91txbAd3JY83333</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
See notes for column 33.<br />
<br />
=== Column number: 35 (Checksum_Interaction) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for this interaction<br />
|-<br />
|Example: ||<pre>rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column may be used to identify other rows (interaction records) in this file that describe interactions between the same set of proteins from the same taxon id.<br />
<br />
This universal key listed here is the RIGID (redundant interaction group identifier) described in the original iRefIndex paper, PMID 18823568. <br />
<br />
The RIGID consists of the ROG identifiers for each of the protein participants (see notes above) ordered by ASCII-based lexicographic sorting in ascending order, concatenated and then digested with the SHA-1 algorithm. See the iRefIndex paper for details. This identifier points to a set of redundant protein-protein interactions that involve the same set of proteins with the exact same primary sequences.<br />
<br />
=== Column number: 36 (Negative) ===<br />
<br />
{|<br />
|Column type: || Boolean (true or false)<br />
|-<br />
|Description: ||Does the interaction record provide evidence that some interaction does NOT occur.<br />
|-<br />
|Example: ||<pre>false</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This value will be false for all lines in this file since iRefIndex does not include "negative" interactions from any of the source databases.<br />
<br />
<hr><br />
<br />
{{Note|<br />
COLUMNS PAST THIS POINT (37 -) ARE NOT DEFINED BY THE PSI-MITAB2.6 STANDARD.<br />
THESE COLUMNS ARE SPECIFIC TO THIS IREFINDEX RELEASE AND MAY CHANGE FROM ONE RELEASE TO ANOTHER<br />
|Important}}<br />
<br />
=== Column number: 37 (OriginalReferenceA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used in the original interaction record to describe interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is the protein reference that was found in the original interaction record to describe interactor A. It is a colon-delimited pair of database name and accession. It may be either the primary or secondary reference for the protein provided by the source database.<br />
<br />
For complexes this will be the ROGID of the complex.<br />
<br />
=== Column number: 38 (OriginalReferenceB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used in the original interaction record to describe interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 37.<br />
<br />
=== Column number: 39 (FinalReferenceA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used by iRefIndex to describe interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Column 37 (OriginalReferenceA) was used by the iRefIndex consolidation process to arrive at this FinalReferenceA. <br />
This database name and accession pair will usually be the same as that listed in column 37, unless the provided reference was malformed, had to be updated or was ambiguous.<br />
<br />
Examples:<br />
<br />
# The original reference is malformed. For example: <tt>RefSeq:NP 036076</tt> instead of <tt>RefSeq:NP_036076</tt>.<br />
# The original reference is incomplete. For example: <tt>PDB:1KQ1|</tt> (missing chain information). <br />
# The original reference is deprecated. For example: <tt>UniProt:Q9H233</tt> (the value of FinalReferenceA will be the latest available accession in this case).<br />
# The original reference is ambiguous. For example: a gene identifier is provided (the value of FinalReferenceA will be a protein product selected in a systematic way in this case).<br />
<br />
=== Column number: 40 (FinalReferenceB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used by iRefIndex to describe interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 39.<br />
<br />
=== Column number: 41 (MappingScoreA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (columns 37) to the final protein reference (columns 39). <br />
|-<br />
|Example: ||<pre>PTUO+</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column contains a description of mapping operations as a condensed string of letters. See the original iRefIndex paper, PMID 18823568. <br />
For complexes, this column will contain <pre>-</pre>.<br />
<br />
=== Column number: 42 (MappingScoreB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (column 38) to the final protein reference (column 40). <br />
|-<br />
|Example: ||<pre>SU</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 41.<br />
<br />
=== Column number: 43 (irogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for interactor A. <br />
|-<br />
|Example: ||<pre>2345</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
This is an internal, integer-equivalent of the alphanumeric identifier in column 33 for interactor A. All interactors with the same sequence and taxon origin will have the same irogid.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 44 (irogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for interactor B.<br />
|-<br />
|Example: ||<pre>456543</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 43.<br />
<br />
=== Column number: 45 (irigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer RIGID for this interaction.<br />
|-<br />
|Example: ||<pre>1234</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is an internal, integer-equivalent of the alphanumeric identifier in column 35 for this interaction. All interactions involving the same interactors (same sequence and same taxon) will have the same irigid.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 46 (crogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric ROGID for the canonical group to which interactor A belongs. <br />
|-<br />
|Example: ||<pre>hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column may be used to identify other interactors in this file that all belong to the same canonical group.<br />
<br />
Members of a canonical group may include splice isoform products from the same or related genes. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the same taxon). One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in this column.<br />
<br />
See http://irefindex.uio.no/wiki/Canonicalization for a description of canonicalization.<br />
<br />
=== Column number: 47 (crogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric ROGID for the canonical group to which interactor B belongs. <br />
|-<br />
|Example: ||<pre>AhmYiMtz8lR12Gixt91txbAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 46.<br />
<br />
=== Column number: 48 (crigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric RIGID for the canonical group to which this interaction belongs. <br />
|-<br />
|Example: ||<pre>3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is the RIGID for this interaction calculated using the canonical ROGIDs (preceding two columns).<br />
<br />
This column may be used to identify other interactions in this file that all belong to the same canonical group.<br />
<br />
<br />
=== Column number: 49 (icrogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for the canonical group to which interactor A belongs. <br />
|-<br />
|Example: ||<pre>2345</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
This is an internal, integer-equivalent of the alphanumeric canonical ROGID in column 46 for interactor A. Interactors with the same icrogid may have different sequences but are related; e.g. different splice isoforms of the same gene.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 50 (icrogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for the canonical group to which interactor B belongs. <br />
|-<br />
|Example: ||<pre>456543</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 49.<br />
<br />
=== Column number: 51 (icrigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer RIGID for the canonical group to which this interaction belongs. <br />
|-<br />
|Example: ||<pre>12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is an internal, integer-equivalent of the canonical RIGID. See column 48.<br />
<br />
This integer may be used to query the iRefWeb interface for the interaction record. For example:<br />
<br />
http://wodaklab.org/iRefWeb/interaction/show/13653<br />
<br />
...where 13653 is the integer, canonical RIGID.<br />
<br />
This identifier serves to group together evidence for interactions that involve the same set (or a related set) of proteins.<br />
<br />
Starting with release 6.0, this canonical RIGID is stable from one release of iRefIndex to another.<br />
<br />
=== Column number: 52 (imex_id) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||IMEx identifier if available<br />
|-<br />
|Example: ||<pre>imex:IM-12202-3</pre><br />
|-<br />
|Example: ||<pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
When no information available a dash (<tt>-</tt>) will be used.<br />
<br />
=== Column number: 53 (edgetype) ===<br />
<br />
{|<br />
|Column type: ||Character<br />
|-<br />
|Description: ||Does the edge represent a binary interaction (X), member of complex (C) data, or a multimer (Y)?<br />
|-<br />
|Example: ||<pre>X</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Edges can be labelled as either <tt>X</tt>, <tt>C</tt> or <tt>Y</tt>:<br />
<br />
;<tt>X</tt><br />
:a binary interaction with two protein participants<br />
<br />
;<tt>C</tt><br />
:denotes that this edge is a binary expansion of interaction record that had 3 or more interactors (so-called "complex" or "n-ary" data). The expansion type is described in column 16 (expansion). In the case of iRefIndex, the expansion is always "bipartite" meaning that Interactor A of this row represents the complex itself and Interactor B represents a protein that is a member of this group.<br />
See [[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for further explanation.<br />
<br />
;<tt>Y</tt><br />
:for dimers and polymers. In case of dimers and polymers when the number of subunits is not described in the original interaction record, the edge is labelled with a <tt>Y</tt>. Interactor A will be identical to the Interactor B. The graphical representation of this will appear as a single node connected to itself (loop). The actual number of self-interacting subunits may be 2 (dimer) or more (say 5 for a pentamer). Refer to the original interaction record for more details and see column 54.<br />
<br />
=== Column number: 54 (numParticipants) ===<br />
<br />
{|<br />
|Column type: ||Integer<br />
|-<br />
|Description: ||Number of participants in the interaction<br />
|-<br />
|Example: ||<pre>2</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
* For edges labelled <tt>X</tt> (see column 53) this value will be two. <br />
* For edges labelled <tt>C</tt>, this value will be equivalent to the number of protein interactors in the original n-ary interaction record.<br />
* For interactions labelled <tt>Y</tt>, this value will either be the number of self-interacting subunits (if present in the original interaction record) or 1 where the exact number of subunits is unknown or unspecified.<br />
<br />
{{Note|<br />
The number of participants can be greater than the number of distinct proteins involved in an interaction because a single protein can participate more than once in an interaction. Such participation is enumerated and counted to produce the value in this column.<br />
|Important}}<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Introductory_Perl&diff=4090
Introductory Perl
2012-05-09T11:20:56Z
<p>PaulBoddie: Added some command prompt and notepad notes.</p>
<hr />
<div>This introduction to Perl is taught as part of [[Bioinformatics_course|MBV-INF4410/9410]] and [http://www.uio.no/studier/emner/matnat/molbio/MBV3070/ MBV3070].<br />
<br />
Slides for the "Introduction to Perl" lectures:<br />
<br />
* [[Image:Lecture_1_-_What_Perl_can_do.ppt]]<br />
* [[Image:Lecture_2_-_More_What_(Bio)Perl_can_do.ppt]]<br />
* [[Perl example code]]<br />
<br />
== Command Prompt and Notepad ==<br />
<br />
The command window or "Command Prompt" is found by opening the "Start" menu, choosing "Programs", then "Accessories", and then selecting "Command Prompt".<br />
<br />
When you start the Command Prompt, you will be able to look at the contents of a particular folder or directory, and the commands you use will do their work within that folder. The folder you start in will probably be your home folder.<br />
<br />
Notepad is found in the same menu, but it can be easier to open Notepad by typing the <tt>notepad</tt> command in the Command Prompt window like this:<br />
<br />
notepad example1.plx<br />
<br />
This makes sure that the new file is created in the right place. In this case, it is called <tt>example1.plx</tt>. If you started Notepad from the menu, when you save the file you have to find your way to where your programs need to be stored, and that can take some time.<br />
<br />
== DOS Commands ==<br />
<br />
The Command Prompt uses a different set of commands to those you find in Perl: these commands are known as DOS commands.<br />
<br />
You will need to know how to make your way around in a DOS command environment.<br />
Useful sites that may be useful to read before/during the exercises are:<br />
<br />
* [http://en.wikipedia.org/wiki/List_of_DOS_commands List of DOS commands]<br />
<br />
Especially these:<br />
<br />
{| cellspacing="0" cellpadding="10" border="1" style="margin: 2em"<br />
| dir<br />
| Shows the contents of the current directory (folder)<br />
|-<br />
| mkdir ''name''<br />
| Makes a new directory called ''name''<br />
|-<br />
| cd ''name''<br />
| Enters the directory called ''name''<br />
|-<br />
| cd ..<br />
| Goes up and out of the current directory<br />
|-<br />
| copy ''source'' ''target''<br />
| Makes a copy of ''source'' with the name ''target'', or makes a copy of ''source'' with the same name inside a directory called ''target'' (if present)<br />
|-<br />
| move ''source'' ''target''<br />
| Moves ''source'' inside a directory called ''target'' (if present)<br />
|-<br />
| rename ''oldname'' ''newname''<br />
| Renames ''oldname'' to ''newname''<br />
|-<br />
| del ''name''<br />
| Deletes the file called ''name'' (be careful!)<br />
|-<br />
| cls<br />
| Clears the console/terminal<br />
|-<br />
| echo %PATH%<br />
| Shows the list of places where the system looks for programs<br />
|-<br />
| Ctrl+C (holding down Ctrl and C)<br />
| Terminate a program - useful if you have accidentally programmed an infinite loop that you want to stop<br />
|}<br />
<br />
== Unix Commands ==<br />
<br />
If you are using GNU/Linux or Mac OS X, you will be using different commands:<br />
<br />
{| cellspacing="0" cellpadding="10" border="1" style="margin: 2em"<br />
| ls<br />
| Shows the contents of the current directory (folder)<br />
|-<br />
| mkdir ''name''<br />
| Makes a new directory called ''name''<br />
|-<br />
| cd ''name''<br />
| Enters the directory called ''name''<br />
|-<br />
| cd ..<br />
| Goes up and out of the current directory<br />
|-<br />
| cd<br />
| Goes to your home directory<br />
|-<br />
| cp ''source'' ''target''<br />
| Makes a copy of ''source'' with the name ''target'', or makes a copy of ''source'' with the same name inside a directory called ''target'' (if present)<br />
|-<br />
| mv ''source'' ''target''<br />
| Renames ''source'' to ''target'', or moves ''source'' inside a directory called ''target'' (if present)<br />
|-<br />
| rm ''name''<br />
| Deletes the file called ''name'' (be careful!)<br />
|-<br />
| clear<br />
| Clears the console/terminal<br />
|-<br />
| echo $PATH<br />
| Shows the list of places where the system looks for programs<br />
|-<br />
| Ctrl+C (holding down Ctrl and C)<br />
| Terminate a program - useful if you have accidentally programmed an infinite loop that you want to stop<br />
|}</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Introductory_Perl&diff=4089
Introductory Perl
2012-05-09T07:41:48Z
<p>PaulBoddie: Added MBV3070.</p>
<hr />
<div>This introduction to Perl is taught as part of [[Bioinformatics_course|MBV-INF4410/9410]] and [http://www.uio.no/studier/emner/matnat/molbio/MBV3070/ MBV3070].<br />
<br />
Slides for the "Introduction to Perl" lectures:<br />
<br />
* [[Image:Lecture_1_-_What_Perl_can_do.ppt]]<br />
* [[Image:Lecture_2_-_More_What_(Bio)Perl_can_do.ppt]]<br />
* [[Perl example code]]<br />
<br />
== DOS Commands ==<br />
<br />
You will need to know how to make your way around in a DOS command environment.<br />
Useful sites that may be useful to read before/during the exercises are:<br />
<br />
* [http://en.wikipedia.org/wiki/List_of_DOS_commands List of DOS commands]<br />
<br />
Especially these:<br />
<br />
{| cellspacing="0" cellpadding="10" border="1" style="margin: 2em"<br />
| dir<br />
| Shows the contents of the current directory (folder)<br />
|-<br />
| mkdir ''name''<br />
| Makes a new directory called ''name''<br />
|-<br />
| cd ''name''<br />
| Enters the directory called ''name''<br />
|-<br />
| cd ..<br />
| Goes up and out of the current directory<br />
|-<br />
| copy ''source'' ''target''<br />
| Makes a copy of ''source'' with the name ''target'', or makes a copy of ''source'' with the same name inside a directory called ''target'' (if present)<br />
|-<br />
| move ''source'' ''target''<br />
| Moves ''source'' inside a directory called ''target'' (if present)<br />
|-<br />
| rename ''oldname'' ''newname''<br />
| Renames ''oldname'' to ''newname''<br />
|-<br />
| del ''name''<br />
| Deletes the file called ''name'' (be careful!)<br />
|-<br />
| cls<br />
| Clears the console/terminal<br />
|-<br />
| echo %PATH%<br />
| Shows the list of places where the system looks for programs<br />
|-<br />
| Ctrl+C (holding down Ctrl and C)<br />
| Terminate a program - useful if you have accidentally programmed an infinite loop that you want to stop<br />
|}<br />
<br />
== Unix Commands ==<br />
<br />
If you are using GNU/Linux or Mac OS X, you will be using different commands:<br />
<br />
{| cellspacing="0" cellpadding="10" border="1" style="margin: 2em"<br />
| ls<br />
| Shows the contents of the current directory (folder)<br />
|-<br />
| mkdir ''name''<br />
| Makes a new directory called ''name''<br />
|-<br />
| cd ''name''<br />
| Enters the directory called ''name''<br />
|-<br />
| cd ..<br />
| Goes up and out of the current directory<br />
|-<br />
| cd<br />
| Goes to your home directory<br />
|-<br />
| cp ''source'' ''target''<br />
| Makes a copy of ''source'' with the name ''target'', or makes a copy of ''source'' with the same name inside a directory called ''target'' (if present)<br />
|-<br />
| mv ''source'' ''target''<br />
| Renames ''source'' to ''target'', or moves ''source'' inside a directory called ''target'' (if present)<br />
|-<br />
| rm ''name''<br />
| Deletes the file called ''name'' (be careful!)<br />
|-<br />
| clear<br />
| Clears the console/terminal<br />
|-<br />
| echo $PATH<br />
| Shows the list of places where the system looks for programs<br />
|-<br />
| Ctrl+C (holding down Ctrl and C)<br />
| Terminate a program - useful if you have accidentally programmed an infinite loop that you want to stop<br />
|}</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=MBV3070&diff=4088
MBV3070
2012-05-09T07:41:35Z
<p>PaulBoddie: Changed the page to a redirect, since there isn't any other content.</p>
<hr />
<div>#REDIRECT [[Introductory Perl]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Sources_iRefIndex_9.0&diff=4086
Sources iRefIndex 9.0
2012-04-20T11:13:41Z
<p>PaulBoddie: /* Interaction related resources */ Fixed InnateDB link.</p>
<hr />
<div>Last edited: {{REVISIONYEAR}}-{{padleft:{{REVISIONMONTH}}|2}}-{{REVISIONDAY2}}<br />
<br />
Applies to iRefIndex release: 9.0<br />
<br />
Release date: 2011-11-07<br />
<br />
Authors: Ian Donaldson, Sabry Razick and Paul Boddie<br />
<br />
Database: iRefIndex (http://irefindex.uio.no)<br />
<br />
Organization: Biotechnology Centre of Oslo, University of Oslo (http://www.biotek.uio.no/)<br />
<br />
Description: This file lists interaction and protein sequence related resources used for the current build of the iRefIndex.<br />
Statistics for the iRefIndex are available and include a breakdown of interactors and interactions from each data source.<br />
*For statistics on the full dataset please refer to: http://irefindex.uio.no/wiki/Statistics_iRefIndex_9.0<br />
<br />
== Interaction related resources ==<br />
<br />
{| {{table}} cellpadding="10" cellspacing="0" border="1"<br />
| align="center" style="background:#f0f0f0;"|'''Source'''<br />
| align="center" style="background:#f0f0f0;"|'''Format'''<br />
| align="center" style="background:#f0f0f0;"|'''Location'''<br />
| align="center" style="background:#f0f0f0;"|'''Version (date)'''<br />
|-<br />
| BIND ||Tab-delimited text file.||ftp://ftp.bind.ca/pub/BIND/data/bindflatfiles/bindindex/ (no longer available - see below). <br />
<br />
20050525.complex2refs.txt <br />
<br />
20050525.ints.txt <br />
<br />
20050525.refs.txt <br />
<br />
20050525.complexes.txt <br />
<br />
20050525.labels.txt <br />
<br />
20050525.complex2subunits.txt <br />
<br />
These file are no longer available via ftp but are available from the authors. BIND archival content is now managed by Thomson Scientific. See http://bond.unleashedinformatics.com/ and http://bond.unleashedinformatics.com/downloads/data/BIND/ <br />
<br />
For historical purposes, a snapshot of the the Blueprint web-site may be viewed at...<br />
<br />
http://web.archive.org/web/20050204013426/www.blueprint.org/index.html<br />
<br />
...via the internet archive at...<br />
<br />
http://web.archive.org/web/*/http://www.blueprint.org<br />
<br />
| 2005-05-25<br />
|-<br />
| BIND Translation ||PSI-MI 2.5||http://download.baderlab.org/BINDTranslation/release1_0/BINDTranslation_v1_xml_AllSpecies.tar.gz ||Version 1.0 (2010-12-15)<br />
|-<br />
| BioGRID||PSI-MI 2.5||http://thebiogrid.org/downloads/archives/Release%20Archive/BIOGRID-3.1.81/BIOGRID-ALL-3.1.81.psi25.zip ||Version 3.1.81 (2011-10-01)<br />
|-<br />
| CORUM||PSI-MI 2.5||http://mips.gsf.de/genre/proj/corum/index.html<br>http://mips.gsf.de/genre/export/sites/default/corum/allComplexes.psimi.zip || 2009-12-02<br />
|-<br />
| DIP||PSI-MI 2.5||http://dip.doe-mbi.ucla.edu/dip/Download.cgi<br />
<br>dip20101010.mif25<br />
<br>Note: date on last IMEx release file is from 2008<br />
| 2010-10-10<br />
|-<br />
| HPRD ||PSI-MI 2.5||http://www.hprd.org/download<br>HPRD_PSIMI_041310.tar.gz||Release 9 (2010-04-13)<br />
|-<br />
| IntAct ||PSI-MI 2.5||ftp://ftp.ebi.ac.uk/pub/databases/intact/2011-09-29/psi25/pmidMIF25.zip|| 2011-09-29<br />
|-<br />
| MINT||PSI-MI 2.5|| ftp://mint.bio.uniroma2.it/pub/release/psi/current/psi25/pmid/|| 2010-12-21<br />
|-<br />
| MPACT||PSI-MI 2.5||ftp://ftpmips.gsf.de/yeast/PPI/mpact-complete.psi25.xml.gz || 2008-01-10<br />
|-<br />
| MPPI||PSI-MI 1.0||http://mips.gsf.de/proj/ppi/data/mppi.gz|| 2004-06-01 (from archive)<br />
|-<br />
| OPHID||PSI-MI 1.0||http://ophid.utoronto.ca/ophid/downloads.html (This service no longer available, please refer to http://ophid.utoronto.ca/ophidv2.201/)|| 2006-07-07<br />
|-<br />
| colspan="4" align="center" style="background:#f0f0f0;" | New for this release<br />
|-<br />
| InnateDB ||PSI-MI 2.5|| http://www.innatedb.org/download.jsp<br>Curated InnateDB Data ||2011-03-06<br />
|-<br />
| MPIDB||MITAB format file|| http://www.jcvi.org/mpidb (information)<br><br />
http://www.jcvi.org/mpidb/download.php (general downloads)<br><br />
http://www.jcvi.org/mpidb/interaction.php?dbsource=MPI-LIT (specific download for MPI-LIT)<br><br />
http://www.jcvi.org/mpidb/interaction.php?dbsource=MPI-IMEX (specific download for MPI-IMEX)<br />
|| Downloaded on 2011-10-03<br />
|-<br />
| MatrixDB||PSI-MI 2.5|| http://matrixdb.ibcp.fr/<br>MatrixDB_20100826.xml.zip || 2010-08-26 (timestamp)<br />
|}<br />
<br />
== Sequence related resources ==<br />
<br />
{| {{table}} cellpadding="10" cellspacing="0" border="1"<br />
| align="center" style="background:#f0f0f0;"|'''Source'''<br />
| align="center" style="background:#f0f0f0;"|'''Format'''<br />
| align="center" style="background:#f0f0f0;"|'''Location'''<br />
| align="center" style="background:#f0f0f0;"|'''Version (date)'''<br />
|-<br />
| SEGUID||Tab-delimited text ||ftp://bioinformatics.anl.gov/seguid/<br>seguidannotation||2007-07-24 (timestamp)<br />
|-<br />
| UniProt||Text||http://www.uniprot.org/downloads<br>UniProtKB/Swiss-Prot (uniprot_sprot.dat.gz)<br />
| rowspan="5" | UniProt Knowledgebase Release 2011_09 (2011-09-21) (Downloaded on 2011-10-04):<br>UniProtKB/Swiss-Prot <br>UniProtKB/TrEMBL <br>(from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt)<br />
|-<br />
| UniProt||Text|| http://www.uniprot.org/downloads<br>UniProtKB/TrEMBL (uniprot_trembl.dat.gz)<br />
|-<br />
| UniProt, IsoForms||FASTA||http://www.uniprot.org/downloads uniprot_sprot_varsplic.fasta.gz<br />
|-<br />
| UniProt, SGD||Tab-delimited text file.||http://www.expasy.org/cgi-bin/lists?yeast.txt<br>Yeast (Saccharomyces cerevisiae): entries, gene names and cross-references to SGD<br />
|-<br />
| UniProt, FLY||Tab-delimited text file.||http://www.expasy.org/cgi-bin/lists?fly.txt<br> Drosophila: entries, gene names and cross-references to FlyBase.<br />
|-<br />
| NCBI, RefSeq||GenPept||ftp://ftp.ncbi.nih.gov/refseq/release/complete<br>see *.protein.gpff.gz files||Release 49 (2011-09-09) (Downloaded on 2011-10-04)<br>(from http://www.ncbi.nlm.nih.gov/refseq/)<br />
|-<br />
| NCBI, MMDB/PDB||Tab-delimited text ||ftp://ftp.ncbi.nih.gov/mmdb/pdbeast/table|| (Downloaded on 2011-10-04)<br />
|-<br />
| NCBI, PDB sequences||FASTA||ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz||(Downloaded on 2011-10-03)<br />
|-<br />
| NCBI Gene2Refseq||Tab-delimited text||ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/<br>gene2refseq.gz||(Downloaded on 2011-10-04)<br />
|}<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Gene_Ontology_similarity_measurement&diff=4080
Gene Ontology similarity measurement
2012-02-29T18:01:09Z
<p>PaulBoddie: Attempted to make the descriptions more coherent, adding a discussion of observed term frequencies as the basis for the probability or specificity of a concept in an ontology.</p>
<hr />
<div>Some notes about measuring similarity of Gene Ontology terms and thus genes (and perhaps even proteins) on this basis.<br />
<br />
The starting point for this investigation is the paper [http://www.biomedcentral.com/1471-2105/7/302/ Schlicker et al., "A new measure for functional similarity of gene products based on Gene Ontology"] which in turn leads to the following papers:<br />
<br />
* [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.5277 Resnik (1995), "Using Information Content to Evaluate Semantic Similarity in a Taxonomy"]<br />
* [http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.38.6442 Resnik (1999), "Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language"]<br />
* [http://www.ncbi.nlm.nih.gov/pubmed/12835272?dopt=AbstractPlus&holding=f1000,f1000m,isrctn Lord et al., "Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation."]<br />
* [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2655092/?tool=pubmed Sheehan et al., "A relation based measure of semantic similarity for Gene Ontology annotations"]<br />
<br />
Resnik defines the information of each concept (or term) in a taxonomy (or ontology) using the notion of [http://en.wikipedia.org/wiki/Self-information information content] by stating that...<br />
<br />
# Each concept has a probability associated with it (defining the probability of "encountering an instance" of that concept).<br />
# Where a concept ''c<sub>specific</sub>'' is subsumed by ''c<sub>general</sub>'' (as in ''c<sub>specific</sub> is-a c<sub>general</sub>'') then the probability of encountering an instance of ''c<sub>specific</sub>'' is less than that of encountering an instance of ''c<sub>general</sub>''.<br />
# Where a single root concept exists, since it subsumes all possible concepts, the probability of encountering an instance of it is 1.<br />
# Since information content is defined as ''-log p(c)'' for a concept ''c'', less probable concepts have higher information content.<br />
<br />
The probability of each concept was defined by the cumulative frequency of all nouns subsumed by that concept divided by the total noun frequency of the corpus.<br />
<br />
== Applying Information Content to Communications ==<br />
<br />
Information content is often used to analyse or illustrate properties of communications representations as described in [http://www.cmh.edu/stats/model/InfoModel.htm these notes about information theory and data compression]. When deriving the information context, one first divides ''p(c)'' into 1 which appears to define the "granularity of the state space" or the number of distinct states required to represent the communication of an occurrence of ''c''. Taking the logarithm of this result (''log 1/p(c)'' and thus ''-log p(c)'') then defines the number of digits or bits (if a base-2 logarithm is used) required to encode such an outcome.<br />
<br />
Thus, if ''c'' is highly probable, occurring with ''p(c) = 0.5'' then ''-log p(c) = -(-1) = 1'', indicating that a single bit is enough to signal the presence of ''c'' in a signal - with a value of 1, say - whereas all other values would be encoded with an initial bit distinguishing them from ''c'' - therefore, with a value of 0 - and additional bits employed if necessary. This can be visualised using a tree:<br />
<br />
* ''c'' (''p(c) = 0.5'')<br />
* (not ''c'')<br />
** ''d'' (''p(d) = 0.25'')<br />
** (not ''d'')<br />
*** ''e'' (''p(e) = 0.15'')<br />
*** ''f'' (''p(f) = 0.1'')<br />
<br />
Clearly, in a communications context, the aim is to minimise the size of the message by favouring the most frequent values.<br />
<br />
== Returning to Concept Similarity ==<br />
<br />
An initial attempt to translate the notion of information content to concept similarity is to consider the specificity of ontology terms by first counting those subsumed by a particular term (including itself) ''n<sub>subtree</sub>'' and then dividing by the total number of terms ''n<sub>total</sub>'' to give the "coverage" of a particular term, subtracting this from 1 to give the specificity of a term. Obviously, this only considers features of the ontology itself and not external information such as the word frequencies used by Resnik, but "how specific a term is" is a familiar concept, at least, and one which upholds the concept hierarchy within the information content framework.<br />
<br />
To measure specificity more accurately, one might introduce frequency observations to the ontology terms, maintaining the general property that more general terms (such as ''c<sub>general</sub>'') subsume more specific terms (such as ''c<sub>1</sub>'', ''c<sub>2</sub>'', ...) such that each term's resultant frequency ''r'' is defined as...<br />
<br />
''r(c<sub>general</sub>) = r(c<sub>1</sub>) + r(c<sub>2</sub>) + ... + r(c<sub>n</sub>) + f(c<sub>general</sub>)''<br />
<br />
...where ''f'' is the observed frequency of the term itself. Since the resultant frequency of any given term includes contributions from the entire subtree of the ontology of which it is the root node, the hierarchical information encoded in the more naive approach is preserved.<br />
<br />
The remaining difficulty lies in defining what the "observed frequency" of a term is.<br />
<br />
== Concept Comparison ==<br />
<br />
When comparing two concepts, ''c1'' and ''c2'', Resnik refers to the set of concepts subsuming ''c1'' and ''c2'' which in a hierarchy will be the common ancestors of ''c1'' and ''c2''. Given a measure for each concept which assigns higher values for concepts further from the root of the hierarchy (more specific terms in an ontology consisting of ''is-a'' relationships directed towards the root), the common ancestor of ''c1'' and ''c2'' furthest from the root (the most specific common ancestor, or "lowest common ancestor (LCA)" according to Schlicker et al.) is likely to provide the highest scoring concept subsuming ''c1'' and ''c2''.<br />
<br />
[[Category:Bioscape]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Statistics_iRefIndex_9.0&diff=4079
Statistics iRefIndex 9.0
2012-02-16T13:45:16Z
<p>PaulBoddie: Used proper links.</p>
<hr />
<div>== Interactions available from major taxonomies ==<br />
<br />
=== Top 15 uncorrected taxonomy groups in iRefIndex (Taxonomy identifiers as they appear in original source)===<br />
{| {{table}}<br />
| align="center" style="background:#f0f0f0;"|''' NCBI taxonomy identifier'''<br />
| align="center" style="background:#f0f0f0;"|'''Scientific_name'''<br />
| align="center" style="background:#f0f0f0;"|'''Number_of_interactions'''<br />
|-<br />
| 559292 || Saccharomyces cerevisiae S288c || 216326<br />
|-<br />
| 9606 || Homo sapiens || 145971<br />
|-<br />
| 7227 || Drosophila melanogaster || 47389<br />
|-<br />
| 4932 || Saccharomyces cerevisiae || 46966<br />
|-<br />
| 40674 || Mammalia || 36307<br />
|-<br />
| 10090 || Mus musculus || 21487<br />
|-<br />
| 83333 || Escherichia coli K-12 || 17673<br />
|-<br />
| 4896 || Schizosaccharomyces pombe || 15493<br />
|-<br />
| 6239 || Caenorhabditis elegans || 14020<br />
|-<br />
| 197 || Campylobacter jejuni || 12028<br />
|-<br />
| 3702 || Arabidopsis thaliana || 9911<br />
|-<br />
| 10116 || Rattus norvegicus || 6917<br />
|-<br />
| 562 || Escherichia coli || 5366<br />
|-<br />
| 632 || Yersinia pestis || 3823<br />
|-<br />
| 243276 || Treponema pallidum subsp. pallidum str. Nichols || 3642<br />
|}<br />
<br />
Full list: [[File:iRefIndex9_taxonomy_summary.txt]]<br />
<br />
===Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)=== <br />
{| {{table}}<br />
| align="center" style="background:#f0f0f0;"|''' NCBI taxonomy identifier'''<br />
| align="center" style="background:#f0f0f0;"|'''Scientific_name'''<br />
| align="center" style="background:#f0f0f0;"|'''Number_of_interactions'''<br />
|-<br />
| 559292 || Saccharomyces cerevisiae S288c || 225644<br />
|-<br />
| 9606 || Homo sapiens || 156006<br />
|-<br />
| 7227 || Drosophila melanogaster || 47388<br />
|-<br />
| 10090 || Mus musculus || 18312<br />
|-<br />
| 83333 || Escherichia coli K-12 || 17408<br />
|-<br />
| 284812 || Schizosaccharomyces pombe 972h- || 15693<br />
|-<br />
| 6239 || Caenorhabditis elegans || 14020<br />
|-<br />
| 197 || Campylobacter jejuni || 12028<br />
|-<br />
| 3702 || Arabidopsis thaliana || 9911<br />
|-<br />
| 10116 || Rattus norvegicus || 5383<br />
|-<br />
| 155864 || Escherichia coli O157 H7 str. EDL933 || 4953<br />
|-<br />
| 632 || Yersinia pestis || 3823<br />
|-<br />
| 243276 || Treponema pallidum subsp. pallidum str. Nichols || 3643<br />
|-<br />
| 1148 || Synechocystis sp. PCC 6803 || 3240<br />
|-<br />
| 1392 || Bacillus anthracis || 3090<br />
|}<br />
<br />
Full list: [[File:iRefIndex9_taxonomy_summary_corrected.txt]]<br />
<br />
== Interactions ==<br />
<br />
{|<br />
| BIND||62923<br />
|-<br />
| GRID||24380 || 277531<br />
|-<br />
| DIP||25785 || 39323 || 89715<br />
|-<br />
| INTACT||25260 || 38185 || 39031 || 156451<br />
|-<br />
| MINT||21992 || 41551 || 36676 || 47296 || 85755<br />
|-<br />
| HPRD||1949 || 8718 || 1124 || 5716 || 4482 || 40488<br />
|-<br />
| OPHID||2414 || 9264 || 1463 || 7578 || 6912 || 10286 || 47479<br />
|-<br />
| MPACT||6480 || 8513 || 7019 || 6231 || 6480 || 0 || 0 || 13331<br />
|-<br />
| MPPI||420 || 153 || 65 || 97 || 93 || 158 || 187 || 0 || 830<br />
|-<br />
| CORUM||263 || 199 || 116 || 248 || 119 || 246 || 237 || 0 || 15 || 2607<br />
|-<br />
| BIND_TRANSLATION||56109 || 24375 || 24813 || 24559 || 21875 || 2200 || 2722 || 6282 || 391 || 196 || 60227<br />
|-<br />
| INNATEDB||357 || 1310 || 409 || 811 || 654 || 1036 || 1222 || 0 || 52 || 82 || 419 || 7000<br />
|-<br />
| MATRIXDB||5 || 11 || 2 || 15 || 2 || 14 || 24 || 0 || 2 || 0 || 5 || 5 || 201<br />
|-<br />
| MPILIT||24 || 0 || 85 || 114 || 32 || 0 || 0 || 0 || 0 || 0 || 24 || 0 || 0 || 745<br />
|-<br />
| MPIIMEX||6 || 0 || 25 || 34 || 14 || 0 || 0 || 0 || 0 || 0 || 6 || 0 || 0 || 30 || 473<br />
|-<br />
| ||BIND||GRID||DIP||INTACT||MINT||HPRD||OPHID||MPACT||MPPI||CORUM||BIND_TRANSLATION||INNATEDB||MATRIXDB||MPILIT||MPIIMEX<br />
|-<br />
| ||(5070)||(200591)||(27502)||(83929)||(18731)||(24349)||(28544)||(1120)||(221)||(1915)||(3055)||(4240)||(156)||(536)||(399)<br />
|}<br />
<br />
== Interactors ==<br />
<br />
{|<br />
| BIND||40897<br />
|-<br />
| GRID||18036 || 34410<br />
|-<br />
| DIP||17437 || 18640 || 29961<br />
|-<br />
| INTACT||19107 || 23866 || 24585 || 53546<br />
|-<br />
| MINT||16751 || 18749 || 19671 || 25563 || 31615<br />
|-<br />
| HPRD||2920 || 5853 || 3489 || 6050 || 4539 || 9825<br />
|-<br />
| OPHID||3397 || 6004 || 4220 || 7030 || 5398 || 6228 || 9574<br />
|-<br />
| MPACT||4419 || 4625 || 4734 || 4936 || 4800 || 0 || 1 || 4979<br />
|-<br />
| MPPI||705 || 498 || 479 || 638 || 565 || 316 || 425 || 0 || 865<br />
|-<br />
| CORUM||2140 || 2449 || 2230 || 3239 || 2535 || 1859 || 2248 || 0 || 418 || 4365<br />
|-<br />
| BIND_TRANSLATION||35150 || 17491 || 16835 || 18553 || 16180 || 2957 || 3331 || 4014 || 687 || 2001 || 37247<br />
|-<br />
| INNATEDB||1685 || 2178 || 1813 || 2614 || 2137 || 1709 || 2112 || 0 || 359 || 1148 || 1687 || 3403<br />
|-<br />
| MATRIXDB||115 || 111 || 88 || 138 || 116 || 111 || 144 || 0 || 18 || 52 || 114 || 89 || 221<br />
|-<br />
| MPILIT||89 || 0 || 332 || 442 || 227 || 0 || 0 || 0 || 0 || 0 || 90 || 0 || 0 || 937<br />
|-<br />
| MPIIMEX||32 || 0 || 111 || 129 || 65 || 0 || 0 || 0 || 0 || 0 || 30 || 0 || 0 || 92 || 473<br />
|-<br />
| ||BIND||GRID||DIP||INTACT||MINT||HPRD||OPHID||MPACT||MPPI||CORUM||BIND_TRANSLATION||INNATEDB||MATRIXDB||MPILIT||MPIIMEX<br />
|-<br />
| ||(4387)||(6793)||(2208)||(16016)||(3969)||(1708)||(921)||(10)||(33)||(494)||(1401)||(244)||(34)||(366)||(282)<br />
|}<br />
<br />
== Summary of mapping interaction records to RIGs (Table 5) ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Source'''||align="center" style="background:#f0f0f0;"|'''Total records'''||align="center" style="background:#f0f0f0;"|'''Protein-only interactors'''||align="center" style="background:#f0f0f0;"|'''PPI Assigned to RIGID'''||align="center" style="background:#f0f0f0;"|'''Unique RIGIDs'''<br />
|-<br />
| bind||193648||93957||91245(97.1136%)||62923(68.9605%)<br />
|-<br />
| grid||416648||411219||410641(99.8594%)||277531(67.5848%)<br />
|-<br />
| dip||90994||90994||89910(98.8087%)||89715(99.7831%)<br />
|-<br />
| intact||184959||183032||182359(99.6323%)||156451(85.7929%)<br />
|-<br />
| mint||122775||122775||122269(99.5879%)||85755(70.1363%)<br />
|-<br />
| HPRD||83022||83022||83022(100.0000%)||40488(48.7678%)<br />
|-<br />
| ophid||73257||73257||73160(99.8676%)||47479(64.8975%)<br />
|-<br />
| MPACT||16504||16504||16296(98.7397%)||13331(81.8054%)<br />
|-<br />
| MPPI||1814||1814||1699(93.6604%)||830(48.8523%)<br />
|-<br />
| CORUM||2844||2844||2844(100.0000%)||2607(91.6667%)<br />
|-<br />
| BIND_Translation||192923||87081||83347(95.7120%)||60227(72.2605%)<br />
|-<br />
| InnateDB||14729||11476||11248(98.0132%)||7000(62.2333%)<br />
|-<br />
| MatrixDB||846||349||321(91.9771%)||201(62.6168%)<br />
|-<br />
| mpilit||745||745||745(100.0000%)||745(100.0000%)<br />
|-<br />
| mpiimex||473||473||473(100.0000%)||473(100.0000%)<br />
|-<br />
| ALL||1396181||1179542||1169579(99.1554%)||545743(46.6615%)<br />
|}<br />
<br />
== Assignment of protein interactors to ROGs (Table 3) ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Source'''||align="center" style="background:#f0f0f0;"|'''Protein_Intractors'''||align="center" style="background:#f0f0f0;"|'''Assigned'''||align="center" style="background:#f0f0f0;"|'''%'''||align="center" style="background:#f0f0f0;"|'''Arbitrary'''||align="center" style="background:#f0f0f0;"|'''N_and_Y'''||align="center" style="background:#f0f0f0;"|'''Unassigned'''||align="center" style="background:#f0f0f0;"|'''Unique proteins'''<br />
|-<br />
| bind||285482||272457||95.4375||0||9077||3930||40897<br />
|-<br />
| BIND_Translation||264346||239976||90.7810||74||15390||8902||37247<br />
|-<br />
| CORUM||12916||12909||99.9458||7||0||0||4365<br />
|-<br />
| dip||30978||29436||95.0223||609||450||483||29961<br />
|-<br />
| grid||45569||37348||81.9592||7948||15||258||34410<br />
|-<br />
| HPRD||123812||103344||83.4685||20255||213||0||9825<br />
|-<br />
| InnateDB||27209||26914||98.9158||0||0||295||3403<br />
|-<br />
| intact||154359||151337||98.0422||36||2581||405||53546<br />
|-<br />
| MatrixDB||1123||1077||95.9038||0||0||46||221<br />
|-<br />
| mint||87509||83380||95.2816||51||3933||145||31615<br />
|-<br />
| MPACT||40349||40121||99.4349||0||1||227||4979<br />
|-<br />
| mpiimex||946||946||100.0000||0||0||0||473<br />
|-<br />
| mpilit||1490||1487||99.7987||3||0||0||937<br />
|-<br />
| MPPI||3628||3456||95.2591||0||42||130||865<br />
|-<br />
| ophid||146423||145149||99.1299||265||1003||6||9574<br />
|-<br />
| All||1226139||1149359||93.7381||29248||32705||14827||97139<br />
|}<br />
<br />
== ROG summary ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Decimal_score'''||align="center" style="background:#f0f0f0;"|'''Binary_flag'''||align="center" style="background:#f0f0f0;"|'''String_score'''||align="center" style="background:#f0f0f0;"|'''Score_class'''||align="center" style="background:#f0f0f0;"|'''Proteins'''||align="center" style="background:#f0f0f0;"|'''Percentage'''||align="center" style="background:#f0f0f0;"|'''bind'''||align="center" style="background:#f0f0f0;"|'''grid'''||align="center" style="background:#f0f0f0;"|'''dip'''||align="center" style="background:#f0f0f0;"|'''intact'''||align="center" style="background:#f0f0f0;"|'''mint'''||align="center" style="background:#f0f0f0;"|'''mpiimex'''||align="center" style="background:#f0f0f0;"|'''mpilit'''||align="center" style="background:#f0f0f0;"|'''HPRD'''||align="center" style="background:#f0f0f0;"|'''ophid'''||align="center" style="background:#f0f0f0;"|'''InnateDB'''||align="center" style="background:#f0f0f0;"|'''MatrixDB'''||align="center" style="background:#f0f0f0;"|'''MPACT'''||align="center" style="background:#f0f0f0;"|'''BIND_Translation'''||align="center" style="background:#f0f0f0;"|'''MPPI'''||align="center" style="background:#f0f0f0;"|'''CORUM'''<br />
|-<br />
| 786||000000001100010010||STO+||-1||8850||0.7218%||0||0||0||0||0||0||0||8850||0||0||0||0||0||0||0<br />
|-<br />
| 1938||000000011110010010||STMOX+||-1||29||0.0024%||0||0||0||0||0||0||0||29||0||0||0||0||0||0||0<br />
|-<br />
| 898||000000001110000010||SMO+||-1||21||0.0017%||0||0||0||17||0||0||0||4||0||0||0||0||0||0||0<br />
|-<br />
| 131093||100000000000010101||PUTQ||-1||5||0.0004%||0||0||0||5||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 1922||000000011110000010||SMOX+||-1||2||0.0002%||0||0||0||2||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 914||000000001110010010||STMO+||-1||2||0.0002%||0||0||0||0||0||0||0||2||0||0||0||0||0||0||0<br />
|-<br />
| 163905||101000000001000001||PDYQ||-1||2||0.0002%||0||0||0||0||0||0||0||0||0||0||0||0||2||0||0<br />
|-<br />
| 163921||101000000001010001||PTDYQ||-1||1||0.0001%||0||0||0||0||0||0||0||0||0||0||0||0||1||0||0<br />
|-<br />
| 218370||110101010100000010||SXLENQ+||-1||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 1||000000000000000001||P||1||745949||60.8372%||155411||31550||0||150559||49666||932||1400||0||124701||26914||828||0||188079||3021||12888<br />
|-<br />
| 2||000000000000000010||S||1||36416||2.9700%||0||65||22378||13||267||0||0||13128||0||0||0||0||565||0||0<br />
|-<br />
| 131201||100000000010000001||PMQ||1||24630||2.0087%||0||0||0||0||0||0||0||0||0||0||0||0||24630||0||0<br />
|-<br />
| 554||000000001000101010||SVGO||1||17303||1.4112%||0||0||0||0||0||0||0||17303||0||0||0||0||0||0||0<br />
|-<br />
| 8194||000010000000000010||SI||1||12319||1.0047%||12319||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 65||000000000001000001||PD||1||7080||0.5774%||7079||0||0||0||1||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 130||000000000010000010||SM||1||6593||0.5377%||0||0||0||0||0||0||0||6593||0||0||0||0||0||0||0<br />
|-<br />
| 41||000000000000101001||PVG||1||2223||0.1813%||0||2223||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 42||000000000000101010||SVG||1||1108||0.0904%||0||0||122||0||0||0||0||986||0||0||0||0||0||0||0<br />
|-<br />
| 129||000000000010000001||PM||1||714||0.0582%||468||0||0||77||0||0||0||0||0||0||137||0||0||32||0<br />
|-<br />
| 139265||100010000000000001||PIQ||1||372||0.0303%||0||0||0||0||0||0||0||0||0||0||0||0||372||0||0<br />
|-<br />
| 10||000000000000001010||SV||1||43||0.0035%||0||0||5||3||35||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 8193||000010000000000001||PI||1||35||0.0029%||0||0||0||27||8||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 66||000000000001000010||SD||1||22||0.0018%||0||4||0||0||0||0||0||0||0||0||18||0||0||0||0<br />
|-<br />
| 9||000000000000001001||PV||1||5||0.0004%||0||0||0||0||5||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 5||000000000000000101||PU||2||21909||1.7868%||0||0||0||289||253||9||7||0||20314||0||0||10||684||322||21<br />
|-<br />
| 16386||000100000000000010||SE||2||4888||0.3986%||4888||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 770||000000001100000010||SO+||2||3478||0.2837%||0||0||0||0||0||0||0||3478||0||0||0||0||0||0||0<br />
|-<br />
| 147458||100100000000000010||SEQ||2||2242||0.1829%||0||0||0||4||0||0||0||0||0||0||0||0||2238||0||0<br />
|-<br />
| 6||000000000000000110||SU||2||194||0.0158%||0||1||147||27||5||0||0||13||0||0||0||0||1||0||0<br />
|-<br />
| 16385||000100000000000001||PE||2||156||0.0127%||0||0||0||147||9||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 147457||100100000000000001||PEQ||2||55||0.0045%||0||0||0||0||0||0||0||0||0||0||0||0||55||0||0<br />
|-<br />
| 773||000000001100000101||PUO+||2||21||0.0017%||0||0||0||8||2||0||0||0||11||0||0||0||0||0||0<br />
|-<br />
| 1797||000000011100000101||PUOX+||2||4||0.0003%||0||0||0||4||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 16514||000100000010000010||SME||2||3||0.0002%||3||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 774||000000001100000110||SUO+||2||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 17||000000000000010001||PT||3||156757||12.7846%||87968||3505||0||19||32901||4||79||0||118||0||2||30590||1524||47||0<br />
|-<br />
| 18||000000000000010010||ST||3||46525||3.7944%||0||0||6773||1||18||0||0||32718||0||0||0||6994||21||0||0<br />
|-<br />
| 146||000000000010010010||STM||3||16664||1.3591%||0||0||0||0||0||0||0||16664||0||0||0||0||0||0||0<br />
|-<br />
| 131217||100000000010010001||PTMQ||3||4257||0.3472%||0||0||0||0||0||0||0||0||0||0||0||0||4257||0||0<br />
|-<br />
| 81||000000000001010001||PTD||3||2567||0.2094%||2472||0||0||3||1||0||0||0||0||0||91||0||0||0||0<br />
|-<br />
| 8210||000010000000010010||STI||3||872||0.0711%||872||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 145||000000000010010001||PTM||3||171||0.0139%||137||0||0||0||0||0||0||0||0||0||0||0||0||34||0<br />
|-<br />
| 163985||101000000010010001||PTMYQ||3||52||0.0042%||0||0||0||0||0||0||0||0||0||0||0||0||52||0||0<br />
|-<br />
| 16530||000100000010010010||STME||3||13||0.0011%||13||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 8209||000010000000010001||PTI||3||13||0.0011%||0||0||0||13||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 82||000000000001010010||STD||3||10||0.0008%||0||0||0||0||9||0||0||0||0||0||1||0||0||0||0<br />
|-<br />
| 139281||100010000000010001||PTIQ||3||7||0.0006%||0||0||0||0||0||0||0||0||0||0||0||0||7||0||0<br />
|-<br />
| 26||000000000000011010||SVT||3||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 16402||000100000000010010||STE||4||828||0.0675%||827||0||1||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 147474||100100000000010010||STEQ||4||411||0.0335%||0||0||0||2||0||0||0||0||0||0||0||0||409||0||0<br />
|-<br />
| 22||000000000000010110||SUT||4||144||0.0117%||0||0||10||0||0||0||0||134||0||0||0||0||0||0||0<br />
|-<br />
| 790||000000001100010110||SUTO+||4||47||0.0038%||0||0||0||18||27||0||0||2||0||0||0||0||0||0||0<br />
|-<br />
| 789||000000001100010101||PUTO+||4||32||0.0026%||0||0||0||27||5||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 16401||000100000000010001||PTE||4||2||0.0002%||0||0||0||2||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 5378||000001010100000010||SXL+||5||18721||1.5268%||0||0||0||14||1||0||0||18706||0||0||0||0||0||0||0<br />
|-<br />
| 131073||100000000000000001||PQ||5||16324||1.3313%||0||0||0||6||0||0||0||0||0||0||0||0||16318||0||0<br />
|-<br />
| 4393||000001000100101001||PVGL+||5||7931||0.6468%||0||7931||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 810||000000001100101010||SVGO+||5||3440||0.2806%||0||0||0||0||0||0||0||3440||0||0||0||0||0||0||0<br />
|-<br />
| 21||000000000000010101||PUT||5||2721||0.2219%||0||0||0||15||168||1||1||0||5||0||0||2527||4||0||0<br />
|-<br />
| 4394||000001000100101010||SVGL+||5||1650||0.1346%||0||0||112||0||0||0||0||1538||0||0||0||0||0||0||0<br />
|-<br />
| 131089||100000000000010001||PTQ||5||859||0.0701%||0||0||0||47||0||0||0||0||0||0||0||0||812||0||0<br />
|-<br />
| 4354||000001000100000010||SL+||5||493||0.0402%||0||17||474||2||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 4357||000001000100000101||PUL+||5||241||0.0197%||0||0||0||0||0||0||3||0||222||0||0||0||9||0||7<br />
|-<br />
| 4373||000001000100010101||PUTL+||5||74||0.0060%||0||0||0||8||3||0||0||0||4||0||0||0||59||0||0<br />
|-<br />
| 5381||000001010100000101||PUXL+||5||55||0.0045%||0||0||0||11||5||0||0||0||39||0||0||0||0||0||0<br />
|-<br />
| 5386||000001010100001010||SVXL+||5||43||0.0035%||0||0||0||1||42||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 4374||000001000100010110||SUTL+||5||30||0.0024%||0||0||17||0||0||0||0||7||0||0||0||0||6||0||0<br />
|-<br />
| 4358||000001000100000110||SUL+||5||6||0.0005%||0||0||6||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 5382||000001010100000110||SUXL+||5||4||0.0003%||0||0||0||0||0||0||0||4||0||0||0||0||0||0||0<br />
|-<br />
| 32769||001000000000000001||PY||6||16102||1.3132%||3687||12||0||1963||3392||0||0||0||750||0||0||0||6293||5||0<br />
|-<br />
| 65601||010000000001000001||PDN||6||8727||0.7117%||52||0||0||2||247||0||0||0||253||0||0||0||8168||5||0<br />
|-<br />
| 81922||010100000000000010||SEN||6||4421||0.3606%||4421||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 65537||010000000000000001||PN||6||970||0.0791%||35||0||190||299||256||0||0||179||0||0||0||0||0||11||0<br />
|-<br />
| 32833||001000000001000001||PDY||6||773||0.0630%||773||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32770||001000000000000010||SY||6||427||0.0348%||0||3||258||92||28||0||0||0||0||0||0||0||46||0||0<br />
|-<br />
| 163969||101000000010000001||PMYQ||6||402||0.0328%||0||0||0||0||0||0||0||0||0||0||0||0||402||0||0<br />
|-<br />
| 212993||110100000000000001||PENQ||6||293||0.0239%||0||0||0||0||0||0||0||0||0||0||0||0||293||0||0<br />
|-<br />
| 73729||010010000000000001||PIN||6||204||0.0166%||0||0||0||204||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32785||001000000000010001||PTY||6||164||0.0134%||93||0||0||14||0||0||0||0||0||0||0||0||57||0||0<br />
|-<br />
| 65553||010000000000010001||PTN||6||38||0.0031%||0||0||0||4||0||0||0||34||0||0||0||0||0||0||0<br />
|-<br />
| 196609||110000000000000001||PNQ||6||31||0.0025%||0||0||0||0||0||0||0||0||0||0||0||0||31||0||0<br />
|-<br />
| 81921||010100000000000001||PEN||6||29||0.0024%||0||0||0||1||10||0||0||0||0||0||0||0||0||18||0<br />
|-<br />
| 65617||010000000001010001||PTDN||6||23||0.0019%||0||0||0||0||0||0||0||0||0||0||0||0||23||0||0<br />
|-<br />
| 196625||110000000000010001||PTNQ||6||22||0.0018%||0||0||0||0||0||0||0||0||0||0||0||0||22||0||0<br />
|-<br />
| 81938||010100000000010010||STEN||6||14||0.0011%||14||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32786||001000000000010010||STY||6||3||0.0002%||0||0||2||0||0||0||0||0||0||0||0||1||0||0||0<br />
|-<br />
| 32897||001000000010000001||PMY||6||2||0.0002%||0||0||0||0||0||0||0||0||0||0||0||0||0||2||0<br />
|-<br />
| 81986||010100000001000010||SDEN||6||1||0.0001%||1||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32913||001000000010010001||PTMY||6||1||0.0001%||0||0||0||0||0||0||0||0||0||0||0||0||0||1||0<br />
|-<br />
| 163857||101000000000010001||PTYQ||6||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 40978||001010000000010010||STIY||6||1||0.0001%||1||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|}<br />
<br />
== Scores (Table 2) ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Character'''||align="center" style="background:#f0f0f0;"|'''Description of feature (when the value is 1)'''||align="center" style="background:#f0f0f0;"|'''Frequency'''<br />
|-<br />
| D||The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.||19206(1.5856%)<br />
|-<br />
| E||The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.||13357(1.1027%)<br />
|-<br />
| G||The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.||33655(2.7784%)<br />
|-<br />
| L||More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)||29249(2.4147%)<br />
|-<br />
| M||The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.||53556(4.4214%)<br />
|-<br />
| +||More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).||45176(3.7296%)<br />
|-<br />
| N||The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.||14774(1.2197%)<br />
|-<br />
| O||More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.||33230(2.7434%)<br />
|-<br />
| I||The protein reference used was an NCBI GenInfo Identifier (I).||13823(1.1412%)<br />
|-<br />
| U||The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.||25488(2.1042%)<br />
|-<br />
| T||The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made||242211(19.9961%)<br />
|-<br />
| V||The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.||33747(2.786%)<br />
|-<br />
| Q||The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.||49967(4.1251%)<br />
|-<br />
| P||The interaction record's primary (P) reference for the protein was used to make the assignment||1023006(84.4559%)<br />
|-<br />
| S||One of the interaction record's secondary (S) references for the protein was used to make the assignment||188284(15.5441%)<br />
|-<br />
| Y|| the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)||17931(1.4803%)<br />
|-<br />
| X||More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record||18859(1.5569%)<br />
|}<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Statistics_iRefIndex_9.0&diff=4078
Statistics iRefIndex 9.0
2012-02-16T13:43:19Z
<p>PaulBoddie: Added complete file links.</p>
<hr />
<div>== Interactions available from major taxonomies ==<br />
<br />
=== Top 15 uncorrected taxonomy groups in iRefIndex (Taxonomy identifiers as they appear in original source)===<br />
{| {{table}}<br />
| align="center" style="background:#f0f0f0;"|''' NCBI taxonomy identifier'''<br />
| align="center" style="background:#f0f0f0;"|'''Scientific_name'''<br />
| align="center" style="background:#f0f0f0;"|'''Number_of_interactions'''<br />
|-<br />
| 559292 || Saccharomyces cerevisiae S288c || 216326<br />
|-<br />
| 9606 || Homo sapiens || 145971<br />
|-<br />
| 7227 || Drosophila melanogaster || 47389<br />
|-<br />
| 4932 || Saccharomyces cerevisiae || 46966<br />
|-<br />
| 40674 || Mammalia || 36307<br />
|-<br />
| 10090 || Mus musculus || 21487<br />
|-<br />
| 83333 || Escherichia coli K-12 || 17673<br />
|-<br />
| 4896 || Schizosaccharomyces pombe || 15493<br />
|-<br />
| 6239 || Caenorhabditis elegans || 14020<br />
|-<br />
| 197 || Campylobacter jejuni || 12028<br />
|-<br />
| 3702 || Arabidopsis thaliana || 9911<br />
|-<br />
| 10116 || Rattus norvegicus || 6917<br />
|-<br />
| 562 || Escherichia coli || 5366<br />
|-<br />
| 632 || Yersinia pestis || 3823<br />
|-<br />
| 243276 || Treponema pallidum subsp. pallidum str. Nichols || 3642<br />
|}<br />
<br />
* Full list [[http://irefindex.uio.no/wikifiles//images/f/fa/iRefIndex9_taxonomy_summary.txt]]<br />
<br />
===Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)=== <br />
{| {{table}}<br />
| align="center" style="background:#f0f0f0;"|''' NCBI taxonomy identifier'''<br />
| align="center" style="background:#f0f0f0;"|'''Scientific_name'''<br />
| align="center" style="background:#f0f0f0;"|'''Number_of_interactions'''<br />
|-<br />
| 559292 || Saccharomyces cerevisiae S288c || 225644<br />
|-<br />
| 9606 || Homo sapiens || 156006<br />
|-<br />
| 7227 || Drosophila melanogaster || 47388<br />
|-<br />
| 10090 || Mus musculus || 18312<br />
|-<br />
| 83333 || Escherichia coli K-12 || 17408<br />
|-<br />
| 284812 || Schizosaccharomyces pombe 972h- || 15693<br />
|-<br />
| 6239 || Caenorhabditis elegans || 14020<br />
|-<br />
| 197 || Campylobacter jejuni || 12028<br />
|-<br />
| 3702 || Arabidopsis thaliana || 9911<br />
|-<br />
| 10116 || Rattus norvegicus || 5383<br />
|-<br />
| 155864 || Escherichia coli O157 H7 str. EDL933 || 4953<br />
|-<br />
| 632 || Yersinia pestis || 3823<br />
|-<br />
| 243276 || Treponema pallidum subsp. pallidum str. Nichols || 3643<br />
|-<br />
| 1148 || Synechocystis sp. PCC 6803 || 3240<br />
|-<br />
| 1392 || Bacillus anthracis || 3090<br />
|}<br />
* Full list [[http://irefindex.uio.no/wikifiles//images/f/fa/iRefIndex9_taxonomy_summary_corrected.txt]]<br />
<br />
== Interactions ==<br />
<br />
{|<br />
| BIND||62923<br />
|-<br />
| GRID||24380 || 277531<br />
|-<br />
| DIP||25785 || 39323 || 89715<br />
|-<br />
| INTACT||25260 || 38185 || 39031 || 156451<br />
|-<br />
| MINT||21992 || 41551 || 36676 || 47296 || 85755<br />
|-<br />
| HPRD||1949 || 8718 || 1124 || 5716 || 4482 || 40488<br />
|-<br />
| OPHID||2414 || 9264 || 1463 || 7578 || 6912 || 10286 || 47479<br />
|-<br />
| MPACT||6480 || 8513 || 7019 || 6231 || 6480 || 0 || 0 || 13331<br />
|-<br />
| MPPI||420 || 153 || 65 || 97 || 93 || 158 || 187 || 0 || 830<br />
|-<br />
| CORUM||263 || 199 || 116 || 248 || 119 || 246 || 237 || 0 || 15 || 2607<br />
|-<br />
| BIND_TRANSLATION||56109 || 24375 || 24813 || 24559 || 21875 || 2200 || 2722 || 6282 || 391 || 196 || 60227<br />
|-<br />
| INNATEDB||357 || 1310 || 409 || 811 || 654 || 1036 || 1222 || 0 || 52 || 82 || 419 || 7000<br />
|-<br />
| MATRIXDB||5 || 11 || 2 || 15 || 2 || 14 || 24 || 0 || 2 || 0 || 5 || 5 || 201<br />
|-<br />
| MPILIT||24 || 0 || 85 || 114 || 32 || 0 || 0 || 0 || 0 || 0 || 24 || 0 || 0 || 745<br />
|-<br />
| MPIIMEX||6 || 0 || 25 || 34 || 14 || 0 || 0 || 0 || 0 || 0 || 6 || 0 || 0 || 30 || 473<br />
|-<br />
| ||BIND||GRID||DIP||INTACT||MINT||HPRD||OPHID||MPACT||MPPI||CORUM||BIND_TRANSLATION||INNATEDB||MATRIXDB||MPILIT||MPIIMEX<br />
|-<br />
| ||(5070)||(200591)||(27502)||(83929)||(18731)||(24349)||(28544)||(1120)||(221)||(1915)||(3055)||(4240)||(156)||(536)||(399)<br />
|}<br />
<br />
== Interactors ==<br />
<br />
{|<br />
| BIND||40897<br />
|-<br />
| GRID||18036 || 34410<br />
|-<br />
| DIP||17437 || 18640 || 29961<br />
|-<br />
| INTACT||19107 || 23866 || 24585 || 53546<br />
|-<br />
| MINT||16751 || 18749 || 19671 || 25563 || 31615<br />
|-<br />
| HPRD||2920 || 5853 || 3489 || 6050 || 4539 || 9825<br />
|-<br />
| OPHID||3397 || 6004 || 4220 || 7030 || 5398 || 6228 || 9574<br />
|-<br />
| MPACT||4419 || 4625 || 4734 || 4936 || 4800 || 0 || 1 || 4979<br />
|-<br />
| MPPI||705 || 498 || 479 || 638 || 565 || 316 || 425 || 0 || 865<br />
|-<br />
| CORUM||2140 || 2449 || 2230 || 3239 || 2535 || 1859 || 2248 || 0 || 418 || 4365<br />
|-<br />
| BIND_TRANSLATION||35150 || 17491 || 16835 || 18553 || 16180 || 2957 || 3331 || 4014 || 687 || 2001 || 37247<br />
|-<br />
| INNATEDB||1685 || 2178 || 1813 || 2614 || 2137 || 1709 || 2112 || 0 || 359 || 1148 || 1687 || 3403<br />
|-<br />
| MATRIXDB||115 || 111 || 88 || 138 || 116 || 111 || 144 || 0 || 18 || 52 || 114 || 89 || 221<br />
|-<br />
| MPILIT||89 || 0 || 332 || 442 || 227 || 0 || 0 || 0 || 0 || 0 || 90 || 0 || 0 || 937<br />
|-<br />
| MPIIMEX||32 || 0 || 111 || 129 || 65 || 0 || 0 || 0 || 0 || 0 || 30 || 0 || 0 || 92 || 473<br />
|-<br />
| ||BIND||GRID||DIP||INTACT||MINT||HPRD||OPHID||MPACT||MPPI||CORUM||BIND_TRANSLATION||INNATEDB||MATRIXDB||MPILIT||MPIIMEX<br />
|-<br />
| ||(4387)||(6793)||(2208)||(16016)||(3969)||(1708)||(921)||(10)||(33)||(494)||(1401)||(244)||(34)||(366)||(282)<br />
|}<br />
<br />
== Summary of mapping interaction records to RIGs (Table 5) ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Source'''||align="center" style="background:#f0f0f0;"|'''Total records'''||align="center" style="background:#f0f0f0;"|'''Protein-only interactors'''||align="center" style="background:#f0f0f0;"|'''PPI Assigned to RIGID'''||align="center" style="background:#f0f0f0;"|'''Unique RIGIDs'''<br />
|-<br />
| bind||193648||93957||91245(97.1136%)||62923(68.9605%)<br />
|-<br />
| grid||416648||411219||410641(99.8594%)||277531(67.5848%)<br />
|-<br />
| dip||90994||90994||89910(98.8087%)||89715(99.7831%)<br />
|-<br />
| intact||184959||183032||182359(99.6323%)||156451(85.7929%)<br />
|-<br />
| mint||122775||122775||122269(99.5879%)||85755(70.1363%)<br />
|-<br />
| HPRD||83022||83022||83022(100.0000%)||40488(48.7678%)<br />
|-<br />
| ophid||73257||73257||73160(99.8676%)||47479(64.8975%)<br />
|-<br />
| MPACT||16504||16504||16296(98.7397%)||13331(81.8054%)<br />
|-<br />
| MPPI||1814||1814||1699(93.6604%)||830(48.8523%)<br />
|-<br />
| CORUM||2844||2844||2844(100.0000%)||2607(91.6667%)<br />
|-<br />
| BIND_Translation||192923||87081||83347(95.7120%)||60227(72.2605%)<br />
|-<br />
| InnateDB||14729||11476||11248(98.0132%)||7000(62.2333%)<br />
|-<br />
| MatrixDB||846||349||321(91.9771%)||201(62.6168%)<br />
|-<br />
| mpilit||745||745||745(100.0000%)||745(100.0000%)<br />
|-<br />
| mpiimex||473||473||473(100.0000%)||473(100.0000%)<br />
|-<br />
| ALL||1396181||1179542||1169579(99.1554%)||545743(46.6615%)<br />
|}<br />
<br />
== Assignment of protein interactors to ROGs (Table 3) ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Source'''||align="center" style="background:#f0f0f0;"|'''Protein_Intractors'''||align="center" style="background:#f0f0f0;"|'''Assigned'''||align="center" style="background:#f0f0f0;"|'''%'''||align="center" style="background:#f0f0f0;"|'''Arbitrary'''||align="center" style="background:#f0f0f0;"|'''N_and_Y'''||align="center" style="background:#f0f0f0;"|'''Unassigned'''||align="center" style="background:#f0f0f0;"|'''Unique proteins'''<br />
|-<br />
| bind||285482||272457||95.4375||0||9077||3930||40897<br />
|-<br />
| BIND_Translation||264346||239976||90.7810||74||15390||8902||37247<br />
|-<br />
| CORUM||12916||12909||99.9458||7||0||0||4365<br />
|-<br />
| dip||30978||29436||95.0223||609||450||483||29961<br />
|-<br />
| grid||45569||37348||81.9592||7948||15||258||34410<br />
|-<br />
| HPRD||123812||103344||83.4685||20255||213||0||9825<br />
|-<br />
| InnateDB||27209||26914||98.9158||0||0||295||3403<br />
|-<br />
| intact||154359||151337||98.0422||36||2581||405||53546<br />
|-<br />
| MatrixDB||1123||1077||95.9038||0||0||46||221<br />
|-<br />
| mint||87509||83380||95.2816||51||3933||145||31615<br />
|-<br />
| MPACT||40349||40121||99.4349||0||1||227||4979<br />
|-<br />
| mpiimex||946||946||100.0000||0||0||0||473<br />
|-<br />
| mpilit||1490||1487||99.7987||3||0||0||937<br />
|-<br />
| MPPI||3628||3456||95.2591||0||42||130||865<br />
|-<br />
| ophid||146423||145149||99.1299||265||1003||6||9574<br />
|-<br />
| All||1226139||1149359||93.7381||29248||32705||14827||97139<br />
|}<br />
<br />
== ROG summary ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Decimal_score'''||align="center" style="background:#f0f0f0;"|'''Binary_flag'''||align="center" style="background:#f0f0f0;"|'''String_score'''||align="center" style="background:#f0f0f0;"|'''Score_class'''||align="center" style="background:#f0f0f0;"|'''Proteins'''||align="center" style="background:#f0f0f0;"|'''Percentage'''||align="center" style="background:#f0f0f0;"|'''bind'''||align="center" style="background:#f0f0f0;"|'''grid'''||align="center" style="background:#f0f0f0;"|'''dip'''||align="center" style="background:#f0f0f0;"|'''intact'''||align="center" style="background:#f0f0f0;"|'''mint'''||align="center" style="background:#f0f0f0;"|'''mpiimex'''||align="center" style="background:#f0f0f0;"|'''mpilit'''||align="center" style="background:#f0f0f0;"|'''HPRD'''||align="center" style="background:#f0f0f0;"|'''ophid'''||align="center" style="background:#f0f0f0;"|'''InnateDB'''||align="center" style="background:#f0f0f0;"|'''MatrixDB'''||align="center" style="background:#f0f0f0;"|'''MPACT'''||align="center" style="background:#f0f0f0;"|'''BIND_Translation'''||align="center" style="background:#f0f0f0;"|'''MPPI'''||align="center" style="background:#f0f0f0;"|'''CORUM'''<br />
|-<br />
| 786||000000001100010010||STO+||-1||8850||0.7218%||0||0||0||0||0||0||0||8850||0||0||0||0||0||0||0<br />
|-<br />
| 1938||000000011110010010||STMOX+||-1||29||0.0024%||0||0||0||0||0||0||0||29||0||0||0||0||0||0||0<br />
|-<br />
| 898||000000001110000010||SMO+||-1||21||0.0017%||0||0||0||17||0||0||0||4||0||0||0||0||0||0||0<br />
|-<br />
| 131093||100000000000010101||PUTQ||-1||5||0.0004%||0||0||0||5||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 1922||000000011110000010||SMOX+||-1||2||0.0002%||0||0||0||2||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 914||000000001110010010||STMO+||-1||2||0.0002%||0||0||0||0||0||0||0||2||0||0||0||0||0||0||0<br />
|-<br />
| 163905||101000000001000001||PDYQ||-1||2||0.0002%||0||0||0||0||0||0||0||0||0||0||0||0||2||0||0<br />
|-<br />
| 163921||101000000001010001||PTDYQ||-1||1||0.0001%||0||0||0||0||0||0||0||0||0||0||0||0||1||0||0<br />
|-<br />
| 218370||110101010100000010||SXLENQ+||-1||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 1||000000000000000001||P||1||745949||60.8372%||155411||31550||0||150559||49666||932||1400||0||124701||26914||828||0||188079||3021||12888<br />
|-<br />
| 2||000000000000000010||S||1||36416||2.9700%||0||65||22378||13||267||0||0||13128||0||0||0||0||565||0||0<br />
|-<br />
| 131201||100000000010000001||PMQ||1||24630||2.0087%||0||0||0||0||0||0||0||0||0||0||0||0||24630||0||0<br />
|-<br />
| 554||000000001000101010||SVGO||1||17303||1.4112%||0||0||0||0||0||0||0||17303||0||0||0||0||0||0||0<br />
|-<br />
| 8194||000010000000000010||SI||1||12319||1.0047%||12319||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 65||000000000001000001||PD||1||7080||0.5774%||7079||0||0||0||1||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 130||000000000010000010||SM||1||6593||0.5377%||0||0||0||0||0||0||0||6593||0||0||0||0||0||0||0<br />
|-<br />
| 41||000000000000101001||PVG||1||2223||0.1813%||0||2223||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 42||000000000000101010||SVG||1||1108||0.0904%||0||0||122||0||0||0||0||986||0||0||0||0||0||0||0<br />
|-<br />
| 129||000000000010000001||PM||1||714||0.0582%||468||0||0||77||0||0||0||0||0||0||137||0||0||32||0<br />
|-<br />
| 139265||100010000000000001||PIQ||1||372||0.0303%||0||0||0||0||0||0||0||0||0||0||0||0||372||0||0<br />
|-<br />
| 10||000000000000001010||SV||1||43||0.0035%||0||0||5||3||35||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 8193||000010000000000001||PI||1||35||0.0029%||0||0||0||27||8||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 66||000000000001000010||SD||1||22||0.0018%||0||4||0||0||0||0||0||0||0||0||18||0||0||0||0<br />
|-<br />
| 9||000000000000001001||PV||1||5||0.0004%||0||0||0||0||5||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 5||000000000000000101||PU||2||21909||1.7868%||0||0||0||289||253||9||7||0||20314||0||0||10||684||322||21<br />
|-<br />
| 16386||000100000000000010||SE||2||4888||0.3986%||4888||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 770||000000001100000010||SO+||2||3478||0.2837%||0||0||0||0||0||0||0||3478||0||0||0||0||0||0||0<br />
|-<br />
| 147458||100100000000000010||SEQ||2||2242||0.1829%||0||0||0||4||0||0||0||0||0||0||0||0||2238||0||0<br />
|-<br />
| 6||000000000000000110||SU||2||194||0.0158%||0||1||147||27||5||0||0||13||0||0||0||0||1||0||0<br />
|-<br />
| 16385||000100000000000001||PE||2||156||0.0127%||0||0||0||147||9||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 147457||100100000000000001||PEQ||2||55||0.0045%||0||0||0||0||0||0||0||0||0||0||0||0||55||0||0<br />
|-<br />
| 773||000000001100000101||PUO+||2||21||0.0017%||0||0||0||8||2||0||0||0||11||0||0||0||0||0||0<br />
|-<br />
| 1797||000000011100000101||PUOX+||2||4||0.0003%||0||0||0||4||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 16514||000100000010000010||SME||2||3||0.0002%||3||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 774||000000001100000110||SUO+||2||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 17||000000000000010001||PT||3||156757||12.7846%||87968||3505||0||19||32901||4||79||0||118||0||2||30590||1524||47||0<br />
|-<br />
| 18||000000000000010010||ST||3||46525||3.7944%||0||0||6773||1||18||0||0||32718||0||0||0||6994||21||0||0<br />
|-<br />
| 146||000000000010010010||STM||3||16664||1.3591%||0||0||0||0||0||0||0||16664||0||0||0||0||0||0||0<br />
|-<br />
| 131217||100000000010010001||PTMQ||3||4257||0.3472%||0||0||0||0||0||0||0||0||0||0||0||0||4257||0||0<br />
|-<br />
| 81||000000000001010001||PTD||3||2567||0.2094%||2472||0||0||3||1||0||0||0||0||0||91||0||0||0||0<br />
|-<br />
| 8210||000010000000010010||STI||3||872||0.0711%||872||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 145||000000000010010001||PTM||3||171||0.0139%||137||0||0||0||0||0||0||0||0||0||0||0||0||34||0<br />
|-<br />
| 163985||101000000010010001||PTMYQ||3||52||0.0042%||0||0||0||0||0||0||0||0||0||0||0||0||52||0||0<br />
|-<br />
| 16530||000100000010010010||STME||3||13||0.0011%||13||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 8209||000010000000010001||PTI||3||13||0.0011%||0||0||0||13||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 82||000000000001010010||STD||3||10||0.0008%||0||0||0||0||9||0||0||0||0||0||1||0||0||0||0<br />
|-<br />
| 139281||100010000000010001||PTIQ||3||7||0.0006%||0||0||0||0||0||0||0||0||0||0||0||0||7||0||0<br />
|-<br />
| 26||000000000000011010||SVT||3||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 16402||000100000000010010||STE||4||828||0.0675%||827||0||1||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 147474||100100000000010010||STEQ||4||411||0.0335%||0||0||0||2||0||0||0||0||0||0||0||0||409||0||0<br />
|-<br />
| 22||000000000000010110||SUT||4||144||0.0117%||0||0||10||0||0||0||0||134||0||0||0||0||0||0||0<br />
|-<br />
| 790||000000001100010110||SUTO+||4||47||0.0038%||0||0||0||18||27||0||0||2||0||0||0||0||0||0||0<br />
|-<br />
| 789||000000001100010101||PUTO+||4||32||0.0026%||0||0||0||27||5||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 16401||000100000000010001||PTE||4||2||0.0002%||0||0||0||2||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 5378||000001010100000010||SXL+||5||18721||1.5268%||0||0||0||14||1||0||0||18706||0||0||0||0||0||0||0<br />
|-<br />
| 131073||100000000000000001||PQ||5||16324||1.3313%||0||0||0||6||0||0||0||0||0||0||0||0||16318||0||0<br />
|-<br />
| 4393||000001000100101001||PVGL+||5||7931||0.6468%||0||7931||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 810||000000001100101010||SVGO+||5||3440||0.2806%||0||0||0||0||0||0||0||3440||0||0||0||0||0||0||0<br />
|-<br />
| 21||000000000000010101||PUT||5||2721||0.2219%||0||0||0||15||168||1||1||0||5||0||0||2527||4||0||0<br />
|-<br />
| 4394||000001000100101010||SVGL+||5||1650||0.1346%||0||0||112||0||0||0||0||1538||0||0||0||0||0||0||0<br />
|-<br />
| 131089||100000000000010001||PTQ||5||859||0.0701%||0||0||0||47||0||0||0||0||0||0||0||0||812||0||0<br />
|-<br />
| 4354||000001000100000010||SL+||5||493||0.0402%||0||17||474||2||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 4357||000001000100000101||PUL+||5||241||0.0197%||0||0||0||0||0||0||3||0||222||0||0||0||9||0||7<br />
|-<br />
| 4373||000001000100010101||PUTL+||5||74||0.0060%||0||0||0||8||3||0||0||0||4||0||0||0||59||0||0<br />
|-<br />
| 5381||000001010100000101||PUXL+||5||55||0.0045%||0||0||0||11||5||0||0||0||39||0||0||0||0||0||0<br />
|-<br />
| 5386||000001010100001010||SVXL+||5||43||0.0035%||0||0||0||1||42||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 4374||000001000100010110||SUTL+||5||30||0.0024%||0||0||17||0||0||0||0||7||0||0||0||0||6||0||0<br />
|-<br />
| 4358||000001000100000110||SUL+||5||6||0.0005%||0||0||6||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 5382||000001010100000110||SUXL+||5||4||0.0003%||0||0||0||0||0||0||0||4||0||0||0||0||0||0||0<br />
|-<br />
| 32769||001000000000000001||PY||6||16102||1.3132%||3687||12||0||1963||3392||0||0||0||750||0||0||0||6293||5||0<br />
|-<br />
| 65601||010000000001000001||PDN||6||8727||0.7117%||52||0||0||2||247||0||0||0||253||0||0||0||8168||5||0<br />
|-<br />
| 81922||010100000000000010||SEN||6||4421||0.3606%||4421||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 65537||010000000000000001||PN||6||970||0.0791%||35||0||190||299||256||0||0||179||0||0||0||0||0||11||0<br />
|-<br />
| 32833||001000000001000001||PDY||6||773||0.0630%||773||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32770||001000000000000010||SY||6||427||0.0348%||0||3||258||92||28||0||0||0||0||0||0||0||46||0||0<br />
|-<br />
| 163969||101000000010000001||PMYQ||6||402||0.0328%||0||0||0||0||0||0||0||0||0||0||0||0||402||0||0<br />
|-<br />
| 212993||110100000000000001||PENQ||6||293||0.0239%||0||0||0||0||0||0||0||0||0||0||0||0||293||0||0<br />
|-<br />
| 73729||010010000000000001||PIN||6||204||0.0166%||0||0||0||204||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32785||001000000000010001||PTY||6||164||0.0134%||93||0||0||14||0||0||0||0||0||0||0||0||57||0||0<br />
|-<br />
| 65553||010000000000010001||PTN||6||38||0.0031%||0||0||0||4||0||0||0||34||0||0||0||0||0||0||0<br />
|-<br />
| 196609||110000000000000001||PNQ||6||31||0.0025%||0||0||0||0||0||0||0||0||0||0||0||0||31||0||0<br />
|-<br />
| 81921||010100000000000001||PEN||6||29||0.0024%||0||0||0||1||10||0||0||0||0||0||0||0||0||18||0<br />
|-<br />
| 65617||010000000001010001||PTDN||6||23||0.0019%||0||0||0||0||0||0||0||0||0||0||0||0||23||0||0<br />
|-<br />
| 196625||110000000000010001||PTNQ||6||22||0.0018%||0||0||0||0||0||0||0||0||0||0||0||0||22||0||0<br />
|-<br />
| 81938||010100000000010010||STEN||6||14||0.0011%||14||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32786||001000000000010010||STY||6||3||0.0002%||0||0||2||0||0||0||0||0||0||0||0||1||0||0||0<br />
|-<br />
| 32897||001000000010000001||PMY||6||2||0.0002%||0||0||0||0||0||0||0||0||0||0||0||0||0||2||0<br />
|-<br />
| 81986||010100000001000010||SDEN||6||1||0.0001%||1||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32913||001000000010010001||PTMY||6||1||0.0001%||0||0||0||0||0||0||0||0||0||0||0||0||0||1||0<br />
|-<br />
| 163857||101000000000010001||PTYQ||6||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 40978||001010000000010010||STIY||6||1||0.0001%||1||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|}<br />
<br />
== Scores (Table 2) ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Character'''||align="center" style="background:#f0f0f0;"|'''Description of feature (when the value is 1)'''||align="center" style="background:#f0f0f0;"|'''Frequency'''<br />
|-<br />
| D||The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.||19206(1.5856%)<br />
|-<br />
| E||The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.||13357(1.1027%)<br />
|-<br />
| G||The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.||33655(2.7784%)<br />
|-<br />
| L||More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)||29249(2.4147%)<br />
|-<br />
| M||The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.||53556(4.4214%)<br />
|-<br />
| +||More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).||45176(3.7296%)<br />
|-<br />
| N||The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.||14774(1.2197%)<br />
|-<br />
| O||More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.||33230(2.7434%)<br />
|-<br />
| I||The protein reference used was an NCBI GenInfo Identifier (I).||13823(1.1412%)<br />
|-<br />
| U||The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.||25488(2.1042%)<br />
|-<br />
| T||The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made||242211(19.9961%)<br />
|-<br />
| V||The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.||33747(2.786%)<br />
|-<br />
| Q||The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.||49967(4.1251%)<br />
|-<br />
| P||The interaction record's primary (P) reference for the protein was used to make the assignment||1023006(84.4559%)<br />
|-<br />
| S||One of the interaction record's secondary (S) references for the protein was used to make the assignment||188284(15.5441%)<br />
|-<br />
| Y|| the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)||17931(1.4803%)<br />
|-<br />
| X||More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record||18859(1.5569%)<br />
|}<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=File:iRefIndex9_taxonomy_summary_corrected.txt&diff=4077
File:iRefIndex9 taxonomy summary corrected.txt
2012-02-16T13:42:36Z
<p>PaulBoddie: Summary of interactions by species (corrected) in iRefIndex 9.</p>
<hr />
<div>Summary of interactions by species (corrected) in iRefIndex 9.</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=File:iRefIndex9_taxonomy_summary.txt&diff=4076
File:iRefIndex9 taxonomy summary.txt
2012-02-16T13:42:09Z
<p>PaulBoddie: Summary of interactions by species in iRefIndex 9.</p>
<hr />
<div>Summary of interactions by species in iRefIndex 9.</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Statistics_iRefIndex_9.0&diff=4075
Statistics iRefIndex 9.0
2012-02-16T13:40:39Z
<p>PaulBoddie: Added taxonomy tables.</p>
<hr />
<div>== Interactions available from major taxonomies ==<br />
<br />
=== Top 15 uncorrected taxonomy groups in iRefIndex (Taxonomy identifiers as they appear in original source)===<br />
{| {{table}}<br />
| align="center" style="background:#f0f0f0;"|''' NCBI taxonomy identifier'''<br />
| align="center" style="background:#f0f0f0;"|'''Scientific_name'''<br />
| align="center" style="background:#f0f0f0;"|'''Number_of_interactions'''<br />
|-<br />
| 559292 || Saccharomyces cerevisiae S288c || 216326<br />
|-<br />
| 9606 || Homo sapiens || 145971<br />
|-<br />
| 7227 || Drosophila melanogaster || 47389<br />
|-<br />
| 4932 || Saccharomyces cerevisiae || 46966<br />
|-<br />
| 40674 || Mammalia || 36307<br />
|-<br />
| 10090 || Mus musculus || 21487<br />
|-<br />
| 83333 || Escherichia coli K-12 || 17673<br />
|-<br />
| 4896 || Schizosaccharomyces pombe || 15493<br />
|-<br />
| 6239 || Caenorhabditis elegans || 14020<br />
|-<br />
| 197 || Campylobacter jejuni || 12028<br />
|-<br />
| 3702 || Arabidopsis thaliana || 9911<br />
|-<br />
| 10116 || Rattus norvegicus || 6917<br />
|-<br />
| 562 || Escherichia coli || 5366<br />
|-<br />
| 632 || Yersinia pestis || 3823<br />
|-<br />
| 243276 || Treponema pallidum subsp. pallidum str. Nichols || 3642<br />
|}<br />
<br />
* Full list [[http://irefindex.uio.no/wikifiles//images/1/15/Interactions_by_taxonomy_beta8_original.pdf]]<br />
<br />
===Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)=== <br />
{| {{table}}<br />
| align="center" style="background:#f0f0f0;"|''' NCBI taxonomy identifier'''<br />
| align="center" style="background:#f0f0f0;"|'''Scientific_name'''<br />
| align="center" style="background:#f0f0f0;"|'''Number_of_interactions'''<br />
|-<br />
| 559292 || Saccharomyces cerevisiae S288c || 225644<br />
|-<br />
| 9606 || Homo sapiens || 156006<br />
|-<br />
| 7227 || Drosophila melanogaster || 47388<br />
|-<br />
| 10090 || Mus musculus || 18312<br />
|-<br />
| 83333 || Escherichia coli K-12 || 17408<br />
|-<br />
| 284812 || Schizosaccharomyces pombe 972h- || 15693<br />
|-<br />
| 6239 || Caenorhabditis elegans || 14020<br />
|-<br />
| 197 || Campylobacter jejuni || 12028<br />
|-<br />
| 3702 || Arabidopsis thaliana || 9911<br />
|-<br />
| 10116 || Rattus norvegicus || 5383<br />
|-<br />
| 155864 || Escherichia coli O157 H7 str. EDL933 || 4953<br />
|-<br />
| 632 || Yersinia pestis || 3823<br />
|-<br />
| 243276 || Treponema pallidum subsp. pallidum str. Nichols || 3643<br />
|-<br />
| 1148 || Synechocystis sp. PCC 6803 || 3240<br />
|-<br />
| 1392 || Bacillus anthracis || 3090<br />
|}<br />
* Full list [[http://irefindex.uio.no/wikifiles//images/c/c3/Interactions_by_taxonomy_beta8_corected.pdf]]<br />
<br />
== Interactions ==<br />
<br />
{|<br />
| BIND||62923<br />
|-<br />
| GRID||24380 || 277531<br />
|-<br />
| DIP||25785 || 39323 || 89715<br />
|-<br />
| INTACT||25260 || 38185 || 39031 || 156451<br />
|-<br />
| MINT||21992 || 41551 || 36676 || 47296 || 85755<br />
|-<br />
| HPRD||1949 || 8718 || 1124 || 5716 || 4482 || 40488<br />
|-<br />
| OPHID||2414 || 9264 || 1463 || 7578 || 6912 || 10286 || 47479<br />
|-<br />
| MPACT||6480 || 8513 || 7019 || 6231 || 6480 || 0 || 0 || 13331<br />
|-<br />
| MPPI||420 || 153 || 65 || 97 || 93 || 158 || 187 || 0 || 830<br />
|-<br />
| CORUM||263 || 199 || 116 || 248 || 119 || 246 || 237 || 0 || 15 || 2607<br />
|-<br />
| BIND_TRANSLATION||56109 || 24375 || 24813 || 24559 || 21875 || 2200 || 2722 || 6282 || 391 || 196 || 60227<br />
|-<br />
| INNATEDB||357 || 1310 || 409 || 811 || 654 || 1036 || 1222 || 0 || 52 || 82 || 419 || 7000<br />
|-<br />
| MATRIXDB||5 || 11 || 2 || 15 || 2 || 14 || 24 || 0 || 2 || 0 || 5 || 5 || 201<br />
|-<br />
| MPILIT||24 || 0 || 85 || 114 || 32 || 0 || 0 || 0 || 0 || 0 || 24 || 0 || 0 || 745<br />
|-<br />
| MPIIMEX||6 || 0 || 25 || 34 || 14 || 0 || 0 || 0 || 0 || 0 || 6 || 0 || 0 || 30 || 473<br />
|-<br />
| ||BIND||GRID||DIP||INTACT||MINT||HPRD||OPHID||MPACT||MPPI||CORUM||BIND_TRANSLATION||INNATEDB||MATRIXDB||MPILIT||MPIIMEX<br />
|-<br />
| ||(5070)||(200591)||(27502)||(83929)||(18731)||(24349)||(28544)||(1120)||(221)||(1915)||(3055)||(4240)||(156)||(536)||(399)<br />
|}<br />
<br />
== Interactors ==<br />
<br />
{|<br />
| BIND||40897<br />
|-<br />
| GRID||18036 || 34410<br />
|-<br />
| DIP||17437 || 18640 || 29961<br />
|-<br />
| INTACT||19107 || 23866 || 24585 || 53546<br />
|-<br />
| MINT||16751 || 18749 || 19671 || 25563 || 31615<br />
|-<br />
| HPRD||2920 || 5853 || 3489 || 6050 || 4539 || 9825<br />
|-<br />
| OPHID||3397 || 6004 || 4220 || 7030 || 5398 || 6228 || 9574<br />
|-<br />
| MPACT||4419 || 4625 || 4734 || 4936 || 4800 || 0 || 1 || 4979<br />
|-<br />
| MPPI||705 || 498 || 479 || 638 || 565 || 316 || 425 || 0 || 865<br />
|-<br />
| CORUM||2140 || 2449 || 2230 || 3239 || 2535 || 1859 || 2248 || 0 || 418 || 4365<br />
|-<br />
| BIND_TRANSLATION||35150 || 17491 || 16835 || 18553 || 16180 || 2957 || 3331 || 4014 || 687 || 2001 || 37247<br />
|-<br />
| INNATEDB||1685 || 2178 || 1813 || 2614 || 2137 || 1709 || 2112 || 0 || 359 || 1148 || 1687 || 3403<br />
|-<br />
| MATRIXDB||115 || 111 || 88 || 138 || 116 || 111 || 144 || 0 || 18 || 52 || 114 || 89 || 221<br />
|-<br />
| MPILIT||89 || 0 || 332 || 442 || 227 || 0 || 0 || 0 || 0 || 0 || 90 || 0 || 0 || 937<br />
|-<br />
| MPIIMEX||32 || 0 || 111 || 129 || 65 || 0 || 0 || 0 || 0 || 0 || 30 || 0 || 0 || 92 || 473<br />
|-<br />
| ||BIND||GRID||DIP||INTACT||MINT||HPRD||OPHID||MPACT||MPPI||CORUM||BIND_TRANSLATION||INNATEDB||MATRIXDB||MPILIT||MPIIMEX<br />
|-<br />
| ||(4387)||(6793)||(2208)||(16016)||(3969)||(1708)||(921)||(10)||(33)||(494)||(1401)||(244)||(34)||(366)||(282)<br />
|}<br />
<br />
== Summary of mapping interaction records to RIGs (Table 5) ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Source'''||align="center" style="background:#f0f0f0;"|'''Total records'''||align="center" style="background:#f0f0f0;"|'''Protein-only interactors'''||align="center" style="background:#f0f0f0;"|'''PPI Assigned to RIGID'''||align="center" style="background:#f0f0f0;"|'''Unique RIGIDs'''<br />
|-<br />
| bind||193648||93957||91245(97.1136%)||62923(68.9605%)<br />
|-<br />
| grid||416648||411219||410641(99.8594%)||277531(67.5848%)<br />
|-<br />
| dip||90994||90994||89910(98.8087%)||89715(99.7831%)<br />
|-<br />
| intact||184959||183032||182359(99.6323%)||156451(85.7929%)<br />
|-<br />
| mint||122775||122775||122269(99.5879%)||85755(70.1363%)<br />
|-<br />
| HPRD||83022||83022||83022(100.0000%)||40488(48.7678%)<br />
|-<br />
| ophid||73257||73257||73160(99.8676%)||47479(64.8975%)<br />
|-<br />
| MPACT||16504||16504||16296(98.7397%)||13331(81.8054%)<br />
|-<br />
| MPPI||1814||1814||1699(93.6604%)||830(48.8523%)<br />
|-<br />
| CORUM||2844||2844||2844(100.0000%)||2607(91.6667%)<br />
|-<br />
| BIND_Translation||192923||87081||83347(95.7120%)||60227(72.2605%)<br />
|-<br />
| InnateDB||14729||11476||11248(98.0132%)||7000(62.2333%)<br />
|-<br />
| MatrixDB||846||349||321(91.9771%)||201(62.6168%)<br />
|-<br />
| mpilit||745||745||745(100.0000%)||745(100.0000%)<br />
|-<br />
| mpiimex||473||473||473(100.0000%)||473(100.0000%)<br />
|-<br />
| ALL||1396181||1179542||1169579(99.1554%)||545743(46.6615%)<br />
|}<br />
<br />
== Assignment of protein interactors to ROGs (Table 3) ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Source'''||align="center" style="background:#f0f0f0;"|'''Protein_Intractors'''||align="center" style="background:#f0f0f0;"|'''Assigned'''||align="center" style="background:#f0f0f0;"|'''%'''||align="center" style="background:#f0f0f0;"|'''Arbitrary'''||align="center" style="background:#f0f0f0;"|'''N_and_Y'''||align="center" style="background:#f0f0f0;"|'''Unassigned'''||align="center" style="background:#f0f0f0;"|'''Unique proteins'''<br />
|-<br />
| bind||285482||272457||95.4375||0||9077||3930||40897<br />
|-<br />
| BIND_Translation||264346||239976||90.7810||74||15390||8902||37247<br />
|-<br />
| CORUM||12916||12909||99.9458||7||0||0||4365<br />
|-<br />
| dip||30978||29436||95.0223||609||450||483||29961<br />
|-<br />
| grid||45569||37348||81.9592||7948||15||258||34410<br />
|-<br />
| HPRD||123812||103344||83.4685||20255||213||0||9825<br />
|-<br />
| InnateDB||27209||26914||98.9158||0||0||295||3403<br />
|-<br />
| intact||154359||151337||98.0422||36||2581||405||53546<br />
|-<br />
| MatrixDB||1123||1077||95.9038||0||0||46||221<br />
|-<br />
| mint||87509||83380||95.2816||51||3933||145||31615<br />
|-<br />
| MPACT||40349||40121||99.4349||0||1||227||4979<br />
|-<br />
| mpiimex||946||946||100.0000||0||0||0||473<br />
|-<br />
| mpilit||1490||1487||99.7987||3||0||0||937<br />
|-<br />
| MPPI||3628||3456||95.2591||0||42||130||865<br />
|-<br />
| ophid||146423||145149||99.1299||265||1003||6||9574<br />
|-<br />
| All||1226139||1149359||93.7381||29248||32705||14827||97139<br />
|}<br />
<br />
== ROG summary ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Decimal_score'''||align="center" style="background:#f0f0f0;"|'''Binary_flag'''||align="center" style="background:#f0f0f0;"|'''String_score'''||align="center" style="background:#f0f0f0;"|'''Score_class'''||align="center" style="background:#f0f0f0;"|'''Proteins'''||align="center" style="background:#f0f0f0;"|'''Percentage'''||align="center" style="background:#f0f0f0;"|'''bind'''||align="center" style="background:#f0f0f0;"|'''grid'''||align="center" style="background:#f0f0f0;"|'''dip'''||align="center" style="background:#f0f0f0;"|'''intact'''||align="center" style="background:#f0f0f0;"|'''mint'''||align="center" style="background:#f0f0f0;"|'''mpiimex'''||align="center" style="background:#f0f0f0;"|'''mpilit'''||align="center" style="background:#f0f0f0;"|'''HPRD'''||align="center" style="background:#f0f0f0;"|'''ophid'''||align="center" style="background:#f0f0f0;"|'''InnateDB'''||align="center" style="background:#f0f0f0;"|'''MatrixDB'''||align="center" style="background:#f0f0f0;"|'''MPACT'''||align="center" style="background:#f0f0f0;"|'''BIND_Translation'''||align="center" style="background:#f0f0f0;"|'''MPPI'''||align="center" style="background:#f0f0f0;"|'''CORUM'''<br />
|-<br />
| 786||000000001100010010||STO+||-1||8850||0.7218%||0||0||0||0||0||0||0||8850||0||0||0||0||0||0||0<br />
|-<br />
| 1938||000000011110010010||STMOX+||-1||29||0.0024%||0||0||0||0||0||0||0||29||0||0||0||0||0||0||0<br />
|-<br />
| 898||000000001110000010||SMO+||-1||21||0.0017%||0||0||0||17||0||0||0||4||0||0||0||0||0||0||0<br />
|-<br />
| 131093||100000000000010101||PUTQ||-1||5||0.0004%||0||0||0||5||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 1922||000000011110000010||SMOX+||-1||2||0.0002%||0||0||0||2||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 914||000000001110010010||STMO+||-1||2||0.0002%||0||0||0||0||0||0||0||2||0||0||0||0||0||0||0<br />
|-<br />
| 163905||101000000001000001||PDYQ||-1||2||0.0002%||0||0||0||0||0||0||0||0||0||0||0||0||2||0||0<br />
|-<br />
| 163921||101000000001010001||PTDYQ||-1||1||0.0001%||0||0||0||0||0||0||0||0||0||0||0||0||1||0||0<br />
|-<br />
| 218370||110101010100000010||SXLENQ+||-1||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 1||000000000000000001||P||1||745949||60.8372%||155411||31550||0||150559||49666||932||1400||0||124701||26914||828||0||188079||3021||12888<br />
|-<br />
| 2||000000000000000010||S||1||36416||2.9700%||0||65||22378||13||267||0||0||13128||0||0||0||0||565||0||0<br />
|-<br />
| 131201||100000000010000001||PMQ||1||24630||2.0087%||0||0||0||0||0||0||0||0||0||0||0||0||24630||0||0<br />
|-<br />
| 554||000000001000101010||SVGO||1||17303||1.4112%||0||0||0||0||0||0||0||17303||0||0||0||0||0||0||0<br />
|-<br />
| 8194||000010000000000010||SI||1||12319||1.0047%||12319||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 65||000000000001000001||PD||1||7080||0.5774%||7079||0||0||0||1||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 130||000000000010000010||SM||1||6593||0.5377%||0||0||0||0||0||0||0||6593||0||0||0||0||0||0||0<br />
|-<br />
| 41||000000000000101001||PVG||1||2223||0.1813%||0||2223||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 42||000000000000101010||SVG||1||1108||0.0904%||0||0||122||0||0||0||0||986||0||0||0||0||0||0||0<br />
|-<br />
| 129||000000000010000001||PM||1||714||0.0582%||468||0||0||77||0||0||0||0||0||0||137||0||0||32||0<br />
|-<br />
| 139265||100010000000000001||PIQ||1||372||0.0303%||0||0||0||0||0||0||0||0||0||0||0||0||372||0||0<br />
|-<br />
| 10||000000000000001010||SV||1||43||0.0035%||0||0||5||3||35||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 8193||000010000000000001||PI||1||35||0.0029%||0||0||0||27||8||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 66||000000000001000010||SD||1||22||0.0018%||0||4||0||0||0||0||0||0||0||0||18||0||0||0||0<br />
|-<br />
| 9||000000000000001001||PV||1||5||0.0004%||0||0||0||0||5||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 5||000000000000000101||PU||2||21909||1.7868%||0||0||0||289||253||9||7||0||20314||0||0||10||684||322||21<br />
|-<br />
| 16386||000100000000000010||SE||2||4888||0.3986%||4888||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 770||000000001100000010||SO+||2||3478||0.2837%||0||0||0||0||0||0||0||3478||0||0||0||0||0||0||0<br />
|-<br />
| 147458||100100000000000010||SEQ||2||2242||0.1829%||0||0||0||4||0||0||0||0||0||0||0||0||2238||0||0<br />
|-<br />
| 6||000000000000000110||SU||2||194||0.0158%||0||1||147||27||5||0||0||13||0||0||0||0||1||0||0<br />
|-<br />
| 16385||000100000000000001||PE||2||156||0.0127%||0||0||0||147||9||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 147457||100100000000000001||PEQ||2||55||0.0045%||0||0||0||0||0||0||0||0||0||0||0||0||55||0||0<br />
|-<br />
| 773||000000001100000101||PUO+||2||21||0.0017%||0||0||0||8||2||0||0||0||11||0||0||0||0||0||0<br />
|-<br />
| 1797||000000011100000101||PUOX+||2||4||0.0003%||0||0||0||4||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 16514||000100000010000010||SME||2||3||0.0002%||3||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 774||000000001100000110||SUO+||2||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 17||000000000000010001||PT||3||156757||12.7846%||87968||3505||0||19||32901||4||79||0||118||0||2||30590||1524||47||0<br />
|-<br />
| 18||000000000000010010||ST||3||46525||3.7944%||0||0||6773||1||18||0||0||32718||0||0||0||6994||21||0||0<br />
|-<br />
| 146||000000000010010010||STM||3||16664||1.3591%||0||0||0||0||0||0||0||16664||0||0||0||0||0||0||0<br />
|-<br />
| 131217||100000000010010001||PTMQ||3||4257||0.3472%||0||0||0||0||0||0||0||0||0||0||0||0||4257||0||0<br />
|-<br />
| 81||000000000001010001||PTD||3||2567||0.2094%||2472||0||0||3||1||0||0||0||0||0||91||0||0||0||0<br />
|-<br />
| 8210||000010000000010010||STI||3||872||0.0711%||872||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 145||000000000010010001||PTM||3||171||0.0139%||137||0||0||0||0||0||0||0||0||0||0||0||0||34||0<br />
|-<br />
| 163985||101000000010010001||PTMYQ||3||52||0.0042%||0||0||0||0||0||0||0||0||0||0||0||0||52||0||0<br />
|-<br />
| 16530||000100000010010010||STME||3||13||0.0011%||13||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 8209||000010000000010001||PTI||3||13||0.0011%||0||0||0||13||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 82||000000000001010010||STD||3||10||0.0008%||0||0||0||0||9||0||0||0||0||0||1||0||0||0||0<br />
|-<br />
| 139281||100010000000010001||PTIQ||3||7||0.0006%||0||0||0||0||0||0||0||0||0||0||0||0||7||0||0<br />
|-<br />
| 26||000000000000011010||SVT||3||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 16402||000100000000010010||STE||4||828||0.0675%||827||0||1||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 147474||100100000000010010||STEQ||4||411||0.0335%||0||0||0||2||0||0||0||0||0||0||0||0||409||0||0<br />
|-<br />
| 22||000000000000010110||SUT||4||144||0.0117%||0||0||10||0||0||0||0||134||0||0||0||0||0||0||0<br />
|-<br />
| 790||000000001100010110||SUTO+||4||47||0.0038%||0||0||0||18||27||0||0||2||0||0||0||0||0||0||0<br />
|-<br />
| 789||000000001100010101||PUTO+||4||32||0.0026%||0||0||0||27||5||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 16401||000100000000010001||PTE||4||2||0.0002%||0||0||0||2||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 5378||000001010100000010||SXL+||5||18721||1.5268%||0||0||0||14||1||0||0||18706||0||0||0||0||0||0||0<br />
|-<br />
| 131073||100000000000000001||PQ||5||16324||1.3313%||0||0||0||6||0||0||0||0||0||0||0||0||16318||0||0<br />
|-<br />
| 4393||000001000100101001||PVGL+||5||7931||0.6468%||0||7931||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 810||000000001100101010||SVGO+||5||3440||0.2806%||0||0||0||0||0||0||0||3440||0||0||0||0||0||0||0<br />
|-<br />
| 21||000000000000010101||PUT||5||2721||0.2219%||0||0||0||15||168||1||1||0||5||0||0||2527||4||0||0<br />
|-<br />
| 4394||000001000100101010||SVGL+||5||1650||0.1346%||0||0||112||0||0||0||0||1538||0||0||0||0||0||0||0<br />
|-<br />
| 131089||100000000000010001||PTQ||5||859||0.0701%||0||0||0||47||0||0||0||0||0||0||0||0||812||0||0<br />
|-<br />
| 4354||000001000100000010||SL+||5||493||0.0402%||0||17||474||2||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 4357||000001000100000101||PUL+||5||241||0.0197%||0||0||0||0||0||0||3||0||222||0||0||0||9||0||7<br />
|-<br />
| 4373||000001000100010101||PUTL+||5||74||0.0060%||0||0||0||8||3||0||0||0||4||0||0||0||59||0||0<br />
|-<br />
| 5381||000001010100000101||PUXL+||5||55||0.0045%||0||0||0||11||5||0||0||0||39||0||0||0||0||0||0<br />
|-<br />
| 5386||000001010100001010||SVXL+||5||43||0.0035%||0||0||0||1||42||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 4374||000001000100010110||SUTL+||5||30||0.0024%||0||0||17||0||0||0||0||7||0||0||0||0||6||0||0<br />
|-<br />
| 4358||000001000100000110||SUL+||5||6||0.0005%||0||0||6||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 5382||000001010100000110||SUXL+||5||4||0.0003%||0||0||0||0||0||0||0||4||0||0||0||0||0||0||0<br />
|-<br />
| 32769||001000000000000001||PY||6||16102||1.3132%||3687||12||0||1963||3392||0||0||0||750||0||0||0||6293||5||0<br />
|-<br />
| 65601||010000000001000001||PDN||6||8727||0.7117%||52||0||0||2||247||0||0||0||253||0||0||0||8168||5||0<br />
|-<br />
| 81922||010100000000000010||SEN||6||4421||0.3606%||4421||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 65537||010000000000000001||PN||6||970||0.0791%||35||0||190||299||256||0||0||179||0||0||0||0||0||11||0<br />
|-<br />
| 32833||001000000001000001||PDY||6||773||0.0630%||773||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32770||001000000000000010||SY||6||427||0.0348%||0||3||258||92||28||0||0||0||0||0||0||0||46||0||0<br />
|-<br />
| 163969||101000000010000001||PMYQ||6||402||0.0328%||0||0||0||0||0||0||0||0||0||0||0||0||402||0||0<br />
|-<br />
| 212993||110100000000000001||PENQ||6||293||0.0239%||0||0||0||0||0||0||0||0||0||0||0||0||293||0||0<br />
|-<br />
| 73729||010010000000000001||PIN||6||204||0.0166%||0||0||0||204||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32785||001000000000010001||PTY||6||164||0.0134%||93||0||0||14||0||0||0||0||0||0||0||0||57||0||0<br />
|-<br />
| 65553||010000000000010001||PTN||6||38||0.0031%||0||0||0||4||0||0||0||34||0||0||0||0||0||0||0<br />
|-<br />
| 196609||110000000000000001||PNQ||6||31||0.0025%||0||0||0||0||0||0||0||0||0||0||0||0||31||0||0<br />
|-<br />
| 81921||010100000000000001||PEN||6||29||0.0024%||0||0||0||1||10||0||0||0||0||0||0||0||0||18||0<br />
|-<br />
| 65617||010000000001010001||PTDN||6||23||0.0019%||0||0||0||0||0||0||0||0||0||0||0||0||23||0||0<br />
|-<br />
| 196625||110000000000010001||PTNQ||6||22||0.0018%||0||0||0||0||0||0||0||0||0||0||0||0||22||0||0<br />
|-<br />
| 81938||010100000000010010||STEN||6||14||0.0011%||14||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32786||001000000000010010||STY||6||3||0.0002%||0||0||2||0||0||0||0||0||0||0||0||1||0||0||0<br />
|-<br />
| 32897||001000000010000001||PMY||6||2||0.0002%||0||0||0||0||0||0||0||0||0||0||0||0||0||2||0<br />
|-<br />
| 81986||010100000001000010||SDEN||6||1||0.0001%||1||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 32913||001000000010010001||PTMY||6||1||0.0001%||0||0||0||0||0||0||0||0||0||0||0||0||0||1||0<br />
|-<br />
| 163857||101000000000010001||PTYQ||6||1||0.0001%||0||0||0||1||0||0||0||0||0||0||0||0||0||0||0<br />
|-<br />
| 40978||001010000000010010||STIY||6||1||0.0001%||1||0||0||0||0||0||0||0||0||0||0||0||0||0||0<br />
|}<br />
<br />
== Scores (Table 2) ==<br />
<br />
{|<br />
| align="center" style="background:#f0f0f0;"|'''Character'''||align="center" style="background:#f0f0f0;"|'''Description of feature (when the value is 1)'''||align="center" style="background:#f0f0f0;"|'''Frequency'''<br />
|-<br />
| D||The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.||19206(1.5856%)<br />
|-<br />
| E||The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.||13357(1.1027%)<br />
|-<br />
| G||The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.||33655(2.7784%)<br />
|-<br />
| L||More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)||29249(2.4147%)<br />
|-<br />
| M||The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.||53556(4.4214%)<br />
|-<br />
| +||More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).||45176(3.7296%)<br />
|-<br />
| N||The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.||14774(1.2197%)<br />
|-<br />
| O||More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.||33230(2.7434%)<br />
|-<br />
| I||The protein reference used was an NCBI GenInfo Identifier (I).||13823(1.1412%)<br />
|-<br />
| U||The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.||25488(2.1042%)<br />
|-<br />
| T||The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made||242211(19.9961%)<br />
|-<br />
| V||The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.||33747(2.786%)<br />
|-<br />
| Q||The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.||49967(4.1251%)<br />
|-<br />
| P||The interaction record's primary (P) reference for the protein was used to make the assignment||1023006(84.4559%)<br />
|-<br />
| S||One of the interaction record's secondary (S) references for the protein was used to make the assignment||188284(15.5441%)<br />
|-<br />
| Y|| the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)||17931(1.4803%)<br />
|-<br />
| X||More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record||18859(1.5569%)<br />
|}<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex_Output_and_Statistics&diff=4074
iRefIndex Output and Statistics
2012-02-16T13:34:04Z
<p>PaulBoddie: /* Creating Additional Statistics */ Changed a parameter name.</p>
<hr />
<div>The production of output and statistics involves two separate programs: PSI_MI_TAB_Maker and BioPSI_Suplimenter.<br />
<br />
== PSI-MI Controlled Vocabulary Mapping ==<br />
<br />
{{Note|<br />
'''Sabry to help document this part.'''<br />
}}<br />
<br />
In the data maintained by iRefIndex, various controlled vocabulary terms are used which do not match genuine terms defined in the molecular interaction ontology. As a result, a process is followed involving the extraction of such unrecognised terms, the curation of a mapping to replacement terms, and the processing of the maintained data to use the replacement terms.<br />
<br />
Currently, the curation process is performed by assembling the unrecognised terms in a spreadsheet which is then modified, adding suggested replacements alongside the existing terms.<br />
<br />
=== Creating a Mapping Wiki Page ===<br />
<br />
A page summarising the mapping of unrecognised terms to known terms should be prepared such as the [[Mapping of terms to MI term ids - iRefIndex 8.0]] page.<br />
<br />
The <tt>cv2wiki.py</tt> script needs to be obtained. Get the program's source code from this location:<br />
<br />
* https://hfaistos.uio.no/cgi-bin/viewvc.cgi/bioscape/bioscape/modules/interaction/Sabry/cv2wiki.py<br />
<br />
Using CVS with the appropriate <tt>CVSROOT</tt> setting, run the following command:<br />
<br />
cvs co bioscape/bioscape/modules/interaction/Sabry/cv2wiki.py<br />
<br />
The <tt>CVSROOT</tt> environment variable should be set to the following for this to work:<br />
<br />
export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot<br />
<br />
(The <tt><username></tt> should be replaced with your actual username.)<br />
<br />
==== Running the Program ====<br />
<br />
The program must be run on comma-separated value files exported from the curation spreadsheet. First, the "interaction type" and "interaction detection method" sheets must be individually exported using the following settings:<br />
<br />
* The field delimiter is defined to be the comma (<tt>,</tt>) character<br />
* Field quoting is done using the double-quote (<tt>"</tt>) character<br />
<br />
With exported files defined, for example, as <tt>cv_int_type.csv</tt> and <tt>cv_int_det_method.csv</tt> for the "interaction type" and "interaction detection method" sheet files respectively, the following command can then be run:<br />
<br />
python cv2wiki.py cv_int_type.csv cv_int_det_method.csv MediaWiki<br />
<br />
The output in the above example will be written to standard output (the terminal/console). To write to a file, add a filename as an argument to the program. For example:<br />
<br />
python cv2wiki.py cv_int_type.csv cv_int_det_method.csv MediaWiki CVMapping<br />
<br />
This file could potentially be uploaded to the Wiki using a tool suitable for this purpose. For a Wiki such as MoinMoin (also supported by the program), the file could potentially be copied into place with a certain degree of care.<br />
<br />
== Building PSI_MI_TAB_Maker ==<br />
<br />
The <tt>PSI_MI_TAB_Maker.jar</tt> file needs to be obtained or built.<br />
<br />
<ol><br />
<li>Get the program's source code from this location:<br />
<br />
<ul><br />
<li>https://hfaistos.uio.no/cgi-bin/viewvc.cgi/bioscape/bioscape/modules/interaction/Sabry/PSI_MI_TAB_Maker/</li><br />
</ul><br />
<br />
Using CVS with the appropriate <tt>CVSROOT</tt> setting, run the following command:<br />
<br />
<pre>cvs co bioscape/bioscape/modules/interaction/Sabry/PSI_MI_TAB_Maker</pre><br />
<br />
The <tt>CVSROOT</tt> environment variable should be set to the following for this to work:<br />
<br />
<pre>export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot</pre><br />
<br />
(The <tt><username></tt> should be replaced with your actual username.)</li><br />
<br />
<li>Obtain the program's dependencies. This program uses the MySQL Connector/J library which can be found at the following location:<br />
<br />
<ul><br />
<li>http://www.mysql.com/products/connector/j/</li><br />
</ul></li><br />
<br />
<li>Extract the dependencies:<br />
<br />
<pre>tar zxf mysql-connector-java-5.1.6.tar.gz</pre><br />
<br />
This will produce a directory called <tt>mysql-connector-java-5.1.6</tt> containing<br />
a file called <tt>mysql-connector-java-5.1.6-bin.jar</tt> which should be placed in<br />
the <tt>lib</tt> directory in the <tt>PSI_Writer</tt> directory...<br />
<br />
<pre><br />
mkdir lib<br />
cp mysql-connector-java-5.1.6/mysql-connector-java-5.1.6-bin.jar lib/</pre><br />
<br />
You may instead choose to copy the library from the <tt>BioPSI_Suplimenter/lib</tt> directory:<br />
<br />
<pre><br />
mkdir lib<br />
cp ../BioPSI_Suplimenter/lib/mysql-connector-java-5.1.6-bin.jar lib/</pre><br />
<br />
The filenames in the above example will need adjusting, depending on the exact version of the library downloaded.<br />
<br />
The <tt>SHA.jar</tt> file needs copying from its build location:<br />
<br />
<pre>cp ../SHA/dist/SHA.jar lib/</pre><br />
<br />
Alternatively, the external libraries can also be found in the following location:<br />
<br />
<pre>/biotek/dias/donaldson3/iRefIndex/External_libraries</pre></li><br />
<br />
<li>Compile the source code. In order to build the software on a computer which does not have the NetBeans IDE installed, copy the generic build file into the <tt>PSI_MI_TAB_Maker</tt> directory:<br />
<br />
<pre>cp Build_files/build.xml .</pre><br />
<br />
Compile and create the <tt>.jar</tt> file as follows:<br />
<br />
<pre>ant jar</pre></li><br />
</ol><br />
<br />
== Preparing the MITAB Tables ==<br />
<br />
Before we can run the PSI_MI_TAB_Maker program, mapping and MITAB-related tables must be created in the database.<br />
<br />
{{Note|<br />
The tables described here are prepared by the <tt>cv.no.uio.biotek.Preprocess</tt> class in a step which will be documented later.<br />
}}<br />
<br />
=== Obtaining the SQL Scripts ===<br />
<br />
Get the scripts from this location:<br />
<br />
<ul><br />
<li>https://hfaistos.uio.no/cgi-bin/viewvc.cgi/bioscape/bioscape/modules/interaction/Sabry/SQL_commands/</li><br />
</ul><br />
<br />
Using CVS with the appropriate <tt>CVSROOT</tt> setting, run the following command:<br />
<br />
<pre>cvs co bioscape/bioscape/modules/interaction/Sabry/SQL_commands</pre><br />
<br />
The <tt>CVSROOT</tt> environment variable should be set to the following for this to work:<br />
<br />
<pre>export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot</pre><br />
<br />
(The <tt><username></tt> should be replaced with your actual username.)<br />
<br />
=== Preparing the SQL Scripts ===<br />
<br />
The mapping tables script first needs to be parameterised before being used:<br />
<br />
sed -e 's/<old_db>/<actual_irefindex_db>/g' make_mapping_tables_for_output.sql > make_mapping_tables_for_output_specific.sql<br />
<br />
=== Running the SQL Scripts ===<br />
<br />
In the <tt>SQL_commands</tt> directory, one script (<tt>preprocess_for_output.sql</tt>) provides the basis for all data output, whereas two other scripts (<tt>preprocess_for_mitab.sql</tt> and <tt>TAB_MAKE.sql</tt>) together provide a large number of SQL statements for the creation of MITAB-related tables. The first two of these scripts should be run as follows and any error conditions noted:<br />
<br />
mysql -h <hostname> -u <username> -p -A -D <database> < make_mapping_tables_for_output_specific.sql<br />
mysql -h <hostname> -u <username> -p -A -D <database> < preprocess_for_output.sql<br />
mysql -h <hostname> -u <username> -p -A -D <database> < preprocess_for_mitab.sql<br />
<br />
The final script can then be run if no errors were experienced:<br />
<br />
mysql -h <hostname> -u <username> -p -A -D <database> < TAB_MAKE.sql<br />
<br />
== Running PSI_MI_TAB_Maker ==<br />
<br />
Run the program as follows:<br />
<br />
java -jar -Xms256m -Xmx8192m build/jar/PSI_MI_TAB_Maker.jar <config filename><br />
<br />
A sample configuration file is located in the <tt>config</tt> directory. It can be copied, modified and supplied to the program.<br />
<br />
== Running BioPSI_Suplimenter to Produce Statistics ==<br />
<br />
This program was already built and run in the [[iRefIndex Build Process]]. For a completed build process it '''should not''' need to be run again, but an option does exist to explicitly produce statistics for the system, should this be required. The basic details of running the program are described in [[iRefIndex Build Process#Running_BioPSI_Suplimenter|Running BioPSI_Suplimenter]].<br />
<br />
=== Create reports ===<br />
<br />
Upon selecting this option in the running program, complete the fields as previously described. The reports will be written to the designated log file directory.<br />
<br />
== Creating a Statistics Wiki Page ==<br />
<br />
The <tt>reports2wiki.py</tt> file needs to be obtained. Get the program's source code from this location:<br />
<br />
<ul><br />
<li>https://hfaistos.uio.no/cgi-bin/viewvc.cgi/bioscape/bioscape/modules/interaction/Sabry/reports2wiki.py</li><br />
</ul><br />
<br />
Using CVS with the appropriate <tt>CVSROOT</tt> setting, run the following command:<br />
<br />
<pre>cvs co bioscape/bioscape/modules/interaction/Sabry/reports2wiki.py</pre><br />
<br />
The <tt>CVSROOT</tt> environment variable should be set to the following for this to work:<br />
<br />
<pre>export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot</pre><br />
<br />
(The <tt><username></tt> should be replaced with your actual username.)<br />
<br />
=== Running the Program ===<br />
<br />
The program can be run on the report files in the log file directory as follows:<br />
<br />
<pre><br />
python reports2wiki.py /home/irefindex/output/Suplimenter03052009 MediaWiki<br />
</pre><br />
<br />
The "prefix" of the report files should be the common part of all such files such that...<br />
<br />
<pre><br />
ls /home/irefindex/output/Suplimenter03052009*<br />
</pre><br />
<br />
...should list the report files (and log files) produced by iRefIndex.<br />
<br />
The output in the above example will be written to standard output (the terminal/console). To write to a file, add a filename as an argument to the program. For example:<br />
<br />
<pre><br />
python reports2wiki.py /home/irefindex/output/Suplimenter03052009 MediaWiki Statistics_iRefIndex_3.0<br />
</pre><br />
<br />
This file could potentially be uploaded to the Wiki using a tool suitable for this purpose. For a Wiki such as MoinMoin (also supported by the program), the file could potentially be copied into place with a certain degree of care.<br />
<br />
=== Creating Additional Statistics ===<br />
<br />
The <tt>make_taxonomy_summary.sql</tt> script in the <tt>SQL_commands</tt> directory can be used to generate a table of interactions by species:<br />
<br />
mysql -h <hostname> -u <username> -p -A -D <database> < make_taxonomy_summary.sql > taxonomy_summary.txt<br />
<br />
This file can be incorporated into the statistics page, at least in part, and otherwise published in full.<br />
<br />
== Creating Other Mapping Files ==<br />
<br />
In addition to the MITAB data, a mapping file should be generated using a script in the <tt>SQL_commands</tt> directory. First prepare the script, substituting a real filesystem path for <tt>&lt;actual_mapping_file&gt;</tt>:<br />
<br />
sed -e 's/<mapping_file>/<actual_mapping_file>/g' mapper_tables.sql > mapper_tables_specific.sql<br />
<br />
You will need to "escape" various characters. For example:<br />
<br />
sed -e 's/<mapping_file>/\/home\/irefindex\/output\/mappings.txt/g' mapper_tables.sql > mapper_tables_specific.sql<br />
<br />
Then execute the script as follows:<br />
<br />
mysql -h <hostname> -u <username> -p -A -D <database> < mapper_tables_specific.sql<br />
<br />
This will write a file to the specified location.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex_Output_and_Statistics&diff=4073
iRefIndex Output and Statistics
2012-02-16T13:33:42Z
<p>PaulBoddie: /* Creating a Statistics Wiki Page */ Added taxonomy summary note.</p>
<hr />
<div>The production of output and statistics involves two separate programs: PSI_MI_TAB_Maker and BioPSI_Suplimenter.<br />
<br />
== PSI-MI Controlled Vocabulary Mapping ==<br />
<br />
{{Note|<br />
'''Sabry to help document this part.'''<br />
}}<br />
<br />
In the data maintained by iRefIndex, various controlled vocabulary terms are used which do not match genuine terms defined in the molecular interaction ontology. As a result, a process is followed involving the extraction of such unrecognised terms, the curation of a mapping to replacement terms, and the processing of the maintained data to use the replacement terms.<br />
<br />
Currently, the curation process is performed by assembling the unrecognised terms in a spreadsheet which is then modified, adding suggested replacements alongside the existing terms.<br />
<br />
=== Creating a Mapping Wiki Page ===<br />
<br />
A page summarising the mapping of unrecognised terms to known terms should be prepared such as the [[Mapping of terms to MI term ids - iRefIndex 8.0]] page.<br />
<br />
The <tt>cv2wiki.py</tt> script needs to be obtained. Get the program's source code from this location:<br />
<br />
* https://hfaistos.uio.no/cgi-bin/viewvc.cgi/bioscape/bioscape/modules/interaction/Sabry/cv2wiki.py<br />
<br />
Using CVS with the appropriate <tt>CVSROOT</tt> setting, run the following command:<br />
<br />
cvs co bioscape/bioscape/modules/interaction/Sabry/cv2wiki.py<br />
<br />
The <tt>CVSROOT</tt> environment variable should be set to the following for this to work:<br />
<br />
export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot<br />
<br />
(The <tt><username></tt> should be replaced with your actual username.)<br />
<br />
==== Running the Program ====<br />
<br />
The program must be run on comma-separated value files exported from the curation spreadsheet. First, the "interaction type" and "interaction detection method" sheets must be individually exported using the following settings:<br />
<br />
* The field delimiter is defined to be the comma (<tt>,</tt>) character<br />
* Field quoting is done using the double-quote (<tt>"</tt>) character<br />
<br />
With exported files defined, for example, as <tt>cv_int_type.csv</tt> and <tt>cv_int_det_method.csv</tt> for the "interaction type" and "interaction detection method" sheet files respectively, the following command can then be run:<br />
<br />
python cv2wiki.py cv_int_type.csv cv_int_det_method.csv MediaWiki<br />
<br />
The output in the above example will be written to standard output (the terminal/console). To write to a file, add a filename as an argument to the program. For example:<br />
<br />
python cv2wiki.py cv_int_type.csv cv_int_det_method.csv MediaWiki CVMapping<br />
<br />
This file could potentially be uploaded to the Wiki using a tool suitable for this purpose. For a Wiki such as MoinMoin (also supported by the program), the file could potentially be copied into place with a certain degree of care.<br />
<br />
== Building PSI_MI_TAB_Maker ==<br />
<br />
The <tt>PSI_MI_TAB_Maker.jar</tt> file needs to be obtained or built.<br />
<br />
<ol><br />
<li>Get the program's source code from this location:<br />
<br />
<ul><br />
<li>https://hfaistos.uio.no/cgi-bin/viewvc.cgi/bioscape/bioscape/modules/interaction/Sabry/PSI_MI_TAB_Maker/</li><br />
</ul><br />
<br />
Using CVS with the appropriate <tt>CVSROOT</tt> setting, run the following command:<br />
<br />
<pre>cvs co bioscape/bioscape/modules/interaction/Sabry/PSI_MI_TAB_Maker</pre><br />
<br />
The <tt>CVSROOT</tt> environment variable should be set to the following for this to work:<br />
<br />
<pre>export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot</pre><br />
<br />
(The <tt><username></tt> should be replaced with your actual username.)</li><br />
<br />
<li>Obtain the program's dependencies. This program uses the MySQL Connector/J library which can be found at the following location:<br />
<br />
<ul><br />
<li>http://www.mysql.com/products/connector/j/</li><br />
</ul></li><br />
<br />
<li>Extract the dependencies:<br />
<br />
<pre>tar zxf mysql-connector-java-5.1.6.tar.gz</pre><br />
<br />
This will produce a directory called <tt>mysql-connector-java-5.1.6</tt> containing<br />
a file called <tt>mysql-connector-java-5.1.6-bin.jar</tt> which should be placed in<br />
the <tt>lib</tt> directory in the <tt>PSI_Writer</tt> directory...<br />
<br />
<pre><br />
mkdir lib<br />
cp mysql-connector-java-5.1.6/mysql-connector-java-5.1.6-bin.jar lib/</pre><br />
<br />
You may instead choose to copy the library from the <tt>BioPSI_Suplimenter/lib</tt> directory:<br />
<br />
<pre><br />
mkdir lib<br />
cp ../BioPSI_Suplimenter/lib/mysql-connector-java-5.1.6-bin.jar lib/</pre><br />
<br />
The filenames in the above example will need adjusting, depending on the exact version of the library downloaded.<br />
<br />
The <tt>SHA.jar</tt> file needs copying from its build location:<br />
<br />
<pre>cp ../SHA/dist/SHA.jar lib/</pre><br />
<br />
Alternatively, the external libraries can also be found in the following location:<br />
<br />
<pre>/biotek/dias/donaldson3/iRefIndex/External_libraries</pre></li><br />
<br />
<li>Compile the source code. In order to build the software on a computer which does not have the NetBeans IDE installed, copy the generic build file into the <tt>PSI_MI_TAB_Maker</tt> directory:<br />
<br />
<pre>cp Build_files/build.xml .</pre><br />
<br />
Compile and create the <tt>.jar</tt> file as follows:<br />
<br />
<pre>ant jar</pre></li><br />
</ol><br />
<br />
== Preparing the MITAB Tables ==<br />
<br />
Before we can run the PSI_MI_TAB_Maker program, mapping and MITAB-related tables must be created in the database.<br />
<br />
{{Note|<br />
The tables described here are prepared by the <tt>cv.no.uio.biotek.Preprocess</tt> class in a step which will be documented later.<br />
}}<br />
<br />
=== Obtaining the SQL Scripts ===<br />
<br />
Get the scripts from this location:<br />
<br />
<ul><br />
<li>https://hfaistos.uio.no/cgi-bin/viewvc.cgi/bioscape/bioscape/modules/interaction/Sabry/SQL_commands/</li><br />
</ul><br />
<br />
Using CVS with the appropriate <tt>CVSROOT</tt> setting, run the following command:<br />
<br />
<pre>cvs co bioscape/bioscape/modules/interaction/Sabry/SQL_commands</pre><br />
<br />
The <tt>CVSROOT</tt> environment variable should be set to the following for this to work:<br />
<br />
<pre>export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot</pre><br />
<br />
(The <tt><username></tt> should be replaced with your actual username.)<br />
<br />
=== Preparing the SQL Scripts ===<br />
<br />
The mapping tables script first needs to be parameterised before being used:<br />
<br />
sed -e 's/<old_db>/<actual_irefindex_db>/g' make_mapping_tables_for_output.sql > make_mapping_tables_for_output_specific.sql<br />
<br />
=== Running the SQL Scripts ===<br />
<br />
In the <tt>SQL_commands</tt> directory, one script (<tt>preprocess_for_output.sql</tt>) provides the basis for all data output, whereas two other scripts (<tt>preprocess_for_mitab.sql</tt> and <tt>TAB_MAKE.sql</tt>) together provide a large number of SQL statements for the creation of MITAB-related tables. The first two of these scripts should be run as follows and any error conditions noted:<br />
<br />
mysql -h <hostname> -u <username> -p -A -D <database> < make_mapping_tables_for_output_specific.sql<br />
mysql -h <hostname> -u <username> -p -A -D <database> < preprocess_for_output.sql<br />
mysql -h <hostname> -u <username> -p -A -D <database> < preprocess_for_mitab.sql<br />
<br />
The final script can then be run if no errors were experienced:<br />
<br />
mysql -h <hostname> -u <username> -p -A -D <database> < TAB_MAKE.sql<br />
<br />
== Running PSI_MI_TAB_Maker ==<br />
<br />
Run the program as follows:<br />
<br />
java -jar -Xms256m -Xmx8192m build/jar/PSI_MI_TAB_Maker.jar <config filename><br />
<br />
A sample configuration file is located in the <tt>config</tt> directory. It can be copied, modified and supplied to the program.<br />
<br />
== Running BioPSI_Suplimenter to Produce Statistics ==<br />
<br />
This program was already built and run in the [[iRefIndex Build Process]]. For a completed build process it '''should not''' need to be run again, but an option does exist to explicitly produce statistics for the system, should this be required. The basic details of running the program are described in [[iRefIndex Build Process#Running_BioPSI_Suplimenter|Running BioPSI_Suplimenter]].<br />
<br />
=== Create reports ===<br />
<br />
Upon selecting this option in the running program, complete the fields as previously described. The reports will be written to the designated log file directory.<br />
<br />
== Creating a Statistics Wiki Page ==<br />
<br />
The <tt>reports2wiki.py</tt> file needs to be obtained. Get the program's source code from this location:<br />
<br />
<ul><br />
<li>https://hfaistos.uio.no/cgi-bin/viewvc.cgi/bioscape/bioscape/modules/interaction/Sabry/reports2wiki.py</li><br />
</ul><br />
<br />
Using CVS with the appropriate <tt>CVSROOT</tt> setting, run the following command:<br />
<br />
<pre>cvs co bioscape/bioscape/modules/interaction/Sabry/reports2wiki.py</pre><br />
<br />
The <tt>CVSROOT</tt> environment variable should be set to the following for this to work:<br />
<br />
<pre>export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot</pre><br />
<br />
(The <tt><username></tt> should be replaced with your actual username.)<br />
<br />
=== Running the Program ===<br />
<br />
The program can be run on the report files in the log file directory as follows:<br />
<br />
<pre><br />
python reports2wiki.py /home/irefindex/output/Suplimenter03052009 MediaWiki<br />
</pre><br />
<br />
The "prefix" of the report files should be the common part of all such files such that...<br />
<br />
<pre><br />
ls /home/irefindex/output/Suplimenter03052009*<br />
</pre><br />
<br />
...should list the report files (and log files) produced by iRefIndex.<br />
<br />
The output in the above example will be written to standard output (the terminal/console). To write to a file, add a filename as an argument to the program. For example:<br />
<br />
<pre><br />
python reports2wiki.py /home/irefindex/output/Suplimenter03052009 MediaWiki Statistics_iRefIndex_3.0<br />
</pre><br />
<br />
This file could potentially be uploaded to the Wiki using a tool suitable for this purpose. For a Wiki such as MoinMoin (also supported by the program), the file could potentially be copied into place with a certain degree of care.<br />
<br />
=== Creating Additional Statistics ===<br />
<br />
The <tt>make_taxonomy_summary.sql</tt> script in the <tt>SQL_commands</tt> directory can be used to generate a table of interactions by species:<br />
<br />
mysql -h <host> -u <username> -p -A -D <database> < make_taxonomy_summary.sql > taxonomy_summary.txt<br />
<br />
This file can be incorporated into the statistics page, at least in part, and otherwise published in full.<br />
<br />
== Creating Other Mapping Files ==<br />
<br />
In addition to the MITAB data, a mapping file should be generated using a script in the <tt>SQL_commands</tt> directory. First prepare the script, substituting a real filesystem path for <tt>&lt;actual_mapping_file&gt;</tt>:<br />
<br />
sed -e 's/<mapping_file>/<actual_mapping_file>/g' mapper_tables.sql > mapper_tables_specific.sql<br />
<br />
You will need to "escape" various characters. For example:<br />
<br />
sed -e 's/<mapping_file>/\/home\/irefindex\/output\/mappings.txt/g' mapper_tables.sql > mapper_tables_specific.sql<br />
<br />
Then execute the script as follows:<br />
<br />
mysql -h <hostname> -u <username> -p -A -D <database> < mapper_tables_specific.sql<br />
<br />
This will write a file to the specified location.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Sources_and_Issues_Next_Release&diff=4072
Sources and Issues Next Release
2012-02-13T13:08:35Z
<p>PaulBoddie: /* Issues */ Tidying.</p>
<hr />
<div>{{Note|<br />
This is a planning template for the next release. It does not correspond to a released product.<br />
See http://irefindex.uio.no/ for the most recent release and related documentation.<br />
This page can be used to create the sources page. <br />
Check for xxx before copying and pasting to the appropriate sources page for the new release. <br />
Do not edit xxx in this page. Leave this page as a template.<br />
After making a new release page, update the general [[Sources_iRefIndex|Sources for iRefIndex]] redirect page.<br />
}}<br />
<br />
Last edited: {{REVISIONYEAR}}-{{padleft:{{REVISIONMONTH}}|2}}-{{REVISIONDAY2}}<br />
<br />
Applies to iRefIndex release: xxx<br />
<br />
Release date: xxx<br />
<br />
Authors: Ian Donaldson, Sabry Razick and Paul Boddie<br />
<br />
Database: iRefIndex (http://irefindex.uio.no)<br />
<br />
Organization: Biotechnology Centre of Oslo, University of Oslo (http://www.biotek.uio.no/)<br />
<br />
Description: This file lists interaction and protein sequence related resources used for the current build of the iRefIndex.<br />
Statistics for the iRefIndex are available and include a breakdown of interactors and interactions from each data source.<br />
*For statistics on full public dataset please refer to: http://irefindex.uio.no/wiki/Statistics_iRefIndex_xxx<br />
*For statistics on the public dataset (distributed on the FTP site contains) please refer to:http://irefindex.uio.no/wiki/Statistics_iRefIndex_free_xxx <br />
<br />
== Issues ==<br />
<br />
=== DIP interaction types ===<br />
<br />
Not all interaction types are properly extracted from DIP XML records because some are stored in an unexpected field.<br />
<br />
See [[Bugzilla:251]].<br />
<br />
=== Review interaction detection type mappings===<br />
<br />
Some InnateDB detection types have been reported to be mapped to a parent term.<br />
<br />
See [[Bugzilla:252]].<br />
<br />
=== Check that BIND small-molecules are not mis-mapped to proteins===<br />
<br />
See [[Bugzilla:253]].<br />
<br />
=== Estimate accurate time to build iRefIndex===<br />
<br />
See [[Bugzilla:254]].<br />
<br />
=== BioGRID interaction record ids (pre-build issue) ===<br />
<br />
Capture BioGRID interaction record ids so iRefWeb can link out to BioGRID.<br />
<br />
The only interaction id available from the BioGRID files are already being used and also there in the iRefWeb, such as...<br />
<br />
<primaryRef db="grid" id="103" refType="identity" refTypeAc="MI:0356" dbAc="MI:0463" /><br />
<br />
See [[Bugzilla:250]].<br />
<br />
=== MITAB/iRefScape canonicalization ===<br />
<br />
Change this to choose canonical sequence rather than longest sequence (mapping score L).<br />
Examples GeneID 84148 and 512564 unnecessarily separates Grid interaction data from interaction data from other databases.<br />
<br />
See [[Bugzilla:255]].<br />
<br />
== Build issues ==<br />
<br />
== Interaction related resources ==<br />
<br />
{| {{table}} cellpadding="10" cellspacing="0" border="1"<br />
| align="center" style="background:#f0f0f0;"|'''Source'''<br />
| align="center" style="background:#f0f0f0;"|'''Format'''<br />
| align="center" style="background:#f0f0f0;"|'''Location'''<br />
| align="center" style="background:#f0f0f0;"|'''Version (date)'''<br />
|-<br />
| BIND ||Tab-delimited text file.||ftp://ftp.bind.ca/pub/BIND/data/bindflatfiles/bindindex/ (no longer available - see below). <br />
<br />
20050525.complex2refs.txt <br />
<br />
20050525.ints.txt <br />
<br />
20050525.refs.txt <br />
<br />
20050525.complexes.txt <br />
<br />
20050525.labels.txt <br />
<br />
20050525.complex2subunits.txt <br />
<br />
These file are no longer available via ftp but are available from the authors. BIND archival content is now managed by Thomson Scientific. See http://bond.unleashedinformatics.com/ and http://bond.unleashedinformatics.com/downloads/data/BIND/ <br />
<br />
For historical purposes, a snapshot of the the Blueprint web-site may be viewed at...<br />
<br />
http://web.archive.org/web/20050204013426/www.blueprint.org/index.html<br />
<br />
...via the internet archive at...<br />
<br />
http://web.archive.org/web/*/http://www.blueprint.org<br />
<br />
| 2005-05-25<br />
|-<br />
| BIND Translation ||PSI-MI 2.5||http://download.baderlab.org/BINDTranslation/release1_0/BINDTranslation_v1_xml_AllSpecies.tar.gz ||Version 1.0 (2010-12-15)<br />
|-<br />
| BioGRID||PSI-MI 2.5||http://thebiogrid.org/downloads/archives/Release%20Archive/BIOGRID-3.1.81/BIOGRID-ALL-3.1.81.psi25.zip ||Version 3.1.81 (2011-10-01)<br />
|-<br />
| CORUM||PSI-MI 2.5||http://mips.gsf.de/genre/proj/corum/index.html<br>http://mips.gsf.de/genre/export/sites/default/corum/allComplexes.psimi.zip || 2009-12-02<br />
|-<br />
| DIP||PSI-MI 2.5||http://dip.doe-mbi.ucla.edu/dip/Download.cgi<br />
<br>dip20101010.mif25<br />
<br>Note: date on last IMEx release file is from 2008<br />
| 2010-10-10<br />
|-<br />
| HPRD ||PSI-MI 2.5||http://www.hprd.org/download<br>HPRD_PSIMI_041310.tar.gz||Release 9 (2010-04-13)<br />
|-<br />
| IntAct ||PSI-MI 2.5||ftp://ftp.ebi.ac.uk/pub/databases/intact/2011-09-29/psi25/pmidMIF25.zip|| 2011-09-29<br />
|-<br />
| MINT||PSI-MI 2.5|| ftp://mint.bio.uniroma2.it/pub/release/psi/current/psi25/pmid/|| 2010-12-21<br />
|-<br />
| MPACT||PSI-MI 2.5||ftp://ftpmips.gsf.de/yeast/PPI/mpact-complete.psi25.xml.gz || 2008-01-10<br />
|-<br />
| MPPI||PSI-MI 1.0||http://mips.gsf.de/proj/ppi/data/mppi.gz|| 2004-06-01 (from archive)<br />
|-<br />
| OPHID||PSI-MI 1.0||http://ophid.utoronto.ca/ophid/downloads.html (This service no longer available, please refer to http://ophid.utoronto.ca/ophidv2.201/)|| 2006-07-07<br />
|-<br />
| colspan="4" align="center" style="background:#f0f0f0;" | New for this release<br />
|-<br />
| InnateDB ||PSI-MI 2.5|| http://www.innatedb.com/download.jsp<br>Curated InnateDB Data ||2011-03-06<br />
|-<br />
| MPIDB||MITAB format file|| http://www.jcvi.org/mpidb (information)<br><br />
http://www.jcvi.org/mpidb/download.php (general downloads)<br><br />
http://www.jcvi.org/mpidb/interaction.php?dbsource=MPI-LIT (specific download for MPI-LIT)<br><br />
http://www.jcvi.org/mpidb/interaction.php?dbsource=MPI-IMEX (specific download for MPI-IMEX)<br />
|| Downloaded on 2011-10-03<br />
|-<br />
| MatrixDB||PSI-MI 2.5|| http://matrixdb.ibcp.fr/<br>MatrixDB_20100826.xml.zip || 2010-08-26 (timestamp)<br />
|}<br />
<br />
== Sequence related resources (not updated yet) ==<br />
<br />
{| {{table}} cellpadding="10" cellspacing="0" border="1"<br />
| align="center" style="background:#f0f0f0;"|'''Source'''<br />
| align="center" style="background:#f0f0f0;"|'''Format'''<br />
| align="center" style="background:#f0f0f0;"|'''Location'''<br />
| align="center" style="background:#f0f0f0;"|'''Version (date)'''<br />
|-<br />
| SEGUID||Tab-delimited text ||ftp://bioinformatics.anl.gov/seguid/<br>seguidannotation||2007-07-24 (timestamp)<br />
|-<br />
| UniProt||Text||http://www.uniprot.org/downloads<br>UniProtKB/Swiss-Prot (uniprot_sprot.dat.gz)<br />
| rowspan="5" | UniProt Knowledgebase Release 2011_09 (2011-09-21) (Downloaded on 2011-10-04):<br>UniProtKB/Swiss-Prot <br>UniProtKB/TrEMBL <br>(from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/reldate.txt)<br />
|-<br />
| UniProt||Text|| http://www.uniprot.org/downloads<br>UniProtKB/TrEMBL (uniprot_trembl.dat.gz)<br />
|-<br />
| UniProt, IsoForms||FASTA||http://www.uniprot.org/downloads uniprot_sprot_varsplic.fasta.gz<br />
|-<br />
| UniProt, SGD||Tab-delimited text file.||http://www.expasy.org/cgi-bin/lists?yeast.txt<br>Yeast (Saccharomyces cerevisiae): entries, gene names and cross-references to SGD<br />
|-<br />
| UniProt, FLY||Tab-delimited text file.||http://www.expasy.org/cgi-bin/lists?fly.txt<br> Drosophila: entries, gene names and cross-references to FlyBase.<br />
|-<br />
| NCBI, RefSeq||GenPept||ftp://ftp.ncbi.nih.gov/refseq/release/complete<br>see *.protein.gpff.gz files||Release 49 (2011-09-09) (Downloaded on 2011-10-04)<br>(from http://www.ncbi.nlm.nih.gov/refseq/)<br />
|-<br />
| NCBI, MMDB/PDB||Tab-delimited text ||ftp://ftp.ncbi.nih.gov/mmdb/pdbeast/table|| (Downloaded on 2011-10-04)<br />
|-<br />
| NCBI, PDB sequences||FASTA||ftp://ftp.ncbi.nih.gov/blast/db/FASTA/pdbaa.gz||(Downloaded on 2011-10-03)<br />
|-<br />
| NCBI Gene2Refseq||Tab-delimited text||ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/<br>gene2refseq.gz||(Downloaded on 2011-10-04)<br />
|}<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Donaldson_Group&diff=4052
Donaldson Group
2011-12-16T11:50:17Z
<p>PaulBoddie: </p>
<hr />
<div>__NOTOC__<br />
<br />
= The Donaldson Group at the Biotechnology Centre of Oslo =<br />
<br />
<div class="floatright"><br />
<imagemap><br />
Image:BiO-logo-liten-pms-border.png<br />
default [http://www.biotek.uio.no]<br />
</imagemap><br />
<br />
<facebook-like /><br />
</div><br />
<br />
== Research Interests ==<br />
<br />
Our primary interests include protein interaction data consolidation, text mining and data mining especially with respect to diseases. <br />
<br />
Our recent work on a consolidated protein interaction database can be found at http://irefindex.uio.no/ .<br />
<br />
== Projects ==<br />
<br />
{|class="wikitable" style="text-align:left; clear:left" border="0" cellpadding="10"<br />
<br />
|-<br />
|<imagemap><br />
Image:iRefIndex_logo.png|100x100px<br />
default [[iRefIndex]]<br />
</imagemap><br />
|<br />
=== [[iRefIndex | iRefIndex, iRefWeb, iRefScape, iRefR]] ===<br />
<br />
[[iRefIndex|http://irefindex.uio.no/]]<br/> iRefIndex (interaction Reference Index) provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.<br />
<br />
iRefIndex is available via a number of interfaces: in MITAB tab-delimited text (iRefIndex), web-site (iRefWeb), Cytoscape plugin (iRefScape) and an R package (iRefR). <br />
<br />
|-<br />
|<imagemap><br />
Image:Magrathea_logo.png|100x100px<br />
default [[Magrathea]]<br />
</imagemap><br />
|<br />
=== [[Magrathea]] ===<br />
<br />
[[Magrathea|http://magrathea.uio.no/]]<br/> Magrathea is prototype software demonstrating how animations of molecular pathways can be driven automatically using local context of the participant molecules. <br />
<br />
|-<br />
|<imagemap><br />
Image:ancientlibraryalex.jpg|100x100px<br />
default [[The Biolibrarian Proposal]]<br />
</imagemap><br />
|<br />
=== [[The Biolibrarian Proposal]] ===<br />
<br />
The Biolibrarian proposal proposes the creation of new positions at university libraries around the world. <br />
These people would act as local biocurators that help local university researchers submit data to relevant biological databases.<br />
<br />
|-<br />
|<imagemap><br />
Image:Vitruvian_man.jpg|100x150px<br />
default [[DiG:_Disease_groups]]<br />
</imagemap><br />
|<br />
=== [[DiG: Disease groups|DiG: Disease Groups]] ===<br />
<br />
[[DiG:_Disease_groups|http://donaldson.uio.no/wiki/DiG:_Disease_groups]]<br/> The Disease Groups project groups together phenotypically related disease-gene associations found in OMIM's Morbid Map. The resulting map of disease genes may be used to explore relationships between disease genes in the human protein-interactome.<br />
<br />
|-<br />
|<imagemap><br />
Image:Bioscape_logo.gif|140x140px<br />
default [[Bioscape]]<br />
</imagemap><br />
|<br />
=== [[Bioscape]] ===<br />
<br />
http://bioscape.uio.no/<br/> Bioscape is our in-house text-mining system used to locate gene and protein mentions in PubMed abstracts.<br />
|}<br />
<br />
== Group Members ==<br />
<br />
* [http://www.biotek.uio.no/english/about/organization/donaldson-group/people/iand/ Ian Donaldson]<br />
* [http://www.biotek.uio.no/english/about/organization/donaldson-group/people/paulbodd/ Paul Boddie]<br />
* [[Antonio Mora]]<br />
<br />
== Past Group Members ==<br />
* Katerina Michalickova<br />
* Hanna Nemchenko<br />
* Sabry Razick: Now in Trondheim at [http://www.ntnu.edu/employees/sabry.razick NTNU].<br />
<br />
==Local Seminar Series==<br />
<br />
The Biotechnology Centre of Oslo holds a weekly [[Bioseminar|Tuesday seminar]] at Forskningsparken, Gaustadalléen 21, Oslo.<br />
<br />
The [http://www.ifi.uio.no/research/clsi/seminars.html Computational Life Science seminars] are held every Wednesday at Ole-Johan Dahls hus, located at Gaustadalléen 23D, Oslo (opposite the Forskningsparken main entrance).<br />
<br />
==Courses==<br />
<br />
{|class="wikitable" style="text-align:left" border="0" cellpadding="10"<br />
|-<br />
|<imagemap><br />
Image:Bioinfo_course_logo.jpg|100x100px<br />
default [[Bioinformatics course]]<br />
</imagemap> ||<br />
=== [[Bioinformatics_course|Bioinformatics for molecular biology]] ===<br />
<br />
A new, two-week, intensive bioinformatics course that covers various aspects of bioinformatics analyses for molecular biology. Statistics, multiple hypothesis testing, microarray analysis, sequence alignments, working with protein structures, protein interaction networks and more. See the [[Bioinformatics course|course page]] for schedule information along with all material used in the course. The course is composed of lectures and practical tutorials. <br />
|}<br />
<br />
Introductory Perl is taught by Antonio Mora and Ian Donaldson as part of the [http://www.uio.no/studier/emner/matnat/molbio/MBV3070/ MBV3070] course. The slides for these lectures are available here at [[MBV3070|Perl lectures for MBV3070]].<br />
<!--Antonio Mora and Ian Donaldson also hold the "Applied readings in mathematics, computer science and biology" course every second Autumn term. See [http://www.uio.no/studier/emner/matnat/molbio/MBV-INF4410/ MBV-INF4410].<br />
--><br />
<br />
Ian Donaldson is organizing this year's Molecular Biotechnology Course at the Biotechnology Centre of Oslo. You can find the MBV9100 course web page [https://www.biotek.uio.no/events/courses_workshops/2011/MBV9100BTS.html here] and the latest schedule [[MBV9100|here]].<br />
<br />
==Contact==<br />
<br />
ian.donaldson at biotek.uio.no</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=README_MITAB2.6_for_iRefIndex_9.0&diff=4051
README MITAB2.6 for iRefIndex 9.0
2011-12-16T11:44:31Z
<p>PaulBoddie: /* Mapping to Legacy RIGIDs */ Added a note about the nature of the legacy mapping.</p>
<hr />
<div><div class="floatright" style="text-align: center"><br />
<br />
'''iRefIndex 9.0 Downloads'''<br />
<imagemap><br />
Image:Document-save-80x80.png<br />
default [ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/psimi_tab/MITAB2.6/]<br />
</imagemap><br />
<br />
'''Parsing MITAB Format Data'''<br />
<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[iRefIndex_MITAB2.6_Parser]]<br />
</imagemap><br />
</div><br />
<br />
Last edited: {{REVISIONYEAR}}-{{padleft:{{REVISIONMONTH}}|2}}-{{REVISIONDAY2}}<br />
<br />
Applies to iRefIndex release: 9.0<br />
<br />
Release date: 2011-11-07<br />
<br />
Download location: ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/psimi_tab/MITAB2.6/<br>(use <tt>anonymous</tt> as the login and your email address as the password)<br />
<br />
Authors: Ian Donaldson, Sabry Razick, Paul Boddie<br />
<br />
Database: iRefIndex (http://irefindex.uio.no)<br />
<br />
Organization: Biotechnology Centre of Oslo, University of Oslo <br />
(http://www.biotek.uio.no/) <br />
<br />
[[#Description|License of the source database]].<br />
<br />
== <span style="color:#0f0086"> Description </span> ==<br />
<br />
This file describes the contents of the <br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/current/psimi_tab/<br />
<br />
directory and the format of the tab-delimited text files contained within. Each index file follows the PSI-MITAB2.6 format with additional columns for annotating edges and nodes. Assignment of source interaction records to these redundant groups is described at http://irefindex.uio.no. The PSI-MI2.6 format plus additional columns is described below.<br />
<br />
A supplementary file lists just database:accession pairs for proteins and their mapping to irog, icrog and Entrez Gene identifiers. See<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/current/Mappingfiles/<br />
<br />
and README at<br />
<br />
http://irefindex.uio.no/wiki/Protein_identifier_mapping<br />
<br />
This file is precalculated from the MITAB distribution as a convenience to users.<br />
<br />
Details on the build process are available from the publication PMID 18823568.<br />
<br />
This distribution includes data consolidated using the iRefIndex method for BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPPI, MPact and OPHID.<br />
<br />
<br />
{|<br />
|Sources || http://irefindex.uio.no/wiki/Sources_iRefIndex_8.0<br />
|-<br />
|Statistics || http://irefindex.uio.no/wiki/Statistics_iRefIndex_8.0<br />
|-<br />
|Download location || ftp://ftp.no.embnet.org/irefindex/data/current/psimi_tab/<br><br />
|}<br />
<br />
== Directory contents ==<br />
<br />
{|<br />
|<tt>README</tt> ||pointer to this file<br />
|-<br />
|<tt>xxxx.mitab.mmddyyyy.txt.zip</tt> ||individual indices in PSI-MITAB2.6 format<br><br />
|}<br />
<br />
iRefIndex data is distributed as a set of tab-delimited text files with names of the form <tt>xxxx.mitab.mmddyyyy.txt.zip</tt> where <tt>mmddyyyy</tt> represents the file's creation date.<br />
<br />
The complete index is available as <tt>All.mitab.mmddyyyy.txt.zip</tt> .<br />
<br />
Taxon specific data sets are also available for:<br />
<br />
{|<br />
| ||'''Taxon Id'''<br />
|-<br />
|Homo sapiens ||9606 (human)<br />
|-<br />
|Mus musculus ||10090 (mouse)<br />
|-<br />
|Rattus norvegicus ||10116 (brown rat)<br />
|-<br />
|Caenorhabditis elegans ||6239 (nematode)<br />
|-<br />
|Drosophila melanogaster ||7227 (fruit fly)<br />
|-<br />
|Saccharomyces cerevisiae ||4932 (baker's yeast)<br />
|-<br />
|Saccharomyces cerevisiae S288c ||559292<br />
|-<br />
|Escherichia coli. ||562 (E. Coli)<br />
|-<br />
|Other ||other<br />
|-<br />
|All ||all<br />
|}<br />
<br />
Taxon specific subsets of the data are named <tt>xxxx.mitab.mmddyyyy.txt.zip</tt> where <tt>xxxx</tt> is the taxonomy identifier of at least one of the interactors according to either the source interaction database or the sequence database record. Each zip compressed file contains a single text file with the corresponding name <tt>xxxx.mitab.mmddyyyy.txt</tt>.<br />
<br />
In some cases, other objects may belong to other taxons if a virus-host interaction is being represented or if a protein from another organism has been used to model a protein in the specified organism. <br />
<br />
Taxonomy identifiers are provided in the data sets allowing these exceptions to be identified. The taxonomy identifiers listed are derived from the source protein sequence record. In some cases, this taxonomy identifier will be a child of the taxon listed in the file's title; for example, Escherichia coli K12 (taxonomy identifier 83333) will appear in the Escherichia coli (taxonomy identifier 562) file.<br />
<br />
A description of the NCBI taxon identifiers is available at the following location:<br />
<br />
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy <br />
<br />
The above data taxon division scheme leads to duplications; for instance, an interaction present in the mouse index could also appear in the human index if the interaction record lists protein sequence records from both human and mouse. The <tt>All.mitab.mmddyyyy</tt> file is a complete and non-redundant listing. <br />
<br />
The data format and divisions provided in this initial release were chosen in the hopes that they would be immediately useful to the largest possible set of users. Other formats and divisions are possible and we welcome your input on future releases.<br />
<br />
== Changes from last version ==<br />
<br />
This is the third release of iRefIndex in PSI-MITAB2.6 format.<br />
<br />
* RIGIDs in previous releases of iRefIndex were [[Bugzilla:242|incorrectly computed]]. Although the properties of such RIGIDs were not compromised - distinct RIGIDs should still have referred to distinct interactions - each RIGID made use of substantially less information from its components. RIGIDs in this release should now be computed correctly.<br />
* [[Bugzilla:245|Duplicate lines]] are now no longer produced in the MITAB output. Previously, database records containing additional information not reproduced in the MITAB output were written to the files on a record-by-record basis. However, since these individual records provide no useful additional information purely through their presence, and since the result is merely a collection of redundant records, lines which are the same as others are now filtered out when writing the MITAB files.<br />
* Many proteins previously assigned the 4932 taxonomy identifier have been [[Bugzilla:247|recategorised]] as having taxonomy identifier 559292. Thus, for convenience, an additional 559292 file is produced alongside the existing (but substantially smaller) 4932 file to hold interactions involving proteins associated with both taxons.<br />
* [[Bugzilla:248|Interactions not involving proteins associated with a specific organism]] are now excluded from organism-specific files. Note that complexes may consist of a number of lines where interactors may have a different taxonomy identifier from that of the specific file being consulted, but in such cases there will always be a member of the complex labelled with the appropriate taxonomy identifier, and thus the complex describes a "mixed species" interaction which should be retained just as binary interactions are where one participant is native to the file and the other is "foreign".<br />
* Previously, PubMed identifiers were being given as interaction detection methods for CORUM-originating interactions. This has now been [[Bugzilla:249|resolved]].<br />
<br />
References:<br />
<br />
* http://code.google.com/p/psimi/issues/detail?id=2<br />
* http://code.google.com/p/psimi/wiki/PsimiTabFormat<br />
<br />
=== Mapping to Legacy RIGIDs ===<br />
<br />
A mapping from current to legacy RIGIDs is provided on the FTP site as <tt>legacy.txt</tt> at the following location:<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/<br />
<br />
This file contains all non-canonical iRefIndex 9.0 RIGIDs mapped to legacy RIGIDs which have been computed for the interactions. Note that many of the legacy RIGIDs may not exist in previous releases of iRefIndex because other changes in the underlying data (such as taxonomy identifier changes) have occurred. Thus, even if the RIGID computation method had remained the same in iRefIndex 9.0 as in previous releases, many RIGIDs would have changed in iRefIndex 9.0 anyway.<br />
<br />
Other files are also provided to give a specific mapping from iRefIndex 9.0 to iRefIndex 8.0 and can be found at the following location:<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/Mappingfiles/<br />
<br />
== Known Issues ==<br />
<br />
* We have replaced the pipe character (<tt>|</tt>) of the PDB identifiers with an underscore character (<tt>_</tt>) <br />
<br />
This decision was taken to avoid unexpected parsing problems: the PSI-MITAB format uses pipes (<tt>|</tt>) as a separator character where multiple values occur in the same column.<br />
<br />
As a result, column number 37 (OriginalReferenceA) and column number 38 (OriginalReferenceB) may differ from the original reference in such cases.<br />
<br />
== Understanding the iRefIndex MITAB format ==<br />
<br />
iRefIndex is distributed in PSI-MITAB format. Version 2.5 of the format was originally described in PMID 17925023 ([http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2189715/?tool=pubmed full text]). This file describes the columns defined by version 2.6 of the PSI-MITAB format plus columns added by iRefIndex.<br />
<br />
Since the PSI-MITAB format allows for only two interactors to be described on each line, it is best suited for describing binary interaction data (the original experiment, say yeast two hybrid, gives a binary readout). However, other source PSI-MI XML source records will describe interactions involving only one interactor type (dimers or multimers) or they will contain associative (also known as "n-ary") interaction data from, for example, immunoprecipitation experiments where the exact interactions between any pair of interactors are unknown. These cases are problematic for the PSI-MITAB format. This document describes exactly how we use the MITAB format to describe these alternate (non-binary) interaction types.<br />
<br />
=== What each line represents ===<br />
<br />
Each line or row in the MITAB file represents a ''single'' interaction record from one primary data source describing an interaction involving the exact same set of proteins (as defined by their primary sequence and taxonomy identifiers).<br />
<br />
{{Note|Each line in this file represents a single source database record that supports either:<br />
<br />
# an interaction between two proteins (binary interaction) or<br />
# the membership of a protein in some complex (complex membership) or<br />
# an interaction that involves only one protein type (multimer or self-interaction).<br />
<br />
|Important}}<br />
<br />
<br />
Each row in this table has a natural key pointing to an original interaction record in some source database that is listed under column 14 (interactionIdentifier). For example:<br />
<br />
intact:EBI-761694<br />
<br />
{{Note|<br />
Prior to release 7.0, each line represented a ''group'' of interaction records involving the exact same set of proteins (as defined by their primary sequence and taxonomy identifiers). This ''collapsed'' or non-redundant format did not allow us to easily describe meta-data associated with each source record. Therefore, we have moved to this ''expanded'' or redundant version. Users can still collapse multiple rows that all provide evidence for an interaction between the same set of proteins using the keys provided (for example, RIGIDs).<br />
}}<br />
<br />
Rows in this table that all provide evidence for an interaction between the same set of proteins can be identified using the RIGID key (redundant interaction group identifier). The RIGID is a 27 character key that is derived from the ROGIDs of the interactors involved in the interaction record. The ROGID is a SHA-1 digest of the protein interactor's primary amino acid sequence concatenated with the NCBI taxonomy identifier (see the paper for details).<br />
<br />
{{Note|<br />
The RIGID key is now listed (by itself) in column 35 (Checksum_Interaction) as part of the new extended PSI-MITAB format. This is a universal key that can be generated by each and every interaction database and may be included in MITAB2.6 distributions from other source databases. The intention of this key is to aid third party integration of data collected from multiple databases (for example, from PSICQUIC web services). <br />
}}<br />
<br />
=== Representation of interactions ===<br />
<br />
==== Binary interaction data ====<br />
<br />
This is the most common data type.<br />
<br />
For binary interaction data, column 53 (edgetype) will contain an X. Interactors A and B will list the two proteins for which interaction evidence is provided in the row. User's should pay close attention to columns 12 (interactionType) and 7 (Method) when deciding what binary data they wish to accept as evidence of a direct physical interaction.<br />
<br />
==== Complexes (a.k.a. n-ary data) ====<br />
<br />
Certain experimental methods (like immunoprecipitations) provide evidence that a list of 3 or more proteins are associated but cannot provide evidence for a direct interaction between any given pair of proteins in that list. <br />
<br />
In these cases, interactor A (column 1) is used as a placeholder to represent the ''complex'' or ''list'' of proteins while interactor B is used to list one of the members of the list: therefore, the entire ''n-ary interaction record'' is described using one row for each interactor. Each of these rows will have the same ''interactor A''. This method of representation is referred to as a '''bi-partite model''' since there are two kinds of nodes corresponding to complexes and proteins. <br />
<br />
These interactions are marked by a C in column 53 (edgetype).<br />
<br />
As an example, let’s say that a source interaction record contained interactors A, B and C found by affinity purification and mass-spec where a tagged version of protein A was used as the bait protein to perform the immunoprecipitation. <br />
<br />
Then we would represent the complex in the MITAB file using three lines:<br />
<br />
X-A<br />
X-B<br />
X-C<br />
<br />
All three entries would have the same string in column 1 (the RIGID for the complex). All three entries would have a C in column 53 (edgetype).<br />
<br />
Other databases take an interaction record with multiple interactors (n-ary data) and make a list of binary interactions (based on the spoke or matrix model) and then list these binary interactions in the MITAB. For the example above, using a '''spoke model''' to transform the data into a set of binary interactions, these data would be represented using two lines in the MITAB file:<br />
<br />
A-B<br />
A-C<br />
<br />
Here A is chosen as the "hub" of the spoke model since it was the "bait" protein. For experimental systems that do not have "baits" and "preys" (such as X-ray crystallography), an arbitrary protein might be chosen as the bait.<br />
<br />
Alternatively, a '''matrix model''' might be used to transform the n-ary data into a list of binary interactions. Here all pairwise combinations of interactors in the original n-ary data are represented as binary interactions. So, in the above example, the immunoprecipitated complex would be represented using three lines of the MITAB file:<br />
<br />
A-B<br />
B-C<br />
A-C<br />
<br />
All three methods for representing n-ary data in a MITAB file (bi-partite, spoke, and matrix) are different representations of the same data. The model type that is chosen to describe n-ary data is listed in column 16 (expansion) of the MITAB2.6 format.<br />
<br />
We have chosen to use the bi-partite method of representation so that it is impossible to mistake spoke or matrix binary entries for true binary entries; the identifiers used for complexes will, of course, not appear in a protein database and any programme that tries to treat complex identifiers as though they were protein identifiers will fail. The method allows you to reconstruct the members of the original interaction record that describes a complex of proteins (say from an affinity purification experiment). From there, you can choose to make a spoke or matrix model by yourself if you want. <br />
<br />
Users are advised that other databases may use spoke and matrix model representations of complexes in the MITAB format. <br />
<br />
==== Intramolecular interactions and multimers ====<br />
<br />
These row types form a minority of the data and are rare incomparison to the above types.<br />
<br />
Sometimes source interaction records in PSI-MI format only list one interactor. These are cases where either<br />
<br />
<ol><br />
<li>an intra-molecular interaction is being represented or</li><br />
<li>a multimer (3 or more) of some protein is being represented.</li><br />
</ol> <br />
These records are difficult to represent in the PSI-MITAB format because PSI-MITAB requires that each row (interaction) list two interactors. <br />
We are representing these interaction records using the following format to reflect the original format provided as closely as possible.<br />
<ol><br />
<li>Interactions involving only one interactor. The uidA and uidB would be the same and the edge type would be 'Y' (column number 53 (edgetype)). Therefore, when ever there is an edge type 'Y' this means that this interaction involves only one protein (although the interaction is given as between two interactors), and thus column number 54 (numParticipants) would always be 1. For example:<br />
<pre>{A - A, edge type 'Y', numParticipants=1}</pre></li><br />
<li>When the interaction is described as involving two interactors but both of them refer to the same protein. This would be represented as a normal binary interaction and would have the edge type = 'X' (column number 53 (edgetype)), and thus column number 54 (numParticipants) would always be 2. For example:<br />
<pre>{A - A, edge type 'X', numParticipants=2}</pre></li><br />
<li>When the interaction is described as involving more than 2 interactors and all those interactors are referring to the same protein, a bi-partite representation will be used. The edge type would be 'C' (column number 53 (edgetype)). For example, with regard to complexes (a.k.a. n-ary data):<br />
<pre><br />
{C - A, edge type 'C', numParticipants=3<br />
C - A, edge type 'C', numParticipants=3<br />
C - A, edge type 'C', numParticipants=3}<br />
</pre></li><br />
</ol><br />
<br />
We draw extra attention to the fact that the RIGID (column number 35 (Checksum_Interaction)) for these interactions will be the SHA-1 digest of the ROGIDs for each of the distinct subunit types (see columns 33 (Checksum_A) and 34 (Checksum_B)). Thus interactions involving 1, 2 or more subunits of the same protein would all have the same RIGID.<br />
<br />
=== Keys for grouping together redundant interactors and interactions ===<br />
<br />
A number of keys are provided in this file to help users group together rows that all provide evidence for some kind of interaction between the same set (or a related set) of proteins. See columns 33-35 (Checksum_A, Checksum_B and Checksum_Interaction) and 43-51 (integer identifier and canonical data columns).<br />
<br />
The process of creating keys that group proteins and interactions into canonical groups was described after the original paper in the [[Canonicalization]] document. <br />
<br />
=== Provenance data ===<br />
<br />
Provenance data (where we retrieved source records from and how we mapped interactors and interactions to ROGIDs) is described in columns 37-42 (original and final references plus mapping scores).<br />
<br />
== License ==<br />
<br />
Data released on this public ftp site are released under the Creative <br />
Commons Attribution License http://creativecommons.org/licenses/by/2.5/. <br />
This means that you are free to use, modify and redistribute these data <br />
for personal or commercial use so long as you provide appropriate <br />
credit. See next section.<br />
<br />
<br />
Copyright © 2008-2011 Ian Donaldson<br />
<br />
== Citation ==<br />
<br />
Credit should include citing the iRefIndex paper (PMID 18823568) and any of the <br />
source databases upon which this resource is based. See <br />
http://irefindex.uio.no for appropriate citations.<br />
<br />
== Disclaimer ==<br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY <br />
WARRANTY; without even the implied warranty of MERCHANTABILITY or <br />
FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
== Description of PSI-MITAB2.6 file ==<br />
<br />
Each line in this file represents a single source database record that supports either:<br />
<br />
# an interaction between two proteins (binary interaction) or<br />
# the membership of a protein in some complex (complex membership) or<br />
# an interaction that involves only one protein type (multimer or self-interaction).<br />
<br />
=== Column number: 1 (uidA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Unique identifier for interactor A. <br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column contains an identifier, taken from a major database, for a protein representing the interactor A. A UniProt or a RefSeq accession is provided (in that order of preference) wherever possible. See column 3 for a list of prefixes that may be employed in this column in addition to the following:<br />
<br />
;<tt>complex</tt><br />
:If interactor A is being used to represent a complex, then the rogid for the complex will be listed here, such as the following:<br />
<br />
<pre>complex:xBr9cTXgzPLNxsaKiYyHcoEm/DM</pre><br />
<br />
[[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for an explanation.<br />
<br />
In rare cases, a rogid may appear here if a protein interactor has a sequence but no known, valid ''<tt>database:accession</tt>'' pair.<br />
<br />
=== Column number: 2 (uidB)===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Unique identifier interactor B.<br />
|-<br />
|Example: ||<pre>uniprotkb:P06722</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 1.<br />
<br />
=== Column number: 3 (altA)===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Alternative identifiers for interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367|refseq:NP_418591|entrezgene/locuslink:948691|rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333|irogid:12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
All ''<tt>database:accession</tt>'' pairs listed in Column 3 point to protein records that describe the exact same sequence from the same taxon.<br />
<br />
Each pipe-delimited entry is a database_name:accession pair delimited by a colon. Database names are taken from the MI controlled vocabulary at the following location:<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
Database references listed in this column may include the following:<br />
<br />
;<tt>uniprotkb</tt><br />
:The accessions this protein is known by in UniProt (http://www.uniprot.org/). More information regarding this protein can be retrieved using this accession from UniProt. See the AC line in the flat file. http://au.expasy.org/sprot/userman.html#AC_line. <br />
;<tt>refseq</tt><br />
:If a protein accession exists in the RefSeq data base (http://www.ncbi.nlm.nih.gov/RefSeq/) that reference is indicated here. More information about this protein can be obtained from RefSeq using this accession. <br />
;<tt>entrezgene/locuslink</tt><br />
:NCBI gene Identifiers for the gene encoding this protein. See ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq column GeneID given protein's accession.version<br />
;<tt><em>other</em></tt><br />
:If none of the three identifier types are available then other <tt><em>databasename</em>:<em>accession</em></tt> pairs will be listed. These database names may not follow the MI controlled vocabulary.<br />
<br />
Example:<br />
<br />
<pre>emb:CAA44868.1|gb:AAA23715.1|gb:AAB02995.1|emb:CAA56736.1|uniprot:P24991</pre><br />
<br />
;<tt>rogid</tt><br />
:Column 33 repeated here for convenience.<br />
<br />
;<tt>irogid</tt><br />
:Column 43 repeated here for convenience.<br />
<br />
{{Note|<br />
The rogid of a complex or a n-ary interaction is the rigid of that <br />
interaction. However, the irogid of the complex is not the irigid.<br />
The irogid for the complex is an integer and it is non-overlapping <br />
with any protein irogids<br />
}}<br />
<br />
=== Column number: 4 (altB)===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Alternative identifiers for interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P06722|refseq:NP_417308|entrezgene/locuslink:947299</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 3. (Columns 34 and 44 are related to this column.)<br />
<br />
=== Column number: 5 (aliasA) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Aliases for interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:MUTL_ECOLI|entrezgene/locuslink:mutL|crogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333|icrogid:12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each pipe-delimited entry is a <tt><em>database name</em>:<em>alias</em></tt> pair delimited by a <br />
colon. Database names are taken from the PSI-MI controlled vocabulary <br />
at the following location:<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
Database names and sources listed in this column may include the following:<br />
<br />
;<tt>uniprotkb:<em>entry name</em></tt><br />
:the entry name given by UniProt. See the description for "Entry name" in the section of http://au.expasy.org/sprot/userman.html#ID_line concerning the "ID (IDentification)" line of the flat file<br />
;<tt>entrezgene/locuslink:<em>symbol</em></tt><br />
:the NCBI gene symbol for the gene encoding this protein. See the section in ftp://ftp.ncbi.nlm.nih.gov/gene/README for <tt>gene_info</tt>, specifically details for the <tt>Symbol</tt> column<br />
;<tt>crogid</tt><br />
:Column 46 repeated here for convenience.<br />
;<tt>icrogid</tt><br />
:Column 49 repeated here for convenience.<br />
;<tt>other db:accession pairs</tt><br />
:Other db:accession pairs may be added (after icrogid) that all belong to the same canonical group. These are purely meant to facilitate look-up by PSICQUIC and other services - these sequences are related (but not identical) with interactor A sequence.<br />
;<tt>NA</tt><br />
:<tt>NA</tt> may be listed here if aliases are <em>not available</em><br />
<br />
=== Column number: 6 (aliasB) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Aliases for interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:MUTH_ECOLI|entrezgene/locuslink:mutH</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 5. (Columns 47 and 50 are related to this column.)<br />
<br />
=== Column number: 7 (Method) ===<br />
<br />
{|<br />
|Column type: ||String <br />
|-<br />
|Description: ||Interaction detection method<br />
|-<br />
|Example: ||<pre>MI:0039(2h fragment pooling)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
Only a single method will appear in this column. Previously, multiple methods appeared.<br />
}}<br />
<br />
Both the controlled vocabulary term identifier for the method (e.g. MI:0399) and the controlled vocabulary term short label in brackets (e.g. 2h fragment pooling) will appear in this column. See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to look up controlled vocabulary term identifiers.<br />
<br />
The interaction detection method is from the original record. Path for PSI-MI 2.5:<br />
<br />
<pre>entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel/</pre><br />
<br />
<br />
{{Note|<br />
If a controlled vocabulary term identifier was not provided by the source database then an attempt was made to use the supplied short label to find the correct term identifier. If a term identifier could not be found, then <tt>MI:0000</tt> will appear before the shortLabels.<br />
}}<br />
<br />
<tt>NA</tt> or <tt>-1</tt> may appear in place of a recognised shortLabel.<br />
<br />
For example:<br />
<br />
<pre><br />
MI:0000(-1)<br />
MI:0000(NA)<br />
</pre><br />
<br />
=== Column number: 8 (author) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||<br />
|-<br />
|Example: ||<pre>hall-1999-1|hall-1999-2|mansour-2001-1|mansour-2001-2|hall-1999</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
According to MITAB2.6 format this column should contain a pipe-delimited list of author surnames in which the interaction has been shown.<br />
<br />
{{Note|<br />
This column will usually include only one author name reference. However, some experimental evidences have secondary references which could be included here.<br />
This filed also includes references which are not author names as in the following examples:<br />
* OPHID Predicted Protein Interaction<br />
* HPRD Text Mining Confirmation<br />
* MINT Text Mining Confirmation<br />
}}<br />
<br />
=== Column number: 9 (pmids) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||PubMed Identifiers<br />
|-<br />
|Example: ||<pre>pubmed:9880500|pubmed:11585365</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is a non-redundant list of PubMed identifiers pointing to literature that supports the interaction. <br />
According to MITAB2.6 format, this column should contain a pipe-delimited set of <tt><em>databaseName</em>:<em>identifier</em></tt> pairs such as <tt>pubmed:12345</tt>.<br />
The source database name is always <tt>pubmed</tt>.<br />
<br />
{{Note|<br />
This column will usually include only one PubMed reference that describes where the experimental evidence is found. In some cases, secondary references are provided by the source database and will be included here.<br />
}}<br />
<br />
<br />
The special value <tt>-</tt> may appear in place of the identifiers.<br />
<br />
=== Column number: 10 (taxa) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Taxonomy identifier for canonical interactor A<br />
|-<br />
|Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre><br />
<br />
|}<br />
<br />
'''Notes'''<br />
<br />
The NCBI taxonomy identifier listed here is that of the sequence record for the interactor and may be corrected from what was provided by the source database. See the methods section of the iRefIndex paper for more details. See also the NCBI taxonomy database at the following location:<br />
<br />
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy<br />
<br />
According to the MITAB2.6 format, this column should contain a pipe delimited set of <tt><em>databaseName</em>:<em>identifier</em></tt> pairs such as <tt>taxid:12345</tt>. The source database name has been listed as taxid since it is always NCBI's taxonomy database. The value in this column will be <tt>NA</tt> if the interactor is a complex.<br />
<br />
=== Column number: 11 (taxb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Taxonomy identifier for canonical interactor B<br />
|-<br />
|Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 10.<br />
<br />
=== Column number: 12 (interactionType) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Interaction Type from controlled vocabulary or short label<br />
|-<br />
|Example: ||<pre>MI:0218(physical interaction)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
Only one interaction type will be present in each line of the file.<br />
}}<br />
<br />
The interaction type is taken from the PSI-MI controlled vocabulary and represented as...<br />
<br />
<pre>database:identifier(interaction type)</pre><br />
<br />
...(when available in the interaction record) or Path for PSI-MI 2.5:<br />
<br />
<pre>entrySet/entry/interactionList/interaction/interactionType/names/shortLabel</pre><br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for interaction types.<br />
<br />
{{Note|<br />
If the MI controlled vocabulary identifier was not provided by the source database, but a text description was provided, then an attempt was made to map the text to the correct controlled vocabulary term identifier.<br />
If this was not possible then <tt>MI:0000</tt> is listed.<br />
|Change}}<br />
<br />
<tt>NA</tt> may be listed here if the interaction type is not available (meaning that we could not find the interaction type in the record provided by the source database).<br />
<br />
=== Column number: 13 (sourcedb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Source database for this interaction record <br />
|-<br />
|Example: ||<pre>MI:0469(intact)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Taken from the PSI-MI controlled vocabulary and represented as...<br />
<br />
<pre>database:identifier(source name)</pre><br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for database sources.<br />
<br />
{{Note|<br />
Only one source database will be listed in each row.<br />
|Change}}<br />
<br />
=== Column number: 14 (interactionIdentifier) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||source interaction-database and accession<br />
|-<br />
|Example: ||<pre>intact:EBI-761694|rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA|irigid:1234|edgetype:X</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each reference is presented as a <tt><em>database name</em>:<em>identifier</em></tt> pair. <br />
<br />
{{Note|<br />
The source database is listed first. Additional information is pipe-delimited and presented here for the convenience of PSICQUIC web-service users (these services presently truncate this file at column 15 as they only support MITAB2.5). See columns 35,45,53. <br />
|Change}}<br />
<br />
The source database names that appear in this column are taken from the<br />
PSI-MI controlled vocabulary at the following location (where possible):<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
If an interaction record identifier is not provided by the source database, this entry will appear as <tt><em>database-name</em>:-</tt> with the identifier region replaced with a dash (<tt>-</tt>).<br />
<br />
=== Column number: 15 (confidence) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Confidence scores<br />
|-<br />
|Example: ||<pre>lpr:1|hpr:12|np:1|PSICQUIC entries are truncated here. See irefindex.uio.no</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each reference is presented as a <tt>''scoreName'':''score''</tt> pair. Three confidence <br />
scores are provided: <tt>lpr</tt>, <tt>hpr</tt> and <tt>np</tt>.<br />
<br />
PubMed Identifiers (PMIDs) point to literature references that support <br />
an interaction. A PMID may be used to support more than one interaction. <br />
<br />
The lpr score (lowest PMID re-use) is the lowest number of distinct <br />
interactions (RIGIDs: see column 35) that any one PMID (supporting the <br />
interaction in this row) is used to support. A value of one indicates <br />
that at least one of the PMIDs supporting this interaction has never <br />
been used to support any other interaction. This likely indicates that <br />
only one interaction was described by that reference and that the <br />
present interaction is not derived from high throughput methods.<br />
<br />
The hpr score (highest PMID re-use) is the highest number of <br />
interactions (RIGIDs: see column 35) that any one PMID (supporting the <br />
interaction in this row) is used to support. A high value (e.g. greater <br />
than 50) indicates that one PMID describes at least 50 other <br />
interactions and it is more likely that high-throughput methods were <br />
used.<br />
<br />
The np score (number PMIDs) is the total number of unique PMIDs used to <br />
support the interaction described in this row.<br />
<br />
<tt>-</tt> may appear in the score field, indicating the absence of a score value.<br />
<br />
----<br />
<br />
{{Note|<br />
COLUMNS PAST THIS POINT (16 - 31) ARE PART OF THE NEW PSI-MITAB 2.6 FORMAT<br />
|Note}}<br />
<br />
=== Column number: 16 (expansion) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Model used to convert n-ary data into binary data for purpose of export in MITAB file<br />
|-<br />
|Example: ||<pre>bipartite</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
For iRefIndex, this column will always contain either <tt>bipartite</tt> or <tt>none</tt>.<br />
<br />
Other databases may use either <tt>spoke</tt> or <tt>matrix</tt> or <tt>none</tt> in this column.<br />
<br />
See <br />
[[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for an explanation.<br />
<br />
=== Column number: 17 (biological_role_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Biological role of interactor A<br />
|-<br />
|Example: ||<pre>MI:0501(enzyme)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
When provided by the source database, this includes single entries such as <tt>MI:0501(enzyme)</tt>, <tt>MI:0502(enzyme target)</tt>, <tt>MI:0580(electron acceptor)</tt>, or <tt>MI:0499(unspecified role)</tt>.<br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to browse possible values for biological role.<br />
<br />
For complexes and when no role is specified this column will indicate an unspecified role.<br />
<br />
=== Column number: 18 (biological_role_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Biological role of interactor B<br />
|-<br />
|Example: ||<pre>MI:0501(enzyme)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 17.<br />
<br />
=== Column number: 19 (experimental_role_A) ===<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Indicates the experimental role of the interactor (such as bait or prey). <br />
|-<br />
|Example: ||<pre>MI:0496(bait)</pre><br />
|-<br />
|Example: ||<pre>MI:0498(prey)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column indicates the experimental role (if any was provided by the source database) that was played by interactor A.<br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to see definitions of bait and prey.<br />
as well as browse other possible values of experimental role that may appear in this column for other databases.<br />
<br />
For complexes and when no role is specified this column will contain the following:<br />
<br />
<pre>MI:0499(unspecified role)</pre><br />
<br />
=== Column number: 20 (experimental_role_B) ===<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Indicates the experimental role of the interactor (such as bait or prey). <br />
|-<br />
|Example: ||<pre>MI:0496(bait)</pre><br />
|-<br />
|Example: ||<pre>MI:0498(prey)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column indicates the experimental role (if any) that was played by interactor B.<br />
<br />
See notes above for column 19.<br />
<br />
=== Column number: 21 (interactor_type_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||describes the type of molecule that A is <br />
|-<br />
|Example: ||<pre>MI:0326(protein)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
For iRefIndex, this will always be one of...<br />
<br />
<pre><br />
MI:0326(protein)<br />
MI:0315(protein complex)<br />
</pre><br />
<br />
=== Column number: 22 (interactor_type_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||describes the type of molecule that B is <br />
|-<br />
|Example: ||<pre>MI:0326(protein)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See column 21.<br />
<br />
=== Column number: 23 (xrefs_A) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for molecule A <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list cross-references to annotation information for molecule A. For example, Gene Ontology identifiers or OMIM identifiers.<br />
<pre>omim:152430(longevity)|go:"GO:0016233"(telomere capping)</pre><br />
<br />
=== Column number: 24 (xrefs_B) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for molecule B <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
See notes to column 23.<br />
<br />
=== Column number: 25 (xrefs_Interaction) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for the interaction <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list cross-references to annotation information for the interaction. For example, Gene Ontology identifiers or OMIM identifiers.<br />
<pre>go:"GO:0048786"(presynaptic active zone)</pre><br />
<br />
=== Column number: 26 (Annotations_A) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Annotations for molecule A <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list free-text annotation information for the interaction. For example:<br />
<pre>This protein has an apparent MW of 25 kDa|This protein binds 7 zinc molecules</pre><br />
<br />
Some databases may use <tt>dataset:<em>*</em></tt> or <tt>data-processing:<em>*</em></tt> (where <tt>*</tt> is non-controlled free-text) in this column.<br />
<br />
=== Column number: 27 (Annotations_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Annotations for molecule B <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
See notes to column 26.<br />
<br />
=== Column number: 28 (Annotations_Interaction) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Annotations for interaction <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list free-text annotation information for the interaction. For example:<br />
<pre>figure-legend:F1A|prediction score:432|comment:prediction based on phage display consensus|author-confidence:8|comment:AD-ORFeome library used in the experiment.</pre><br />
The prefixes used before the <tt>:</tt> (like "comment") are database specific and not controlled.<br />
<br />
Some databases may use ''<tt>dataset:*</tt>'' or ''<tt>data-processing:*</tt>'' (where <tt>*</tt> is non-controlled free-text) in this column.<br />
<br />
=== Column number: 29 (Host_organism_taxid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||The taxonomy identifier of the host organism where the interaction was experimentally demonstrated<br />
|-<br />
|Example: || <pre>taxid:10090(Mus musculus)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This may differ from the taxonomy identifier associated with the interactors. Other possible entries are: <br />
<br />
* <tt>taxid:-1(in vitro)</tt><br />
* <tt>taxid:-4(in vivo)</tt><br />
<br />
A dash (<tt>-</tt>) will be used when no information about the host organism is available.<br />
<br />
<tt>taxid:32644(unidentified)</tt> will be used when the source specifies the host organism taxonomy identifier as 32644.<br />
<br />
=== Column number: 30 (parameters_Interaction) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Parameters for the interaction<br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
Internal note : use of this column is not well-defined or characterized.<br />
<br />
=== Column number: 31 (Creation_date) ===<br />
<br />
{|<br />
|Column type: ||String (yyyy/mm/dd)<br />
|-<br />
|Description: ||When was the entry created.<br />
|-<br />
|Example: || <pre>2010/05/06</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This will be the release date of iRefIndex for all entries in this file. <br />
<br />
This date will not match the date for the corresponding record in the source database.<br />
<br />
=== Column number: 32 (Update_date) ===<br />
<br />
{|<br />
|Column type: ||String (yyyy/mm/dd)<br />
|-<br />
|Description: ||When was this record last updated?<br />
|-<br />
|Example: || <pre>2010/05/06</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This will be the release date of iRefIndex for all entries in this file. <br />
<br />
This date will not match the date for the corresponding record in the source database.<br />
<br />
=== Column number: 33 (Checksum_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for interactor A. <br />
|-<br />
|Example: ||<pre>rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
This column contains a universal key for interactor A .<br />
|Note}}<br />
<br />
This column may be used to identify other interactors in this file that have the exact same amino acid sequence and taxon id. <br />
<br />
This universal key listed here is the ROGID (redundant object group identifier) described in the original iRefIndex paper, PMID 18823568. <br />
<br />
Column 3 lists database names and accessions that all have this same key. <br />
<br />
The ROGID for proteins, consists of the base-64 version of the SHA-1 key for the protein sequence concatenated with the taxonomy identifier for the protein. For complex nodes, the ROGID is calculated as the SHA-1 digest of the ROGIDs of all the protein participants (after first ordering them by ASCII-based lexicographical sorting in ascending order and concatenating them) See the iRefIndex paper for details. The SHA-1 key is always 27 characters long. So the ROGID will be composed of 27 characters concatenated with a taxonomy identifier for proteins.<br />
<br />
=== Column number: 34 (Checksum_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for interactor B. <br />
|-<br />
|Example: ||<pre>rogid:AhmYiMtz8lR12Gixt91txbAd3JY83333</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
See notes for column 33.<br />
<br />
=== Column number: 35 (Checksum_Interaction) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for this interaction<br />
|-<br />
|Example: ||<pre>rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column may be used to identify other rows (interaction records) in this file that describe interactions between the same set of proteins from the same taxon id.<br />
<br />
This universal key listed here is the RIGID (redundant interaction group identifier) described in the original iRefIndex paper, PMID 18823568. <br />
<br />
The RIGID consists of the ROG identifiers for each of the protein participants (see notes above) ordered by ASCII-based lexicographic sorting in ascending order, concatenated and then digested with the SHA-1 algorithm. See the iRefIndex paper for details. This identifier points to a set of redundant protein-protein interactions that involve the same set of proteins with the exact same primary sequences.<br />
<br />
=== Column number: 36 (Negative) ===<br />
<br />
{|<br />
|Column type: || Boolean (true or false)<br />
|-<br />
|Description: ||Does the interaction record provide evidence that some interaction does NOT occur.<br />
|-<br />
|Example: ||<pre>false</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This value will be false for all lines in this file since iRefIndex does not include "negative" interactions from any of the source databases.<br />
<br />
<hr><br />
<br />
{{Note|<br />
COLUMNS PAST THIS POINT (37 -) ARE NOT DEFINED BY THE PSI-MITAB2.6 STANDARD.<br />
THESE COLUMNS ARE SPECIFIC TO THIS IREFINDEX RELEASE AND MAY CHANGE FROM ONE RELEASE TO ANOTHER<br />
|Important}}<br />
<br />
=== Column number: 37 (OriginalReferenceA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used in the original interaction record to describe interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is the protein reference that was found in the original interaction record to describe interactor A. It is a colon-delimited pair of database name and accession. It may be either the primary or secondary reference for the protein provided by the source database.<br />
<br />
For complexes this will be the ROGID of the complex.<br />
<br />
=== Column number: 38 (OriginalReferenceB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used in the original interaction record to describe interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 37.<br />
<br />
=== Column number: 39 (FinalReferenceA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used by iRefIndex to describe interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Column 37 (OriginalReferenceA) was used by the iRefIndex consolidation process to arrive at this FinalReferenceA. <br />
This database name and accession pair will usually be the same as that listed in column 37, unless the provided reference was malformed, had to be updated or was ambiguous.<br />
<br />
Examples:<br />
<br />
# The original reference is malformed. For example: <tt>RefSeq:NP 036076</tt> instead of <tt>RefSeq:NP_036076</tt>.<br />
# The original reference is incomplete. For example: <tt>PDB:1KQ1|</tt> (missing chain information). <br />
# The original reference is deprecated. For example: <tt>UniProt:Q9H233</tt> (the value of FinalReferenceA will be the latest available accession in this case).<br />
# The original reference is ambiguous. For example: a gene identifier is provided (the value of FinalReferenceA will be a protein product selected in a systematic way in this case).<br />
<br />
=== Column number: 40 (FinalReferenceB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used by iRefIndex to describe interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 39.<br />
<br />
=== Column number: 41 (MappingScoreA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (columns 37) to the final protein reference (columns 39). <br />
|-<br />
|Example: ||<pre>PTUO+</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column contains a description of mapping operations as a condensed string of letters. See the original iRefIndex paper, PMID 18823568. <br />
For complexes, this column will contain <pre>-</pre>.<br />
<br />
=== Column number: 42 (MappingScoreB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (column 38) to the final protein reference (column 40). <br />
|-<br />
|Example: ||<pre>SU</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 41.<br />
<br />
=== Column number: 43 (irogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for interactor A. <br />
|-<br />
|Example: ||<pre>2345</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
This is an internal, integer-equivalent of the alphanumeric identifier in column 33 for interactor A. All interactors with the same sequence and taxon origin will have the same irogid.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 44 (irogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for interactor B.<br />
|-<br />
|Example: ||<pre>456543</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 43.<br />
<br />
=== Column number: 45 (irigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer RIGID for this interaction.<br />
|-<br />
|Example: ||<pre>1234</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is an internal, integer-equivalent of the alphanumeric identifier in column 35 for this interaction. All interactions involving the same interactors (same sequence and same taxon) will have the same irigid.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 46 (crogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric ROGID for the canonical group to which interactor A belongs. <br />
|-<br />
|Example: ||<pre>hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column may be used to identify other interactors in this file that all belong to the same canonical group.<br />
<br />
Members of a canonical group may include splice isoform products from the same or related genes. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the same taxon). One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in this column.<br />
<br />
See http://irefindex.uio.no/wiki/Canonicalization for a description of canonicalization.<br />
<br />
=== Column number: 47 (crogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric ROGID for the canonical group to which interactor B belongs. <br />
|-<br />
|Example: ||<pre>AhmYiMtz8lR12Gixt91txbAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 46.<br />
<br />
=== Column number: 48 (crigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric RIGID for the canonical group to which this interaction belongs. <br />
|-<br />
|Example: ||<pre>3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is the RIGID for this interaction calculated using the canonical ROGIDs (preceding two columns).<br />
<br />
This column may be used to identify other interactions in this file that all belong to the same canonical group.<br />
<br />
<br />
=== Column number: 49 (icrogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for the canonical group to which interactor A belongs. <br />
|-<br />
|Example: ||<pre>2345</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
This is an internal, integer-equivalent of the alphanumeric canonical ROGID in column 46 for interactor A. Interactors with the same icrogid may have different sequences but are related; e.g. different splice isoforms of the same gene.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 50 (icrogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for the canonical group to which interactor B belongs. <br />
|-<br />
|Example: ||<pre>456543</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 49.<br />
<br />
=== Column number: 51 (icrigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer RIGID for the canonical group to which this interaction belongs. <br />
|-<br />
|Example: ||<pre>12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is an internal, integer-equivalent of the canonical RIGID. See column 48.<br />
<br />
This integer may be used to query the iRefWeb interface for the interaction record. For example:<br />
<br />
http://wodaklab.org/iRefWeb/interaction/show/13653<br />
<br />
...where 13653 is the integer, canonical RIGID.<br />
<br />
This identifier serves to group together evidence for interactions that involve the same set (or a related set) of proteins.<br />
<br />
Starting with release 6.0, this canonical RIGID is stable from one release of iRefIndex to another.<br />
<br />
=== Column number: 52 (imex_id) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||IMEx identifier if available<br />
|-<br />
|Example: ||<pre>imex:IM-12202-3</pre><br />
|-<br />
|Example: ||<pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
When no information available a dash (<tt>-</tt>) will be used.<br />
<br />
=== Column number: 53 (edgetype) ===<br />
<br />
{|<br />
|Column type: ||Character<br />
|-<br />
|Description: ||Does the edge represent a binary interaction (X), member of complex (C) data, or a multimer (Y)?<br />
|-<br />
|Example: ||<pre>X</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Edges can be labelled as either <tt>X</tt>, <tt>C</tt> or <tt>Y</tt>:<br />
<br />
;<tt>X</tt><br />
:a binary interaction with two protein participants<br />
<br />
;<tt>C</tt><br />
:denotes that this edge is a binary expansion of interaction record that had 3 or more interactors (so-called "complex" or "n-ary" data). The expansion type is described in column 16 (expansion). In the case of iRefIndex, the expansion is always "bipartite" meaning that Interactor A of this row represents the complex itself and Interactor B represents a protein that is a member of this group.<br />
See [[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for further explanation.<br />
<br />
;<tt>Y</tt><br />
:for dimers and polymers. In case of dimers and polymers when the number of subunits is not described in the original interaction record, the edge is labelled with a <tt>Y</tt>. Interactor A will be identical to the Interactor B. The graphical representation of this will appear as a single node connected to itself (loop). The actual number of self-interacting subunits may be 2 (dimer) or more (say 5 for a pentamer). Refer to the original interaction record for more details and see column 54.<br />
<br />
=== Column number: 54 (numParticipants) ===<br />
<br />
{|<br />
|Column type: ||Integer<br />
|-<br />
|Description: ||Number of participants in the interaction<br />
|-<br />
|Example: ||<pre>2</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
* For edges labelled <tt>X</tt> (see column 53) this value will be two. <br />
* For edges labelled <tt>C</tt>, this value will be equivalent to the number of protein interactors in the original n-ary interaction record.<br />
* For interactions labelled <tt>Y</tt>, this value will either be the number of self-interacting subunits (if present in the original interaction record) or 1 where the exact number of subunits is unknown or unspecified.<br />
<br />
{{Note|<br />
The number of participants can be greater than the number of distinct proteins involved in an interaction because a single protein can participate more than once in an interaction. Such participation is enumerated and counted to produce the value in this column.<br />
|Important}}<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefR&diff=4046
iRefR
2011-12-02T16:08:06Z
<p>PaulBoddie: Removed notice.</p>
<hr />
<div>__NOTOC__<br />
<br />
<div style="float:right"><br />
<facebook-like /><br />
</div><br />
<br />
iRefR is an R package that provides access to [[iRefIndex]]. It allows the user to load any version of the consolidated protein interaction database "iRefIndex" and perform tasks such as: selecting databases, pmids, experimental methods, searching for specific proteins, separate binary interactions from complexes and polymers, generate complexes according to an algorithm that looks after possible binary-represented complexes, make general database statistics and create network graphs, among others.<br />
<br />
{|class="wikitable" style="text-align:left; clear: left" border="0" cellpadding="10"<br />
| style="vertical-align: top" |<br />
== Installation ==<br />
<br />
iRefR is available from CRAN as the [http://cran.r-project.org/web/packages/iRefR/index.html iRefR package] and can also be downloaded from...<br />
<br />
ftp://ftp.no.embnet.org/irefindex/iRefR/current/<br />
<br />
Documentation and tutorial material is included. First time users should refer to <tt>iRefR_tutorial.pdf</tt> in the <tt>doc</tt> directory of the source distribution.<br />
|<imagemap><br />
Image:Applications-system-80x80.png<br />
default [ftp://ftp.no.embnet.org/irefindex/iRefR/current/]<br />
</imagemap><br />
|-<br />
| style="vertical-align: top" |<br />
== Publication ==<br />
<br />
[http://www.biomedcentral.com/1471-2105/12/455 iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database.]<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px|left<br />
default [http://www.biomedcentral.com/1471-2105/12/455]<br />
</imagemap><br />
|}</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex&diff=4045
iRefIndex
2011-12-02T14:25:16Z
<p>PaulBoddie: Removed notice.</p>
<hr />
<div>__NOTOC__<br />
<br />
<div style="float:right"><br />
<facebook-like /><br />
</div><br />
<br />
<div class="floatleft"><br />
<imagemap><br />
Image:iRefIndex_logo.png|120x120px<br />
default [[iRefIndex#A_reference_index_for_protein_interaction_data]]<br />
</imagemap><br />
</div><br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, InnateDB, IntAct, MatrixDB, MINT, MPact, MPIDB, MPPI and OPHID. <br />
<br />
The iRefScape plugin for Cytoscape has been published recently [http://www.biomedcentral.com/1471-2105/12/388 here].<br />
<br />
[[iRefIndex#A_reference_index_for_protein_interaction_data|Read more]]<br />
<br />
{|class="wikitable" style="text-align:left; clear: left" border="0" cellpadding="10"<br />
|<imagemap><br />
Image:Document-save-80x80.png<br />
default [[README_MITAB2.6_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[README_MITAB2.6_for_iRefIndex|Download]] ===<br />
<br />
Download the current iRefIndex release in PSI-MITAB tab-delimited format via FTP.<br />
|<imagemap><br />
Image:Accessories-text-editor-80x80.png<br />
default [[iRefIndex_Release_Notes]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex Release Notes]] ===<br />
<br />
View release notes and news for each release of iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px<br />
default [[iRefIndex_Citations]]<br />
</imagemap><br />
| style="vertical-align: top" colspan="3" |<br />
=== [[iRefIndex_Citations | Publications, citing, citations and further reading]] ===<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:CytoscapeLogo.png|80x80px<br />
default [[iRefScape]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefScape|iRefScape]] ===<br />
<br />
iRefScape is a plugin for [http://www.cytoscape.org/ Cytoscape] that exposes iRefIndex data as a navigable graphical network.<br />
|<imagemap><br />
Image:firefox.png|80x80px<br />
default [http://wodaklab.org/iRefWeb/]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [http://wodaklab.org/iRefWeb/ iRefWeb] ===<br />
<br />
iRefWeb provides a searchable web interface to the iRefIndex. This interface was developed as part of a collaboration with the Wodak group at the hospital for Sick Children in Toronto, Canada.<br />
|-<br />
|<imagemap><br />
Image:R-logo.jpg|80x80px<br />
default [[iRefR]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefR]] ===<br />
<br />
An R package providing access to iRefIndex data.<br />
|<imagemap><br />
Image:Applications-internet-80x80.png<br />
default [[README_PSICQUIC_web_services_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
<br />
=== [[PSICQUIC|Web services]] ===<br />
<br />
iRefIndex PSICQUIC and PSISCORE web services are now running on release 9.0 of iRefIndex <br />
|-<br />
|<imagemap><br />
Image:Youtube-256.png|80x80px<br />
default [[iRefIndex_Videos]]<br />
</imagemap> <br />
| style="vertical-align: top" |<br />
<br />
=== [[iRefIndex Videos|iRefIndex videos]] ===<br />
<br />
Video learning materials for iRefIndex, iRefScape and iRefWeb.<br />
|<imagemap><br />
Image:Internet-mail-80x80.png<br />
default [[iRefIndex#Contact and mailing list]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex#Contact and mailing list|Contact information and mailing list]] ===<br />
<br />
How to get in touch with the developers.<br />
|-<br />
|<imagemap><br />
Image:Emblem-notice-80x80.png<br />
default [[Sources_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Sources_iRefIndex|Source data information]] ===<br />
<br />
Details of all the different source databases that provide the foundation for iRefIndex.<br />
|<imagemap><br />
Image:X-office-spreadsheet-80x80.png<br />
default [[Statistics_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Statistics_iRefIndex|Statistical information]] ===<br />
<br />
Statistics for the current iRefIndex release.<br />
|}<br />
<br />
== Technical information on the iRefIndex database ==<br />
<br />
Build process: [[iRefIndex Manual]]<br />
<br />
Feedback files: [[README iRefIndex Feedback]]<br />
<br />
Mapping files: [[Protein identifier mapping]]<br />
<br />
Normalization of MI cv terms: [[Mapping of terms to MI term ids - iRefIndex]]<br />
<br />
Canonicalization: [[Canonicalization]]<br />
<br />
Disease Groups: [[DiG: Disease groups]]<br />
<br />
All iRefIndex pages and archived releases: [[iRefIndex#All_iRefIndex_Pages|see below]]<br />
<br />
License and disclaimer: [[iRefIndex#License_and_disclaimer|see below]]<br />
<br />
----<br />
<br />
== A reference index for protein interaction data ==<br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including [http://bond.unleashedinformatics.com/ BIND], [http://www.thebiogrid.org/ BioGRID], [http://mips.gsf.de/genre/proj/corum/index.html CORUM], [http://dip.doe-mbi.ucla.edu/ DIP], [http://www.hprd.org/ HPRD], [http://www.ebi.ac.uk/intact/site/index.jsf IntAct], [http://mint.bio.uniroma2.it/mint/Welcome.do MINT], [http://mips.gsf.de/genre/proj/mpact MPact], [http://mips.gsf.de/proj/ppi/ MPPI] and [http://ophid.utoronto.ca/ OPHID]. This index includes multiple interaction types including physical and genetic (mapped to their corresponding protein products) as determined by a multitude of methods. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.<br />
<br />
iRefIndex assigns a global unique identifier (rigid) which looks like 'tjWXXjgPyHyT2J6EwED8zK2x18U' to identify interactions that are identical (according to the sequence and taxon ids of the interactors). iRefIndex also assigns similar looking keys to protein interactors. These keys are global meaning they can be generated by anyone using the method described in the paper. This method allows users to integrate their own data with the iRefIndex in a way that ensures proteins with the exact same sequence will be represented only once.<br />
<br />
== Publications and further reading ==<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex are provided on the [[iRefIndex Citations]] page.<br />
<br />
----<br />
<br />
== Long term goals of the iRefIndex project ==<br />
<br />
We believe that protein interaction data hold incredible potential for biomedical research. Presently, these data are collected and archived by multiple groups around the world and the number of groups taking part in this work is growing rather than diminishing. <br />
<br />
As such, it is important that these databases have the means to effectively exchange and compare data and that they are curating and representing data using similar standards in order to make their data accessible and allow effective use.<br />
<br />
To this end, the iRefIndex project has three long term objectives:<br />
<br />
;1) to facilitate exchange of interaction data between interaction databases. <br />
<br />
:The iRefIndex paper describes a method for assigning unique and global identifiers to protein interactors, interactions and complexes. This method is independent of the iRefIndex resource and may be used by anyone to facilitate exchange and consolidation of data.<br />
<br />
;2) to consolidate interaction data from multiple sources. <br />
<br />
:The method has been used by to index interaction records from multiple sources. The resulting iRefIndex may be used search for the existence of interaction data for any protein regardless of the original resource. Nine interaction databases have been incorporated so far, others will follow.<br />
<br />
;3) to provide feedback to source interaction databases. <br />
<br />
:During the process of data consolidation, iRefIndex uses a sophisticated method to keep track of potential problems with source records such as outdated or unfound protein identifiers or incorrectly assigned taxonomy identifiers. These data are provided as feedback files to source interaction databases for correction, clarification or improvements to our own system. This process will help to harmonize data representation and improve the overall quality of interaction records for all source databases. This process will also help source databases to exchange data with one another.<br />
<br />
== iRefIndex availability ==<br />
<br />
iRefIndex is made available in a number of formats: MITAB tab-delimited text files, iRefWeb interface, iRefScape plugin for Cytoscape, PSICQUIC Web services, and an interface for the R programming language environment. See the links at the top of this page. For the license and disclaimer, [[iRefIndex#License_and_disclaimer | see below]].<br />
<br />
== Credits and collaborations ==<br />
<br />
'''Sabry Razick and Ian Donaldson''' developed iRefIndex at the Biotechnology Centre of Oslo, University of Oslo. <br />
<br />
'''Paul Boddie''' provides ongoing maintenance and development.<br />
<br />
'''George Magklaras''' provides systems engineer support and [http://www.no.embnet.org/ EMBNet Norway] provided hardware support.<br />
<br />
'''Antonio Mora''' developed [[iRefR|iRefR]].<br />
<br />
'''Katerina Michalickova''' developed [[DiG:_Disease_groups|Disease groups]].<br />
<br />
'''Brian Turner and Andrei Turinsky''' from the [http://wodaklab.org/ws/ Wodak group] at the Hospital for Sick Children in Toronto, Canada developed the [http://wodaklab.org/iRefWeb/ iRefWeb interface].<br />
<br />
<imagemap><br />
Image:IMEx_logo_webmedium.jpg|100x100px<br />
default [http://www.psimex.org]<br />
</imagemap> <br />
<br />
iRefIndex is a PSIMex partner: http://www.psimex.org<br />
<br />
<!--<br />
=== iRefWeb in the NCBI LinkOut programme ===<br />
<br />
Many [http://www.ncbi.nlm.nih.gov/gene Entrez Gene] records provided by NCBI contain links to iRefWeb in the [http://www.ncbi.nlm.nih.gov/projects/linkout/index.html LinkOut] section, allowing users to consult iRefWeb for related protein interactions when browsing gene information. The software which exposes iRefIndex information to the LinkOut programme can be found on the [[iRefWeb LinkOut Generator]] page.<br />
<br />
--><br />
----<br />
<br />
== License and disclaimer==<br />
<br />
Data released on the public FTP site is released under the [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution 2.5 Generic (CC BY 2.5) license].<br />
<br />
<imagemap><br />
Image:By-100x35.png<br />
default [http://creativecommons.org/licenses/by/2.5/]<br />
</imagemap><br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
----<br />
<br />
== Contact and mailing list ==<br />
<br />
Suggestions, requests and comments are welcome.<br />
<br />
<pre>ian.donaldson@biotek.uio.no</pre><br />
<br />
Full contact details are available at the [http://www.biotek.uio.no/english/research/groups/donaldson-group/ group home page].<br />
<br />
<imagemap><br />
Image:google-groups-logo.gif<br />
default [http://groups.google.com/group/irefindex]<br />
</imagemap><br />
<br />
See the [http://groups.google.com/group/irefindex iRefIndex Google Group] for announcements and discussion.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=README_MITAB2.6_for_iRefIndex_9.0&diff=4044
README MITAB2.6 for iRefIndex 9.0
2011-12-02T14:24:48Z
<p>PaulBoddie: Removed notice.</p>
<hr />
<div><div class="floatright" style="text-align: center"><br />
<br />
'''iRefIndex 9.0 Downloads'''<br />
<imagemap><br />
Image:Document-save-80x80.png<br />
default [ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/psimi_tab/MITAB2.6/]<br />
</imagemap><br />
<br />
'''Parsing MITAB Format Data'''<br />
<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[iRefIndex_MITAB2.6_Parser]]<br />
</imagemap><br />
</div><br />
<br />
Last edited: {{REVISIONYEAR}}-{{padleft:{{REVISIONMONTH}}|2}}-{{REVISIONDAY2}}<br />
<br />
Applies to iRefIndex release: 9.0<br />
<br />
Release date: 2011-11-07<br />
<br />
Download location: ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/psimi_tab/MITAB2.6/<br>(use <tt>anonymous</tt> as the login and your email address as the password)<br />
<br />
Authors: Ian Donaldson, Sabry Razick, Paul Boddie<br />
<br />
Database: iRefIndex (http://irefindex.uio.no)<br />
<br />
Organization: Biotechnology Centre of Oslo, University of Oslo <br />
(http://www.biotek.uio.no/) <br />
<br />
[[#Description|License of the source database]].<br />
<br />
== <span style="color:#0f0086"> Description </span> ==<br />
<br />
This file describes the contents of the <br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/current/psimi_tab/<br />
<br />
directory and the format of the tab-delimited text files contained within. Each index file follows the PSI-MITAB2.6 format with additional columns for annotating edges and nodes. Assignment of source interaction records to these redundant groups is described at http://irefindex.uio.no. The PSI-MI2.6 format plus additional columns is described below.<br />
<br />
A supplementary file lists just database:accession pairs for proteins and their mapping to irog, icrog and Entrez Gene identifiers. See<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/current/Mappingfiles/<br />
<br />
and README at<br />
<br />
http://irefindex.uio.no/wiki/Protein_identifier_mapping<br />
<br />
This file is precalculated from the MITAB distribution as a convenience to users.<br />
<br />
Details on the build process are available from the publication PMID 18823568.<br />
<br />
This distribution includes data consolidated using the iRefIndex method for BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPPI, MPact and OPHID.<br />
<br />
<br />
{|<br />
|Sources || http://irefindex.uio.no/wiki/Sources_iRefIndex_8.0<br />
|-<br />
|Statistics || http://irefindex.uio.no/wiki/Statistics_iRefIndex_8.0<br />
|-<br />
|Download location || ftp://ftp.no.embnet.org/irefindex/data/current/psimi_tab/<br><br />
|}<br />
<br />
== Directory contents ==<br />
<br />
{|<br />
|<tt>README</tt> ||pointer to this file<br />
|-<br />
|<tt>xxxx.mitab.mmddyyyy.txt.zip</tt> ||individual indices in PSI-MITAB2.6 format<br><br />
|}<br />
<br />
iRefIndex data is distributed as a set of tab-delimited text files with names of the form <tt>xxxx.mitab.mmddyyyy.txt.zip</tt> where <tt>mmddyyyy</tt> represents the file's creation date.<br />
<br />
The complete index is available as <tt>All.mitab.mmddyyyy.txt.zip</tt> .<br />
<br />
Taxon specific data sets are also available for:<br />
<br />
{|<br />
| ||'''Taxon Id'''<br />
|-<br />
|Homo sapiens ||9606 (human)<br />
|-<br />
|Mus musculus ||10090 (mouse)<br />
|-<br />
|Rattus norvegicus ||10116 (brown rat)<br />
|-<br />
|Caenorhabditis elegans ||6239 (nematode)<br />
|-<br />
|Drosophila melanogaster ||7227 (fruit fly)<br />
|-<br />
|Saccharomyces cerevisiae ||4932 (baker's yeast)<br />
|-<br />
|Saccharomyces cerevisiae S288c ||559292<br />
|-<br />
|Escherichia coli. ||562 (E. Coli)<br />
|-<br />
|Other ||other<br />
|-<br />
|All ||all<br />
|}<br />
<br />
Taxon specific subsets of the data are named <tt>xxxx.mitab.mmddyyyy.txt.zip</tt> where <tt>xxxx</tt> is the taxonomy identifier of at least one of the interactors according to either the source interaction database or the sequence database record. Each zip compressed file contains a single text file with the corresponding name <tt>xxxx.mitab.mmddyyyy.txt</tt>.<br />
<br />
In some cases, other objects may belong to other taxons if a virus-host interaction is being represented or if a protein from another organism has been used to model a protein in the specified organism. <br />
<br />
Taxonomy identifiers are provided in the data sets allowing these exceptions to be identified. The taxonomy identifiers listed are derived from the source protein sequence record. In some cases, this taxonomy identifier will be a child of the taxon listed in the file's title; for example, Escherichia coli K12 (taxonomy identifier 83333) will appear in the Escherichia coli (taxonomy identifier 562) file.<br />
<br />
A description of the NCBI taxon identifiers is available at the following location:<br />
<br />
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy <br />
<br />
The above data taxon division scheme leads to duplications; for instance, an interaction present in the mouse index could also appear in the human index if the interaction record lists protein sequence records from both human and mouse. The <tt>All.mitab.mmddyyyy</tt> file is a complete and non-redundant listing. <br />
<br />
The data format and divisions provided in this initial release were chosen in the hopes that they would be immediately useful to the largest possible set of users. Other formats and divisions are possible and we welcome your input on future releases.<br />
<br />
== Changes from last version ==<br />
<br />
This is the third release of iRefIndex in PSI-MITAB2.6 format.<br />
<br />
* RIGIDs in previous releases of iRefIndex were [[Bugzilla:242|incorrectly computed]]. Although the properties of such RIGIDs were not compromised - distinct RIGIDs should still have referred to distinct interactions - each RIGID made use of substantially less information from its components. RIGIDs in this release should now be computed correctly.<br />
* [[Bugzilla:245|Duplicate lines]] are now no longer produced in the MITAB output. Previously, database records containing additional information not reproduced in the MITAB output were written to the files on a record-by-record basis. However, since these individual records provide no useful additional information purely through their presence, and since the result is merely a collection of redundant records, lines which are the same as others are now filtered out when writing the MITAB files.<br />
* Many proteins previously assigned the 4932 taxonomy identifier have been [[Bugzilla:247|recategorised]] as having taxonomy identifier 559292. Thus, for convenience, an additional 559292 file is produced alongside the existing (but substantially smaller) 4932 file to hold interactions involving proteins associated with both taxons.<br />
* [[Bugzilla:248|Interactions not involving proteins associated with a specific organism]] are now excluded from organism-specific files. Note that complexes may consist of a number of lines where interactors may have a different taxonomy identifier from that of the specific file being consulted, but in such cases there will always be a member of the complex labelled with the appropriate taxonomy identifier, and thus the complex describes a "mixed species" interaction which should be retained just as binary interactions are where one participant is native to the file and the other is "foreign".<br />
* Previously, PubMed identifiers were being given as interaction detection methods for CORUM-originating interactions. This has now been [[Bugzilla:249|resolved]].<br />
<br />
References:<br />
<br />
* http://code.google.com/p/psimi/issues/detail?id=2<br />
* http://code.google.com/p/psimi/wiki/PsimiTabFormat<br />
<br />
=== Mapping to Legacy RIGIDs ===<br />
<br />
A mapping from current to legacy RIGIDs is provided on the FTP site as <tt>legacy.txt</tt> at the following location:<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/<br />
<br />
== Known Issues ==<br />
<br />
* We have replaced the pipe character (<tt>|</tt>) of the PDB identifiers with an underscore character (<tt>_</tt>) <br />
<br />
This decision was taken to avoid unexpected parsing problems: the PSI-MITAB format uses pipes (<tt>|</tt>) as a separator character where multiple values occur in the same column.<br />
<br />
As a result, column number 37 (OriginalReferenceA) and column number 38 (OriginalReferenceB) may differ from the original reference in such cases.<br />
<br />
== Understanding the iRefIndex MITAB format ==<br />
<br />
iRefIndex is distributed in PSI-MITAB format. Version 2.5 of the format was originally described in PMID 17925023 ([http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2189715/?tool=pubmed full text]). This file describes the columns defined by version 2.6 of the PSI-MITAB format plus columns added by iRefIndex.<br />
<br />
Since the PSI-MITAB format allows for only two interactors to be described on each line, it is best suited for describing binary interaction data (the original experiment, say yeast two hybrid, gives a binary readout). However, other source PSI-MI XML source records will describe interactions involving only one interactor type (dimers or multimers) or they will contain associative (also known as "n-ary") interaction data from, for example, immunoprecipitation experiments where the exact interactions between any pair of interactors are unknown. These cases are problematic for the PSI-MITAB format. This document describes exactly how we use the MITAB format to describe these alternate (non-binary) interaction types.<br />
<br />
=== What each line represents ===<br />
<br />
Each line or row in the MITAB file represents a ''single'' interaction record from one primary data source describing an interaction involving the exact same set of proteins (as defined by their primary sequence and taxonomy identifiers).<br />
<br />
{{Note|Each line in this file represents a single source database record that supports either:<br />
<br />
# an interaction between two proteins (binary interaction) or<br />
# the membership of a protein in some complex (complex membership) or<br />
# an interaction that involves only one protein type (multimer or self-interaction).<br />
<br />
|Important}}<br />
<br />
<br />
Each row in this table has a natural key pointing to an original interaction record in some source database that is listed under column 14 (interactionIdentifier). For example:<br />
<br />
intact:EBI-761694<br />
<br />
{{Note|<br />
Prior to release 7.0, each line represented a ''group'' of interaction records involving the exact same set of proteins (as defined by their primary sequence and taxonomy identifiers). This ''collapsed'' or non-redundant format did not allow us to easily describe meta-data associated with each source record. Therefore, we have moved to this ''expanded'' or redundant version. Users can still collapse multiple rows that all provide evidence for an interaction between the same set of proteins using the keys provided (for example, RIGIDs).<br />
}}<br />
<br />
Rows in this table that all provide evidence for an interaction between the same set of proteins can be identified using the RIGID key (redundant interaction group identifier). The RIGID is a 27 character key that is derived from the ROGIDs of the interactors involved in the interaction record. The ROGID is a SHA-1 digest of the protein interactor's primary amino acid sequence concatenated with the NCBI taxonomy identifier (see the paper for details).<br />
<br />
{{Note|<br />
The RIGID key is now listed (by itself) in column 35 (Checksum_Interaction) as part of the new extended PSI-MITAB format. This is a universal key that can be generated by each and every interaction database and may be included in MITAB2.6 distributions from other source databases. The intention of this key is to aid third party integration of data collected from multiple databases (for example, from PSICQUIC web services). <br />
}}<br />
<br />
=== Representation of interactions ===<br />
<br />
==== Binary interaction data ====<br />
<br />
This is the most common data type.<br />
<br />
For binary interaction data, column 53 (edgetype) will contain an X. Interactors A and B will list the two proteins for which interaction evidence is provided in the row. User's should pay close attention to columns 12 (interactionType) and 7 (Method) when deciding what binary data they wish to accept as evidence of a direct physical interaction.<br />
<br />
==== Complexes (a.k.a. n-ary data) ====<br />
<br />
Certain experimental methods (like immunoprecipitations) provide evidence that a list of 3 or more proteins are associated but cannot provide evidence for a direct interaction between any given pair of proteins in that list. <br />
<br />
In these cases, interactor A (column 1) is used as a placeholder to represent the ''complex'' or ''list'' of proteins while interactor B is used to list one of the members of the list: therefore, the entire ''n-ary interaction record'' is described using one row for each interactor. Each of these rows will have the same ''interactor A''. This method of representation is referred to as a '''bi-partite model''' since there are two kinds of nodes corresponding to complexes and proteins. <br />
<br />
These interactions are marked by a C in column 53 (edgetype).<br />
<br />
As an example, let’s say that a source interaction record contained interactors A, B and C found by affinity purification and mass-spec where a tagged version of protein A was used as the bait protein to perform the immunoprecipitation. <br />
<br />
Then we would represent the complex in the MITAB file using three lines:<br />
<br />
X-A<br />
X-B<br />
X-C<br />
<br />
All three entries would have the same string in column 1 (the RIGID for the complex). All three entries would have a C in column 53 (edgetype).<br />
<br />
Other databases take an interaction record with multiple interactors (n-ary data) and make a list of binary interactions (based on the spoke or matrix model) and then list these binary interactions in the MITAB. For the example above, using a '''spoke model''' to transform the data into a set of binary interactions, these data would be represented using two lines in the MITAB file:<br />
<br />
A-B<br />
A-C<br />
<br />
Here A is chosen as the "hub" of the spoke model since it was the "bait" protein. For experimental systems that do not have "baits" and "preys" (such as X-ray crystallography), an arbitrary protein might be chosen as the bait.<br />
<br />
Alternatively, a '''matrix model''' might be used to transform the n-ary data into a list of binary interactions. Here all pairwise combinations of interactors in the original n-ary data are represented as binary interactions. So, in the above example, the immunoprecipitated complex would be represented using three lines of the MITAB file:<br />
<br />
A-B<br />
B-C<br />
A-C<br />
<br />
All three methods for representing n-ary data in a MITAB file (bi-partite, spoke, and matrix) are different representations of the same data. The model type that is chosen to describe n-ary data is listed in column 16 (expansion) of the MITAB2.6 format.<br />
<br />
We have chosen to use the bi-partite method of representation so that it is impossible to mistake spoke or matrix binary entries for true binary entries; the identifiers used for complexes will, of course, not appear in a protein database and any programme that tries to treat complex identifiers as though they were protein identifiers will fail. The method allows you to reconstruct the members of the original interaction record that describes a complex of proteins (say from an affinity purification experiment). From there, you can choose to make a spoke or matrix model by yourself if you want. <br />
<br />
Users are advised that other databases may use spoke and matrix model representations of complexes in the MITAB format. <br />
<br />
==== Intramolecular interactions and multimers ====<br />
<br />
These row types form a minority of the data and are rare incomparison to the above types.<br />
<br />
Sometimes source interaction records in PSI-MI format only list one interactor. These are cases where either<br />
<br />
<ol><br />
<li>an intra-molecular interaction is being represented or</li><br />
<li>a multimer (3 or more) of some protein is being represented.</li><br />
</ol> <br />
These records are difficult to represent in the PSI-MITAB format because PSI-MITAB requires that each row (interaction) list two interactors. <br />
We are representing these interaction records using the following format to reflect the original format provided as closely as possible.<br />
<ol><br />
<li>Interactions involving only one interactor. The uidA and uidB would be the same and the edge type would be 'Y' (column number 53 (edgetype)). Therefore, when ever there is an edge type 'Y' this means that this interaction involves only one protein (although the interaction is given as between two interactors), and thus column number 54 (numParticipants) would always be 1. For example:<br />
<pre>{A - A, edge type 'Y', numParticipants=1}</pre></li><br />
<li>When the interaction is described as involving two interactors but both of them refer to the same protein. This would be represented as a normal binary interaction and would have the edge type = 'X' (column number 53 (edgetype)), and thus column number 54 (numParticipants) would always be 2. For example:<br />
<pre>{A - A, edge type 'X', numParticipants=2}</pre></li><br />
<li>When the interaction is described as involving more than 2 interactors and all those interactors are referring to the same protein, a bi-partite representation will be used. The edge type would be 'C' (column number 53 (edgetype)). For example, with regard to complexes (a.k.a. n-ary data):<br />
<pre><br />
{C - A, edge type 'C', numParticipants=3<br />
C - A, edge type 'C', numParticipants=3<br />
C - A, edge type 'C', numParticipants=3}<br />
</pre></li><br />
</ol><br />
<br />
We draw extra attention to the fact that the RIGID (column number 35 (Checksum_Interaction)) for these interactions will be the SHA-1 digest of the ROGIDs for each of the distinct subunit types (see columns 33 (Checksum_A) and 34 (Checksum_B)). Thus interactions involving 1, 2 or more subunits of the same protein would all have the same RIGID.<br />
<br />
=== Keys for grouping together redundant interactors and interactions ===<br />
<br />
A number of keys are provided in this file to help users group together rows that all provide evidence for some kind of interaction between the same set (or a related set) of proteins. See columns 33-35 (Checksum_A, Checksum_B and Checksum_Interaction) and 43-51 (integer identifier and canonical data columns).<br />
<br />
The process of creating keys that group proteins and interactions into canonical groups was described after the original paper in the [[Canonicalization]] document. <br />
<br />
=== Provenance data ===<br />
<br />
Provenance data (where we retrieved source records from and how we mapped interactors and interactions to ROGIDs) is described in columns 37-42 (original and final references plus mapping scores).<br />
<br />
== License ==<br />
<br />
Data released on this public ftp site are released under the Creative <br />
Commons Attribution License http://creativecommons.org/licenses/by/2.5/. <br />
This means that you are free to use, modify and redistribute these data <br />
for personal or commercial use so long as you provide appropriate <br />
credit. See next section.<br />
<br />
<br />
Copyright © 2008-2011 Ian Donaldson<br />
<br />
== Citation ==<br />
<br />
Credit should include citing the iRefIndex paper (PMID 18823568) and any of the <br />
source databases upon which this resource is based. See <br />
http://irefindex.uio.no for appropriate citations.<br />
<br />
== Disclaimer ==<br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY <br />
WARRANTY; without even the implied warranty of MERCHANTABILITY or <br />
FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
== Description of PSI-MITAB2.6 file ==<br />
<br />
Each line in this file represents a single source database record that supports either:<br />
<br />
# an interaction between two proteins (binary interaction) or<br />
# the membership of a protein in some complex (complex membership) or<br />
# an interaction that involves only one protein type (multimer or self-interaction).<br />
<br />
=== Column number: 1 (uidA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Unique identifier for interactor A. <br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column contains an identifier, taken from a major database, for a protein representing the interactor A. A UniProt or a RefSeq accession is provided (in that order of preference) wherever possible. See column 3 for a list of prefixes that may be employed in this column in addition to the following:<br />
<br />
;<tt>complex</tt><br />
:If interactor A is being used to represent a complex, then the rogid for the complex will be listed here, such as the following:<br />
<br />
<pre>complex:xBr9cTXgzPLNxsaKiYyHcoEm/DM</pre><br />
<br />
[[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for an explanation.<br />
<br />
In rare cases, a rogid may appear here if a protein interactor has a sequence but no known, valid ''<tt>database:accession</tt>'' pair.<br />
<br />
=== Column number: 2 (uidB)===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Unique identifier interactor B.<br />
|-<br />
|Example: ||<pre>uniprotkb:P06722</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 1.<br />
<br />
=== Column number: 3 (altA)===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Alternative identifiers for interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367|refseq:NP_418591|entrezgene/locuslink:948691|rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333|irogid:12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
All ''<tt>database:accession</tt>'' pairs listed in Column 3 point to protein records that describe the exact same sequence from the same taxon.<br />
<br />
Each pipe-delimited entry is a database_name:accession pair delimited by a colon. Database names are taken from the MI controlled vocabulary at the following location:<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
Database references listed in this column may include the following:<br />
<br />
;<tt>uniprotkb</tt><br />
:The accessions this protein is known by in UniProt (http://www.uniprot.org/). More information regarding this protein can be retrieved using this accession from UniProt. See the AC line in the flat file. http://au.expasy.org/sprot/userman.html#AC_line. <br />
;<tt>refseq</tt><br />
:If a protein accession exists in the RefSeq data base (http://www.ncbi.nlm.nih.gov/RefSeq/) that reference is indicated here. More information about this protein can be obtained from RefSeq using this accession. <br />
;<tt>entrezgene/locuslink</tt><br />
:NCBI gene Identifiers for the gene encoding this protein. See ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq column GeneID given protein's accession.version<br />
;<tt><em>other</em></tt><br />
:If none of the three identifier types are available then other <tt><em>databasename</em>:<em>accession</em></tt> pairs will be listed. These database names may not follow the MI controlled vocabulary.<br />
<br />
Example:<br />
<br />
<pre>emb:CAA44868.1|gb:AAA23715.1|gb:AAB02995.1|emb:CAA56736.1|uniprot:P24991</pre><br />
<br />
;<tt>rogid</tt><br />
:Column 33 repeated here for convenience.<br />
<br />
;<tt>irogid</tt><br />
:Column 43 repeated here for convenience.<br />
<br />
{{Note|<br />
The rogid of a complex or a n-ary interaction is the rigid of that <br />
interaction. However, the irogid of the complex is not the irigid.<br />
The irogid for the complex is an integer and it is non-overlapping <br />
with any protein irogids<br />
}}<br />
<br />
=== Column number: 4 (altB)===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Alternative identifiers for interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P06722|refseq:NP_417308|entrezgene/locuslink:947299</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 3. (Columns 34 and 44 are related to this column.)<br />
<br />
=== Column number: 5 (aliasA) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Aliases for interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:MUTL_ECOLI|entrezgene/locuslink:mutL|crogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333|icrogid:12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each pipe-delimited entry is a <tt><em>database name</em>:<em>alias</em></tt> pair delimited by a <br />
colon. Database names are taken from the PSI-MI controlled vocabulary <br />
at the following location:<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
Database names and sources listed in this column may include the following:<br />
<br />
;<tt>uniprotkb:<em>entry name</em></tt><br />
:the entry name given by UniProt. See the description for "Entry name" in the section of http://au.expasy.org/sprot/userman.html#ID_line concerning the "ID (IDentification)" line of the flat file<br />
;<tt>entrezgene/locuslink:<em>symbol</em></tt><br />
:the NCBI gene symbol for the gene encoding this protein. See the section in ftp://ftp.ncbi.nlm.nih.gov/gene/README for <tt>gene_info</tt>, specifically details for the <tt>Symbol</tt> column<br />
;<tt>crogid</tt><br />
:Column 46 repeated here for convenience.<br />
;<tt>icrogid</tt><br />
:Column 49 repeated here for convenience.<br />
;<tt>other db:accession pairs</tt><br />
:Other db:accession pairs may be added (after icrogid) that all belong to the same canonical group. These are purely meant to facilitate look-up by PSICQUIC and other services - these sequences are related (but not identical) with interactor A sequence.<br />
;<tt>NA</tt><br />
:<tt>NA</tt> may be listed here if aliases are <em>not available</em><br />
<br />
=== Column number: 6 (aliasB) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Aliases for interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:MUTH_ECOLI|entrezgene/locuslink:mutH</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 5. (Columns 47 and 50 are related to this column.)<br />
<br />
=== Column number: 7 (Method) ===<br />
<br />
{|<br />
|Column type: ||String <br />
|-<br />
|Description: ||Interaction detection method<br />
|-<br />
|Example: ||<pre>MI:0039(2h fragment pooling)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
Only a single method will appear in this column. Previously, multiple methods appeared.<br />
}}<br />
<br />
Both the controlled vocabulary term identifier for the method (e.g. MI:0399) and the controlled vocabulary term short label in brackets (e.g. 2h fragment pooling) will appear in this column. See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to look up controlled vocabulary term identifiers.<br />
<br />
The interaction detection method is from the original record. Path for PSI-MI 2.5:<br />
<br />
<pre>entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel/</pre><br />
<br />
<br />
{{Note|<br />
If a controlled vocabulary term identifier was not provided by the source database then an attempt was made to use the supplied short label to find the correct term identifier. If a term identifier could not be found, then <tt>MI:0000</tt> will appear before the shortLabels.<br />
}}<br />
<br />
<tt>NA</tt> or <tt>-1</tt> may appear in place of a recognised shortLabel.<br />
<br />
For example:<br />
<br />
<pre><br />
MI:0000(-1)<br />
MI:0000(NA)<br />
</pre><br />
<br />
=== Column number: 8 (author) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||<br />
|-<br />
|Example: ||<pre>hall-1999-1|hall-1999-2|mansour-2001-1|mansour-2001-2|hall-1999</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
According to MITAB2.6 format this column should contain a pipe-delimited list of author surnames in which the interaction has been shown.<br />
<br />
{{Note|<br />
This column will usually include only one author name reference. However, some experimental evidences have secondary references which could be included here.<br />
This filed also includes references which are not author names as in the following examples:<br />
* OPHID Predicted Protein Interaction<br />
* HPRD Text Mining Confirmation<br />
* MINT Text Mining Confirmation<br />
}}<br />
<br />
=== Column number: 9 (pmids) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||PubMed Identifiers<br />
|-<br />
|Example: ||<pre>pubmed:9880500|pubmed:11585365</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is a non-redundant list of PubMed identifiers pointing to literature that supports the interaction. <br />
According to MITAB2.6 format, this column should contain a pipe-delimited set of <tt><em>databaseName</em>:<em>identifier</em></tt> pairs such as <tt>pubmed:12345</tt>.<br />
The source database name is always <tt>pubmed</tt>.<br />
<br />
{{Note|<br />
This column will usually include only one PubMed reference that describes where the experimental evidence is found. In some cases, secondary references are provided by the source database and will be included here.<br />
}}<br />
<br />
<br />
The special value <tt>-</tt> may appear in place of the identifiers.<br />
<br />
=== Column number: 10 (taxa) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Taxonomy identifier for canonical interactor A<br />
|-<br />
|Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre><br />
<br />
|}<br />
<br />
'''Notes'''<br />
<br />
The NCBI taxonomy identifier listed here is that of the sequence record for the interactor and may be corrected from what was provided by the source database. See the methods section of the iRefIndex paper for more details. See also the NCBI taxonomy database at the following location:<br />
<br />
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy<br />
<br />
According to the MITAB2.6 format, this column should contain a pipe delimited set of <tt><em>databaseName</em>:<em>identifier</em></tt> pairs such as <tt>taxid:12345</tt>. The source database name has been listed as taxid since it is always NCBI's taxonomy database. The value in this column will be <tt>NA</tt> if the interactor is a complex.<br />
<br />
=== Column number: 11 (taxb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Taxonomy identifier for canonical interactor B<br />
|-<br />
|Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 10.<br />
<br />
=== Column number: 12 (interactionType) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Interaction Type from controlled vocabulary or short label<br />
|-<br />
|Example: ||<pre>MI:0218(physical interaction)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
Only one interaction type will be present in each line of the file.<br />
}}<br />
<br />
The interaction type is taken from the PSI-MI controlled vocabulary and represented as...<br />
<br />
<pre>database:identifier(interaction type)</pre><br />
<br />
...(when available in the interaction record) or Path for PSI-MI 2.5:<br />
<br />
<pre>entrySet/entry/interactionList/interaction/interactionType/names/shortLabel</pre><br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for interaction types.<br />
<br />
{{Note|<br />
If the MI controlled vocabulary identifier was not provided by the source database, but a text description was provided, then an attempt was made to map the text to the correct controlled vocabulary term identifier.<br />
If this was not possible then <tt>MI:0000</tt> is listed.<br />
|Change}}<br />
<br />
<tt>NA</tt> may be listed here if the interaction type is not available (meaning that we could not find the interaction type in the record provided by the source database).<br />
<br />
=== Column number: 13 (sourcedb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Source database for this interaction record <br />
|-<br />
|Example: ||<pre>MI:0469(intact)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Taken from the PSI-MI controlled vocabulary and represented as...<br />
<br />
<pre>database:identifier(source name)</pre><br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for database sources.<br />
<br />
{{Note|<br />
Only one source database will be listed in each row.<br />
|Change}}<br />
<br />
=== Column number: 14 (interactionIdentifier) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||source interaction-database and accession<br />
|-<br />
|Example: ||<pre>intact:EBI-761694|rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA|irigid:1234|edgetype:X</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each reference is presented as a <tt><em>database name</em>:<em>identifier</em></tt> pair. <br />
<br />
{{Note|<br />
The source database is listed first. Additional information is pipe-delimited and presented here for the convenience of PSICQUIC web-service users (these services presently truncate this file at column 15 as they only support MITAB2.5). See columns 35,45,53. <br />
|Change}}<br />
<br />
The source database names that appear in this column are taken from the<br />
PSI-MI controlled vocabulary at the following location (where possible):<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
If an interaction record identifier is not provided by the source database, this entry will appear as <tt><em>database-name</em>:-</tt> with the identifier region replaced with a dash (<tt>-</tt>).<br />
<br />
=== Column number: 15 (confidence) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Confidence scores<br />
|-<br />
|Example: ||<pre>lpr:1|hpr:12|np:1|PSICQUIC entries are truncated here. See irefindex.uio.no</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each reference is presented as a <tt>''scoreName'':''score''</tt> pair. Three confidence <br />
scores are provided: <tt>lpr</tt>, <tt>hpr</tt> and <tt>np</tt>.<br />
<br />
PubMed Identifiers (PMIDs) point to literature references that support <br />
an interaction. A PMID may be used to support more than one interaction. <br />
<br />
The lpr score (lowest PMID re-use) is the lowest number of distinct <br />
interactions (RIGIDs: see column 35) that any one PMID (supporting the <br />
interaction in this row) is used to support. A value of one indicates <br />
that at least one of the PMIDs supporting this interaction has never <br />
been used to support any other interaction. This likely indicates that <br />
only one interaction was described by that reference and that the <br />
present interaction is not derived from high throughput methods.<br />
<br />
The hpr score (highest PMID re-use) is the highest number of <br />
interactions (RIGIDs: see column 35) that any one PMID (supporting the <br />
interaction in this row) is used to support. A high value (e.g. greater <br />
than 50) indicates that one PMID describes at least 50 other <br />
interactions and it is more likely that high-throughput methods were <br />
used.<br />
<br />
The np score (number PMIDs) is the total number of unique PMIDs used to <br />
support the interaction described in this row.<br />
<br />
<tt>-</tt> may appear in the score field, indicating the absence of a score value.<br />
<br />
----<br />
<br />
{{Note|<br />
COLUMNS PAST THIS POINT (16 - 31) ARE PART OF THE NEW PSI-MITAB 2.6 FORMAT<br />
|Note}}<br />
<br />
=== Column number: 16 (expansion) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Model used to convert n-ary data into binary data for purpose of export in MITAB file<br />
|-<br />
|Example: ||<pre>bipartite</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
For iRefIndex, this column will always contain either <tt>bipartite</tt> or <tt>none</tt>.<br />
<br />
Other databases may use either <tt>spoke</tt> or <tt>matrix</tt> or <tt>none</tt> in this column.<br />
<br />
See <br />
[[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for an explanation.<br />
<br />
=== Column number: 17 (biological_role_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Biological role of interactor A<br />
|-<br />
|Example: ||<pre>MI:0501(enzyme)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
When provided by the source database, this includes single entries such as <tt>MI:0501(enzyme)</tt>, <tt>MI:0502(enzyme target)</tt>, <tt>MI:0580(electron acceptor)</tt>, or <tt>MI:0499(unspecified role)</tt>.<br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to browse possible values for biological role.<br />
<br />
For complexes and when no role is specified this column will indicate an unspecified role.<br />
<br />
=== Column number: 18 (biological_role_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Biological role of interactor B<br />
|-<br />
|Example: ||<pre>MI:0501(enzyme)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 17.<br />
<br />
=== Column number: 19 (experimental_role_A) ===<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Indicates the experimental role of the interactor (such as bait or prey). <br />
|-<br />
|Example: ||<pre>MI:0496(bait)</pre><br />
|-<br />
|Example: ||<pre>MI:0498(prey)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column indicates the experimental role (if any was provided by the source database) that was played by interactor A.<br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to see definitions of bait and prey.<br />
as well as browse other possible values of experimental role that may appear in this column for other databases.<br />
<br />
For complexes and when no role is specified this column will contain the following:<br />
<br />
<pre>MI:0499(unspecified role)</pre><br />
<br />
=== Column number: 20 (experimental_role_B) ===<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Indicates the experimental role of the interactor (such as bait or prey). <br />
|-<br />
|Example: ||<pre>MI:0496(bait)</pre><br />
|-<br />
|Example: ||<pre>MI:0498(prey)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column indicates the experimental role (if any) that was played by interactor B.<br />
<br />
See notes above for column 19.<br />
<br />
=== Column number: 21 (interactor_type_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||describes the type of molecule that A is <br />
|-<br />
|Example: ||<pre>MI:0326(protein)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
For iRefIndex, this will always be one of...<br />
<br />
<pre><br />
MI:0326(protein)<br />
MI:0315(protein complex)<br />
</pre><br />
<br />
=== Column number: 22 (interactor_type_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||describes the type of molecule that B is <br />
|-<br />
|Example: ||<pre>MI:0326(protein)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See column 21.<br />
<br />
=== Column number: 23 (xrefs_A) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for molecule A <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list cross-references to annotation information for molecule A. For example, Gene Ontology identifiers or OMIM identifiers.<br />
<pre>omim:152430(longevity)|go:"GO:0016233"(telomere capping)</pre><br />
<br />
=== Column number: 24 (xrefs_B) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for molecule B <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
See notes to column 23.<br />
<br />
=== Column number: 25 (xrefs_Interaction) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for the interaction <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list cross-references to annotation information for the interaction. For example, Gene Ontology identifiers or OMIM identifiers.<br />
<pre>go:"GO:0048786"(presynaptic active zone)</pre><br />
<br />
=== Column number: 26 (Annotations_A) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Annotations for molecule A <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list free-text annotation information for the interaction. For example:<br />
<pre>This protein has an apparent MW of 25 kDa|This protein binds 7 zinc molecules</pre><br />
<br />
Some databases may use <tt>dataset:<em>*</em></tt> or <tt>data-processing:<em>*</em></tt> (where <tt>*</tt> is non-controlled free-text) in this column.<br />
<br />
=== Column number: 27 (Annotations_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Annotations for molecule B <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
See notes to column 26.<br />
<br />
=== Column number: 28 (Annotations_Interaction) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Annotations for interaction <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list free-text annotation information for the interaction. For example:<br />
<pre>figure-legend:F1A|prediction score:432|comment:prediction based on phage display consensus|author-confidence:8|comment:AD-ORFeome library used in the experiment.</pre><br />
The prefixes used before the <tt>:</tt> (like "comment") are database specific and not controlled.<br />
<br />
Some databases may use ''<tt>dataset:*</tt>'' or ''<tt>data-processing:*</tt>'' (where <tt>*</tt> is non-controlled free-text) in this column.<br />
<br />
=== Column number: 29 (Host_organism_taxid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||The taxonomy identifier of the host organism where the interaction was experimentally demonstrated<br />
|-<br />
|Example: || <pre>taxid:10090(Mus musculus)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This may differ from the taxonomy identifier associated with the interactors. Other possible entries are: <br />
<br />
* <tt>taxid:-1(in vitro)</tt><br />
* <tt>taxid:-4(in vivo)</tt><br />
<br />
A dash (<tt>-</tt>) will be used when no information about the host organism is available.<br />
<br />
<tt>taxid:32644(unidentified)</tt> will be used when the source specifies the host organism taxonomy identifier as 32644.<br />
<br />
=== Column number: 30 (parameters_Interaction) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Parameters for the interaction<br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
Internal note : use of this column is not well-defined or characterized.<br />
<br />
=== Column number: 31 (Creation_date) ===<br />
<br />
{|<br />
|Column type: ||String (yyyy/mm/dd)<br />
|-<br />
|Description: ||When was the entry created.<br />
|-<br />
|Example: || <pre>2010/05/06</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This will be the release date of iRefIndex for all entries in this file. <br />
<br />
This date will not match the date for the corresponding record in the source database.<br />
<br />
=== Column number: 32 (Update_date) ===<br />
<br />
{|<br />
|Column type: ||String (yyyy/mm/dd)<br />
|-<br />
|Description: ||When was this record last updated?<br />
|-<br />
|Example: || <pre>2010/05/06</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This will be the release date of iRefIndex for all entries in this file. <br />
<br />
This date will not match the date for the corresponding record in the source database.<br />
<br />
=== Column number: 33 (Checksum_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for interactor A. <br />
|-<br />
|Example: ||<pre>rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
This column contains a universal key for interactor A .<br />
|Note}}<br />
<br />
This column may be used to identify other interactors in this file that have the exact same amino acid sequence and taxon id. <br />
<br />
This universal key listed here is the ROGID (redundant object group identifier) described in the original iRefIndex paper, PMID 18823568. <br />
<br />
Column 3 lists database names and accessions that all have this same key. <br />
<br />
The ROGID for proteins, consists of the base-64 version of the SHA-1 key for the protein sequence concatenated with the taxonomy identifier for the protein. For complex nodes, the ROGID is calculated as the SHA-1 digest of the ROGIDs of all the protein participants (after first ordering them by ASCII-based lexicographical sorting in ascending order and concatenating them) See the iRefIndex paper for details. The SHA-1 key is always 27 characters long. So the ROGID will be composed of 27 characters concatenated with a taxonomy identifier for proteins.<br />
<br />
=== Column number: 34 (Checksum_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for interactor B. <br />
|-<br />
|Example: ||<pre>rogid:AhmYiMtz8lR12Gixt91txbAd3JY83333</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
See notes for column 33.<br />
<br />
=== Column number: 35 (Checksum_Interaction) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for this interaction<br />
|-<br />
|Example: ||<pre>rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column may be used to identify other rows (interaction records) in this file that describe interactions between the same set of proteins from the same taxon id.<br />
<br />
This universal key listed here is the RIGID (redundant interaction group identifier) described in the original iRefIndex paper, PMID 18823568. <br />
<br />
The RIGID consists of the ROG identifiers for each of the protein participants (see notes above) ordered by ASCII-based lexicographic sorting in ascending order, concatenated and then digested with the SHA-1 algorithm. See the iRefIndex paper for details. This identifier points to a set of redundant protein-protein interactions that involve the same set of proteins with the exact same primary sequences.<br />
<br />
=== Column number: 36 (Negative) ===<br />
<br />
{|<br />
|Column type: || Boolean (true or false)<br />
|-<br />
|Description: ||Does the interaction record provide evidence that some interaction does NOT occur.<br />
|-<br />
|Example: ||<pre>false</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This value will be false for all lines in this file since iRefIndex does not include "negative" interactions from any of the source databases.<br />
<br />
<hr><br />
<br />
{{Note|<br />
COLUMNS PAST THIS POINT (37 -) ARE NOT DEFINED BY THE PSI-MITAB2.6 STANDARD.<br />
THESE COLUMNS ARE SPECIFIC TO THIS IREFINDEX RELEASE AND MAY CHANGE FROM ONE RELEASE TO ANOTHER<br />
|Important}}<br />
<br />
=== Column number: 37 (OriginalReferenceA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used in the original interaction record to describe interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is the protein reference that was found in the original interaction record to describe interactor A. It is a colon-delimited pair of database name and accession. It may be either the primary or secondary reference for the protein provided by the source database.<br />
<br />
For complexes this will be the ROGID of the complex.<br />
<br />
=== Column number: 38 (OriginalReferenceB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used in the original interaction record to describe interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 37.<br />
<br />
=== Column number: 39 (FinalReferenceA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used by iRefIndex to describe interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Column 37 (OriginalReferenceA) was used by the iRefIndex consolidation process to arrive at this FinalReferenceA. <br />
This database name and accession pair will usually be the same as that listed in column 37, unless the provided reference was malformed, had to be updated or was ambiguous.<br />
<br />
Examples:<br />
<br />
# The original reference is malformed. For example: <tt>RefSeq:NP 036076</tt> instead of <tt>RefSeq:NP_036076</tt>.<br />
# The original reference is incomplete. For example: <tt>PDB:1KQ1|</tt> (missing chain information). <br />
# The original reference is deprecated. For example: <tt>UniProt:Q9H233</tt> (the value of FinalReferenceA will be the latest available accession in this case).<br />
# The original reference is ambiguous. For example: a gene identifier is provided (the value of FinalReferenceA will be a protein product selected in a systematic way in this case).<br />
<br />
=== Column number: 40 (FinalReferenceB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used by iRefIndex to describe interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 39.<br />
<br />
=== Column number: 41 (MappingScoreA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (columns 37) to the final protein reference (columns 39). <br />
|-<br />
|Example: ||<pre>PTUO+</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column contains a description of mapping operations as a condensed string of letters. See the original iRefIndex paper, PMID 18823568. <br />
For complexes, this column will contain <pre>-</pre>.<br />
<br />
=== Column number: 42 (MappingScoreB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (column 38) to the final protein reference (column 40). <br />
|-<br />
|Example: ||<pre>SU</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 41.<br />
<br />
=== Column number: 43 (irogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for interactor A. <br />
|-<br />
|Example: ||<pre>2345</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
This is an internal, integer-equivalent of the alphanumeric identifier in column 33 for interactor A. All interactors with the same sequence and taxon origin will have the same irogid.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 44 (irogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for interactor B.<br />
|-<br />
|Example: ||<pre>456543</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 43.<br />
<br />
=== Column number: 45 (irigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer RIGID for this interaction.<br />
|-<br />
|Example: ||<pre>1234</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is an internal, integer-equivalent of the alphanumeric identifier in column 35 for this interaction. All interactions involving the same interactors (same sequence and same taxon) will have the same irigid.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 46 (crogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric ROGID for the canonical group to which interactor A belongs. <br />
|-<br />
|Example: ||<pre>hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column may be used to identify other interactors in this file that all belong to the same canonical group.<br />
<br />
Members of a canonical group may include splice isoform products from the same or related genes. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the same taxon). One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in this column.<br />
<br />
See http://irefindex.uio.no/wiki/Canonicalization for a description of canonicalization.<br />
<br />
=== Column number: 47 (crogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric ROGID for the canonical group to which interactor B belongs. <br />
|-<br />
|Example: ||<pre>AhmYiMtz8lR12Gixt91txbAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 46.<br />
<br />
=== Column number: 48 (crigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric RIGID for the canonical group to which this interaction belongs. <br />
|-<br />
|Example: ||<pre>3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is the RIGID for this interaction calculated using the canonical ROGIDs (preceding two columns).<br />
<br />
This column may be used to identify other interactions in this file that all belong to the same canonical group.<br />
<br />
<br />
=== Column number: 49 (icrogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for the canonical group to which interactor A belongs. <br />
|-<br />
|Example: ||<pre>2345</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
This is an internal, integer-equivalent of the alphanumeric canonical ROGID in column 46 for interactor A. Interactors with the same icrogid may have different sequences but are related; e.g. different splice isoforms of the same gene.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 50 (icrogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for the canonical group to which interactor B belongs. <br />
|-<br />
|Example: ||<pre>456543</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 49.<br />
<br />
=== Column number: 51 (icrigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer RIGID for the canonical group to which this interaction belongs. <br />
|-<br />
|Example: ||<pre>12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is an internal, integer-equivalent of the canonical RIGID. See column 48.<br />
<br />
This integer may be used to query the iRefWeb interface for the interaction record. For example:<br />
<br />
http://wodaklab.org/iRefWeb/interaction/show/13653<br />
<br />
...where 13653 is the integer, canonical RIGID.<br />
<br />
This identifier serves to group together evidence for interactions that involve the same set (or a related set) of proteins.<br />
<br />
Starting with release 6.0, this canonical RIGID is stable from one release of iRefIndex to another.<br />
<br />
=== Column number: 52 (imex_id) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||IMEx identifier if available<br />
|-<br />
|Example: ||<pre>imex:IM-12202-3</pre><br />
|-<br />
|Example: ||<pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
When no information available a dash (<tt>-</tt>) will be used.<br />
<br />
=== Column number: 53 (edgetype) ===<br />
<br />
{|<br />
|Column type: ||Character<br />
|-<br />
|Description: ||Does the edge represent a binary interaction (X), member of complex (C) data, or a multimer (Y)?<br />
|-<br />
|Example: ||<pre>X</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Edges can be labelled as either <tt>X</tt>, <tt>C</tt> or <tt>Y</tt>:<br />
<br />
;<tt>X</tt><br />
:a binary interaction with two protein participants<br />
<br />
;<tt>C</tt><br />
:denotes that this edge is a binary expansion of interaction record that had 3 or more interactors (so-called "complex" or "n-ary" data). The expansion type is described in column 16 (expansion). In the case of iRefIndex, the expansion is always "bipartite" meaning that Interactor A of this row represents the complex itself and Interactor B represents a protein that is a member of this group.<br />
See [[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for further explanation.<br />
<br />
;<tt>Y</tt><br />
:for dimers and polymers. In case of dimers and polymers when the number of subunits is not described in the original interaction record, the edge is labelled with a <tt>Y</tt>. Interactor A will be identical to the Interactor B. The graphical representation of this will appear as a single node connected to itself (loop). The actual number of self-interacting subunits may be 2 (dimer) or more (say 5 for a pentamer). Refer to the original interaction record for more details and see column 54.<br />
<br />
=== Column number: 54 (numParticipants) ===<br />
<br />
{|<br />
|Column type: ||Integer<br />
|-<br />
|Description: ||Number of participants in the interaction<br />
|-<br />
|Example: ||<pre>2</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
* For edges labelled <tt>X</tt> (see column 53) this value will be two. <br />
* For edges labelled <tt>C</tt>, this value will be equivalent to the number of protein interactors in the original n-ary interaction record.<br />
* For interactions labelled <tt>Y</tt>, this value will either be the number of self-interacting subunits (if present in the original interaction record) or 1 where the exact number of subunits is unknown or unspecified.<br />
<br />
{{Note|<br />
The number of participants can be greater than the number of distinct proteins involved in an interaction because a single protein can participate more than once in an interaction. Such participation is enumerated and counted to produce the value in this column.<br />
|Important}}<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex&diff=4039
iRefIndex
2011-12-02T11:45:54Z
<p>PaulBoddie: Used a template for the notice.</p>
<hr />
<div>__NOTOC__<br />
<br />
{{Note|<br />
The FTP server used by iRefIndex is currently down and will not be restored<br />
until Monday, December 5th or later. This will affect downloads of MITAB files,<br />
Cytoscape plugin data and iRefR package data. Apologies for the inconvenience.<br />
This message will be removed once the server returns.<br />
|Important}}<br />
<br />
<div style="float:right"><br />
<facebook-like /><br />
</div><br />
<br />
<div class="floatleft"><br />
<imagemap><br />
Image:iRefIndex_logo.png|120x120px<br />
default [[iRefIndex#A_reference_index_for_protein_interaction_data]]<br />
</imagemap><br />
</div><br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, InnateDB, IntAct, MatrixDB, MINT, MPact, MPIDB, MPPI and OPHID. <br />
<br />
The iRefScape plugin for Cytoscape has been published recently [http://www.biomedcentral.com/1471-2105/12/388 here].<br />
<br />
[[iRefIndex#A_reference_index_for_protein_interaction_data|Read more]]<br />
<br />
{|class="wikitable" style="text-align:left; clear: left" border="0" cellpadding="10"<br />
|<imagemap><br />
Image:Document-save-80x80.png<br />
default [[README_MITAB2.6_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[README_MITAB2.6_for_iRefIndex|Download]] ===<br />
<br />
Download the current iRefIndex release in PSI-MITAB tab-delimited format via FTP.<br />
|<imagemap><br />
Image:Accessories-text-editor-80x80.png<br />
default [[iRefIndex_Release_Notes]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex Release Notes]] ===<br />
<br />
View release notes and news for each release of iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px<br />
default [[iRefIndex_Citations]]<br />
</imagemap><br />
| style="vertical-align: top" colspan="3" |<br />
=== [[iRefIndex_Citations | Publications, citing, citations and further reading]] ===<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:CytoscapeLogo.png|80x80px<br />
default [[iRefScape]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefScape|iRefScape]] ===<br />
<br />
iRefScape is a plugin for [http://www.cytoscape.org/ Cytoscape] that exposes iRefIndex data as a navigable graphical network.<br />
|<imagemap><br />
Image:firefox.png|80x80px<br />
default [http://wodaklab.org/iRefWeb/]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [http://wodaklab.org/iRefWeb/ iRefWeb] ===<br />
<br />
iRefWeb provides a searchable web interface to the iRefIndex. This interface was developed as part of a collaboration with the Wodak group at the hospital for Sick Children in Toronto, Canada.<br />
|-<br />
|<imagemap><br />
Image:R-logo.jpg|80x80px<br />
default [[iRefR]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefR]] ===<br />
<br />
An R package providing access to iRefIndex data.<br />
|<imagemap><br />
Image:Applications-internet-80x80.png<br />
default [[README_PSICQUIC_web_services_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
<br />
=== [[PSICQUIC|Web services]] ===<br />
<br />
iRefIndex PSICQUIC and PSISCORE web services are now running on release 9.0 of iRefIndex <br />
|-<br />
|<imagemap><br />
Image:Youtube-256.png|80x80px<br />
default [[iRefIndex_Videos]]<br />
</imagemap> <br />
| style="vertical-align: top" |<br />
<br />
=== [[iRefIndex Videos|iRefIndex videos]] ===<br />
<br />
Video learning materials for iRefIndex, iRefScape and iRefWeb.<br />
|<imagemap><br />
Image:Internet-mail-80x80.png<br />
default [[iRefIndex#Contact and mailing list]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex#Contact and mailing list|Contact information and mailing list]] ===<br />
<br />
How to get in touch with the developers.<br />
|-<br />
|<imagemap><br />
Image:Emblem-notice-80x80.png<br />
default [[Sources_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Sources_iRefIndex|Source data information]] ===<br />
<br />
Details of all the different source databases that provide the foundation for iRefIndex.<br />
|<imagemap><br />
Image:X-office-spreadsheet-80x80.png<br />
default [[Statistics_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Statistics_iRefIndex|Statistical information]] ===<br />
<br />
Statistics for the current iRefIndex release.<br />
|}<br />
<br />
== Technical information on the iRefIndex database ==<br />
<br />
Build process: [[iRefIndex Manual]]<br />
<br />
Feedback files: [[README iRefIndex Feedback]]<br />
<br />
Mapping files: [[Protein identifier mapping]]<br />
<br />
Normalization of MI cv terms: [[Mapping of terms to MI term ids - iRefIndex]]<br />
<br />
Canonicalization: [[Canonicalization]]<br />
<br />
Disease Groups: [[DiG: Disease groups]]<br />
<br />
All iRefIndex pages and archived releases: [[iRefIndex#All_iRefIndex_Pages|see below]]<br />
<br />
License and disclaimer: [[iRefIndex#License_and_disclaimer|see below]]<br />
<br />
----<br />
<br />
== A reference index for protein interaction data ==<br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including [http://bond.unleashedinformatics.com/ BIND], [http://www.thebiogrid.org/ BioGRID], [http://mips.gsf.de/genre/proj/corum/index.html CORUM], [http://dip.doe-mbi.ucla.edu/ DIP], [http://www.hprd.org/ HPRD], [http://www.ebi.ac.uk/intact/site/index.jsf IntAct], [http://mint.bio.uniroma2.it/mint/Welcome.do MINT], [http://mips.gsf.de/genre/proj/mpact MPact], [http://mips.gsf.de/proj/ppi/ MPPI] and [http://ophid.utoronto.ca/ OPHID]. This index includes multiple interaction types including physical and genetic (mapped to their corresponding protein products) as determined by a multitude of methods. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.<br />
<br />
iRefIndex assigns a global unique identifier (rigid) which looks like 'tjWXXjgPyHyT2J6EwED8zK2x18U' to identify interactions that are identical (according to the sequence and taxon ids of the interactors). iRefIndex also assigns similar looking keys to protein interactors. These keys are global meaning they can be generated by anyone using the method described in the paper. This method allows users to integrate their own data with the iRefIndex in a way that ensures proteins with the exact same sequence will be represented only once.<br />
<br />
== Publications and further reading ==<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex are provided on the [[iRefIndex Citations]] page.<br />
<br />
----<br />
<br />
== Long term goals of the iRefIndex project ==<br />
<br />
We believe that protein interaction data hold incredible potential for biomedical research. Presently, these data are collected and archived by multiple groups around the world and the number of groups taking part in this work is growing rather than diminishing. <br />
<br />
As such, it is important that these databases have the means to effectively exchange and compare data and that they are curating and representing data using similar standards in order to make their data accessible and allow effective use.<br />
<br />
To this end, the iRefIndex project has three long term objectives:<br />
<br />
;1) to facilitate exchange of interaction data between interaction databases. <br />
<br />
:The iRefIndex paper describes a method for assigning unique and global identifiers to protein interactors, interactions and complexes. This method is independent of the iRefIndex resource and may be used by anyone to facilitate exchange and consolidation of data.<br />
<br />
;2) to consolidate interaction data from multiple sources. <br />
<br />
:The method has been used by to index interaction records from multiple sources. The resulting iRefIndex may be used search for the existence of interaction data for any protein regardless of the original resource. Nine interaction databases have been incorporated so far, others will follow.<br />
<br />
;3) to provide feedback to source interaction databases. <br />
<br />
:During the process of data consolidation, iRefIndex uses a sophisticated method to keep track of potential problems with source records such as outdated or unfound protein identifiers or incorrectly assigned taxonomy identifiers. These data are provided as feedback files to source interaction databases for correction, clarification or improvements to our own system. This process will help to harmonize data representation and improve the overall quality of interaction records for all source databases. This process will also help source databases to exchange data with one another.<br />
<br />
== iRefIndex availability ==<br />
<br />
iRefIndex is made available in a number of formats: MITAB tab-delimited text files, iRefWeb interface, iRefScape plugin for Cytoscape, PSICQUIC Web services, and an interface for the R programming language environment. See the links at the top of this page. For the license and disclaimer, [[iRefIndex#License_and_disclaimer | see below]].<br />
<br />
== Credits and collaborations ==<br />
<br />
'''Sabry Razick and Ian Donaldson''' developed iRefIndex at the Biotechnology Centre of Oslo, University of Oslo. <br />
<br />
'''Paul Boddie''' provides ongoing maintenance and development.<br />
<br />
'''George Magklaras''' provides systems engineer support and [http://www.no.embnet.org/ EMBNet Norway] provided hardware support.<br />
<br />
'''Antonio Mora''' developed [[iRefR|iRefR]].<br />
<br />
'''Katerina Michalickova''' developed [[DiG:_Disease_groups|Disease groups]].<br />
<br />
'''Brian Turner and Andrei Turinsky''' from the [http://wodaklab.org/ws/ Wodak group] at the Hospital for Sick Children in Toronto, Canada developed the [http://wodaklab.org/iRefWeb/ iRefWeb interface].<br />
<br />
<imagemap><br />
Image:IMEx_logo_webmedium.jpg|100x100px<br />
default [http://www.psimex.org]<br />
</imagemap> <br />
<br />
iRefIndex is a PSIMex partner: http://www.psimex.org<br />
<br />
<!--<br />
=== iRefWeb in the NCBI LinkOut programme ===<br />
<br />
Many [http://www.ncbi.nlm.nih.gov/gene Entrez Gene] records provided by NCBI contain links to iRefWeb in the [http://www.ncbi.nlm.nih.gov/projects/linkout/index.html LinkOut] section, allowing users to consult iRefWeb for related protein interactions when browsing gene information. The software which exposes iRefIndex information to the LinkOut programme can be found on the [[iRefWeb LinkOut Generator]] page.<br />
<br />
--><br />
----<br />
<br />
== License and disclaimer==<br />
<br />
Data released on the public FTP site is released under the [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution 2.5 Generic (CC BY 2.5) license].<br />
<br />
<imagemap><br />
Image:By-100x35.png<br />
default [http://creativecommons.org/licenses/by/2.5/]<br />
</imagemap><br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
----<br />
<br />
== Contact and mailing list ==<br />
<br />
Suggestions, requests and comments are welcome.<br />
<br />
<pre>ian.donaldson@biotek.uio.no</pre><br />
<br />
Full contact details are available at the [http://www.biotek.uio.no/english/research/groups/donaldson-group/ group home page].<br />
<br />
<imagemap><br />
Image:google-groups-logo.gif<br />
default [http://groups.google.com/group/irefindex]<br />
</imagemap><br />
<br />
See the [http://groups.google.com/group/irefindex iRefIndex Google Group] for announcements and discussion.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=File:Lyle_HTS_Part3.pdf&diff=4038
File:Lyle HTS Part3.pdf
2011-12-02T11:14:05Z
<p>PaulBoddie: </p>
<hr />
<div></div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=File:Lyle_HTS_Part2.pdf&diff=4037
File:Lyle HTS Part2.pdf
2011-12-02T11:13:33Z
<p>PaulBoddie: </p>
<hr />
<div></div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=File:Lyle_HTS_Part1.pdf&diff=4036
File:Lyle HTS Part1.pdf
2011-12-02T11:12:55Z
<p>PaulBoddie: </p>
<hr />
<div></div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex_MITAB2.6_Parser&diff=4031
iRefIndex MITAB2.6 Parser
2011-11-29T16:52:33Z
<p>PaulBoddie: Fixed the snapshot label.</p>
<hr />
<div>A tool has been developed to parse the MITAB files produced in the [[iRefIndex Build Process]]. Currently, the tool is capable of parsing the MITAB format described on the page [[README MITAB2.6 for iRefIndex]].<br />
<br />
== Obtaining the MITAB Parser ==<br />
<br />
The parser and associated resources are available for download here:<br />
<br />
* Snapshot 2011-11-29: [http://irefindex.uio.no/hg/mitab/archive/8b936902f616.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/mitab/archive/8b936902f616.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/mitab/archive/8b936902f616.zip zip archive]<br />
* [http://irefindex.uio.no/hg/mitab/ mitab repository home]<br />
<br />
== Prerequisites ==<br />
<br />
The following programs are required to use the parser:<br />
<br />
* [http://www.python.org/ Python] (tested with 2.5.4)<br />
* [http://www.postgresql.org/ PostgreSQL] (tested with 8.1.17, 9.0.4)<br />
<br />
== Running the Parser ==<br />
<br />
Given a directory for the iRefIndex output files such as...<br />
<br />
<pre>/home/irefindex/output</pre><br />
<br />
...run the parser as follows:<br />
<br />
<pre>python parse_mitab.py /home/irefindex/output/All.mitab.03042009.txt</pre><br />
<br />
It will be necessary to change the date details included in the above filename to match the actual name of the appropriate file found in your own output directory.<br />
<br />
== Creating the Database ==<br />
<br />
A database can be created using the usual PostgreSQL tools:<br />
<br />
<pre>createdb -E unicode mitab_irefindex</pre><br />
<br />
This database is initialised as follows:<br />
<br />
<pre>psql -f init_mitab.sql mitab_irefindex</pre><br />
<br />
Should the database tables need to be dropped (perhaps in case of problems with the import), the following command can be used:<br />
<br />
<pre>psql -f drop_mitab.sql mitab_irefindex</pre><br />
<br />
== Populating the Database ==<br />
<br />
The database is populated as follows:<br />
<br />
<pre>python database_action.py mitab_irefindex import_mitab.sql</pre><br />
<br />
As a result, a number of tables representing the structure of the data should be available in the database. For applications built to use this data, indexes may need creating in order to make querying more efficient.<br />
<br />
== Notes on the Populated Database ==<br />
<br />
The schema used by the populated database attempts to model the data as effectively as possible using a number of tables:<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! Entity type<br />
! Tables<br />
! Table purpose<br />
! Notable columns<br />
! Source columns (if different or converted)<br />
|-<br />
| rowspan="6" | Interaction<br />
| mitab_interactions<br />
| Model each interaction referencing interactors<br />
| rigid, edgetype, numParticipants, crigid<br />
|<br />
|-<br />
| mitab_sources<br />
| Represent sources for each interaction<br />
| rigid, ''sourcedb'', ''name''<br />
| sourcedb<br />
|-<br />
| mitab_interaction_type_names<br />
| Represent interaction types for each interaction<br />
| rigid, ''code'', ''name''<br />
| interactionType<br />
|-<br />
| mitab_interaction_identifiers<br />
| Represent interaction identifiers for each interaction<br />
| rigid, ''dbname'', ''uid''<br />
| interactionIdentifiers<br />
|-<br />
| mitab_confidence<br />
| Represent confidence scores for each interaction<br />
| rigid, ''type'', ''confidence''<br />
| confidence<br />
|-<br />
| mitab_interaction_rigs<br />
| Represent alternative integer identifiers for each interaction<br />
| ''uid'', ''rig''<br />
| rigid, irigid<br />
|-<br />
| Canonical interaction<br />
| mitab_canonical_interaction_rigs<br />
| Represent alternative integer identifiers for each canonical interaction<br />
| ''uid'', ''rig''<br />
| crigid, icrigid<br />
|-<br />
| rowspan="3" | Experiment<br />
| mitab_method_names<br />
| Represent detection methods for each interaction<br />
| rigid, ''code'', ''name''<br />
| method<br />
|-<br />
| mitab_authors<br />
| Represent publication authors for each interaction<br />
| rigid, ''author''<br />
| author<br />
|-<br />
| mitab_pubmed<br />
| Represent publication identifiers for each interaction<br />
| rigid, ''pmid''<br />
| pmids<br />
|-<br />
| rowspan="4" | Interactor<br />
| mitab_interactions<br />
| Model each interaction referencing interactors<br />
| uidA, uidB, taxA, taxB, atype, btype, crogidA, crogidB<br />
|<br />
|-<br />
| mitab_aliases<br />
| Represent aliases for each interactor<br />
| ''uid'', ''dbname'', ''alias''<br />
| uidA or uidB, aliasA or aliasB<br />
|-<br />
| mitab_alternatives<br />
| Represent alternative identifiers for each interactor<br />
| ''uid'', ''dbname'', ''alt''<br />
| uidA or uidB, altA or altB<br />
|-<br />
| mitab_interactor_rogs<br />
| Represent alternative integer identifiers for each interactor<br />
| ''uid'', ''rog''<br />
| uidA or uidB, irogA or irogB<br />
|-<br />
| Canonical interactor<br />
| mitab_canonical_interactor_rogs<br />
| Represent alternative integer identifiers for each canonical interactor<br />
| ''uid'', ''rog''<br />
| crogidA or crogidB, icrogA or icrogB<br />
|}<br />
<br />
Some changes in representation occur when creating the database:<br />
<br />
* Prefixed values are generally split to expose the prefix and identifier, name or value following it.<br />
** The various interaction and interactor prefixes (such as <tt>irefindex:</tt>, <tt>rigid:</tt>, <tt>rogid:</tt>, <tt>crigid:</tt> and <tt>crogid:</tt>) are omitted from interaction and interactor columns. '''Note''' that for non-iRefIndex data, any prefixes other than these will be retained, although this approach may be revised in future.<br />
** Source identifiers are split with the prefix (such as <tt>intact:</tt>) used to make a dbname column with the actual identifier stored in its own column (such as alias or alt).<br />
* The "empty value" (<tt>-</tt>) should never appear as an identifier, and where such a value is used in a list, that element should be excluded. This is pertinent in the case of vocabulary terms where <tt>MI:0000</tt> might be used together with an empty list of identifiers or names as an "empty collection" indicator.<br />
* Duplicate values in lists are generally discarded.<br />
<br />
Further work may include the introduction of a separate interactor table, collecting related information for each interactor. Support for interactor identifiers other than ROG identifiers may be improved, with a new column potentially being introduced to indicate the type of each identifier.<br />
<br />
=== Canonical interactors and interactions ===<br />
<br />
The <tt>mitab_interactions</tt> table incorporates the canonical interaction and interactors alongside the specific interaction and interactors. A separate <tt>mitab_canonical_interactor_rogs</tt> table is used to map canonical interactors to integer identifiers, just as <tt>mitab_interactor_rogs</tt> does so for specific interactors.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=README_MITAB2.6_for_iRefIndex_9.0&diff=4030
README MITAB2.6 for iRefIndex 9.0
2011-11-29T16:51:24Z
<p>PaulBoddie: Added parser icon/link.</p>
<hr />
<div><div class="floatright" style="text-align: center"><br />
'''iRefIndex 9.0 Downloads'''<br />
<imagemap><br />
Image:Document-save-80x80.png<br />
default [ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/psimi_tab/MITAB2.6/]<br />
</imagemap><br />
<br />
'''Parsing MITAB Format Data'''<br />
<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[iRefIndex_MITAB2.6_Parser]]<br />
</imagemap><br />
</div><br />
<br />
Last edited: {{REVISIONYEAR}}-{{padleft:{{REVISIONMONTH}}|2}}-{{REVISIONDAY2}}<br />
<br />
Applies to iRefIndex release: 9.0<br />
<br />
Release date: 2011-11-07<br />
<br />
Download location: ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/psimi_tab/MITAB2.6/<br>(use <tt>anonymous</tt> as the login and your email address as the password)<br />
<br />
Authors: Ian Donaldson, Sabry Razick, Paul Boddie<br />
<br />
Database: iRefIndex (http://irefindex.uio.no)<br />
<br />
Organization: Biotechnology Centre of Oslo, University of Oslo <br />
(http://www.biotek.uio.no/) <br />
<br />
[[#Description|License of the source database]].<br />
<br />
== <span style="color:#0f0086"> Description </span> ==<br />
<br />
This file describes the contents of the <br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/current/psimi_tab/<br />
<br />
directory and the format of the tab-delimited text files contained within. Each index file follows the PSI-MITAB2.6 format with additional columns for annotating edges and nodes. Assignment of source interaction records to these redundant groups is described at http://irefindex.uio.no. The PSI-MI2.6 format plus additional columns is described below.<br />
<br />
A supplementary file lists just database:accession pairs for proteins and their mapping to irog, icrog and Entrez Gene identifiers. See<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/current/Mappingfiles/<br />
<br />
and README at<br />
<br />
http://irefindex.uio.no/wiki/Protein_identifier_mapping<br />
<br />
This file is precalculated from the MITAB distribution as a convenience to users.<br />
<br />
Details on the build process are available from the publication PMID 18823568.<br />
<br />
This distribution includes data consolidated using the iRefIndex method for BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPPI, MPact and OPHID.<br />
<br />
<br />
{|<br />
|Sources || http://irefindex.uio.no/wiki/Sources_iRefIndex_8.0<br />
|-<br />
|Statistics || http://irefindex.uio.no/wiki/Statistics_iRefIndex_8.0<br />
|-<br />
|Download location || ftp://ftp.no.embnet.org/irefindex/data/current/psimi_tab/<br><br />
|}<br />
<br />
== Directory contents ==<br />
<br />
{|<br />
|<tt>README</tt> ||pointer to this file<br />
|-<br />
|<tt>xxxx.mitab.mmddyyyy.txt.zip</tt> ||individual indices in PSI-MITAB2.6 format<br><br />
|}<br />
<br />
iRefIndex data is distributed as a set of tab-delimited text files with names of the form <tt>xxxx.mitab.mmddyyyy.txt.zip</tt> where <tt>mmddyyyy</tt> represents the file's creation date.<br />
<br />
The complete index is available as <tt>All.mitab.mmddyyyy.txt.zip</tt> .<br />
<br />
Taxon specific data sets are also available for:<br />
<br />
{|<br />
| ||'''Taxon Id'''<br />
|-<br />
|Homo sapiens ||9606 (human)<br />
|-<br />
|Mus musculus ||10090 (mouse)<br />
|-<br />
|Rattus norvegicus ||10116 (brown rat)<br />
|-<br />
|Caenorhabditis elegans ||6239 (nematode)<br />
|-<br />
|Drosophila melanogaster ||7227 (fruit fly)<br />
|-<br />
|Saccharomyces cerevisiae ||4932 (baker's yeast)<br />
|-<br />
|Saccharomyces cerevisiae S288c ||559292<br />
|-<br />
|Escherichia coli. ||562 (E. Coli)<br />
|-<br />
|Other ||other<br />
|-<br />
|All ||all<br />
|}<br />
<br />
Taxon specific subsets of the data are named <tt>xxxx.mitab.mmddyyyy.txt.zip</tt> where <tt>xxxx</tt> is the taxonomy identifier of at least one of the interactors according to either the source interaction database or the sequence database record. Each zip compressed file contains a single text file with the corresponding name <tt>xxxx.mitab.mmddyyyy.txt</tt>.<br />
<br />
In some cases, other objects may belong to other taxons if a virus-host interaction is being represented or if a protein from another organism has been used to model a protein in the specified organism. <br />
<br />
Taxonomy identifiers are provided in the data sets allowing these exceptions to be identified. The taxonomy identifiers listed are derived from the source protein sequence record. In some cases, this taxonomy identifier will be a child of the taxon listed in the file's title; for example, Escherichia coli K12 (taxonomy identifier 83333) will appear in the Escherichia coli (taxonomy identifier 562) file.<br />
<br />
A description of the NCBI taxon identifiers is available at the following location:<br />
<br />
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy <br />
<br />
The above data taxon division scheme leads to duplications; for instance, an interaction present in the mouse index could also appear in the human index if the interaction record lists protein sequence records from both human and mouse. The <tt>All.mitab.mmddyyyy</tt> file is a complete and non-redundant listing. <br />
<br />
The data format and divisions provided in this initial release were chosen in the hopes that they would be immediately useful to the largest possible set of users. Other formats and divisions are possible and we welcome your input on future releases.<br />
<br />
== Changes from last version ==<br />
<br />
This is the third release of iRefIndex in PSI-MITAB2.6 format.<br />
<br />
* RIGIDs in previous releases of iRefIndex were [[Bugzilla:242|incorrectly computed]]. Although the properties of such RIGIDs were not compromised - distinct RIGIDs should still have referred to distinct interactions - each RIGID made use of substantially less information from its components. RIGIDs in this release should now be computed correctly.<br />
* [[Bugzilla:245|Duplicate lines]] are now no longer produced in the MITAB output. Previously, database records containing additional information not reproduced in the MITAB output were written to the files on a record-by-record basis. However, since these individual records provide no useful additional information purely through their presence, and since the result is merely a collection of redundant records, lines which are the same as others are now filtered out when writing the MITAB files.<br />
* Many proteins previously assigned the 4932 taxonomy identifier have been [[Bugzilla:247|recategorised]] as having taxonomy identifier 559292. Thus, for convenience, an additional 559292 file is produced alongside the existing (but substantially smaller) 4932 file to hold interactions involving proteins associated with both taxons.<br />
* [[Bugzilla:248|Interactions not involving proteins associated with a specific organism]] are now excluded from organism-specific files. Note that complexes may consist of a number of lines where interactors may have a different taxonomy identifier from that of the specific file being consulted, but in such cases there will always be a member of the complex labelled with the appropriate taxonomy identifier, and thus the complex describes a "mixed species" interaction which should be retained just as binary interactions are where one participant is native to the file and the other is "foreign".<br />
* Previously, PubMed identifiers were being given as interaction detection methods for CORUM-originating interactions. This has now been [[Bugzilla:249|resolved]].<br />
<br />
References:<br />
<br />
* http://code.google.com/p/psimi/issues/detail?id=2<br />
* http://code.google.com/p/psimi/wiki/PsimiTabFormat<br />
<br />
=== Mapping to Legacy RIGIDs ===<br />
<br />
A mapping from current to legacy RIGIDs is provided on the FTP site as <tt>legacy.txt</tt> at the following location:<br />
<br />
ftp://ftp.no.embnet.org/irefindex/data/archive/release_9.0/<br />
<br />
== Known Issues ==<br />
<br />
* We have replaced the pipe character (<tt>|</tt>) of the PDB identifiers with an underscore character (<tt>_</tt>) <br />
<br />
This decision was taken to avoid unexpected parsing problems: the PSI-MITAB format uses pipes (<tt>|</tt>) as a separator character where multiple values occur in the same column.<br />
<br />
As a result, column number 37 (OriginalReferenceA) and column number 38 (OriginalReferenceB) may differ from the original reference in such cases.<br />
<br />
== Understanding the iRefIndex MITAB format ==<br />
<br />
iRefIndex is distributed in PSI-MITAB format. Version 2.5 of the format was originally described in PMID 17925023 ([http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2189715/?tool=pubmed full text]). This file describes the columns defined by version 2.6 of the PSI-MITAB format plus columns added by iRefIndex.<br />
<br />
Since the PSI-MITAB format allows for only two interactors to be described on each line, it is best suited for describing binary interaction data (the original experiment, say yeast two hybrid, gives a binary readout). However, other source PSI-MI XML source records will describe interactions involving only one interactor type (dimers or multimers) or they will contain associative (also known as "n-ary") interaction data from, for example, immunoprecipitation experiments where the exact interactions between any pair of interactors are unknown. These cases are problematic for the PSI-MITAB format. This document describes exactly how we use the MITAB format to describe these alternate (non-binary) interaction types.<br />
<br />
=== What each line represents ===<br />
<br />
Each line or row in the MITAB file represents a ''single'' interaction record from one primary data source describing an interaction involving the exact same set of proteins (as defined by their primary sequence and taxonomy identifiers).<br />
<br />
{{Note|Each line in this file represents a single source database record that supports either:<br />
<br />
# an interaction between two proteins (binary interaction) or<br />
# the membership of a protein in some complex (complex membership) or<br />
# an interaction that involves only one protein type (multimer or self-interaction).<br />
<br />
|Important}}<br />
<br />
<br />
Each row in this table has a natural key pointing to an original interaction record in some source database that is listed under column 14 (interactionIdentifier). For example:<br />
<br />
intact:EBI-761694<br />
<br />
{{Note|<br />
Prior to release 7.0, each line represented a ''group'' of interaction records involving the exact same set of proteins (as defined by their primary sequence and taxonomy identifiers). This ''collapsed'' or non-redundant format did not allow us to easily describe meta-data associated with each source record. Therefore, we have moved to this ''expanded'' or redundant version. Users can still collapse multiple rows that all provide evidence for an interaction between the same set of proteins using the keys provided (for example, RIGIDs).<br />
}}<br />
<br />
Rows in this table that all provide evidence for an interaction between the same set of proteins can be identified using the RIGID key (redundant interaction group identifier). The RIGID is a 27 character key that is derived from the ROGIDs of the interactors involved in the interaction record. The ROGID is a SHA-1 digest of the protein interactor's primary amino acid sequence concatenated with the NCBI taxonomy identifier (see the paper for details).<br />
<br />
{{Note|<br />
The RIGID key is now listed (by itself) in column 35 (Checksum_Interaction) as part of the new extended PSI-MITAB format. This is a universal key that can be generated by each and every interaction database and may be included in MITAB2.6 distributions from other source databases. The intention of this key is to aid third party integration of data collected from multiple databases (for example, from PSICQUIC web services). <br />
}}<br />
<br />
=== Representation of interactions ===<br />
<br />
==== Binary interaction data ====<br />
<br />
This is the most common data type.<br />
<br />
For binary interaction data, column 53 (edgetype) will contain an X. Interactors A and B will list the two proteins for which interaction evidence is provided in the row. User's should pay close attention to columns 12 (interactionType) and 7 (Method) when deciding what binary data they wish to accept as evidence of a direct physical interaction.<br />
<br />
==== Complexes (a.k.a. n-ary data) ====<br />
<br />
Certain experimental methods (like immunoprecipitations) provide evidence that a list of 3 or more proteins are associated but cannot provide evidence for a direct interaction between any given pair of proteins in that list. <br />
<br />
In these cases, interactor A (column 1) is used as a placeholder to represent the ''complex'' or ''list'' of proteins while interactor B is used to list one of the members of the list: therefore, the entire ''n-ary interaction record'' is described using one row for each interactor. Each of these rows will have the same ''interactor A''. This method of representation is referred to as a '''bi-partite model''' since there are two kinds of nodes corresponding to complexes and proteins. <br />
<br />
These interactions are marked by a C in column 53 (edgetype).<br />
<br />
As an example, let’s say that a source interaction record contained interactors A, B and C found by affinity purification and mass-spec where a tagged version of protein A was used as the bait protein to perform the immunoprecipitation. <br />
<br />
Then we would represent the complex in the MITAB file using three lines:<br />
<br />
X-A<br />
X-B<br />
X-C<br />
<br />
All three entries would have the same string in column 1 (the RIGID for the complex). All three entries would have a C in column 53 (edgetype).<br />
<br />
Other databases take an interaction record with multiple interactors (n-ary data) and make a list of binary interactions (based on the spoke or matrix model) and then list these binary interactions in the MITAB. For the example above, using a '''spoke model''' to transform the data into a set of binary interactions, these data would be represented using two lines in the MITAB file:<br />
<br />
A-B<br />
A-C<br />
<br />
Here A is chosen as the "hub" of the spoke model since it was the "bait" protein. For experimental systems that do not have "baits" and "preys" (such as X-ray crystallography), an arbitrary protein might be chosen as the bait.<br />
<br />
Alternatively, a '''matrix model''' might be used to transform the n-ary data into a list of binary interactions. Here all pairwise combinations of interactors in the original n-ary data are represented as binary interactions. So, in the above example, the immunoprecipitated complex would be represented using three lines of the MITAB file:<br />
<br />
A-B<br />
B-C<br />
A-C<br />
<br />
All three methods for representing n-ary data in a MITAB file (bi-partite, spoke, and matrix) are different representations of the same data. The model type that is chosen to describe n-ary data is listed in column 16 (expansion) of the MITAB2.6 format.<br />
<br />
We have chosen to use the bi-partite method of representation so that it is impossible to mistake spoke or matrix binary entries for true binary entries; the identifiers used for complexes will, of course, not appear in a protein database and any programme that tries to treat complex identifiers as though they were protein identifiers will fail. The method allows you to reconstruct the members of the original interaction record that describes a complex of proteins (say from an affinity purification experiment). From there, you can choose to make a spoke or matrix model by yourself if you want. <br />
<br />
Users are advised that other databases may use spoke and matrix model representations of complexes in the MITAB format. <br />
<br />
==== Intramolecular interactions and multimers ====<br />
<br />
These row types form a minority of the data and are rare incomparison to the above types.<br />
<br />
Sometimes source interaction records in PSI-MI format only list one interactor. These are cases where either<br />
<br />
<ol><br />
<li>an intra-molecular interaction is being represented or</li><br />
<li>a multimer (3 or more) of some protein is being represented.</li><br />
</ol> <br />
These records are difficult to represent in the PSI-MITAB format because PSI-MITAB requires that each row (interaction) list two interactors. <br />
We are representing these interaction records using the following format to reflect the original format provided as closely as possible.<br />
<ol><br />
<li>Interactions involving only one interactor. The uidA and uidB would be the same and the edge type would be 'Y' (column number 53 (edgetype)). Therefore, when ever there is an edge type 'Y' this means that this interaction involves only one protein (although the interaction is given as between two interactors), and thus column number 54 (numParticipants) would always be 1. For example:<br />
<pre>{A - A, edge type 'Y', numParticipants=1}</pre></li><br />
<li>When the interaction is described as involving two interactors but both of them refer to the same protein. This would be represented as a normal binary interaction and would have the edge type = 'X' (column number 53 (edgetype)), and thus column number 54 (numParticipants) would always be 2. For example:<br />
<pre>{A - A, edge type 'X', numParticipants=2}</pre></li><br />
<li>When the interaction is described as involving more than 2 interactors and all those interactors are referring to the same protein, a bi-partite representation will be used. The edge type would be 'C' (column number 53 (edgetype)). For example, with regard to complexes (a.k.a. n-ary data):<br />
<pre><br />
{C - A, edge type 'C', numParticipants=3<br />
C - A, edge type 'C', numParticipants=3<br />
C - A, edge type 'C', numParticipants=3}<br />
</pre></li><br />
</ol><br />
<br />
We draw extra attention to the fact that the RIGID (column number 35 (Checksum_Interaction)) for these interactions will be the SHA-1 digest of the ROGIDs for each of the distinct subunit types (see columns 33 (Checksum_A) and 34 (Checksum_B)). Thus interactions involving 1, 2 or more subunits of the same protein would all have the same RIGID.<br />
<br />
=== Keys for grouping together redundant interactors and interactions ===<br />
<br />
A number of keys are provided in this file to help users group together rows that all provide evidence for some kind of interaction between the same set (or a related set) of proteins. See columns 33-35 (Checksum_A, Checksum_B and Checksum_Interaction) and 43-51 (integer identifier and canonical data columns).<br />
<br />
The process of creating keys that group proteins and interactions into canonical groups was described after the original paper in the [[Canonicalization]] document. <br />
<br />
=== Provenance data ===<br />
<br />
Provenance data (where we retrieved source records from and how we mapped interactors and interactions to ROGIDs) is described in columns 37-42 (original and final references plus mapping scores).<br />
<br />
== License ==<br />
<br />
Data released on this public ftp site are released under the Creative <br />
Commons Attribution License http://creativecommons.org/licenses/by/2.5/. <br />
This means that you are free to use, modify and redistribute these data <br />
for personal or commercial use so long as you provide appropriate <br />
credit. See next section.<br />
<br />
<br />
Copyright © 2008-2011 Ian Donaldson<br />
<br />
== Citation ==<br />
<br />
Credit should include citing the iRefIndex paper (PMID 18823568) and any of the <br />
source databases upon which this resource is based. See <br />
http://irefindex.uio.no for appropriate citations.<br />
<br />
== Disclaimer ==<br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY <br />
WARRANTY; without even the implied warranty of MERCHANTABILITY or <br />
FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
== Description of PSI-MITAB2.6 file ==<br />
<br />
Each line in this file represents a single source database record that supports either:<br />
<br />
# an interaction between two proteins (binary interaction) or<br />
# the membership of a protein in some complex (complex membership) or<br />
# an interaction that involves only one protein type (multimer or self-interaction).<br />
<br />
=== Column number: 1 (uidA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Unique identifier for interactor A. <br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column contains an identifier, taken from a major database, for a protein representing the interactor A. A UniProt or a RefSeq accession is provided (in that order of preference) wherever possible. See column 3 for a list of prefixes that may be employed in this column in addition to the following:<br />
<br />
;<tt>complex</tt><br />
:If interactor A is being used to represent a complex, then the rogid for the complex will be listed here, such as the following:<br />
<br />
<pre>complex:xBr9cTXgzPLNxsaKiYyHcoEm/DM</pre><br />
<br />
[[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for an explanation.<br />
<br />
In rare cases, a rogid may appear here if a protein interactor has a sequence but no known, valid ''<tt>database:accession</tt>'' pair.<br />
<br />
=== Column number: 2 (uidB)===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Unique identifier interactor B.<br />
|-<br />
|Example: ||<pre>uniprotkb:P06722</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 1.<br />
<br />
=== Column number: 3 (altA)===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Alternative identifiers for interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367|refseq:NP_418591|entrezgene/locuslink:948691|rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333|irogid:12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
All ''<tt>database:accession</tt>'' pairs listed in Column 3 point to protein records that describe the exact same sequence from the same taxon.<br />
<br />
Each pipe-delimited entry is a database_name:accession pair delimited by a colon. Database names are taken from the MI controlled vocabulary at the following location:<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
Database references listed in this column may include the following:<br />
<br />
;<tt>uniprotkb</tt><br />
:The accessions this protein is known by in UniProt (http://www.uniprot.org/). More information regarding this protein can be retrieved using this accession from UniProt. See the AC line in the flat file. http://au.expasy.org/sprot/userman.html#AC_line. <br />
;<tt>refseq</tt><br />
:If a protein accession exists in the RefSeq data base (http://www.ncbi.nlm.nih.gov/RefSeq/) that reference is indicated here. More information about this protein can be obtained from RefSeq using this accession. <br />
;<tt>entrezgene/locuslink</tt><br />
:NCBI gene Identifiers for the gene encoding this protein. See ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq column GeneID given protein's accession.version<br />
;<tt><em>other</em></tt><br />
:If none of the three identifier types are available then other <tt><em>databasename</em>:<em>accession</em></tt> pairs will be listed. These database names may not follow the MI controlled vocabulary.<br />
<br />
Example:<br />
<br />
<pre>emb:CAA44868.1|gb:AAA23715.1|gb:AAB02995.1|emb:CAA56736.1|uniprot:P24991</pre><br />
<br />
;<tt>rogid</tt><br />
:Column 33 repeated here for convenience.<br />
<br />
;<tt>irogid</tt><br />
:Column 43 repeated here for convenience.<br />
<br />
{{Note|<br />
The rogid of a complex or a n-ary interaction is the rigid of that <br />
interaction. However, the irogid of the complex is not the irigid.<br />
The irogid for the complex is an integer and it is non-overlapping <br />
with any protein irogids<br />
}}<br />
<br />
=== Column number: 4 (altB)===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Alternative identifiers for interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P06722|refseq:NP_417308|entrezgene/locuslink:947299</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 3. (Columns 34 and 44 are related to this column.)<br />
<br />
=== Column number: 5 (aliasA) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Aliases for interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:MUTL_ECOLI|entrezgene/locuslink:mutL|crogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333|icrogid:12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each pipe-delimited entry is a <tt><em>database name</em>:<em>alias</em></tt> pair delimited by a <br />
colon. Database names are taken from the PSI-MI controlled vocabulary <br />
at the following location:<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
Database names and sources listed in this column may include the following:<br />
<br />
;<tt>uniprotkb:<em>entry name</em></tt><br />
:the entry name given by UniProt. See the description for "Entry name" in the section of http://au.expasy.org/sprot/userman.html#ID_line concerning the "ID (IDentification)" line of the flat file<br />
;<tt>entrezgene/locuslink:<em>symbol</em></tt><br />
:the NCBI gene symbol for the gene encoding this protein. See the section in ftp://ftp.ncbi.nlm.nih.gov/gene/README for <tt>gene_info</tt>, specifically details for the <tt>Symbol</tt> column<br />
;<tt>crogid</tt><br />
:Column 46 repeated here for convenience.<br />
;<tt>icrogid</tt><br />
:Column 49 repeated here for convenience.<br />
;<tt>other db:accession pairs</tt><br />
:Other db:accession pairs may be added (after icrogid) that all belong to the same canonical group. These are purely meant to facilitate look-up by PSICQUIC and other services - these sequences are related (but not identical) with interactor A sequence.<br />
;<tt>NA</tt><br />
:<tt>NA</tt> may be listed here if aliases are <em>not available</em><br />
<br />
=== Column number: 6 (aliasB) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Aliases for interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:MUTH_ECOLI|entrezgene/locuslink:mutH</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 5. (Columns 47 and 50 are related to this column.)<br />
<br />
=== Column number: 7 (Method) ===<br />
<br />
{|<br />
|Column type: ||String <br />
|-<br />
|Description: ||Interaction detection method<br />
|-<br />
|Example: ||<pre>MI:0039(2h fragment pooling)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
Only a single method will appear in this column. Previously, multiple methods appeared.<br />
}}<br />
<br />
Both the controlled vocabulary term identifier for the method (e.g. MI:0399) and the controlled vocabulary term short label in brackets (e.g. 2h fragment pooling) will appear in this column. See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to look up controlled vocabulary term identifiers.<br />
<br />
The interaction detection method is from the original record. Path for PSI-MI 2.5:<br />
<br />
<pre>entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel/</pre><br />
<br />
<br />
{{Note|<br />
If a controlled vocabulary term identifier was not provided by the source database then an attempt was made to use the supplied short label to find the correct term identifier. If a term identifier could not be found, then <tt>MI:0000</tt> will appear before the shortLabels.<br />
}}<br />
<br />
<tt>NA</tt> or <tt>-1</tt> may appear in place of a recognised shortLabel.<br />
<br />
For example:<br />
<br />
<pre><br />
MI:0000(-1)<br />
MI:0000(NA)<br />
</pre><br />
<br />
=== Column number: 8 (author) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||<br />
|-<br />
|Example: ||<pre>hall-1999-1|hall-1999-2|mansour-2001-1|mansour-2001-2|hall-1999</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
According to MITAB2.6 format this column should contain a pipe-delimited list of author surnames in which the interaction has been shown.<br />
<br />
{{Note|<br />
This column will usually include only one author name reference. However, some experimental evidences have secondary references which could be included here.<br />
This filed also includes references which are not author names as in the following examples:<br />
* OPHID Predicted Protein Interaction<br />
* HPRD Text Mining Confirmation<br />
* MINT Text Mining Confirmation<br />
}}<br />
<br />
=== Column number: 9 (pmids) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||PubMed Identifiers<br />
|-<br />
|Example: ||<pre>pubmed:9880500|pubmed:11585365</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is a non-redundant list of PubMed identifiers pointing to literature that supports the interaction. <br />
According to MITAB2.6 format, this column should contain a pipe-delimited set of <tt><em>databaseName</em>:<em>identifier</em></tt> pairs such as <tt>pubmed:12345</tt>.<br />
The source database name is always <tt>pubmed</tt>.<br />
<br />
{{Note|<br />
This column will usually include only one PubMed reference that describes where the experimental evidence is found. In some cases, secondary references are provided by the source database and will be included here.<br />
}}<br />
<br />
<br />
The special value <tt>-</tt> may appear in place of the identifiers.<br />
<br />
=== Column number: 10 (taxa) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Taxonomy identifier for canonical interactor A<br />
|-<br />
|Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre><br />
<br />
|}<br />
<br />
'''Notes'''<br />
<br />
The NCBI taxonomy identifier listed here is that of the sequence record for the interactor and may be corrected from what was provided by the source database. See the methods section of the iRefIndex paper for more details. See also the NCBI taxonomy database at the following location:<br />
<br />
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy<br />
<br />
According to the MITAB2.6 format, this column should contain a pipe delimited set of <tt><em>databaseName</em>:<em>identifier</em></tt> pairs such as <tt>taxid:12345</tt>. The source database name has been listed as taxid since it is always NCBI's taxonomy database. The value in this column will be <tt>NA</tt> if the interactor is a complex.<br />
<br />
=== Column number: 11 (taxb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Taxonomy identifier for canonical interactor B<br />
|-<br />
|Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 10.<br />
<br />
=== Column number: 12 (interactionType) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Interaction Type from controlled vocabulary or short label<br />
|-<br />
|Example: ||<pre>MI:0218(physical interaction)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
Only one interaction type will be present in each line of the file.<br />
}}<br />
<br />
The interaction type is taken from the PSI-MI controlled vocabulary and represented as...<br />
<br />
<pre>database:identifier(interaction type)</pre><br />
<br />
...(when available in the interaction record) or Path for PSI-MI 2.5:<br />
<br />
<pre>entrySet/entry/interactionList/interaction/interactionType/names/shortLabel</pre><br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for interaction types.<br />
<br />
{{Note|<br />
If the MI controlled vocabulary identifier was not provided by the source database, but a text description was provided, then an attempt was made to map the text to the correct controlled vocabulary term identifier.<br />
If this was not possible then <tt>MI:0000</tt> is listed.<br />
|Change}}<br />
<br />
<tt>NA</tt> may be listed here if the interaction type is not available (meaning that we could not find the interaction type in the record provided by the source database).<br />
<br />
=== Column number: 13 (sourcedb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Source database for this interaction record <br />
|-<br />
|Example: ||<pre>MI:0469(intact)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Taken from the PSI-MI controlled vocabulary and represented as...<br />
<br />
<pre>database:identifier(source name)</pre><br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for database sources.<br />
<br />
{{Note|<br />
Only one source database will be listed in each row.<br />
|Change}}<br />
<br />
=== Column number: 14 (interactionIdentifier) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||source interaction-database and accession<br />
|-<br />
|Example: ||<pre>intact:EBI-761694|rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA|irigid:1234|edgetype:X</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each reference is presented as a <tt><em>database name</em>:<em>identifier</em></tt> pair. <br />
<br />
{{Note|<br />
The source database is listed first. Additional information is pipe-delimited and presented here for the convenience of PSICQUIC web-service users (these services presently truncate this file at column 15 as they only support MITAB2.5). See columns 35,45,53. <br />
|Change}}<br />
<br />
The source database names that appear in this column are taken from the<br />
PSI-MI controlled vocabulary at the following location (where possible):<br />
<br />
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI<br />
<br />
If an interaction record identifier is not provided by the source database, this entry will appear as <tt><em>database-name</em>:-</tt> with the identifier region replaced with a dash (<tt>-</tt>).<br />
<br />
=== Column number: 15 (confidence) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Confidence scores<br />
|-<br />
|Example: ||<pre>lpr:1|hpr:12|np:1|PSICQUIC entries are truncated here. See irefindex.uio.no</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Each reference is presented as a <tt>''scoreName'':''score''</tt> pair. Three confidence <br />
scores are provided: <tt>lpr</tt>, <tt>hpr</tt> and <tt>np</tt>.<br />
<br />
PubMed Identifiers (PMIDs) point to literature references that support <br />
an interaction. A PMID may be used to support more than one interaction. <br />
<br />
The lpr score (lowest PMID re-use) is the lowest number of distinct <br />
interactions (RIGIDs: see column 35) that any one PMID (supporting the <br />
interaction in this row) is used to support. A value of one indicates <br />
that at least one of the PMIDs supporting this interaction has never <br />
been used to support any other interaction. This likely indicates that <br />
only one interaction was described by that reference and that the <br />
present interaction is not derived from high throughput methods.<br />
<br />
The hpr score (highest PMID re-use) is the highest number of <br />
interactions (RIGIDs: see column 35) that any one PMID (supporting the <br />
interaction in this row) is used to support. A high value (e.g. greater <br />
than 50) indicates that one PMID describes at least 50 other <br />
interactions and it is more likely that high-throughput methods were <br />
used.<br />
<br />
The np score (number PMIDs) is the total number of unique PMIDs used to <br />
support the interaction described in this row.<br />
<br />
<tt>-</tt> may appear in the score field, indicating the absence of a score value.<br />
<br />
----<br />
<br />
{{Note|<br />
COLUMNS PAST THIS POINT (16 - 31) ARE PART OF THE NEW PSI-MITAB 2.6 FORMAT<br />
|Note}}<br />
<br />
=== Column number: 16 (expansion) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Model used to convert n-ary data into binary data for purpose of export in MITAB file<br />
|-<br />
|Example: ||<pre>bipartite</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
For iRefIndex, this column will always contain either <tt>bipartite</tt> or <tt>none</tt>.<br />
<br />
Other databases may use either <tt>spoke</tt> or <tt>matrix</tt> or <tt>none</tt> in this column.<br />
<br />
See <br />
[[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for an explanation.<br />
<br />
=== Column number: 17 (biological_role_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Biological role of interactor A<br />
|-<br />
|Example: ||<pre>MI:0501(enzyme)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
When provided by the source database, this includes single entries such as <tt>MI:0501(enzyme)</tt>, <tt>MI:0502(enzyme target)</tt>, <tt>MI:0580(electron acceptor)</tt>, or <tt>MI:0499(unspecified role)</tt>.<br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to browse possible values for biological role.<br />
<br />
For complexes and when no role is specified this column will indicate an unspecified role.<br />
<br />
=== Column number: 18 (biological_role_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Biological role of interactor B<br />
|-<br />
|Example: ||<pre>MI:0501(enzyme)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 17.<br />
<br />
=== Column number: 19 (experimental_role_A) ===<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Indicates the experimental role of the interactor (such as bait or prey). <br />
|-<br />
|Example: ||<pre>MI:0496(bait)</pre><br />
|-<br />
|Example: ||<pre>MI:0498(prey)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column indicates the experimental role (if any was provided by the source database) that was played by interactor A.<br />
<br />
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to see definitions of bait and prey.<br />
as well as browse other possible values of experimental role that may appear in this column for other databases.<br />
<br />
For complexes and when no role is specified this column will contain the following:<br />
<br />
<pre>MI:0499(unspecified role)</pre><br />
<br />
=== Column number: 20 (experimental_role_B) ===<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Indicates the experimental role of the interactor (such as bait or prey). <br />
|-<br />
|Example: ||<pre>MI:0496(bait)</pre><br />
|-<br />
|Example: ||<pre>MI:0498(prey)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column indicates the experimental role (if any) that was played by interactor B.<br />
<br />
See notes above for column 19.<br />
<br />
=== Column number: 21 (interactor_type_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||describes the type of molecule that A is <br />
|-<br />
|Example: ||<pre>MI:0326(protein)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
For iRefIndex, this will always be one of...<br />
<br />
<pre><br />
MI:0326(protein)<br />
MI:0315(protein complex)<br />
</pre><br />
<br />
=== Column number: 22 (interactor_type_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||describes the type of molecule that B is <br />
|-<br />
|Example: ||<pre>MI:0326(protein)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See column 21.<br />
<br />
=== Column number: 23 (xrefs_A) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for molecule A <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list cross-references to annotation information for molecule A. For example, Gene Ontology identifiers or OMIM identifiers.<br />
<pre>omim:152430(longevity)|go:"GO:0016233"(telomere capping)</pre><br />
<br />
=== Column number: 24 (xrefs_B) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for molecule B <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
See notes to column 23.<br />
<br />
=== Column number: 25 (xrefs_Interaction) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||xrefs for the interaction <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list cross-references to annotation information for the interaction. For example, Gene Ontology identifiers or OMIM identifiers.<br />
<pre>go:"GO:0048786"(presynaptic active zone)</pre><br />
<br />
=== Column number: 26 (Annotations_A) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Annotations for molecule A <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list free-text annotation information for the interaction. For example:<br />
<pre>This protein has an apparent MW of 25 kDa|This protein binds 7 zinc molecules</pre><br />
<br />
Some databases may use <tt>dataset:<em>*</em></tt> or <tt>data-processing:<em>*</em></tt> (where <tt>*</tt> is non-controlled free-text) in this column.<br />
<br />
=== Column number: 27 (Annotations_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Annotations for molecule B <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
See notes to column 26.<br />
<br />
=== Column number: 28 (Annotations_Interaction) ===<br />
<br />
{|<br />
|Column type: ||Pipe-delimited set of strings<br />
|-<br />
|Description: ||Annotations for interaction <br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
This column may be used by other databases to list free-text annotation information for the interaction. For example:<br />
<pre>figure-legend:F1A|prediction score:432|comment:prediction based on phage display consensus|author-confidence:8|comment:AD-ORFeome library used in the experiment.</pre><br />
The prefixes used before the <tt>:</tt> (like "comment") are database specific and not controlled.<br />
<br />
Some databases may use ''<tt>dataset:*</tt>'' or ''<tt>data-processing:*</tt>'' (where <tt>*</tt> is non-controlled free-text) in this column.<br />
<br />
=== Column number: 29 (Host_organism_taxid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||The taxonomy identifier of the host organism where the interaction was experimentally demonstrated<br />
|-<br />
|Example: || <pre>taxid:10090(Mus musculus)</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This may differ from the taxonomy identifier associated with the interactors. Other possible entries are: <br />
<br />
* <tt>taxid:-1(in vitro)</tt><br />
* <tt>taxid:-4(in vivo)</tt><br />
<br />
A dash (<tt>-</tt>) will be used when no information about the host organism is available.<br />
<br />
<tt>taxid:32644(unidentified)</tt> will be used when the source specifies the host organism taxonomy identifier as 32644.<br />
<br />
=== Column number: 30 (parameters_Interaction) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Parameters for the interaction<br />
|-<br />
|Example: || <pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is not used by iRefIndex. A dash (<tt>-</tt>) will always appear in this column.<br />
<br />
Internal note : use of this column is not well-defined or characterized.<br />
<br />
=== Column number: 31 (Creation_date) ===<br />
<br />
{|<br />
|Column type: ||String (yyyy/mm/dd)<br />
|-<br />
|Description: ||When was the entry created.<br />
|-<br />
|Example: || <pre>2010/05/06</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This will be the release date of iRefIndex for all entries in this file. <br />
<br />
This date will not match the date for the corresponding record in the source database.<br />
<br />
=== Column number: 32 (Update_date) ===<br />
<br />
{|<br />
|Column type: ||String (yyyy/mm/dd)<br />
|-<br />
|Description: ||When was this record last updated?<br />
|-<br />
|Example: || <pre>2010/05/06</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This will be the release date of iRefIndex for all entries in this file. <br />
<br />
This date will not match the date for the corresponding record in the source database.<br />
<br />
=== Column number: 33 (Checksum_A) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for interactor A. <br />
|-<br />
|Example: ||<pre>rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
{{Note|<br />
This column contains a universal key for interactor A .<br />
|Note}}<br />
<br />
This column may be used to identify other interactors in this file that have the exact same amino acid sequence and taxon id. <br />
<br />
This universal key listed here is the ROGID (redundant object group identifier) described in the original iRefIndex paper, PMID 18823568. <br />
<br />
Column 3 lists database names and accessions that all have this same key. <br />
<br />
The ROGID for proteins, consists of the base-64 version of the SHA-1 key for the protein sequence concatenated with the taxonomy identifier for the protein. For complex nodes, the ROGID is calculated as the SHA-1 digest of the ROGIDs of all the protein participants (after first ordering them by ASCII-based lexicographical sorting in ascending order and concatenating them) See the iRefIndex paper for details. The SHA-1 key is always 27 characters long. So the ROGID will be composed of 27 characters concatenated with a taxonomy identifier for proteins.<br />
<br />
=== Column number: 34 (Checksum_B) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for interactor B. <br />
|-<br />
|Example: ||<pre>rogid:AhmYiMtz8lR12Gixt91txbAd3JY83333</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
See notes for column 33.<br />
<br />
=== Column number: 35 (Checksum_Interaction) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Hash key for this interaction<br />
|-<br />
|Example: ||<pre>rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column may be used to identify other rows (interaction records) in this file that describe interactions between the same set of proteins from the same taxon id.<br />
<br />
This universal key listed here is the RIGID (redundant interaction group identifier) described in the original iRefIndex paper, PMID 18823568. <br />
<br />
The RIGID consists of the ROG identifiers for each of the protein participants (see notes above) ordered by ASCII-based lexicographic sorting in ascending order, concatenated and then digested with the SHA-1 algorithm. See the iRefIndex paper for details. This identifier points to a set of redundant protein-protein interactions that involve the same set of proteins with the exact same primary sequences.<br />
<br />
=== Column number: 36 (Negative) ===<br />
<br />
{|<br />
|Column type: || Boolean (true or false)<br />
|-<br />
|Description: ||Does the interaction record provide evidence that some interaction does NOT occur.<br />
|-<br />
|Example: ||<pre>false</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This value will be false for all lines in this file since iRefIndex does not include "negative" interactions from any of the source databases.<br />
<br />
<hr><br />
<br />
{{Note|<br />
COLUMNS PAST THIS POINT (37 -) ARE NOT DEFINED BY THE PSI-MITAB2.6 STANDARD.<br />
THESE COLUMNS ARE SPECIFIC TO THIS IREFINDEX RELEASE AND MAY CHANGE FROM ONE RELEASE TO ANOTHER<br />
|Important}}<br />
<br />
=== Column number: 37 (OriginalReferenceA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used in the original interaction record to describe interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is the protein reference that was found in the original interaction record to describe interactor A. It is a colon-delimited pair of database name and accession. It may be either the primary or secondary reference for the protein provided by the source database.<br />
<br />
For complexes this will be the ROGID of the complex.<br />
<br />
=== Column number: 38 (OriginalReferenceB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used in the original interaction record to describe interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 37.<br />
<br />
=== Column number: 39 (FinalReferenceA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used by iRefIndex to describe interactor A<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Column 37 (OriginalReferenceA) was used by the iRefIndex consolidation process to arrive at this FinalReferenceA. <br />
This database name and accession pair will usually be the same as that listed in column 37, unless the provided reference was malformed, had to be updated or was ambiguous.<br />
<br />
Examples:<br />
<br />
# The original reference is malformed. For example: <tt>RefSeq:NP 036076</tt> instead of <tt>RefSeq:NP_036076</tt>.<br />
# The original reference is incomplete. For example: <tt>PDB:1KQ1|</tt> (missing chain information). <br />
# The original reference is deprecated. For example: <tt>UniProt:Q9H233</tt> (the value of FinalReferenceA will be the latest available accession in this case).<br />
# The original reference is ambiguous. For example: a gene identifier is provided (the value of FinalReferenceA will be a protein product selected in a systematic way in this case).<br />
<br />
=== Column number: 40 (FinalReferenceB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Database name and reference used by iRefIndex to describe interactor B<br />
|-<br />
|Example: ||<pre>uniprotkb:P23367</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 39.<br />
<br />
=== Column number: 41 (MappingScoreA) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (columns 37) to the final protein reference (columns 39). <br />
|-<br />
|Example: ||<pre>PTUO+</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column contains a description of mapping operations as a condensed string of letters. See the original iRefIndex paper, PMID 18823568. <br />
For complexes, this column will contain <pre>-</pre>.<br />
<br />
=== Column number: 42 (MappingScoreB) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (column 38) to the final protein reference (column 40). <br />
|-<br />
|Example: ||<pre>SU</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 41.<br />
<br />
=== Column number: 43 (irogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for interactor A. <br />
|-<br />
|Example: ||<pre>2345</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
This is an internal, integer-equivalent of the alphanumeric identifier in column 33 for interactor A. All interactors with the same sequence and taxon origin will have the same irogid.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 44 (irogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for interactor B.<br />
|-<br />
|Example: ||<pre>456543</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 43.<br />
<br />
=== Column number: 45 (irigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer RIGID for this interaction.<br />
|-<br />
|Example: ||<pre>1234</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is an internal, integer-equivalent of the alphanumeric identifier in column 35 for this interaction. All interactions involving the same interactors (same sequence and same taxon) will have the same irigid.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 46 (crogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric ROGID for the canonical group to which interactor A belongs. <br />
|-<br />
|Example: ||<pre>hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This column may be used to identify other interactors in this file that all belong to the same canonical group.<br />
<br />
Members of a canonical group may include splice isoform products from the same or related genes. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the same taxon). One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in this column.<br />
<br />
See http://irefindex.uio.no/wiki/Canonicalization for a description of canonicalization.<br />
<br />
=== Column number: 47 (crogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric ROGID for the canonical group to which interactor B belongs. <br />
|-<br />
|Example: ||<pre>AhmYiMtz8lR12Gixt91txbAd3JY83333</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 46.<br />
<br />
=== Column number: 48 (crigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Alphanumeric RIGID for the canonical group to which this interaction belongs. <br />
|-<br />
|Example: ||<pre>3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is the RIGID for this interaction calculated using the canonical ROGIDs (preceding two columns).<br />
<br />
This column may be used to identify other interactions in this file that all belong to the same canonical group.<br />
<br />
<br />
=== Column number: 49 (icrogida) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for the canonical group to which interactor A belongs. <br />
|-<br />
|Example: ||<pre>2345</pre><br />
|}<br />
<br />
'''Notes''' <br />
<br />
This is an internal, integer-equivalent of the alphanumeric canonical ROGID in column 46 for interactor A. Interactors with the same icrogid may have different sequences but are related; e.g. different splice isoforms of the same gene.<br />
<br />
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.<br />
<br />
=== Column number: 50 (icrogidb) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer ROGID for the canonical group to which interactor B belongs. <br />
|-<br />
|Example: ||<pre>456543</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
See notes for column 49.<br />
<br />
=== Column number: 51 (icrigid) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||Integer RIGID for the canonical group to which this interaction belongs. <br />
|-<br />
|Example: ||<pre>12345</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
This is an internal, integer-equivalent of the canonical RIGID. See column 48.<br />
<br />
This integer may be used to query the iRefWeb interface for the interaction record. For example:<br />
<br />
http://wodaklab.org/iRefWeb/interaction/show/13653<br />
<br />
...where 13653 is the integer, canonical RIGID.<br />
<br />
This identifier serves to group together evidence for interactions that involve the same set (or a related set) of proteins.<br />
<br />
Starting with release 6.0, this canonical RIGID is stable from one release of iRefIndex to another.<br />
<br />
=== Column number: 52 (imex_id) ===<br />
<br />
{|<br />
|Column type: ||String<br />
|-<br />
|Description: ||IMEx identifier if available<br />
|-<br />
|Example: ||<pre>imex:IM-12202-3</pre><br />
|-<br />
|Example: ||<pre>-</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
When no information available a dash (<tt>-</tt>) will be used.<br />
<br />
=== Column number: 53 (edgetype) ===<br />
<br />
{|<br />
|Column type: ||Character<br />
|-<br />
|Description: ||Does the edge represent a binary interaction (X), member of complex (C) data, or a multimer (Y)?<br />
|-<br />
|Example: ||<pre>X</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
Edges can be labelled as either <tt>X</tt>, <tt>C</tt> or <tt>Y</tt>:<br />
<br />
;<tt>X</tt><br />
:a binary interaction with two protein participants<br />
<br />
;<tt>C</tt><br />
:denotes that this edge is a binary expansion of interaction record that had 3 or more interactors (so-called "complex" or "n-ary" data). The expansion type is described in column 16 (expansion). In the case of iRefIndex, the expansion is always "bipartite" meaning that Interactor A of this row represents the complex itself and Interactor B represents a protein that is a member of this group.<br />
See [[#Understanding_the_iRefIndex_MITAB_format|Understanding the iRefIndex MITAB format]] for further explanation.<br />
<br />
;<tt>Y</tt><br />
:for dimers and polymers. In case of dimers and polymers when the number of subunits is not described in the original interaction record, the edge is labelled with a <tt>Y</tt>. Interactor A will be identical to the Interactor B. The graphical representation of this will appear as a single node connected to itself (loop). The actual number of self-interacting subunits may be 2 (dimer) or more (say 5 for a pentamer). Refer to the original interaction record for more details and see column 54.<br />
<br />
=== Column number: 54 (numParticipants) ===<br />
<br />
{|<br />
|Column type: ||Integer<br />
|-<br />
|Description: ||Number of participants in the interaction<br />
|-<br />
|Example: ||<pre>2</pre><br />
|}<br />
<br />
'''Notes'''<br />
<br />
* For edges labelled <tt>X</tt> (see column 53) this value will be two. <br />
* For edges labelled <tt>C</tt>, this value will be equivalent to the number of protein interactors in the original n-ary interaction record.<br />
* For interactions labelled <tt>Y</tt>, this value will either be the number of self-interacting subunits (if present in the original interaction record) or 1 where the exact number of subunits is unknown or unspecified.<br />
<br />
{{Note|<br />
The number of participants can be greater than the number of distinct proteins involved in an interaction because a single protein can participate more than once in an interaction. Such participation is enumerated and counted to produce the value in this column.<br />
|Important}}<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex_MITAB2.6_Parser&diff=4029
iRefIndex MITAB2.6 Parser
2011-11-29T16:48:14Z
<p>PaulBoddie: Updated the release information.</p>
<hr />
<div>A tool has been developed to parse the MITAB files produced in the [[iRefIndex Build Process]]. Currently, the tool is capable of parsing the MITAB format described on the page [[README MITAB2.6 for iRefIndex]].<br />
<br />
== Obtaining the MITAB Parser ==<br />
<br />
The parser and associated resources are available for download here:<br />
<br />
* Snapshot 2011-11-19: [http://irefindex.uio.no/hg/mitab/archive/8b936902f616.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/mitab/archive/8b936902f616.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/mitab/archive/8b936902f616.zip zip archive]<br />
* [http://irefindex.uio.no/hg/mitab/ mitab repository home]<br />
<br />
== Prerequisites ==<br />
<br />
The following programs are required to use the parser:<br />
<br />
* [http://www.python.org/ Python] (tested with 2.5.4)<br />
* [http://www.postgresql.org/ PostgreSQL] (tested with 8.1.17, 9.0.4)<br />
<br />
== Running the Parser ==<br />
<br />
Given a directory for the iRefIndex output files such as...<br />
<br />
<pre>/home/irefindex/output</pre><br />
<br />
...run the parser as follows:<br />
<br />
<pre>python parse_mitab.py /home/irefindex/output/All.mitab.03042009.txt</pre><br />
<br />
It will be necessary to change the date details included in the above filename to match the actual name of the appropriate file found in your own output directory.<br />
<br />
== Creating the Database ==<br />
<br />
A database can be created using the usual PostgreSQL tools:<br />
<br />
<pre>createdb -E unicode mitab_irefindex</pre><br />
<br />
This database is initialised as follows:<br />
<br />
<pre>psql -f init_mitab.sql mitab_irefindex</pre><br />
<br />
Should the database tables need to be dropped (perhaps in case of problems with the import), the following command can be used:<br />
<br />
<pre>psql -f drop_mitab.sql mitab_irefindex</pre><br />
<br />
== Populating the Database ==<br />
<br />
The database is populated as follows:<br />
<br />
<pre>python database_action.py mitab_irefindex import_mitab.sql</pre><br />
<br />
As a result, a number of tables representing the structure of the data should be available in the database. For applications built to use this data, indexes may need creating in order to make querying more efficient.<br />
<br />
== Notes on the Populated Database ==<br />
<br />
The schema used by the populated database attempts to model the data as effectively as possible using a number of tables:<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! Entity type<br />
! Tables<br />
! Table purpose<br />
! Notable columns<br />
! Source columns (if different or converted)<br />
|-<br />
| rowspan="6" | Interaction<br />
| mitab_interactions<br />
| Model each interaction referencing interactors<br />
| rigid, edgetype, numParticipants, crigid<br />
|<br />
|-<br />
| mitab_sources<br />
| Represent sources for each interaction<br />
| rigid, ''sourcedb'', ''name''<br />
| sourcedb<br />
|-<br />
| mitab_interaction_type_names<br />
| Represent interaction types for each interaction<br />
| rigid, ''code'', ''name''<br />
| interactionType<br />
|-<br />
| mitab_interaction_identifiers<br />
| Represent interaction identifiers for each interaction<br />
| rigid, ''dbname'', ''uid''<br />
| interactionIdentifiers<br />
|-<br />
| mitab_confidence<br />
| Represent confidence scores for each interaction<br />
| rigid, ''type'', ''confidence''<br />
| confidence<br />
|-<br />
| mitab_interaction_rigs<br />
| Represent alternative integer identifiers for each interaction<br />
| ''uid'', ''rig''<br />
| rigid, irigid<br />
|-<br />
| Canonical interaction<br />
| mitab_canonical_interaction_rigs<br />
| Represent alternative integer identifiers for each canonical interaction<br />
| ''uid'', ''rig''<br />
| crigid, icrigid<br />
|-<br />
| rowspan="3" | Experiment<br />
| mitab_method_names<br />
| Represent detection methods for each interaction<br />
| rigid, ''code'', ''name''<br />
| method<br />
|-<br />
| mitab_authors<br />
| Represent publication authors for each interaction<br />
| rigid, ''author''<br />
| author<br />
|-<br />
| mitab_pubmed<br />
| Represent publication identifiers for each interaction<br />
| rigid, ''pmid''<br />
| pmids<br />
|-<br />
| rowspan="4" | Interactor<br />
| mitab_interactions<br />
| Model each interaction referencing interactors<br />
| uidA, uidB, taxA, taxB, atype, btype, crogidA, crogidB<br />
|<br />
|-<br />
| mitab_aliases<br />
| Represent aliases for each interactor<br />
| ''uid'', ''dbname'', ''alias''<br />
| uidA or uidB, aliasA or aliasB<br />
|-<br />
| mitab_alternatives<br />
| Represent alternative identifiers for each interactor<br />
| ''uid'', ''dbname'', ''alt''<br />
| uidA or uidB, altA or altB<br />
|-<br />
| mitab_interactor_rogs<br />
| Represent alternative integer identifiers for each interactor<br />
| ''uid'', ''rog''<br />
| uidA or uidB, irogA or irogB<br />
|-<br />
| Canonical interactor<br />
| mitab_canonical_interactor_rogs<br />
| Represent alternative integer identifiers for each canonical interactor<br />
| ''uid'', ''rog''<br />
| crogidA or crogidB, icrogA or icrogB<br />
|}<br />
<br />
Some changes in representation occur when creating the database:<br />
<br />
* Prefixed values are generally split to expose the prefix and identifier, name or value following it.<br />
** The various interaction and interactor prefixes (such as <tt>irefindex:</tt>, <tt>rigid:</tt>, <tt>rogid:</tt>, <tt>crigid:</tt> and <tt>crogid:</tt>) are omitted from interaction and interactor columns. '''Note''' that for non-iRefIndex data, any prefixes other than these will be retained, although this approach may be revised in future.<br />
** Source identifiers are split with the prefix (such as <tt>intact:</tt>) used to make a dbname column with the actual identifier stored in its own column (such as alias or alt).<br />
* The "empty value" (<tt>-</tt>) should never appear as an identifier, and where such a value is used in a list, that element should be excluded. This is pertinent in the case of vocabulary terms where <tt>MI:0000</tt> might be used together with an empty list of identifiers or names as an "empty collection" indicator.<br />
* Duplicate values in lists are generally discarded.<br />
<br />
Further work may include the introduction of a separate interactor table, collecting related information for each interactor. Support for interactor identifiers other than ROG identifiers may be improved, with a new column potentially being introduced to indicate the type of each identifier.<br />
<br />
=== Canonical interactors and interactions ===<br />
<br />
The <tt>mitab_interactions</tt> table incorporates the canonical interaction and interactors alongside the specific interaction and interactors. A separate <tt>mitab_canonical_interactor_rogs</tt> table is used to map canonical interactors to integer identifiers, just as <tt>mitab_interactor_rogs</tt> does so for specific interactors.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Bioinformatics_course&diff=4025
Bioinformatics course
2011-11-28T13:39:38Z
<p>PaulBoddie: /* Bioinformatics links relevant to the course */ Updated iRefScape link.</p>
<hr />
<div><div class="floatright"><br />
<imagemap><br />
Image:Bioinfo_course_logo.jpg|400x400px<br />
default [[Bioinformatics course]]<br />
</imagemap><br />
</div><br />
<br />
=Bioinformatics for molecular biology - Fall 2011=<br />
<br />
'''November 21st to December 2nd'''<br />
<br />
==Description==<br />
The aim of the course is to introduce students to bioinformatics resources and tools for molecular biology research by having some of the best researchers in Norway to talk about their field in general and then present their own work. Students are encouraged to bring a lap-top; we will be set up for in-course demonstrations as well as practical lab exercises. The course is intended for biology students or computer science/math students. No prior background in bioinformatics or computer science is required.<br />
<br />
The course is jointly delivered by the Biotechnology Centre of Oslo, the Department of Molecular Biosciences (IMBV), the Department of Informatics (IFI) and the Norwegian University of Life Sciences. This course is one of the Ph.D. School courses offered by the Biotechnology Centre of Oslo (http://www.biotek.uio.no/ny-web/events/).<br />
The UiO page for this course is http://www.uio.no/studier/emner/matnat/molbio/MBV-INF4410/.<br />
<br />
'''Registration is open now (June of 2011). '''<br />
<br />
MBV-INF 4410 (M.Sc. level course code)10.0 study points<br />
<br />
MBV-INF 9410 (Ph.D. level course code) 10.0 study points<br />
<br />
MBV-INF 9410A (Ph.D. level course code) 8.0 study points<br />
<br />
The course consists of two weeks of lectures, a final take-home exam (one week) and an essay (10 to 20 pages) to be completed by the middle of December. <br />
<br />
Ph.D. level students may opt to take the course without the essay for only 8 study points. <br />
<br />
Please bookmark this page. All future changes or announcements for the 2011 course will be posted to this page.<br />
<br />
'''Information:''' <br />
ragni.indahl@biotek.uio.no (about course administration) <br />
ian.donaldson@biotek.uio.no (about course content)<br />
'''Registration:''' <br />
torill.rortveit@imbv.uio.no<br />
<br />
==Dates and times==<br />
'''The course will occur November 21st to December 2nd.'''<br />
<br />
Each day will consist of three time slots for lectures and/or practical labs between 9 AM and 4 PM.<br />
<br />
==New Place!== <br />
'''Mondays and Tuesdays:'''<br />
<br />
Prolog Seminar room, Ole Johan Dahls Hus, IFI2<br />
<br />
Gaustadalleen 23c<br />
<br />
Same floor as entrance level. Use entrance nearest to Problemveien.<br />
<br />
'''Wednesdays, Thursdays and Fridays:'''<br />
<br />
Seminar room 510 in Veglaboratoriet <br />
Gaustadalleen 25<br />
<br />
Fifth floor.<br />
<br />
<br />
<br />
'''Map'''<br />
<br />
This [http://maps.google.com/maps/ms?ie=UTF&msa=0&msid=213421894609917298556.0004ae60c37d5c2baba4b map]shows closest entrances to use for both buildings.<br />
<br />
'''Closest T-bane'''<br />
<br />
Forskningsparken.<br />
<br />
==Contacts during the course==<br />
<br />
Ian Donaldson (course coordinator) ian.donaldson@biotek.uio.no +47 99115149<br />
<br />
Problems with room access or audiovisual<br />
<br />
Sigrun Lien: 22852953<br />
<br />
Line Valbø: 22852415<br />
<br />
==Programme==<br />
<br />
<br />
'''Note:<br />
The schedule displayed below is tentative.<br />
Ongoing changes will be made to this page as we organize <br />
speakers before and during the course. <br />
Requests and suggestions are welcome.<br />
For examples of material presented last year, see [http://donaldson.uio.no/wiki/Bioinformatics_for_molecular_biology_2010 Bioinformatics for molecular biology 2010]).'''<br />
<br />
{| class="wikitable" style="text-align:left" cellpadding="5"<br />
!width="20%"|<br />
!width="27%"|<br />
!width="27%"|<br />
!width="27%"|<br />
|-<br />
| align="center" style="background:grey; color: white" colspan="4"|'''Week 1: Monday, November 21st - Friday, November 25th'''<br />
|-style="background: steelblue; color: black"<br />
| ||Session 1||Session 2||Session 3 <br />
|-<br />
| ||09:00 – 10:45 ||11:00 – 12:45 ||14:00 – 15:45 <br />
|-style="background: lightgrey; color: black"<br />
| Mon. 21st<br />
||[[Media:Databaselecturenotes.pdf|Database lecture notes]]<br />
||<br />
[[Media:Working_with_common_db_identifiers.pdf |Working with identifiers]]<br />
<br />
[[Media:Identifier_conversion_excercise.pdf |Excercise]]<br />
<br />
||[http://www.perl.org/get.html Install Perl]<br />
|-style="background: lightgrey; color: black"<br />
| ||Ian Donaldson||Ian Donaldson, Antonio Mora, Paul Boddie||Ian Donaldson, Antonio Mora, Paul Boddie<br />
|-<br />
| Tue. 22nd<br />
||[[Introductory Perl | Perl]]<br />
||[[Introductory Perl | More Perl]]<br />
||[[Introductory Perl | Perl lab]]<br />
<br />
|-<br />
<br />
| ||Antonio Mora||Antonio Mora||Antonio Mora, Paul Boddie<br />
<br />
|-style="background: lightgrey; color: black"<br />
<br />
| Wed. 23rd<br />
||[[Media:R_lecture.pdf | Introduction to R]]<br />
||[[Media:Intro_R_lab.pdf | R lab]]<br />
||R lab<br />
<br />
|-style="background: lightgrey; color: black"<br />
| ||Bjørn-Helge Mevik||Bjørn-Helge Mevik, Katerina Michalickova, Antonio Mora ||Bjørn-Helge Mevik, Katerina Michalickova, Antonio Mora <br />
|-<br />
| Thur. 24th<br />
||[[Media:Exploratory_Data_Analysis.pdf | Exploratory data analysis]] [http://bioinformatics.uio.no/wiki/Image:Exploratory_data_analysis_extra_materials.zip Extra material]<br />
<br />
[[analyse.r | R script]]<br />
||An introduction to statistical inference<br />
||[[Media:Multiple_hypothesis_testing.pdf | Multiple hypothesis testing]]<br />
|-<br />
| ||Anja Bråthern Kristoffersen||TBA||Clara-Cecilie Günter<br />
|-style="background: lightgrey; color: black"<br />
| Fri. 25th<br />
||[[Media:R-lab-sn.pdf | Microarray data analysis]]<br />
||Microarray data lab<br />
||[[Media:Working_with_Gene_Lists_and_Over-representation_analysis.pdf | Gene lists and ORA]]<br />
<br />
[[Media:GO_DAVID_and_ORA_lab.pdf | Lab]]<br />
<br />
|-style="background: lightgrey; color: black"<br />
| ||Ståle Nygård||Ståle Nygård||Donaldson<br />
|-<br />
| ||||||<br />
|-<br />
| ||||||<br />
|-<br />
| align="center" style="background:grey; color: white" colspan="4"|'''Week 2: Monday, November 28th - Friday, December 2nd.<br />
|-style="background: steelblue; color: black"<br />
| ||Session 1 ||Session 2||Session 3<br />
|-<br />
| ||09:00 – 10:45 ||11:00 – 12:45 ||14:00 – 15:45 <br />
|-style="background: lightgrey; color: black"<br />
| Mon 28th<br />
||[[Media:Interaction_data_resources.pdf | Interaction data resources]]<br />
||[http://wiki.cytoscape.org/Presentations/Basic Cytoscape lab]<br />
||[[iRefScape | Cytoscape plugin lab]]<br />
<br />
|-style="background: lightgrey; color: black"<br />
| ||Donaldson||Donaldson Mora Boddie||Donaldson Mora Boddie<br />
|-<br />
| Tue. 29th ||ht sequencing||ht sequence lab||<br />
|-<br />
| ||Robert Lyle||Robert Lyle||<br />
|-style="background: lightgrey; color: black"<br />
| Wed 30th||Searching sequence databases and multiple sequence alignments||Motif scanning and discovery in DNA||Sequence lab<br />
|-style="background: lightgrey; color: black"<br />
| ||Torbjørn Rognes||Geir Sandve||Geir Sandve<br />
|-<br />
| Thur 1st||Structural biology review, PyMOL and installing PyMOL||Structural biology tools, predictors and 3D modelling||PyMOL and structural biology tutorial<br />
|-<br />
| ||Jon K. Laerdahl||Jon K. Laerdahl||Jon K. Laerdahl<br />
|-style="background: lightgrey; color: black"<br />
| Fri 2nd||Modeling guide||Modeling excercises||Homology modeling excercise<br />
|-style="background: lightgrey; color: black"<br />
| ||Jon K. Laerdahl||Jon K. Laerdahl||Jon K. Laerdahl<br />
|-<br />
| <br />
|}<br />
<br />
== Written assignment ==<br />
<br />
<br />
Students enrolled in MBV-INF4410 or 9410 must complete a written assignment as part of the course requirements. <br />
<br />
The assignment is due by Friday, December 16th. It should be emailed to ian.donaldson at biotek.uio.no preferably as a PDF document (Microsoft Word or OpenOffice is also acceptable). The assignment is to be between 10 pages and 20 pages (2000 to 4000 words). This is a rough guide (I wont be counting pages and words - quality and conciseness count more than quantity).<br />
<br />
Topics include:<br />
<br />
<br />
1) write an explanation of three or more methods that were covered in the course. These should be simple explanations aimed at someone approaching the topic for the first time. Your explanation may include derivations of equations (if they are clearly explained), figures or tables. Use examples. Describe how the concept can be applied to a problem in biological research and what limitations the method has. List any resources you use as well as references to additional material that a student might use if they want to follow up on the topic further. Please indicate whether your material may be used on the course's wiki page.<br />
<br />
<br />
2) describe how you would use two or more of the methods covered in the course in your own research. Your proposal may include figures or tables. Give a short introduction to your problem area, clearly state your hypothesis and how you think it might be addressed by each of the methods. Provide justifications for your proposal as well as expected outcome. Describe potential risks (say, the method provides no meaningful results) and what you would do to mitigate this risk. List any resources you use.<br />
<br />
<br />
3) you may define your own alternative topic. Please send an email to ian.donaldson at biotek.uio.no to have your topic approved first.<br />
<br />
== Exam ==<br />
<br />
<pre><br />
Please note:<br />
<br />
The exam for this course will be a one week take home exam. <br />
<br />
The exam will be emailed to candidates on Monday, December 5th before 5PM. <br />
<br />
The exam must be emailed back by Monday, December 12th at 5 PM to <br />
<br />
Torill Rortveit (torill.rortveit@imbv.uio.no) as a single PDF document<br />
<br />
(Microsoft Word or an Open Office Document is also acceptable). The document <br />
<br />
should be named with the course code and your candidate number only <br />
<br />
(e.g. MBV-INF 4410-1.pdf). Do not place your name in the document.<br />
<br />
</pre><br />
<br />
==Bioinformatics links relevant to the course==<br />
<br />
{|class="wikitable" style="text-align:left" border="1" cellpadding="5"<br />
|+ '''Bioinformatics Links 2011'''<br />
!width="20%"|Name<br />
!width="40%"|URL<br />
!width="40%"|Description<br />
|-<br />
|style="background:SteelBlue; color:white" colspan="4" align="center"|Statistics<br />
|-<br />
|StatSoft textbook || http://statsoft.com/textbook/stathome.html || Good overview of methods and concepts<br />
|-<br />
|SAS manuals || http://support.sas.com/onlinedoc/913/ || Thorough overview of analysis procedures found in SAS<br />
|-<br />
|GraphPad || http://graphpad.com/help/prism5/prism5help.html?usingstatistical_analyses_step_by_s.htm || See the GraphPad statistical guide for easy introductions to many concepts in statistics<br />
|-<br />
| R || http://cran.r-project.org/ || The Comprehensive R Archive Network <br />
|-<br />
| <br />
Introduction to R<br />
<br />
Exploratory data analysis<br />
<br />
Hypothesis testing<br />
<br />
|| http://bioinformatics.ca/workshops/2009/course-content <br />
|| See the CBW course on "Exploratory Data Analysis Essential Statistics using R" at the bottom of this page. Slides and lecture recordings from Modules 1-3 cover much of the same material covered in the first two days of this course.<br />
|-<br />
|Learning R || http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.1000482 || A Quick Guide to Teaching R Programming to Computational Biology Students<br />
|-<br />
|R reference card || http://cran.r-project.org/doc/contrib/Short-refcard.pdf || A reference card for R syntax<br />
|-<br />
|UCLA tutorials || http://www.ats.ucla.edu/stat/r/ || Useful example code while learning R<br />
|-<br />
|R vs. other languages || http://www.johndcook.com/R_language_for_programmers.html || A brief description of how R differs from other programming languages. Useful if you already know a programming language.<br />
|-<br />
|Which test do I use || http://www.practicalstats.com/which/index.html <br />
||An interactive guide to choosing which statistical test to use.<br />
|-<br />
|EMBNet Microarray course || http://vit-embnet.unil.ch/CoursEMBnet/Arrays06/Material.html || An online course with R/Bioconductor example tutorials<br />
|-<br />
|EMBNet tutorials || http://www.ch.embnet.org/pages/courses2.html || Other helpful tutorials with R examples related to biostatistics<br />
|-<br />
|SIB tutorials || http://edu.isb-sib.ch/ || Portal to the SIB bioinformatics tutorials - includes R, Unix, Perl, statisitics and lots more.<br />
|-<br />
|CBW tutorials || http://bioinformatics.ca || Portal to the Canadian Bioinformatics Workshop material. Includes R, statistics and lots more.<br />
|-<br />
|style="background:SteelBlue; color:white" colspan="4" align="center"|Microarrays<br />
|-<br />
|NMC || http://www.mikromatrise.no/ || Norwegian Microarray Consrtium<br />
|-<br />
| MACF || http://core.rr-research.no/ || UiO MicroArray Facility<br />
|-<br />
|Bioinformatics core facility || http://core.rr-research.no/index.php?section=3 || The Bioinformatics Core Facility established at Rikshospitalet-Radiumhospitalet (RR) will provide its users at RR and the University of Oslo with a range of services within bioinformatics, including analysis of DNA and protein sequences, analysis of microarray data, protein structure analysis and access to useful databases and web services<br />
|-<br />
|style="background:SteelBlue; color:white" colspan="4" align="center"|Sequence<br />
|-<br />
|RSAT || [http://rsat.ulb.ac.be/rsat/ RSAT] || A collection of several motif-related tools<br />
|-<br />
|ConTra|| [http://bioit.dmbr.ugent.be/ConTra/index.php ConTra] || A more user-friendly tool for matching a collection of Position Weighted Matrices against promoter sequences across species<br />
|-<br />
|style="background:SteelBlue; color:white" colspan="4" align="center"| Structure <br />
|-<br />
|style="background:SteelBlue; color:white" colspan="4" align="center"| Networks <br />
|-<br />
|Required files || [Sample network[http://bioinformatics.uio.no/wikifiles//images/8/8c/SAMPLE_1_network.xls.zip]] [Node atributes[http://bioinformatics.uio.no/wikifiles//images/c/c2/SAMPLE_1_nodeatribute.xls.zip]]|| Please download these and uncompress before using<br />
|-<br />
|Cytoscape || http://www.cytoscape.org/ || Cytoscape home page <br />
|-<br />
|Cytoscape||http://cytoscape.org/cgi-bin/moin.cgi/Presentations|| Tutorial on Cytoscape <br />
|-<br />
|Cytoscape||http://cytoscape.wodaklab.org/wiki/How_to_increase_memory_for_Cytoscape||Solution to common problem with working with large networks in Cytoscape.<br />
|-<br />
|Alternative source for the RUAL.sif data||http://chianti.ucsd.edu/cytoscape-data/<br />
|-<br />
|Alternative source for the node attribute data||http://chianti.ucsd.edu/svn/cyto_web/branches/initial/tut/filters.editor/<br />
|-<br />
|iRefIndex Cytoscape plugin || http://irefindex.uio.no/wiki/iRefScape || iRefIndex installation <br />
|-<br />
|iRefIndex || http://irefindex.uio.no/ || iRefIndex wiki<br />
|-<br />
|iRefIndex publication || http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=18823568 || Full details of iRefIndex<br />
|-<br />
|DAVID || http://www.nature.com/nprot/journal/v4/n1/pdf/nprot.2008.211.pdf || Nature Protocols paper<br />
|-<br />
|GSEA || http://www.broadinstitute.org/gsea/doc/subramanian_tamayo_gsea_pnas.pdf || PNAS paper<br />
|-<br />
|GSEA || http://www.broadinstitute.org/gsea || Gene Set Enrichment Analysis application<br />
|-<br />
|}<br />
<br />
==Bioinformatics Mailing Lists==<br />
<br />
If you are interested in being informed of future courses, talks and new related to bioinformatics in the Oslo region, then consider signing up for the cbo mailing list.<br />
<br />
[https://sympa.uio.no/usit.uio.no/info/cbo-all https://sympa.uio.no/usit.uio.no/info/cbo-all]<br />
<br />
You might also consider the Norwegian-wide bioinformatics email list. You can sign up at<br />
<br />
[http://mailman.uib.no/listinfo/bioinfo.users http://mailman.uib.no/listinfo/bioinfo.users].<br />
<br />
Both lists are run by members of the [http://www.bioinfo.no/about Norwegian Bioinformatics Platform].<br />
<br />
<br />
==Archived courses==<br />
<br />
http://donaldson.uio.no/wiki/Bioinformatics_for_molecular_biology_2010<br />
<br />
http://donaldson.uio.no/wiki/Bioinformatics_for_molecular_biology_2009<br />
<br />
==Wiki help==<br />
<br />
http://excel2wiki.net/index.php <br />
<br />
http://en.wikipedia.org/wiki/Help:Table</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Bioinformatics_course&diff=4024
Bioinformatics course
2011-11-28T13:13:33Z
<p>PaulBoddie: Added alternative data sources.</p>
<hr />
<div><div class="floatright"><br />
<imagemap><br />
Image:Bioinfo_course_logo.jpg|400x400px<br />
default [[Bioinformatics course]]<br />
</imagemap><br />
</div><br />
<br />
=Bioinformatics for molecular biology - Fall 2011=<br />
<br />
'''November 21st to December 2nd'''<br />
<br />
==Description==<br />
The aim of the course is to introduce students to bioinformatics resources and tools for molecular biology research by having some of the best researchers in Norway to talk about their field in general and then present their own work. Students are encouraged to bring a lap-top; we will be set up for in-course demonstrations as well as practical lab exercises. The course is intended for biology students or computer science/math students. No prior background in bioinformatics or computer science is required.<br />
<br />
The course is jointly delivered by the Biotechnology Centre of Oslo, the Department of Molecular Biosciences (IMBV), the Department of Informatics (IFI) and the Norwegian University of Life Sciences. This course is one of the Ph.D. School courses offered by the Biotechnology Centre of Oslo (http://www.biotek.uio.no/ny-web/events/).<br />
The UiO page for this course is http://www.uio.no/studier/emner/matnat/molbio/MBV-INF4410/.<br />
<br />
'''Registration is open now (June of 2011). '''<br />
<br />
MBV-INF 4410 (M.Sc. level course code)10.0 study points<br />
<br />
MBV-INF 9410 (Ph.D. level course code) 10.0 study points<br />
<br />
MBV-INF 9410A (Ph.D. level course code) 8.0 study points<br />
<br />
The course consists of two weeks of lectures, a final take-home exam (one week) and an essay (10 to 20 pages) to be completed by the middle of December. <br />
<br />
Ph.D. level students may opt to take the course without the essay for only 8 study points. <br />
<br />
Please bookmark this page. All future changes or announcements for the 2011 course will be posted to this page.<br />
<br />
'''Information:''' <br />
ragni.indahl@biotek.uio.no (about course administration) <br />
ian.donaldson@biotek.uio.no (about course content)<br />
'''Registration:''' <br />
torill.rortveit@imbv.uio.no<br />
<br />
==Dates and times==<br />
'''The course will occur November 21st to December 2nd.'''<br />
<br />
Each day will consist of three time slots for lectures and/or practical labs between 9 AM and 4 PM.<br />
<br />
==New Place!== <br />
'''Mondays and Tuesdays:'''<br />
<br />
Prolog Seminar room, Ole Johan Dahls Hus, IFI2<br />
<br />
Gaustadalleen 23c<br />
<br />
Same floor as entrance level. Use entrance nearest to Problemveien.<br />
<br />
'''Wednesdays, Thursdays and Fridays:'''<br />
<br />
Seminar room 510 in Veglaboratoriet <br />
Gaustadalleen 25<br />
<br />
Fifth floor.<br />
<br />
<br />
<br />
'''Map'''<br />
<br />
This [http://maps.google.com/maps/ms?ie=UTF&msa=0&msid=213421894609917298556.0004ae60c37d5c2baba4b map]shows closest entrances to use for both buildings.<br />
<br />
'''Closest T-bane'''<br />
<br />
Forskningsparken.<br />
<br />
==Contacts during the course==<br />
<br />
Ian Donaldson (course coordinator) ian.donaldson@biotek.uio.no +47 99115149<br />
<br />
Problems with room access or audiovisual<br />
<br />
Sigrun Lien: 22852953<br />
<br />
Line Valbø: 22852415<br />
<br />
==Programme==<br />
<br />
<br />
'''Note:<br />
The schedule displayed below is tentative.<br />
Ongoing changes will be made to this page as we organize <br />
speakers before and during the course. <br />
Requests and suggestions are welcome.<br />
For examples of material presented last year, see [http://donaldson.uio.no/wiki/Bioinformatics_for_molecular_biology_2010 Bioinformatics for molecular biology 2010]).'''<br />
<br />
{| class="wikitable" style="text-align:left" cellpadding="5"<br />
!width="20%"|<br />
!width="27%"|<br />
!width="27%"|<br />
!width="27%"|<br />
|-<br />
| align="center" style="background:grey; color: white" colspan="4"|'''Week 1: Monday, November 21st - Friday, November 25th'''<br />
|-style="background: steelblue; color: black"<br />
| ||Session 1||Session 2||Session 3 <br />
|-<br />
| ||09:00 – 10:45 ||11:00 – 12:45 ||14:00 – 15:45 <br />
|-style="background: lightgrey; color: black"<br />
| Mon. 21st<br />
||[[Media:Databaselecturenotes.pdf|Database lecture notes]]<br />
||<br />
[[Media:Working_with_common_db_identifiers.pdf |Working with identifiers]]<br />
<br />
[[Media:Identifier_conversion_excercise.pdf |Excercise]]<br />
<br />
||[http://www.perl.org/get.html Install Perl]<br />
|-style="background: lightgrey; color: black"<br />
| ||Ian Donaldson||Ian Donaldson, Antonio Mora, Paul Boddie||Ian Donaldson, Antonio Mora, Paul Boddie<br />
|-<br />
| Tue. 22nd<br />
||[[Introductory Perl | Perl]]<br />
||[[Introductory Perl | More Perl]]<br />
||[[Introductory Perl | Perl lab]]<br />
<br />
|-<br />
<br />
| ||Antonio Mora||Antonio Mora||Antonio Mora, Paul Boddie<br />
<br />
|-style="background: lightgrey; color: black"<br />
<br />
| Wed. 23rd<br />
||[[Media:R_lecture.pdf | Introduction to R]]<br />
||[[Media:Intro_R_lab.pdf | R lab]]<br />
||R lab<br />
<br />
|-style="background: lightgrey; color: black"<br />
| ||Bjørn-Helge Mevik||Bjørn-Helge Mevik, Katerina Michalickova, Antonio Mora ||Bjørn-Helge Mevik, Katerina Michalickova, Antonio Mora <br />
|-<br />
| Thur. 24th<br />
||[[Media:Exploratory_Data_Analysis.pdf | Exploratory data analysis]] [http://bioinformatics.uio.no/wiki/Image:Exploratory_data_analysis_extra_materials.zip Extra material]<br />
<br />
[[analyse.r | R script]]<br />
||An introduction to statistical inference<br />
||[[Media:Multiple_hypothesis_testing.pdf | Multiple hypothesis testing]]<br />
|-<br />
| ||Anja Bråthern Kristoffersen||TBA||Clara-Cecilie Günter<br />
|-style="background: lightgrey; color: black"<br />
| Fri. 25th<br />
||[[Media:R-lab-sn.pdf | Microarray data analysis]]<br />
||Microarray data lab<br />
||[[Media:Working_with_Gene_Lists_and_Over-representation_analysis.pdf | Gene lists and ORA]]<br />
<br />
[[Media:GO_DAVID_and_ORA_lab.pdf | Lab]]<br />
<br />
|-style="background: lightgrey; color: black"<br />
| ||Ståle Nygård||Ståle Nygård||Donaldson<br />
|-<br />
| ||||||<br />
|-<br />
| ||||||<br />
|-<br />
| align="center" style="background:grey; color: white" colspan="4"|'''Week 2: Monday, November 28th - Friday, December 2nd.<br />
|-style="background: steelblue; color: black"<br />
| ||Session 1 ||Session 2||Session 3<br />
|-<br />
| ||09:00 – 10:45 ||11:00 – 12:45 ||14:00 – 15:45 <br />
|-style="background: lightgrey; color: black"<br />
| Mon 28th<br />
||[[Media:Interaction_data_resources.pdf | Interaction data resources]]<br />
||[http://wiki.cytoscape.org/Presentations/Basic Cytoscape lab]<br />
||[[iRefScape | Cytoscape plugin lab]]<br />
<br />
|-style="background: lightgrey; color: black"<br />
| ||Donaldson||Donaldson Mora Boddie||Donaldson Mora Boddie<br />
|-<br />
| Tue. 29th ||ht sequencing||ht sequence lab||<br />
|-<br />
| ||Robert Lyle||Robert Lyle||<br />
|-style="background: lightgrey; color: black"<br />
| Wed 30th||Searching sequence databases and multiple sequence alignments||Motif scanning and discovery in DNA||Sequence lab<br />
|-style="background: lightgrey; color: black"<br />
| ||Torbjørn Rognes||Geir Sandve||Geir Sandve<br />
|-<br />
| Thur 1st||Structural biology review, PyMOL and installing PyMOL||Structural biology tools, predictors and 3D modelling||PyMOL and structural biology tutorial<br />
|-<br />
| ||Jon K. Laerdahl||Jon K. Laerdahl||Jon K. Laerdahl<br />
|-style="background: lightgrey; color: black"<br />
| Fri 2nd||Modeling guide||Modeling excercises||Homology modeling excercise<br />
|-style="background: lightgrey; color: black"<br />
| ||Jon K. Laerdahl||Jon K. Laerdahl||Jon K. Laerdahl<br />
|-<br />
| <br />
|}<br />
<br />
== Written assignment ==<br />
<br />
<br />
Students enrolled in MBV-INF4410 or 9410 must complete a written assignment as part of the course requirements. <br />
<br />
The assignment is due by Friday, December 16th. It should be emailed to ian.donaldson at biotek.uio.no preferably as a PDF document (Microsoft Word or OpenOffice is also acceptable). The assignment is to be between 10 pages and 20 pages (2000 to 4000 words). This is a rough guide (I wont be counting pages and words - quality and conciseness count more than quantity).<br />
<br />
Topics include:<br />
<br />
<br />
1) write an explanation of three or more methods that were covered in the course. These should be simple explanations aimed at someone approaching the topic for the first time. Your explanation may include derivations of equations (if they are clearly explained), figures or tables. Use examples. Describe how the concept can be applied to a problem in biological research and what limitations the method has. List any resources you use as well as references to additional material that a student might use if they want to follow up on the topic further. Please indicate whether your material may be used on the course's wiki page.<br />
<br />
<br />
2) describe how you would use two or more of the methods covered in the course in your own research. Your proposal may include figures or tables. Give a short introduction to your problem area, clearly state your hypothesis and how you think it might be addressed by each of the methods. Provide justifications for your proposal as well as expected outcome. Describe potential risks (say, the method provides no meaningful results) and what you would do to mitigate this risk. List any resources you use.<br />
<br />
<br />
3) you may define your own alternative topic. Please send an email to ian.donaldson at biotek.uio.no to have your topic approved first.<br />
<br />
== Exam ==<br />
<br />
<pre><br />
Please note:<br />
<br />
The exam for this course will be a one week take home exam. <br />
<br />
The exam will be emailed to candidates on Monday, December 5th before 5PM. <br />
<br />
The exam must be emailed back by Monday, December 12th at 5 PM to <br />
<br />
Torill Rortveit (torill.rortveit@imbv.uio.no) as a single PDF document<br />
<br />
(Microsoft Word or an Open Office Document is also acceptable). The document <br />
<br />
should be named with the course code and your candidate number only <br />
<br />
(e.g. MBV-INF 4410-1.pdf). Do not place your name in the document.<br />
<br />
</pre><br />
<br />
==Bioinformatics links relevant to the course==<br />
<br />
{|class="wikitable" style="text-align:left" border="1" cellpadding="5"<br />
|+ '''Bioinformatics Links 2011'''<br />
!width="20%"|Name<br />
!width="40%"|URL<br />
!width="40%"|Description<br />
|-<br />
|style="background:SteelBlue; color:white" colspan="4" align="center"|Statistics<br />
|-<br />
|StatSoft textbook || http://statsoft.com/textbook/stathome.html || Good overview of methods and concepts<br />
|-<br />
|SAS manuals || http://support.sas.com/onlinedoc/913/ || Thorough overview of analysis procedures found in SAS<br />
|-<br />
|GraphPad || http://graphpad.com/help/prism5/prism5help.html?usingstatistical_analyses_step_by_s.htm || See the GraphPad statistical guide for easy introductions to many concepts in statistics<br />
|-<br />
| R || http://cran.r-project.org/ || The Comprehensive R Archive Network <br />
|-<br />
| <br />
Introduction to R<br />
<br />
Exploratory data analysis<br />
<br />
Hypothesis testing<br />
<br />
|| http://bioinformatics.ca/workshops/2009/course-content <br />
|| See the CBW course on "Exploratory Data Analysis Essential Statistics using R" at the bottom of this page. Slides and lecture recordings from Modules 1-3 cover much of the same material covered in the first two days of this course.<br />
|-<br />
|Learning R || http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.1000482 || A Quick Guide to Teaching R Programming to Computational Biology Students<br />
|-<br />
|R reference card || http://cran.r-project.org/doc/contrib/Short-refcard.pdf || A reference card for R syntax<br />
|-<br />
|UCLA tutorials || http://www.ats.ucla.edu/stat/r/ || Useful example code while learning R<br />
|-<br />
|R vs. other languages || http://www.johndcook.com/R_language_for_programmers.html || A brief description of how R differs from other programming languages. Useful if you already know a programming language.<br />
|-<br />
|Which test do I use || http://www.practicalstats.com/which/index.html <br />
||An interactive guide to choosing which statistical test to use.<br />
|-<br />
|EMBNet Microarray course || http://vit-embnet.unil.ch/CoursEMBnet/Arrays06/Material.html || An online course with R/Bioconductor example tutorials<br />
|-<br />
|EMBNet tutorials || http://www.ch.embnet.org/pages/courses2.html || Other helpful tutorials with R examples related to biostatistics<br />
|-<br />
|SIB tutorials || http://edu.isb-sib.ch/ || Portal to the SIB bioinformatics tutorials - includes R, Unix, Perl, statisitics and lots more.<br />
|-<br />
|CBW tutorials || http://bioinformatics.ca || Portal to the Canadian Bioinformatics Workshop material. Includes R, statistics and lots more.<br />
|-<br />
|style="background:SteelBlue; color:white" colspan="4" align="center"|Microarrays<br />
|-<br />
|NMC || http://www.mikromatrise.no/ || Norwegian Microarray Consrtium<br />
|-<br />
| MACF || http://core.rr-research.no/ || UiO MicroArray Facility<br />
|-<br />
|Bioinformatics core facility || http://core.rr-research.no/index.php?section=3 || The Bioinformatics Core Facility established at Rikshospitalet-Radiumhospitalet (RR) will provide its users at RR and the University of Oslo with a range of services within bioinformatics, including analysis of DNA and protein sequences, analysis of microarray data, protein structure analysis and access to useful databases and web services<br />
|-<br />
|style="background:SteelBlue; color:white" colspan="4" align="center"|Sequence<br />
|-<br />
|RSAT || [http://rsat.ulb.ac.be/rsat/ RSAT] || A collection of several motif-related tools<br />
|-<br />
|ConTra|| [http://bioit.dmbr.ugent.be/ConTra/index.php ConTra] || A more user-friendly tool for matching a collection of Position Weighted Matrices against promoter sequences across species<br />
|-<br />
|style="background:SteelBlue; color:white" colspan="4" align="center"| Structure <br />
|-<br />
|style="background:SteelBlue; color:white" colspan="4" align="center"| Networks <br />
|-<br />
|Required files || [Sample network[http://bioinformatics.uio.no/wikifiles//images/8/8c/SAMPLE_1_network.xls.zip]] [Node atributes[http://bioinformatics.uio.no/wikifiles//images/c/c2/SAMPLE_1_nodeatribute.xls.zip]]|| Please download these and uncompress before using<br />
|-<br />
|Cytoscape || http://www.cytoscape.org/ || Cytoscape home page <br />
|-<br />
|Cytoscape||http://cytoscape.org/cgi-bin/moin.cgi/Presentations|| Tutorial on Cytoscape <br />
|-<br />
|Cytoscape||http://cytoscape.wodaklab.org/wiki/How_to_increase_memory_for_Cytoscape||Solution to common problem with working with large networks in Cytoscape.<br />
|-<br />
|Alternative source for the RUAL.sif data||http://chianti.ucsd.edu/cytoscape-data/<br />
|-<br />
|Alternative source for the node attribute data||http://chianti.ucsd.edu/svn/cyto_web/branches/initial/tut/filters.editor/<br />
|-<br />
|iRefIndex Cytoscape plugin || http://irefindex.uio.no/wiki/README_Cytoscape_plugin_0.9x || iRefIndex installation <br />
|-<br />
|iRefIndex || http://irefindex.uio.no || iRefIndex wiki<br />
|-<br />
|iRefIndex publication || http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=18823568 || Full details of iRefIndex<br />
|-<br />
|DAVID || http://www.nature.com/nprot/journal/v4/n1/pdf/nprot.2008.211.pdf || Nature Protocols paper<br />
|-<br />
|GSEA || http://www.broadinstitute.org/gsea/doc/subramanian_tamayo_gsea_pnas.pdf || PNAS paper<br />
|-<br />
|GSEA || http://www.broadinstitute.org/gsea || Gene Set Enrichment Analysis application<br />
|-<br />
|}<br />
<br />
==Bioinformatics Mailing Lists==<br />
<br />
If you are interested in being informed of future courses, talks and new related to bioinformatics in the Oslo region, then consider signing up for the cbo mailing list.<br />
<br />
[https://sympa.uio.no/usit.uio.no/info/cbo-all https://sympa.uio.no/usit.uio.no/info/cbo-all]<br />
<br />
You might also consider the Norwegian-wide bioinformatics email list. You can sign up at<br />
<br />
[http://mailman.uib.no/listinfo/bioinfo.users http://mailman.uib.no/listinfo/bioinfo.users].<br />
<br />
Both lists are run by members of the [http://www.bioinfo.no/about Norwegian Bioinformatics Platform].<br />
<br />
<br />
==Archived courses==<br />
<br />
http://donaldson.uio.no/wiki/Bioinformatics_for_molecular_biology_2010<br />
<br />
http://donaldson.uio.no/wiki/Bioinformatics_for_molecular_biology_2009<br />
<br />
==Wiki help==<br />
<br />
http://excel2wiki.net/index.php <br />
<br />
http://en.wikipedia.org/wiki/Help:Table</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex&diff=4017
iRefIndex
2011-11-25T12:52:52Z
<p>PaulBoddie: Added new sources.</p>
<hr />
<div>__NOTOC__<br />
<br />
<div style="float:right"><br />
<facebook-like /><br />
</div><br />
<br />
<div class="floatleft"><br />
<imagemap><br />
Image:iRefIndex_logo.png|120x120px<br />
default [[iRefIndex#A_reference_index_for_protein_interaction_data]]<br />
</imagemap><br />
</div><br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, InnateDB, IntAct, MatrixDB, MINT, MPact, MPIDB, MPPI and OPHID. <br />
<br />
The iRefScape plugin for Cytoscape has been published recently [http://www.biomedcentral.com/1471-2105/12/388 here].<br />
<br />
[[iRefIndex#A_reference_index_for_protein_interaction_data|Read more]]<br />
<br />
{|class="wikitable" style="text-align:left; clear: left" border="0" cellpadding="10"<br />
|<imagemap><br />
Image:Document-save-80x80.png<br />
default [[README_MITAB2.6_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[README_MITAB2.6_for_iRefIndex|Download]] ===<br />
<br />
Download the current iRefIndex release in PSI-MITAB tab-delimited format via FTP.<br />
|<imagemap><br />
Image:Accessories-text-editor-80x80.png<br />
default [[iRefIndex_Release_Notes]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex Release Notes]] ===<br />
<br />
View release notes and news for each release of iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px<br />
default [[iRefIndex_Citations]]<br />
</imagemap><br />
| style="vertical-align: top" colspan="3" |<br />
=== [[iRefIndex_Citations | Publications, citing, citations and further reading]] ===<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:CytoscapeLogo.png|80x80px<br />
default [[iRefScape]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefScape|iRefScape]] ===<br />
<br />
iRefScape is a plugin for [http://www.cytoscape.org/ Cytoscape] that exposes iRefIndex data as a navigable graphical network.<br />
|<imagemap><br />
Image:firefox.png|80x80px<br />
default [http://wodaklab.org/iRefWeb/]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [http://wodaklab.org/iRefWeb/ iRefWeb] ===<br />
<br />
iRefWeb provides a searchable web interface to the iRefIndex. This interface was developed as part of a collaboration with the Wodak group at the hospital for Sick Children in Toronto, Canada.<br />
|-<br />
|<imagemap><br />
Image:R-logo.jpg|80x80px<br />
default [[iRefR]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefR]] ===<br />
<br />
An R package providing access to iRefIndex data.<br />
|<imagemap><br />
Image:Applications-internet-80x80.png<br />
default [[README_PSICQUIC_web_services_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
<br />
=== [[PSICQUIC|Web services]] ===<br />
<br />
iRefIndex PSICQUIC and PSISCORE web services are now running on release 9.0 of iRefIndex <br />
|-<br />
|<imagemap><br />
Image:Youtube-256.png|80x80px<br />
default [[iRefIndex_Videos]]<br />
</imagemap> <br />
| style="vertical-align: top" |<br />
<br />
=== [[iRefIndex Videos|iRefIndex videos]] ===<br />
<br />
Video learning materials for iRefIndex, iRefScape and iRefWeb.<br />
|<imagemap><br />
Image:Internet-mail-80x80.png<br />
default [[iRefIndex#Contact and mailing list]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex#Contact and mailing list|Contact information and mailing list]] ===<br />
<br />
How to get in touch with the developers.<br />
|-<br />
|<imagemap><br />
Image:Emblem-notice-80x80.png<br />
default [[Sources_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Sources_iRefIndex|Source data information]] ===<br />
<br />
Details of all the different source databases that provide the foundation for iRefIndex.<br />
|<imagemap><br />
Image:X-office-spreadsheet-80x80.png<br />
default [[Statistics_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Statistics_iRefIndex|Statistical information]] ===<br />
<br />
Statistics for the current iRefIndex release.<br />
|}<br />
<br />
== Technical information on the iRefIndex database ==<br />
<br />
Build process: [[iRefIndex Manual]]<br />
<br />
Feedback files: [[README iRefIndex Feedback]]<br />
<br />
Mapping files: [[Protein identifier mapping]]<br />
<br />
Normalization of MI cv terms: [[Mapping of terms to MI term ids - iRefIndex]]<br />
<br />
Canonicalization: [[Canonicalization]]<br />
<br />
Disease Groups: [[DiG: Disease groups]]<br />
<br />
All iRefIndex pages and archived releases: [[iRefIndex#All_iRefIndex_Pages|see below]]<br />
<br />
License and disclaimer: [[iRefIndex#License_and_disclaimer|see below]]<br />
<br />
----<br />
<br />
== A reference index for protein interaction data ==<br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including [http://bond.unleashedinformatics.com/ BIND], [http://www.thebiogrid.org/ BioGRID], [http://mips.gsf.de/genre/proj/corum/index.html CORUM], [http://dip.doe-mbi.ucla.edu/ DIP], [http://www.hprd.org/ HPRD], [http://www.ebi.ac.uk/intact/site/index.jsf IntAct], [http://mint.bio.uniroma2.it/mint/Welcome.do MINT], [http://mips.gsf.de/genre/proj/mpact MPact], [http://mips.gsf.de/proj/ppi/ MPPI] and [http://ophid.utoronto.ca/ OPHID]. This index includes multiple interaction types including physical and genetic (mapped to their corresponding protein products) as determined by a multitude of methods. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.<br />
<br />
iRefIndex assigns a global unique identifier (rigid) which looks like 'tjWXXjgPyHyT2J6EwED8zK2x18U' to identify interactions that are identical (according to the sequence and taxon ids of the interactors). iRefIndex also assigns similar looking keys to protein interactors. These keys are global meaning they can be generated by anyone using the method described in the paper. This method allows users to integrate their own data with the iRefIndex in a way that ensures proteins with the exact same sequence will be represented only once.<br />
<br />
== Publications and further reading ==<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex are provided on the [[iRefIndex Citations]] page.<br />
<br />
----<br />
<br />
== Long term goals of the iRefIndex project ==<br />
<br />
We believe that protein interaction data hold incredible potential for biomedical research. Presently, these data are collected and archived by multiple groups around the world and the number of groups taking part in this work is growing rather than diminishing. <br />
<br />
As such, it is important that these databases have the means to effectively exchange and compare data and that they are curating and representing data using similar standards in order to make their data accessible and allow effective use.<br />
<br />
To this end, the iRefIndex project has three long term objectives:<br />
<br />
;1) to facilitate exchange of interaction data between interaction databases. <br />
<br />
:The iRefIndex paper describes a method for assigning unique and global identifiers to protein interactors, interactions and complexes. This method is independent of the iRefIndex resource and may be used by anyone to facilitate exchange and consolidation of data.<br />
<br />
;2) to consolidate interaction data from multiple sources. <br />
<br />
:The method has been used by to index interaction records from multiple sources. The resulting iRefIndex may be used search for the existence of interaction data for any protein regardless of the original resource. Nine interaction databases have been incorporated so far, others will follow.<br />
<br />
;3) to provide feedback to source interaction databases. <br />
<br />
:During the process of data consolidation, iRefIndex uses a sophisticated method to keep track of potential problems with source records such as outdated or unfound protein identifiers or incorrectly assigned taxonomy identifiers. These data are provided as feedback files to source interaction databases for correction, clarification or improvements to our own system. This process will help to harmonize data representation and improve the overall quality of interaction records for all source databases. This process will also help source databases to exchange data with one another.<br />
<br />
== iRefIndex availability ==<br />
<br />
iRefIndex is made available in a number of formats: MITAB tab-delimited text files, iRefWeb interface, iRefScape plugin for Cytoscape, PSICQUIC Web services, and an interface for the R programming language environment. See the links at the top of this page. For the license and disclaimer, [[iRefIndex#License_and_disclaimer | see below]].<br />
<br />
== Credits and collaborations ==<br />
<br />
'''Sabry Razick and Ian Donaldson''' developed iRefIndex at the Biotechnology Centre of Oslo, University of Oslo. <br />
<br />
'''Paul Boddie''' provides ongoing maintenance and development.<br />
<br />
'''George Magklaras''' provides systems engineer support and [http://www.no.embnet.org/ EMBNet Norway] provided hardware support.<br />
<br />
'''Antonio Mora''' developed [[iRefR|iRefR]].<br />
<br />
'''Katerina Michalickova''' developed [[DiG:_Disease_groups|Disease groups]].<br />
<br />
'''Brian Turner and Andrei Turinsky''' from the [http://wodaklab.org/ws/ Wodak group] at the Hospital for Sick Children in Toronto, Canada developed the [http://wodaklab.org/iRefWeb/ iRefWeb interface].<br />
<br />
<imagemap><br />
Image:IMEx_logo_webmedium.jpg|100x100px<br />
default [http://www.psimex.org]<br />
</imagemap> <br />
<br />
iRefIndex is a PSIMex partner: http://www.psimex.org<br />
<br />
<!--<br />
=== iRefWeb in the NCBI LinkOut programme ===<br />
<br />
Many [http://www.ncbi.nlm.nih.gov/gene Entrez Gene] records provided by NCBI contain links to iRefWeb in the [http://www.ncbi.nlm.nih.gov/projects/linkout/index.html LinkOut] section, allowing users to consult iRefWeb for related protein interactions when browsing gene information. The software which exposes iRefIndex information to the LinkOut programme can be found on the [[iRefWeb LinkOut Generator]] page.<br />
<br />
--><br />
----<br />
<br />
== License and disclaimer==<br />
<br />
Data released on the public FTP site is released under the [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution 2.5 Generic (CC BY 2.5) license].<br />
<br />
<imagemap><br />
Image:By-100x35.png<br />
default [http://creativecommons.org/licenses/by/2.5/]<br />
</imagemap><br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
----<br />
<br />
== Contact and mailing list ==<br />
<br />
Suggestions, requests and comments are welcome.<br />
<br />
<pre>ian.donaldson@biotek.uio.no</pre><br />
<br />
Full contact details are available at the [http://www.biotek.uio.no/english/research/groups/donaldson-group/ group home page].<br />
<br />
<imagemap><br />
Image:google-groups-logo.gif<br />
default [http://groups.google.com/group/irefindex]<br />
</imagemap><br />
<br />
See the [http://groups.google.com/group/irefindex iRefIndex Google Group] for announcements and discussion.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefScape_1.0&diff=4014
iRefScape 1.0
2011-11-24T23:24:48Z
<p>PaulBoddie: Moved the "like" button.</p>
<hr />
<div>__NOTOC__<br />
<br />
[[Image:NP_499166-NP_501526-iterations-1-400x278.png|right]]<br />
<br />
iRefScape is a plugin for Cytoscape that exposes iRefIndex data as a navigable graphical network.<br />
<br />
This page describes the iRefScape 1.0 plug-in for Cytoscape 2.8.x. See the [[#Compatibility_Information|compatibility information section]] for information on other versions.<br />
<br />
<div class="floatleft"><br />
<facebook-like /><br />
</div><br />
<br />
{|class="wikitable" style="text-align:left; clear:left; min-width:50%" border="0" cellpadding="10"<br />
| style="vertical-align: top" |<br />
== Installation ==<br />
<br />
See the [[#Installing_iRefScape|installation section]] for quick installation instructions and references to other documentation.<br />
|<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[#Installing_iRefScape|installation section]]<br />
</imagemap><br />
|-<br />
| style="vertical-align: top" |<br />
== Publication ==<br />
<br />
[http://www.biomedcentral.com/content/pdf/1471-2105-12-388.pdf iRefScape. A Cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex.]<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px|left<br />
default [http://www.biomedcentral.com/content/pdf/1471-2105-12-388.pdf]<br />
</imagemap><br />
|-<br />
| style="vertical-align: top" |<br />
== Contact information and mailing list ==<br />
Join the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group] to be informed of updates. See also the [[iRefScape|latest release of iRefScape]] which may differ from the release described here.<br />
|<imagemap><br />
Image:Internet-mail-80x80.png<br />
default [http://groups.google.com/group/irefindex?hl=en]<br />
</imagemap><br />
|}<br />
<br />
__TOC__<br />
<br />
== Compatibility Information ==<br />
<br />
See the following table for more detailed iRefScape compatibility information.<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Cytoscape<br />
! align="center" style="background:#f0f0f0;"|iRefScape<br />
|-<br />
| 2.8.1, 2.8.2<br />
| iRefScape 1.0 (described on this page)<br />
|-<br />
| 2.7.0<br />
| [[iRefScape 0.9]]<br />
|-<br />
| 2.6.3<br />
| [[iRefScape 0.8]]<br />
|}<br />
<br />
== Installing iRefScape ==<br />
<br />
The plugin can be installed using Cytoscape's plugin menu. Select...<br />
<br />
# "Manage plugins"<br />
# "Available for Install"<br />
# "Network and Attribute I/O"<br />
# "iRefScape" (where the precise version will provide a specific version such as "iRefScape 1.0")<br />
<br />
Then follow the on-screen instructions.<br />
<br />
{|class="wikitable" style="text-align:left; clear: left; border: 1px solid #cccccc" cellpadding="10"<br />
| style="vertical-align: top" |<br />
=== Installation Guide ===<br />
<br />
More detailed instructions, troubleshooting tips and alternative methods are available in the [[iRefScape 1.0 Installation|installation guide]].<br />
|<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[iRefScape 1.0 Installation|installation guide]]<br />
</imagemap><br />
|}<br />
<br />
After, installation, select the "iRefScape" entry from Cytoscape's plugin menu.<br />
<br />
When the plugin is started for the first time, it will download the publicly available data set.<br />
<br />
=== Tested systems ===<br />
This version of the iRefScape plugin has been tested with the following system configurations:<br />
<br />
{| cellspacing="0" cellpadding="10" border="1" style="margin: 2em"<br />
! style="background:#f0f0f0;" | Operating System<br />
! style="background:#f0f0f0;" | Java Version<br />
|-<br />
| Red Hat Enterprise Linux 5 (32-bit) (kernel 2.6.18)<br />
| 1.6.0_01 (32-bit)<br />
|-<br />
| Microsoft Windows 7 (64-bit)<br />
| 1.6.0_25 (64-bit)<br />
|-<br />
| Microsoft Windows Vista (32-bit)<br />
| 1.6.0 (32-bit)<br />
|-<br />
| Ubuntu Linux 8.04 (32-bit)<br />
| 1.6 (32-bit)<br />
|-<br />
| Mac OS X 10.6 (64-bit)<br />
| 1.6.0_15 (32-bit)<br />
|}<br />
<br />
Please refer to the [[iRefScape 1.0 Installation|installation guide]] for more details on system configuration issues.<br />
<br />
=== Source Code ===<br />
<br />
Since iRefScape is made available under version 3 or later of the [http://www.gnu.org/licenses/gpl.html GNU General Public License], the source code is also made available:<br />
<br />
* iRefScape 1.18:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/e12f853c5951 Source browser]<br />
* iRefScape 1.17:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/0001288b7527 Source browser]<br />
* iRefScape 1.16:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/3ade99fc92b6 Source browser]<br />
* [http://irefindex.uio.no/hg/iRefScape/ iRefScape repository home]<br />
<br />
Please consult the <tt>README.txt</tt> file in the source distribution for information on building the software.<br />
<br />
== Using the Wizard - an example search ==<br />
<br />
Click the "Wizard" button - a pop-up window will appear. <br />
<br />
Follow the prompts. Here is an example search:<br />
<br />
# Select "Search protein-protein interactions for a protein".<br />
# Select "UniProt identifier".<br />
# For "Taxonomy identifier", select "9606 (Human)" <br />
# Type <tt>QCR2_HUMAN</tt> in the provided space. Click "Next".<br />
# Click "Search & load".<br />
<!-- commenting these out since they are outdated<br />
The images below show each of the steps in the wizard.<br />
<br />
<gallery perrow="5"><br />
Image:IRefIndex-Cytoscape-Wizard.png|The iRefIndex wizard<br />
Image:IRefIndex-Cytoscape-Wizard-step2.png|Choosing a result type<br />
Image:IRefIndex-Cytoscape-Wizard-step3.png|Choosing a taxonomy type<br />
Image:IRefIndex-Cytoscape-Wizard-step4.png|Specifying the search term<br />
Image:IRefIndex-Cytoscape-Wizard-step5.png|Additional options<br />
</gallery><br />
--><br />
<br />
== Using the Search Panel ==<br />
<br />
To perform a search, the following steps are involved:<br />
<br />
# Enter query term(s)<br />
# Select a search type<br />
# Select taxonomy/organism<br />
# Adjust search options (iterations, new view, canonical expansion) - this is optional<br />
# Start the search<br />
<br />
=== Enter query term(s) ===<br />
<br />
Queries may be loaded from a file or by pasting the query into the text box (one query per line). Multiple queries can also be separated by pipe characters (<tt>|</tt>) or by tab characters. Queries with spaces in them should be enclosed in double quotes.<br />
<br />
=== Select a search type ===<br />
<br />
Example searches are listed below.<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Search Type<br />
! align="center" style="background:#f0f0f0;"|Example<br />
! align="center" style="background:#f0f0f0;"|Notes<br />
|-<br />
| <tt>RefSeq_Ac</tt>||<tt>NP_996224</tt>||See http://www.ncbi.nlm.nih.gov/protein/221379660<br />
|-<br />
| <tt>UniProt_Ac</tt>||<tt>Q7KSF4</tt>||See http://www.uniprot.org/uniprot/Q7KSF4<br />
|-<br />
| <tt>UniProt_ID</tt>||<tt>Q7KSF4_DROME</tt>||See http://www.uniprot.org/uniprot/Q7KSF4<br />
|-<br />
| <tt>geneID</tt>||<tt>42066</tt>||See http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Retrieve&dopt=full_report&list_uids=42066<br />
|-<br />
| <tt>geneSymbol</tt>||<tt>cher</tt>||See http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Retrieve&dopt=full_report&list_uids=42066<br />
|-<br />
| <tt>mass</tt>||<tt>72854<-->72866</tt>||Search protein interactors for a range of molecular mass (in Da).<br />
|-<br />
| <tt>rog</tt>||<tt>10121899</tt>||Redundant object group: iRefIndex's internal identifier for a protein. See note feature i.rog.<br />
|-<br />
| <tt>PMID</tt>||<tt>14605208</tt>||PubMed Identifier where an interaction is described. See http://www.ncbi.nlm.nih.gov/pubmed. Iterations and "Use canonical expansion" have no effect on this search type. This search will return all protein interactors in the given PMID and will automatically draw all interactions known between these proteins (even if these interactions are supported by different PMIDs). Select edges in the resulting graph, and see the i.PMID attribute in the Edge Attribute Browser.<br />
|-<br />
| <tt>src_intxn_id</tt>||<tt>EBI-212627</tt>||Source interaction database identifier. Iterations and "Use canonical expansion" have no effect on this search type. Caution: multiple databases may have overlapping interaction record identifiers (e.g. <tt>147805</tt> returns records from both BIND and BioGrid) and there is no way to limit this search to a specific database at this time.<br />
Equivalent interactions from other databases will be automatically retrieved using this search type (see provided example).<br />
|-<br />
| <tt>omim</tt>||<tt>227650</tt>||OMIM identifier. See http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=227650<br />
|-<br />
| <tt>digid</tt>||<tt>449</tt>||Internal identifier for a group of phenotypically related diseases. See [[DiG: Disease groups]]. A digid can be found by first performing a search for some omim identifier - the digid will then appear as the i.digid node attribute.<br />
|-<br />
|style="background:#f0f0f0;" colspan="3" align="center"| Additional search types: first select from Advanced features/Preferences.<br />
|-<br />
| <tt>dig_title</tt>||<tt>fanconi</tt>||Non-exact text search of OMIM titles. Select matching titles from the Query Helper and press return to copy titles to search box. Then hit "Search and load". See [[DiG: Disease groups]].<br />
|-<br />
| <tt>ROGID</tt>||<tt>5IrM14EfdlehbVJ0WAcAoQM3pFw9606</tt>||Exact search results for ROGID of a protein. This searches the i.rogid_TOP node feature. Users can also generate a ROGID for an amino acid sequence and taxon identifier pair using the Wizard/Create SEGUID/ROGID for sequence tool. See PMID 18823568.<br />
|-<br />
| <tt>RIGID</tt>||<tt>cXAoT7JjMde7J+CN/2tOR6gETyA</tt>||Exact search results for RIGID of an interaction. This searches the i.rigid edge feature. See PMID 18823568.<br />
|-<br />
|}<br />
<br />
=== Select taxonomy/organism ===<br />
<br />
This will limit the search results to a particular organism. An organism can be selected from the list, or a taxonomy identifier can be entered into the field itself. See [http://www.ncbi.nlm.nih.gov/taxonomy Entrez Taxonomy] for more details on taxonomy identifiers. For most search types, it is acceptable to leave this field set to <tt>Any</tt>.<br />
<br />
=== Adjust search options ===<br />
<br />
The following optional adjustments can be made:<br />
<br />
==== Iterations ====<br />
<br />
A distance from the query list's members can be specified:<br />
<br />
* Selecting <tt>0</tt> will return only interactions between nodes found by the query list<br />
* Selecting <tt>1</tt> will return immediate neighbours of nodes in the query list<br />
<br />
==== Create new view ====<br />
<br />
A new view will be opened for the search results if this option is selected. Otherwise, the results will be added to the current view.<br />
<br />
==== Use canonical expansion ====<br />
<br />
Selecting this option will expand the search to include all proteins that are related to the query protein (for example, splice isoforms). See [[Canonicalization]] for technical details.<br />
<br />
=== Start the search ===<br />
<br />
Press the "Search and load" button to perform the search.<br />
<br />
{{Note|<br />
See the [[iRefScape Batch Files]] document for information on using text files to describe searches, annotate result nodes and to define new search types using user-supplied data.<br />
}}<br />
<br />
== Viewing the Results ==<br />
<br />
=== Colours and Shapes ===<br />
<br />
* Blue nodes corresponds to proteins found by your query<br />
* Green nodes are interacting partners for your query protein<br />
* Purple hexagons are complex-nodes (also called pseudo-nodes); they keep partners of a complex together (i.e. QCR6_HUMAN is found in two complexes also involving "QCR2_HUMAN")<br />
* Orange-yellow edges indicate protein-protein interactions and pink edges represent membership of some protein in a complex<br />
<br />
=== Toggling Edges ===<br />
<br />
Multiple edges may appear between two nodes. These represent separate interaction records that support this link. Details on each original record can be viewed using the edge attribute viewer (below). You can toggle this multi-view on and off by selecting "Toggle selected multi-edges" in the iRefScape/View Tools menu. Only one of the edges will be shown in the collapsed view.<br />
<br />
=== iRefScape Menu ===<br />
<br />
The iRefScape menu in the Cytoscape menu bar contains a number of other functions that may help with searching and viewing interaction data. These are described in more detail in the [[iRefScape plugin menu]] document.<br />
<br />
=== Expanding the Interaction Map ===<br />
<br />
You can search for additional interactions by right-clicking on a node and selecting "iRefIndex -- Retrieve interactions".<br />
<br />
Some example result displays are shown below.<br />
<br />
<gallery widths="500px" heights="300px"><br />
Image:QCR2_HUMAN_initial.png|Results<br />
Image:QCR2_HUMAN.png|Results (tidied)<br />
</gallery><br />
<br />
== Attributes ==<br />
<br />
[[Image:iRefIndex-0.83-node-attributes-close-up-closed.png|right|The node attributes menu]]<br />
<br />
There are two types of attributes available from iRefIndex: node attributes and edge attributes. These may be used to view information about selected nodes or edges (like <tt>i.taxid</tt>). Some features may allow the user to link out to additional data sources through the "right-click" menu (like <tt>i.geneID</tt>). Features may also be used to sort and select nodes and edges with specific attributes (like <tt>i.order</tt>). The <tt>i.query</tt> feature shows the user's query that is responsible for returning the node or edge.<br />
<br />
Brief descriptions and examples of each attribute are provided below. <br />
<br />
The user must first select the attributes that are to be displayed. This can be done by clicking on the "attribute" icon at the top of the node or edge attribute browser, as shown in the illustrative images.<br />
<br />
<div style="clear: right"></div><br />
=== Node Attributes ===<br />
<br />
[[Image:iRefIndex-0.83-node-attributes-close-up-open.png|right|The open node attributes menu]]<br />
<br />
Each node represents a distinct amino acid sequence (protein) from a distinct organism (taxonomy identifier). Each of the attributes below, provide additional information about the node. Although each node is distinct, a graph produced by iRefIndex may contain multiple nodes that are related proteins (such as splice isoform products from the same gene). These nodes will all have the same <tt>i.canonical_rog</tt> and <tt>i.canonical_rogid</tt> feature values. See the notes below.<br />
<br />
Node attributes that can be lists of items (like <tt>i.UniProt</tt>) will have a corresponding attribute called <tt>i.''attribute name''_TOP</tt> (for example, <tt>i.UniProt_TOP</tt>) which provides the first item of the associated list.<br />
<br />
<div style="clear: right"></div><br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Attribute name<br />
! align="center" style="background:#f0f0f0;"|Data type<br />
! align="center" style="background:#f0f0f0;"|Example value<br />
! align="center" style="background:#f0f0f0;"|Description<br />
|-<br />
| <tt>ID</tt>||Integer||<tt>10121899</tt>||This is a unique identifier for the node assigned by iRefIndex (no two nodes will have the same ID). Each node corresponds to a distinct amino acid sequence from a distinct taxonomy identifier. See also <tt>i.rog</tt> and <tt>i.rogid</tt>.<br />
|-<br />
| <tt>canonicalName</tt>||Integer||<tt>10121899</tt>||This is the same as <tt>ID</tt>. This attribute is set by Cytoscape and is unrelated to the <tt>i.canonical_rog</tt> or <tt>i.canonical_rogid</tt> used by iRefIndex<br />
|-<br />
| <tt>i.RefSeq_Ac</tt>||List||<tt>[NP_996224]</tt> ||All RefSeq accessions with an amino acid sequence and taxon identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[RefSeq_Ac]'' on the web -- Entrez -- Protein" for more information. See also <tt>i.RefSeq_TOP</tt> for the first entry in this list of accessions.<br />
|-<br />
| <tt>i.UniProt_Ac</tt>||List||<tt>[Q7KSF4]</tt>||All UniProt accessions with an amino acid sequence and taxonomy identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[UniProt_Ac]'' on the web -- UniProt -- KB Beta" for more information. See also <tt>i.UniProt_Ac_TOP</tt> for the first entry in this list of accessions.<br />
|-<br />
| <tt>i.UniProt_ID</tt>||List||<tt>[Q7KSF4_DROME]</tt> ||All UniProt identifers with an amino acid sequence and taxonomy identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[UniProt_ID]'' on the web -- UniProt -- KB Beta" for more information. See also <tt>i.UniProt_ID_TOP</tt> for the first entry in this list of IDs.<br />
|-<br />
| <tt>i.canonical_rog</tt>||Integer||<tt>10121899</tt>||Related proteins (say splice isoforms from the same gene) will all belong to the same canonical group. One member of this group is assigned as the canonical representative of this group. The <tt>i.canonical_rog</tt> attribute lists the identifier of the protein's canonical group identifier. For example, all products of Entrez Gene 42066 have the same <tt>i.canonical_rog</tt> (<tt>10121899</tt>). Each of these gene products has its own identifier (because they each have a distinct amino acid sequence). One of the splice isoforms (<tt>NP_996224</tt>) was chosen as the canonical representative of this group. See the [http://irefindex.uio.no/wiki/Canonicalization canonicalization document] for more details on how canonical groups are constructed and how canonical representatives are chosen.<br />
|-<br />
| <tt>i.canonical_rogid</tt>||String||<tt>1ZFb1WlW0OgOlhiAPtkJTdb6oOg7227</tt>||This is a unique alphanumeric key for the canonical representative of the canonical group to which this node belongs. Briefly, an SHA-1 digest of the amino acid sequence is used to generate a unique 27 character key and this is prepended to the taxonomy identifier for the protein's source organism in order to make the rogid. See PMID 18823568 for details on how this key can be generated. This is a string equivalent of the <tt>i.canonical_rog</tt> attribute. All <tt>i.canonical_rog</tt> instances (each being an integer) have one corresponding <tt>i.canonical_rogid</tt>. See the [http://irefindex.uio.no/wiki/Canonicalization canonicalization document] for more details on how canonical groups are constructed and how canonical representatives are chosen. Note that the rogid for the protein represented by this specific node is listed under <tt>i.rogid</tt>.<br />
|-<br />
| <tt>i.dataset</tt>||Integer||<tt>0</tt>||In the batch query mode this can be used to locate the query batch (i.e. which group of queries were responsible for the node). In single query mode, when a sequence of queries are issued one after another this variable can be used to distinguish the results from each step. All nodes with a i.dataset value higher than 999 can be found using more than one batch of queries. <br />
|-<br />
| <tt>i.digid</tt>||List||<tt>449</tt>||This is an integer identifier that is shared by a group of disease entries in OMIM that are related by their titles. See [[DiG: Disease groups]] for more details. Also see <tt>i.omim</tt> and <tt>i.dig_title</tt>.<br />
|-<br />
| <tt>i.dig_title</tt>||List||<tt>[Fanconi anemia, complementation group B, 300514 (3), VACTERL association with hydrocephalus, X-linked, 314390 (3)]</tt>||These are entries from OMIM's Morbid Map that are all part of the same disease group. See [[DiG: Disease groups]] for more details. Also see <tt>i.omim</tt> and <tt>i.digid</tt>.<br />
|-<br />
| <tt>i.displayLabel</tt>||List||<tt>[Q7KSF4_DROME]</tt> ||This is a list of short labels chosen by iRefIndex to label the node using the VizMapper. The UniProt identifier is preferentially chosen (if one is available) followed by the Entrez Gene Symbol. See also <tt>i.displayLabel_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.geneID</tt>||List||<tt>[42066]</tt>||All NCBI Entrez Gene identifiers that encode a protein sequence identical to that of this node. Right click on this entry and select "Search ''[geneID]'' on the web -- Entrez -- Gene" for more information. See also <tt>i.geneID_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.geneSymbol</tt>||List||<tt>[CHER]</tt>||All NCBI Entrez Gene official symbols that encode a protein sequence identical to that of this node. Right click on this entry and select "Search ''[geneSymbol]'' on the web -- Entrez -- Gene" for more information. See also <tt>i.geneSymbol_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.interactor_description</tt>||List||<tt>[Q7KSF4_DROME, CHER, DMEL_CG3937, SKO, DMEL CG3937, FLN, CG3937, CHER, DMEL\\CG3937, FLN, SKO, CHER, NAME=CHER, DMEL_CG3937]</tt>||A collection of all the names in their short form as given by the original interaction databases. See also <tt>i.interactor_description_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.mass</tt>||Integer|| <tt>259142</tt> ||Mass associated with the protein sequence for this node. From UniProt, if available. You can search for nodes inside a mass range using the <tt>mass</tt> search type in the iRefIndex plugin.<br />
|-<br />
| <tt>i.omim</tt>||List||<tt>[608053]</tt>||List of OMIM disease identifiers associated with this protein. Right click on the entry and select "Search for ''[omim]'' on the web -- Entrez -- OMIM" for more information. <br />
|-<br />
| <tt>i.order</tt>||Integer|| <tt>0</tt> || The distance of this node from the query node (query node has distance <tt>0</tt>, nodes that are returned by a query because they are a part of the same canonical group have a value of <tt>10</tt>, direct neighbours have a value of<tt>1</tt>). Pseudonodes have negative values (<tt>-1</tt> is a complex holder, <tt>-2</tt> is a collapsed instance).<br />
|-<br />
| <tt>i.overall_degree_TOP</tt>||Integer|| <tt>42</tt> ||The total number of interactions described for this node in the iRefIndex database. Not all of these edges will be necessarily shown in the current view. This is the node degree in the full iRefIndex interactome. When calculating the value of this all proteins in iRefIndex (not only the ones currently loaded) will be used<br />
|-<br />
| <tt>i.popularity</tt>||List|| <tt>42</tt> || '''TO BE DESCRIBED'''<br />
|-<br />
| <tt>i.pseudonode</tt>||Boolean|| <tt>false</tt> || This is set to true is the node represents a "complex" or n-ary interaction record. Protein nodes with edges incident to a pseudonode are member interactors from the interaction record where specific interactions between pairs of interactors is unknown. Pseudonodes appear as hexagons when using the iRefIndex VizMapper style. <br />
|-<br />
| <tt>i.query</tt>||String||<tt>NP_996224</tt>||The user query used to retrieve this specific node. Neighbours of "query" nodes will not have an <tt>i.query</tt> value. Nodes returned by queries are coloured blue when using the iRefIndex VizMapper style.<br />
|-<br />
| <tt>i.rog</tt>||Integer||<tt>10121899</tt>||This is a unique identifier for the node assigned by iRefIndex (no two nodes will have the same ID). Each node corresponds to a distinct amino acid sequence associated with a distinct taxonomy identifier. <tt>i.rog</tt> also appears as the <tt>ID</tt> attribute. Each <tt>i.rog</tt> has a corresponding <tt>i.rogid</tt> - see below.<br />
|-<br />
| <tt>i.rogid</tt>||String||<tt>2mL9oLZ9g/SSPyK0nOz97RmOzPg3702</tt>||This is a unique alphanumeric key for the protein represented by this node. Briefly, an SHA-1 digest of the amino acid sequence is used to generate a unique 27 character key and this is prepended to the taxonomy identifier for the protein's source organism in order to make the rogid. See PMID 18823568 for details on how this key can be generated. This is a string equivalent of the <tt>i.rog</tt> attribute. All <tt>i.rog</tt> instances (each being an integer) have one corresponding <tt>i.rogid</tt>.<br />
|-<br />
| <tt>i.taxid</tt>||Integer||<tt>7227</tt>||The NCBI taxonomy identifier for this protein's source organism. See http://www.ncbi.nlm.nih.gov/taxonomy?term=7227 for more details of this example value for <tt>i.taxid</tt>.<br />
|-<br />
| <tt>i.xref</tt>||List||<tt>[AAF70826.1,Q9M6R5]</tt> ||All the accessions as given by the original interaction database records to describe this protein. See also <tt>i.xref_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.alive</tt>||Boolean||<tt>true or false</tt> ||This is true for all nodes after a search operation. This variable is used by the iRefScape filter and after a filter is applied, all nodes matching the filter criteria will have a true value for this variable (all other nodes will have false).<br />
|-<br />
| <tt>i.alive_degree</tt>||Integer||<tt>0,1,2-...</tt> ||This is will give the node degree after a search. When an iRefScape filter is applied this will give the number of nodes with "i.alive=true" connected to a particular node(How many nodes matching the filter criteria has connections with a particular node). <br />
|-<br />
|}<br />
<br />
===Edge Attributes===<br />
<br />
Each edge represents a distinct primary database record that supports some relationship between the two incident nodes. So, if an interaction between two proteins has been annotated by two databases (or twice by the same database) then two edges will appear between those two protein nodes.<br />
<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Attribute name<br />
! align="center" style="background:#f0f0f0;"|Data type<br />
! align="center" style="background:#f0f0f0;"|Example value<br />
! align="center" style="background:#f0f0f0;"|Description<br />
|-<br />
| <tt>ID</tt>||String||<tt>10121899 (2771704(40952)) 13911416</tt>||This is a unique identifier for the edge assigned by Cytoscape (no two edges will have same <tt>ID</tt>). See <tt>i.rig</tt> and <tt>i.rigid</tt> for unique identifiers for the edge assigned by iRefIndex.<br />
|-<br />
| <tt>i.PMID</tt>||Integer||<tt>14605208</tt>||Publication identifier of the publication where the interaction represented by the edge mentioned. Right click on this entry and select "Search ''[PMID]'' on the web -- Entrez -- Pubmed" for more details on the publication.<br />
|-<br />
| <tt>i.bait</tt>||Integer||<tt>13911416</tt>||Node ID for the protein that was used as a bait in this experiment. Only applicable where the experimental system (see <tt>i.method_name</tt>) used to support this relationship was a bait-prey system (for example, two hybrid).<br />
|-<br />
| <tt>i.canonical_rig</tt>||Integer||<tt>27799</tt>||See notes for the <tt>i.rig</tt> edge feature. This is the rig constructed for the interaction using its canonical rogs. Use a web browser to query http://wodaklab.org/iRefWeb/interaction/show/27799 (where <tt>27799</tt> is the <tt>i.canonical_rig</tt> value) to retrieve more information on this interaction and equivalent source interaction records.<br />
|-<br />
| <tt>i.experiment</tt>||String||<tt>Giot L [2003]</tt>||A short label for the experiment where this interaction was found (usually contains authors names).<br />
|-<br />
| <tt>i.flag</tt>||Integer||<tt>1</tt>||Used by iRefIndex plugin to control display of edges (<tt>0</tt> being the representative edge, used in edge toggle; <tt>1</tt> being an edge which will disappear during edge toggle; <tt>2</tt> being a complex holder edge; <tt>6</tt> being a path; <tt>7</tt> being an edge from or to a collapsed node).<br />
|-<br />
| <tt>i.host_taxid</tt>||Integer||<tt>7227</tt>||Indicates the organism taxonomy identifier where the interaction was experimentally demonstrated.<br />
|-<br />
| <tt>i.isLoop</tt>||Integer||<tt>1</tt>||Indicates whether the interaction is a self interaction (such as a dimer or possibly multimer of the same protein type). See the source interaction record for details.<br />
|-<br />
| <tt>i.method_cv</tt>||String||<tt>MI:0018</tt>||PSI-MI controlled vocabulary term identifier for the method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The name of the method is also given in the <tt>i.method_name</tt> feature.<br />
|-<br />
| <tt>i.method_name</tt>||String||<tt>two hybrid</tt>||PSI-MI controlled vocabulary term name for the method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term identifer is also given in the <tt>i.method_cv</tt> feature.<br />
|-<br />
| <tt>i.participant_identification</tt>||String||<tt>predetermined participant</tt>||PSI-MI controlled vocabulary term for the participant identification method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The identifier for the term is also given in the <tt>i.participant_cv</tt> feature.<br />
|-<br />
| <tt>i.participant_cv</tt>||String||<tt>predetermined participant</tt>||PSI-MI controlled vocabulary term identifier for the participant identification method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.participant_identification</tt> feature.<br />
|-<br />
| <tt>i.query</tt>||String||<tt>NP_996224</tt>||The user's query that is responsible for returning this edge.<br />
|-<br />
| <tt>i.rig</tt>||Integer||<tt>27799</tt>||Redundant interaction group identifier for the interaction. <br />
This is an integer equivalent of <tt>i.rigid</tt>. Every rig has one corresponding rigid.<br />
|-<br />
| <tt>i.rigid</tt>||String||<tt>TAabV6yJ1XzUvEhYwZLpu5reBU0</tt>||Redundant interaction group identifier for the interaction. This is a universal key generated for the interaction by ordering according to ASCII value and concatentating the rogids participating in the interaction and then generating a Base-64 representation of an SHA-1 digest of the resulting string. See PMID 18823568 for details on how this key can be generated.<br />
|-<br />
| <tt>i.score_hpr</tt>||Integer||<tt>15</tt>||The hpr score (highest pmid re-use) is the highest number of interactions that any one PMID (supporting this interaction) is used to support. See PMID 18823568 for details. See also <tt>i.score_np</tt> and <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.score_lpr</tt>||Integer||<tt>11</tt>||The lpr score (lowest pmid re-use) is the lowest number of distinct interactions that any one PMID (supporting this interaction) is used to support. An lpr of greater than 20 is considered to be a high-throughput experiment. See PMID 18823568 for details. See also <tt>i.score_np</tt> and <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.score_np</tt>||Integer||<tt>2</tt>||Number of PubMed Identifiers (PMIDs) pointing to literature where this interaction is supported. See PMID 18823568 for details. See also <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.source_protein</tt>||Integer||<tt>-1</tt>||'''TO BE DESCRIBED'''<br />
|-<br />
| <tt>i.src_intxn_db</tt>||String||<tt>grid</tt>||Original interaction database where this interaction record was obtained.<br />
|-<br />
| <tt>i.src_intxn_id</tt>||String||<tt>38677</tt>||Original interaction database where this interaction record was obtained. <br />
In some case, it may be possible to right click and "Search ''[src_intxn_id]'' on the web -- Interaction databases -- the database" to see the original record.<br />
|-<br />
| <tt>i.type_cv</tt>||String||<tt>MI:0407</tt>||PSI-MI controlled vocabulary term identifier for the interaction type that occurs between the two proteins. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.type_name</tt> feature.<br />
|-<br />
| <tt>i.type_name</tt>||String||<tt>direct interaction</tt>||PSI-MI controlled vocabulary term identifier for the interaction type that occurs between the two proteins. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.type_name</tt> feature.<br />
|-<br />
| <tt>i.target_protein</tt>||Integer||<tt>-1</tt>||'''TO BE DESCRIBED'''<br />
|-<br />
|}<br />
<br />
=== User Attributes ===<br />
<br />
See [[iRefScape Batch Files]] for information on adding attributes to search results.<br />
<br />
== Obtaining Updates to the Data ==<br />
<br />
You can check for and download updates to the dataset used by your plugin using the Wizard (see "Check for iRefIndex updates").<br />
<br />
iRefIndex updates are announced through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group]<br />
<br />
==Obtaining Updates to the Plugin==<br />
<br />
If you already have a plugin called iRefScape (a menu entry "iRefScape" under the plugin menu of Cytoscape) and you want to make sure you have the latest version, use "Update plugins" from the "Plugins" menu. However, if you want to reinstall the plugin, you should uninstall any previous version of the plugin first.<br />
<br />
Plugin updates are announced through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group]<br />
<br />
<!--<br />
<br />
==Integrating User Data into the Plugin==<br />
<br />
===How to create node and edge attributes ===<br />
<br />
Example: Attaching [[DiG: Disease groups]] identifiers to nodes<br />
<br />
==Updating==<br />
# From Cytoscape updater<br />
# Using plugins update feature<br />
<br />
== Log Files, Search Details and Errors ==<br />
# How to interpret log messages and save them for later reference. <br />
<br />
==Using the plugin as a search tool ==<br />
The plugin could also be used to search the current network. However, there is a better search option in Cytoscape with Google suggest which may be more convenient to use. The reason for including the search function was that the Cytoscape search filed remained inactive on some occasions for networks crated using the plugin. The reason for this is still unknown and deleting a node on the network seems to activate it, when this bug will be fixed the users are encouraged to use the Cytoscape search option.<br />
Currently, if a user performs a search with a term and if the corresponding protein is already loaded, the loaded protein (corresponding node) would be highlighted with Cytoscape default highlight colors. <br />
<br />
<br />
== Exit plugin and force terminate operations ==<br />
The exit button performs two functions. <br />
# First one is to exit iRefIndex plugin, where the outcome is to detach the plugin from Cytoscape. <br />
# The second function "FORCE STOP" (only available during a active task) is to terminate current operation. The "FORCE STOP" is useful when the search query or a subsequent operation takes too long to finish or none-responding. When a force stop is performed the out come is unpredictable and behavior was undefined, therefore results after such operation could not be trusted. <br />
<br />
--><br />
<br />
==Advanced features==<br />
<br />
The advanced features panel holds a number of tabbed panels, most of which expose settings which can be adjusted to change the behaviour of the normal search operations. Many panels offer contextual help via the iRefScape help system, but a brief description of each panel is also given here.<br />
<br />
{| cellpadding="10" cellspacing="0" border="1"<br />
! Preferences<br />
| This panel configures the range of search types (such as <tt>UniProt_Ac</tt>) presented in the main query interface. More search types can be added, and existing search types can be removed.<br />
|-<br />
! Statistics<br />
| A selection of statistics measures for the current network can be calculated and displayed using this panel.<br />
|-<br />
! Compare<br />
| This panel configures the <tt>COMPARE</tt> search operation and the equivalent functionality in the "Grouping" submenu of the iRefScape menu.<br />
|-<br />
! Summary<br />
| This panel generates node-by-node summaries where the attributes of each selected node (or of all nodes in the current network, if no nodes are selected) are presented in a separate table in the help viewer.<br />
|-<br />
! Filter<br />
| As an alternative to the manual selection of nodes and edges using the graphical user interface, this panel permits the selection of nodes and edges according to certain criteria based on node and edge attributes.<br />
|-<br />
! Path parameters<br />
| This pane provides options that configure the path-finding functionality described below.<br />
|-<br />
! Loading options<br />
| The options presented here affect the retrieval of data in search operations, including or excluding certain kinds of data (such as lists of values for certain attributes) in order to either simplify the results or speed up each search operation.<br />
|-<br />
! Import<br />
| The import panel provides the ability to import a generic Cytoscape network into iRefScape by interpreting node attributes as iRefScape queries.<br />
|-<br />
! Export<br />
| The export panel provides the ability to export an iRefScape network in such a way that other Cytoscape plugins may be able to access and manipulate the network's essential information.<br />
|}<br />
<br />
=== Path-finding ===<br />
<br />
[[Image:NP_002515-NP_742031.png|thumb|187px|The path in the results, highlighted in green. Solid green lines indicate presence of evidence for this step of the path in the direction specified by the query ''or'' the presence of evidence that has no directionality. A dashed green line indicates there is evidence for this step of the path but only in the direction that is opposite to that specified in the query.]]<br />
<br />
iRefScape can be used to find interaction events connecting two proteins or a sequence of events involving several proteins. <br />
<br />
This process intakes two terminal nodes as input and returns all reasonable paths connecting these two. The results returned here are pathway independent. In other words, the sequences of interactions connecting the nodes are not constructed using currently published pathways. However, the paths returned may contain pathway centric information.<br />
<br />
The query format is as follows:<br />
<br />
NP_203524 <==> NP_002871<br />
<br />
Additional type and taxonomy parameters were also supplied as required:<br />
<br />
* '''Search type:''' <tt>RefSeq_Ac</tt><br />
* '''Taxonomy:''' <tt>9606 (Homo sapiens)</tt><br />
<br />
This query located all reasonable paths between <tt>NP_203524</tt> and <tt>NP_002871</tt> and the returned path also contains the shortest path between them. The results of the path finding was sorted in the ascending order of path length and the maximum path length was restricted to a default value of 6; this value can be modified by changing the value of "Maximum distance" from the "Path parameters" tab in the advanced options panel. The paths found in this way were "reasonable paths", this concept is different from finding the shortest path or finding all the paths. A "reasonable path" from A to B is a path extending from A to B where none of the intermediate points can be reached from A with fewer steps by a path that extends from A via B (in other words, when evaluating a path from A to B, nodes beyond B are not considered).<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Reversing the Path ===<br />
<br />
[[Image:NP_742031-NP_002515.png|thumb|187px|The path in the results, highlighted in green]]<br />
<br />
The query rewritten to find the reversed path is as follows:<br />
<br />
NP_002871 <==> NP_203524<br />
<br />
In this case, the same nodes and edges are retrieved and the path is merely reversed.<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Differences in Forward and Reverse Directions ===<br />
<br />
[[Image:P62070-Q13322.png|thumb|198px|The path in the results, highlighted in green. Here, many nodes have been hidden in order to show the nodes involved in the path.]]<br />
<br />
Consider the following path query (using <tt>UniProt_Ac</tt> as the search type:<br />
<br />
P62070 <==> Q13322<br />
<br />
This produces a network of 214 nodes and 253 edges, and the result is shown in the illustration.<br />
<br />
<div style="clear: right"></div><br />
<br />
[[Image:Q13322-P62070.png|thumb|270px|The reverse path in the results, highlighted in green. Here, many nodes have been hidden in order to show the nodes involved in the path.]]<br />
<br />
However, when searching with the accessions reversed...<br />
<br />
Q13322 <==> P62070<br />
<br />
...a network of 46 nodes and 91 edges was produced, as illustrated.<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Path Selection ===<br />
<br />
[[File:IRefScape-1.18-path-selector.png|thumb|500px|The path selector for the results]]<br />
<br />
After the path-finding is completed the "path selection" panel can be used to selectively load the paths. In order to make the selection easier, the paths found can be described by a particular attribute type: by selecting a value from the list for "Convert pop-up type to" (such as <tt>UniProt_Ac</tt>) and pressing the "Convert" button, a tooltip appearing over each path description will show the requested attribute values for each component of the path. Thus, a path description such as...<br />
<br />
4664766 -> 2079075 -> 4770079<br />
<br />
...will provide a tooltip showing the following identifiers:<br />
<br />
Q13322 -> P06241 -> P62070<br />
<br />
A "query helper" panel will also show the converted identifiers.<br />
<br />
=== List Comparison ===<br />
<br />
This feature is available with version 0.91 and later.<br />
<br />
This feature provides a way to compare two lists of proteins. When a <tt>COMPARE{<List1>,<List2>}</tt> format query is issued with default settings an interaction network is loaded with interactions involving only the proteins of the list and proteins which are not in the list but interacts with at least two proteins from each list (intermediate components). At the end of the operation, in addition to the Cytoscape network a adjacency cube (adjacency matrix with colours as the third dimension) is also created. This adjacency cube is synchronized with the network and can be used examine the results easily. A summary report function is provided to list the overall summary of each protein in the list sorted order so that the most connected protein appear first. The identifiers used to display the proteins in the adjacency cube are either iROGID or the ROGID of complexes. The user has the option to visualize these in popular identifier types using convert feature.<br />
<br />
An example query (from PMID:20670417):<br />
<br />
COMPARE{P08588,P16671|P07550,P13945}<br />
<br />
This query compares two groups:<br />
<br />
# P08588,P16671<br />
# P07550,P13945<br />
<br />
Members within the group are separated with a comma (<tt>,</tt>); groups are separated by a pipe (<tt>|</tt>).<br />
<br />
====Questions and answers about list comparison====<br />
<br />
''What is the maximum number of members a group can have?''<br />
<br />
You could have any number of members. The more members there are, the more time it will take for the operation, and the more memory it will need. For instance the above example search will complete comfortably in 1 minute with 256MB of allocated memory. If you have more than 100 members we recommend having at least 1GB dedicated memory for Cytoscape. <br />
<br />
''Can I compare more than two groups?''<br />
No. Only two groups could be compared in the current version. If a protein appears in both groups being compared these proteins will be treated as a third group. But this third group is defined after the execution. <br />
<br />
''What if a protein or protein resulting from query appears in more than one group?''<br />
<br />
All proteins found in more than one group are treated as a new group (group 3).<br />
<br />
==Troubleshooting==<br />
<br />
* See http://cytoscape.org/ for a manual and a set of tutorials which describe the installation and use of Cytoscape.<br />
* For problems with Cytoscape installation or use, try the [http://groups-beta.google.com/group/cytoscape-helpdesk Cytoscape Help Desk].<br />
* If you have problems with installation or use, please share your experience with us through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group].<br />
* When updating data on Microsoft Windows XP and Vista, a "Failed to find resources message" may appear in the log message window. If this happens please run the update again and the plugin will check and correct the problem during the second attempt.<br />
* If you are working with large graphs, make sure Cytoscape has at least 128MB memory. See the [http://cytoscape.org/cgi-bin/moin.cgi/How_to_increase_memory_for_Cytoscape Cytoscape documentation] for more information on setting up memory allowances.<br />
<br />
<br />
==Internal Testing==<br />
Our internal test results for this release of the plugin can be found on the [[iRefScape Test Cases 1.0]] page.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefScape_1.0&diff=4013
iRefScape 1.0
2011-11-24T23:23:53Z
<p>PaulBoddie: Removed uninformative last update and release date details. Added "like" button.</p>
<hr />
<div>__NOTOC__<br />
<br />
<div class="floatright"><br />
[[Image:NP_499166-NP_501526-iterations-1-400x278.png]]<br />
<br />
<facebook-like /><br />
</div><br />
<br />
iRefScape is a plugin for Cytoscape that exposes iRefIndex data as a navigable graphical network.<br />
<br />
This page describes the iRefScape 1.0 plug-in for Cytoscape 2.8.x. See the [[#Compatibility_Information|compatibility information section]] for information on other versions.<br />
<br />
{|class="wikitable" style="text-align:left; clear:left; min-width:50%" border="0" cellpadding="10"<br />
| style="vertical-align: top" |<br />
== Installation ==<br />
<br />
See the [[#Installing_iRefScape|installation section]] for quick installation instructions and references to other documentation.<br />
|<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[#Installing_iRefScape|installation section]]<br />
</imagemap><br />
|-<br />
| style="vertical-align: top" |<br />
== Publication ==<br />
<br />
[http://www.biomedcentral.com/content/pdf/1471-2105-12-388.pdf iRefScape. A Cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex.]<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px|left<br />
default [http://www.biomedcentral.com/content/pdf/1471-2105-12-388.pdf]<br />
</imagemap><br />
|-<br />
| style="vertical-align: top" |<br />
== Contact information and mailing list ==<br />
Join the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group] to be informed of updates. See also the [[iRefScape|latest release of iRefScape]] which may differ from the release described here.<br />
|<imagemap><br />
Image:Internet-mail-80x80.png<br />
default [http://groups.google.com/group/irefindex?hl=en]<br />
</imagemap><br />
|}<br />
<br />
__TOC__<br />
<br />
== Compatibility Information ==<br />
<br />
See the following table for more detailed iRefScape compatibility information.<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Cytoscape<br />
! align="center" style="background:#f0f0f0;"|iRefScape<br />
|-<br />
| 2.8.1, 2.8.2<br />
| iRefScape 1.0 (described on this page)<br />
|-<br />
| 2.7.0<br />
| [[iRefScape 0.9]]<br />
|-<br />
| 2.6.3<br />
| [[iRefScape 0.8]]<br />
|}<br />
<br />
== Installing iRefScape ==<br />
<br />
The plugin can be installed using Cytoscape's plugin menu. Select...<br />
<br />
# "Manage plugins"<br />
# "Available for Install"<br />
# "Network and Attribute I/O"<br />
# "iRefScape" (where the precise version will provide a specific version such as "iRefScape 1.0")<br />
<br />
Then follow the on-screen instructions.<br />
<br />
{|class="wikitable" style="text-align:left; clear: left; border: 1px solid #cccccc" cellpadding="10"<br />
| style="vertical-align: top" |<br />
=== Installation Guide ===<br />
<br />
More detailed instructions, troubleshooting tips and alternative methods are available in the [[iRefScape 1.0 Installation|installation guide]].<br />
|<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[iRefScape 1.0 Installation|installation guide]]<br />
</imagemap><br />
|}<br />
<br />
After, installation, select the "iRefScape" entry from Cytoscape's plugin menu.<br />
<br />
When the plugin is started for the first time, it will download the publicly available data set.<br />
<br />
=== Tested systems ===<br />
This version of the iRefScape plugin has been tested with the following system configurations:<br />
<br />
{| cellspacing="0" cellpadding="10" border="1" style="margin: 2em"<br />
! style="background:#f0f0f0;" | Operating System<br />
! style="background:#f0f0f0;" | Java Version<br />
|-<br />
| Red Hat Enterprise Linux 5 (32-bit) (kernel 2.6.18)<br />
| 1.6.0_01 (32-bit)<br />
|-<br />
| Microsoft Windows 7 (64-bit)<br />
| 1.6.0_25 (64-bit)<br />
|-<br />
| Microsoft Windows Vista (32-bit)<br />
| 1.6.0 (32-bit)<br />
|-<br />
| Ubuntu Linux 8.04 (32-bit)<br />
| 1.6 (32-bit)<br />
|-<br />
| Mac OS X 10.6 (64-bit)<br />
| 1.6.0_15 (32-bit)<br />
|}<br />
<br />
Please refer to the [[iRefScape 1.0 Installation|installation guide]] for more details on system configuration issues.<br />
<br />
=== Source Code ===<br />
<br />
Since iRefScape is made available under version 3 or later of the [http://www.gnu.org/licenses/gpl.html GNU General Public License], the source code is also made available:<br />
<br />
* iRefScape 1.18:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/e12f853c5951 Source browser]<br />
* iRefScape 1.17:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/0001288b7527 Source browser]<br />
* iRefScape 1.16:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/3ade99fc92b6 Source browser]<br />
* [http://irefindex.uio.no/hg/iRefScape/ iRefScape repository home]<br />
<br />
Please consult the <tt>README.txt</tt> file in the source distribution for information on building the software.<br />
<br />
== Using the Wizard - an example search ==<br />
<br />
Click the "Wizard" button - a pop-up window will appear. <br />
<br />
Follow the prompts. Here is an example search:<br />
<br />
# Select "Search protein-protein interactions for a protein".<br />
# Select "UniProt identifier".<br />
# For "Taxonomy identifier", select "9606 (Human)" <br />
# Type <tt>QCR2_HUMAN</tt> in the provided space. Click "Next".<br />
# Click "Search & load".<br />
<!-- commenting these out since they are outdated<br />
The images below show each of the steps in the wizard.<br />
<br />
<gallery perrow="5"><br />
Image:IRefIndex-Cytoscape-Wizard.png|The iRefIndex wizard<br />
Image:IRefIndex-Cytoscape-Wizard-step2.png|Choosing a result type<br />
Image:IRefIndex-Cytoscape-Wizard-step3.png|Choosing a taxonomy type<br />
Image:IRefIndex-Cytoscape-Wizard-step4.png|Specifying the search term<br />
Image:IRefIndex-Cytoscape-Wizard-step5.png|Additional options<br />
</gallery><br />
--><br />
<br />
== Using the Search Panel ==<br />
<br />
To perform a search, the following steps are involved:<br />
<br />
# Enter query term(s)<br />
# Select a search type<br />
# Select taxonomy/organism<br />
# Adjust search options (iterations, new view, canonical expansion) - this is optional<br />
# Start the search<br />
<br />
=== Enter query term(s) ===<br />
<br />
Queries may be loaded from a file or by pasting the query into the text box (one query per line). Multiple queries can also be separated by pipe characters (<tt>|</tt>) or by tab characters. Queries with spaces in them should be enclosed in double quotes.<br />
<br />
=== Select a search type ===<br />
<br />
Example searches are listed below.<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Search Type<br />
! align="center" style="background:#f0f0f0;"|Example<br />
! align="center" style="background:#f0f0f0;"|Notes<br />
|-<br />
| <tt>RefSeq_Ac</tt>||<tt>NP_996224</tt>||See http://www.ncbi.nlm.nih.gov/protein/221379660<br />
|-<br />
| <tt>UniProt_Ac</tt>||<tt>Q7KSF4</tt>||See http://www.uniprot.org/uniprot/Q7KSF4<br />
|-<br />
| <tt>UniProt_ID</tt>||<tt>Q7KSF4_DROME</tt>||See http://www.uniprot.org/uniprot/Q7KSF4<br />
|-<br />
| <tt>geneID</tt>||<tt>42066</tt>||See http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Retrieve&dopt=full_report&list_uids=42066<br />
|-<br />
| <tt>geneSymbol</tt>||<tt>cher</tt>||See http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Retrieve&dopt=full_report&list_uids=42066<br />
|-<br />
| <tt>mass</tt>||<tt>72854<-->72866</tt>||Search protein interactors for a range of molecular mass (in Da).<br />
|-<br />
| <tt>rog</tt>||<tt>10121899</tt>||Redundant object group: iRefIndex's internal identifier for a protein. See note feature i.rog.<br />
|-<br />
| <tt>PMID</tt>||<tt>14605208</tt>||PubMed Identifier where an interaction is described. See http://www.ncbi.nlm.nih.gov/pubmed. Iterations and "Use canonical expansion" have no effect on this search type. This search will return all protein interactors in the given PMID and will automatically draw all interactions known between these proteins (even if these interactions are supported by different PMIDs). Select edges in the resulting graph, and see the i.PMID attribute in the Edge Attribute Browser.<br />
|-<br />
| <tt>src_intxn_id</tt>||<tt>EBI-212627</tt>||Source interaction database identifier. Iterations and "Use canonical expansion" have no effect on this search type. Caution: multiple databases may have overlapping interaction record identifiers (e.g. <tt>147805</tt> returns records from both BIND and BioGrid) and there is no way to limit this search to a specific database at this time.<br />
Equivalent interactions from other databases will be automatically retrieved using this search type (see provided example).<br />
|-<br />
| <tt>omim</tt>||<tt>227650</tt>||OMIM identifier. See http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=227650<br />
|-<br />
| <tt>digid</tt>||<tt>449</tt>||Internal identifier for a group of phenotypically related diseases. See [[DiG: Disease groups]]. A digid can be found by first performing a search for some omim identifier - the digid will then appear as the i.digid node attribute.<br />
|-<br />
|style="background:#f0f0f0;" colspan="3" align="center"| Additional search types: first select from Advanced features/Preferences.<br />
|-<br />
| <tt>dig_title</tt>||<tt>fanconi</tt>||Non-exact text search of OMIM titles. Select matching titles from the Query Helper and press return to copy titles to search box. Then hit "Search and load". See [[DiG: Disease groups]].<br />
|-<br />
| <tt>ROGID</tt>||<tt>5IrM14EfdlehbVJ0WAcAoQM3pFw9606</tt>||Exact search results for ROGID of a protein. This searches the i.rogid_TOP node feature. Users can also generate a ROGID for an amino acid sequence and taxon identifier pair using the Wizard/Create SEGUID/ROGID for sequence tool. See PMID 18823568.<br />
|-<br />
| <tt>RIGID</tt>||<tt>cXAoT7JjMde7J+CN/2tOR6gETyA</tt>||Exact search results for RIGID of an interaction. This searches the i.rigid edge feature. See PMID 18823568.<br />
|-<br />
|}<br />
<br />
=== Select taxonomy/organism ===<br />
<br />
This will limit the search results to a particular organism. An organism can be selected from the list, or a taxonomy identifier can be entered into the field itself. See [http://www.ncbi.nlm.nih.gov/taxonomy Entrez Taxonomy] for more details on taxonomy identifiers. For most search types, it is acceptable to leave this field set to <tt>Any</tt>.<br />
<br />
=== Adjust search options ===<br />
<br />
The following optional adjustments can be made:<br />
<br />
==== Iterations ====<br />
<br />
A distance from the query list's members can be specified:<br />
<br />
* Selecting <tt>0</tt> will return only interactions between nodes found by the query list<br />
* Selecting <tt>1</tt> will return immediate neighbours of nodes in the query list<br />
<br />
==== Create new view ====<br />
<br />
A new view will be opened for the search results if this option is selected. Otherwise, the results will be added to the current view.<br />
<br />
==== Use canonical expansion ====<br />
<br />
Selecting this option will expand the search to include all proteins that are related to the query protein (for example, splice isoforms). See [[Canonicalization]] for technical details.<br />
<br />
=== Start the search ===<br />
<br />
Press the "Search and load" button to perform the search.<br />
<br />
{{Note|<br />
See the [[iRefScape Batch Files]] document for information on using text files to describe searches, annotate result nodes and to define new search types using user-supplied data.<br />
}}<br />
<br />
== Viewing the Results ==<br />
<br />
=== Colours and Shapes ===<br />
<br />
* Blue nodes corresponds to proteins found by your query<br />
* Green nodes are interacting partners for your query protein<br />
* Purple hexagons are complex-nodes (also called pseudo-nodes); they keep partners of a complex together (i.e. QCR6_HUMAN is found in two complexes also involving "QCR2_HUMAN")<br />
* Orange-yellow edges indicate protein-protein interactions and pink edges represent membership of some protein in a complex<br />
<br />
=== Toggling Edges ===<br />
<br />
Multiple edges may appear between two nodes. These represent separate interaction records that support this link. Details on each original record can be viewed using the edge attribute viewer (below). You can toggle this multi-view on and off by selecting "Toggle selected multi-edges" in the iRefScape/View Tools menu. Only one of the edges will be shown in the collapsed view.<br />
<br />
=== iRefScape Menu ===<br />
<br />
The iRefScape menu in the Cytoscape menu bar contains a number of other functions that may help with searching and viewing interaction data. These are described in more detail in the [[iRefScape plugin menu]] document.<br />
<br />
=== Expanding the Interaction Map ===<br />
<br />
You can search for additional interactions by right-clicking on a node and selecting "iRefIndex -- Retrieve interactions".<br />
<br />
Some example result displays are shown below.<br />
<br />
<gallery widths="500px" heights="300px"><br />
Image:QCR2_HUMAN_initial.png|Results<br />
Image:QCR2_HUMAN.png|Results (tidied)<br />
</gallery><br />
<br />
== Attributes ==<br />
<br />
[[Image:iRefIndex-0.83-node-attributes-close-up-closed.png|right|The node attributes menu]]<br />
<br />
There are two types of attributes available from iRefIndex: node attributes and edge attributes. These may be used to view information about selected nodes or edges (like <tt>i.taxid</tt>). Some features may allow the user to link out to additional data sources through the "right-click" menu (like <tt>i.geneID</tt>). Features may also be used to sort and select nodes and edges with specific attributes (like <tt>i.order</tt>). The <tt>i.query</tt> feature shows the user's query that is responsible for returning the node or edge.<br />
<br />
Brief descriptions and examples of each attribute are provided below. <br />
<br />
The user must first select the attributes that are to be displayed. This can be done by clicking on the "attribute" icon at the top of the node or edge attribute browser, as shown in the illustrative images.<br />
<br />
<div style="clear: right"></div><br />
=== Node Attributes ===<br />
<br />
[[Image:iRefIndex-0.83-node-attributes-close-up-open.png|right|The open node attributes menu]]<br />
<br />
Each node represents a distinct amino acid sequence (protein) from a distinct organism (taxonomy identifier). Each of the attributes below, provide additional information about the node. Although each node is distinct, a graph produced by iRefIndex may contain multiple nodes that are related proteins (such as splice isoform products from the same gene). These nodes will all have the same <tt>i.canonical_rog</tt> and <tt>i.canonical_rogid</tt> feature values. See the notes below.<br />
<br />
Node attributes that can be lists of items (like <tt>i.UniProt</tt>) will have a corresponding attribute called <tt>i.''attribute name''_TOP</tt> (for example, <tt>i.UniProt_TOP</tt>) which provides the first item of the associated list.<br />
<br />
<div style="clear: right"></div><br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Attribute name<br />
! align="center" style="background:#f0f0f0;"|Data type<br />
! align="center" style="background:#f0f0f0;"|Example value<br />
! align="center" style="background:#f0f0f0;"|Description<br />
|-<br />
| <tt>ID</tt>||Integer||<tt>10121899</tt>||This is a unique identifier for the node assigned by iRefIndex (no two nodes will have the same ID). Each node corresponds to a distinct amino acid sequence from a distinct taxonomy identifier. See also <tt>i.rog</tt> and <tt>i.rogid</tt>.<br />
|-<br />
| <tt>canonicalName</tt>||Integer||<tt>10121899</tt>||This is the same as <tt>ID</tt>. This attribute is set by Cytoscape and is unrelated to the <tt>i.canonical_rog</tt> or <tt>i.canonical_rogid</tt> used by iRefIndex<br />
|-<br />
| <tt>i.RefSeq_Ac</tt>||List||<tt>[NP_996224]</tt> ||All RefSeq accessions with an amino acid sequence and taxon identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[RefSeq_Ac]'' on the web -- Entrez -- Protein" for more information. See also <tt>i.RefSeq_TOP</tt> for the first entry in this list of accessions.<br />
|-<br />
| <tt>i.UniProt_Ac</tt>||List||<tt>[Q7KSF4]</tt>||All UniProt accessions with an amino acid sequence and taxonomy identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[UniProt_Ac]'' on the web -- UniProt -- KB Beta" for more information. See also <tt>i.UniProt_Ac_TOP</tt> for the first entry in this list of accessions.<br />
|-<br />
| <tt>i.UniProt_ID</tt>||List||<tt>[Q7KSF4_DROME]</tt> ||All UniProt identifers with an amino acid sequence and taxonomy identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[UniProt_ID]'' on the web -- UniProt -- KB Beta" for more information. See also <tt>i.UniProt_ID_TOP</tt> for the first entry in this list of IDs.<br />
|-<br />
| <tt>i.canonical_rog</tt>||Integer||<tt>10121899</tt>||Related proteins (say splice isoforms from the same gene) will all belong to the same canonical group. One member of this group is assigned as the canonical representative of this group. The <tt>i.canonical_rog</tt> attribute lists the identifier of the protein's canonical group identifier. For example, all products of Entrez Gene 42066 have the same <tt>i.canonical_rog</tt> (<tt>10121899</tt>). Each of these gene products has its own identifier (because they each have a distinct amino acid sequence). One of the splice isoforms (<tt>NP_996224</tt>) was chosen as the canonical representative of this group. See the [http://irefindex.uio.no/wiki/Canonicalization canonicalization document] for more details on how canonical groups are constructed and how canonical representatives are chosen.<br />
|-<br />
| <tt>i.canonical_rogid</tt>||String||<tt>1ZFb1WlW0OgOlhiAPtkJTdb6oOg7227</tt>||This is a unique alphanumeric key for the canonical representative of the canonical group to which this node belongs. Briefly, an SHA-1 digest of the amino acid sequence is used to generate a unique 27 character key and this is prepended to the taxonomy identifier for the protein's source organism in order to make the rogid. See PMID 18823568 for details on how this key can be generated. This is a string equivalent of the <tt>i.canonical_rog</tt> attribute. All <tt>i.canonical_rog</tt> instances (each being an integer) have one corresponding <tt>i.canonical_rogid</tt>. See the [http://irefindex.uio.no/wiki/Canonicalization canonicalization document] for more details on how canonical groups are constructed and how canonical representatives are chosen. Note that the rogid for the protein represented by this specific node is listed under <tt>i.rogid</tt>.<br />
|-<br />
| <tt>i.dataset</tt>||Integer||<tt>0</tt>||In the batch query mode this can be used to locate the query batch (i.e. which group of queries were responsible for the node). In single query mode, when a sequence of queries are issued one after another this variable can be used to distinguish the results from each step. All nodes with a i.dataset value higher than 999 can be found using more than one batch of queries. <br />
|-<br />
| <tt>i.digid</tt>||List||<tt>449</tt>||This is an integer identifier that is shared by a group of disease entries in OMIM that are related by their titles. See [[DiG: Disease groups]] for more details. Also see <tt>i.omim</tt> and <tt>i.dig_title</tt>.<br />
|-<br />
| <tt>i.dig_title</tt>||List||<tt>[Fanconi anemia, complementation group B, 300514 (3), VACTERL association with hydrocephalus, X-linked, 314390 (3)]</tt>||These are entries from OMIM's Morbid Map that are all part of the same disease group. See [[DiG: Disease groups]] for more details. Also see <tt>i.omim</tt> and <tt>i.digid</tt>.<br />
|-<br />
| <tt>i.displayLabel</tt>||List||<tt>[Q7KSF4_DROME]</tt> ||This is a list of short labels chosen by iRefIndex to label the node using the VizMapper. The UniProt identifier is preferentially chosen (if one is available) followed by the Entrez Gene Symbol. See also <tt>i.displayLabel_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.geneID</tt>||List||<tt>[42066]</tt>||All NCBI Entrez Gene identifiers that encode a protein sequence identical to that of this node. Right click on this entry and select "Search ''[geneID]'' on the web -- Entrez -- Gene" for more information. See also <tt>i.geneID_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.geneSymbol</tt>||List||<tt>[CHER]</tt>||All NCBI Entrez Gene official symbols that encode a protein sequence identical to that of this node. Right click on this entry and select "Search ''[geneSymbol]'' on the web -- Entrez -- Gene" for more information. See also <tt>i.geneSymbol_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.interactor_description</tt>||List||<tt>[Q7KSF4_DROME, CHER, DMEL_CG3937, SKO, DMEL CG3937, FLN, CG3937, CHER, DMEL\\CG3937, FLN, SKO, CHER, NAME=CHER, DMEL_CG3937]</tt>||A collection of all the names in their short form as given by the original interaction databases. See also <tt>i.interactor_description_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.mass</tt>||Integer|| <tt>259142</tt> ||Mass associated with the protein sequence for this node. From UniProt, if available. You can search for nodes inside a mass range using the <tt>mass</tt> search type in the iRefIndex plugin.<br />
|-<br />
| <tt>i.omim</tt>||List||<tt>[608053]</tt>||List of OMIM disease identifiers associated with this protein. Right click on the entry and select "Search for ''[omim]'' on the web -- Entrez -- OMIM" for more information. <br />
|-<br />
| <tt>i.order</tt>||Integer|| <tt>0</tt> || The distance of this node from the query node (query node has distance <tt>0</tt>, nodes that are returned by a query because they are a part of the same canonical group have a value of <tt>10</tt>, direct neighbours have a value of<tt>1</tt>). Pseudonodes have negative values (<tt>-1</tt> is a complex holder, <tt>-2</tt> is a collapsed instance).<br />
|-<br />
| <tt>i.overall_degree_TOP</tt>||Integer|| <tt>42</tt> ||The total number of interactions described for this node in the iRefIndex database. Not all of these edges will be necessarily shown in the current view. This is the node degree in the full iRefIndex interactome. When calculating the value of this all proteins in iRefIndex (not only the ones currently loaded) will be used<br />
|-<br />
| <tt>i.popularity</tt>||List|| <tt>42</tt> || '''TO BE DESCRIBED'''<br />
|-<br />
| <tt>i.pseudonode</tt>||Boolean|| <tt>false</tt> || This is set to true is the node represents a "complex" or n-ary interaction record. Protein nodes with edges incident to a pseudonode are member interactors from the interaction record where specific interactions between pairs of interactors is unknown. Pseudonodes appear as hexagons when using the iRefIndex VizMapper style. <br />
|-<br />
| <tt>i.query</tt>||String||<tt>NP_996224</tt>||The user query used to retrieve this specific node. Neighbours of "query" nodes will not have an <tt>i.query</tt> value. Nodes returned by queries are coloured blue when using the iRefIndex VizMapper style.<br />
|-<br />
| <tt>i.rog</tt>||Integer||<tt>10121899</tt>||This is a unique identifier for the node assigned by iRefIndex (no two nodes will have the same ID). Each node corresponds to a distinct amino acid sequence associated with a distinct taxonomy identifier. <tt>i.rog</tt> also appears as the <tt>ID</tt> attribute. Each <tt>i.rog</tt> has a corresponding <tt>i.rogid</tt> - see below.<br />
|-<br />
| <tt>i.rogid</tt>||String||<tt>2mL9oLZ9g/SSPyK0nOz97RmOzPg3702</tt>||This is a unique alphanumeric key for the protein represented by this node. Briefly, an SHA-1 digest of the amino acid sequence is used to generate a unique 27 character key and this is prepended to the taxonomy identifier for the protein's source organism in order to make the rogid. See PMID 18823568 for details on how this key can be generated. This is a string equivalent of the <tt>i.rog</tt> attribute. All <tt>i.rog</tt> instances (each being an integer) have one corresponding <tt>i.rogid</tt>.<br />
|-<br />
| <tt>i.taxid</tt>||Integer||<tt>7227</tt>||The NCBI taxonomy identifier for this protein's source organism. See http://www.ncbi.nlm.nih.gov/taxonomy?term=7227 for more details of this example value for <tt>i.taxid</tt>.<br />
|-<br />
| <tt>i.xref</tt>||List||<tt>[AAF70826.1,Q9M6R5]</tt> ||All the accessions as given by the original interaction database records to describe this protein. See also <tt>i.xref_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.alive</tt>||Boolean||<tt>true or false</tt> ||This is true for all nodes after a search operation. This variable is used by the iRefScape filter and after a filter is applied, all nodes matching the filter criteria will have a true value for this variable (all other nodes will have false).<br />
|-<br />
| <tt>i.alive_degree</tt>||Integer||<tt>0,1,2-...</tt> ||This is will give the node degree after a search. When an iRefScape filter is applied this will give the number of nodes with "i.alive=true" connected to a particular node(How many nodes matching the filter criteria has connections with a particular node). <br />
|-<br />
|}<br />
<br />
===Edge Attributes===<br />
<br />
Each edge represents a distinct primary database record that supports some relationship between the two incident nodes. So, if an interaction between two proteins has been annotated by two databases (or twice by the same database) then two edges will appear between those two protein nodes.<br />
<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Attribute name<br />
! align="center" style="background:#f0f0f0;"|Data type<br />
! align="center" style="background:#f0f0f0;"|Example value<br />
! align="center" style="background:#f0f0f0;"|Description<br />
|-<br />
| <tt>ID</tt>||String||<tt>10121899 (2771704(40952)) 13911416</tt>||This is a unique identifier for the edge assigned by Cytoscape (no two edges will have same <tt>ID</tt>). See <tt>i.rig</tt> and <tt>i.rigid</tt> for unique identifiers for the edge assigned by iRefIndex.<br />
|-<br />
| <tt>i.PMID</tt>||Integer||<tt>14605208</tt>||Publication identifier of the publication where the interaction represented by the edge mentioned. Right click on this entry and select "Search ''[PMID]'' on the web -- Entrez -- Pubmed" for more details on the publication.<br />
|-<br />
| <tt>i.bait</tt>||Integer||<tt>13911416</tt>||Node ID for the protein that was used as a bait in this experiment. Only applicable where the experimental system (see <tt>i.method_name</tt>) used to support this relationship was a bait-prey system (for example, two hybrid).<br />
|-<br />
| <tt>i.canonical_rig</tt>||Integer||<tt>27799</tt>||See notes for the <tt>i.rig</tt> edge feature. This is the rig constructed for the interaction using its canonical rogs. Use a web browser to query http://wodaklab.org/iRefWeb/interaction/show/27799 (where <tt>27799</tt> is the <tt>i.canonical_rig</tt> value) to retrieve more information on this interaction and equivalent source interaction records.<br />
|-<br />
| <tt>i.experiment</tt>||String||<tt>Giot L [2003]</tt>||A short label for the experiment where this interaction was found (usually contains authors names).<br />
|-<br />
| <tt>i.flag</tt>||Integer||<tt>1</tt>||Used by iRefIndex plugin to control display of edges (<tt>0</tt> being the representative edge, used in edge toggle; <tt>1</tt> being an edge which will disappear during edge toggle; <tt>2</tt> being a complex holder edge; <tt>6</tt> being a path; <tt>7</tt> being an edge from or to a collapsed node).<br />
|-<br />
| <tt>i.host_taxid</tt>||Integer||<tt>7227</tt>||Indicates the organism taxonomy identifier where the interaction was experimentally demonstrated.<br />
|-<br />
| <tt>i.isLoop</tt>||Integer||<tt>1</tt>||Indicates whether the interaction is a self interaction (such as a dimer or possibly multimer of the same protein type). See the source interaction record for details.<br />
|-<br />
| <tt>i.method_cv</tt>||String||<tt>MI:0018</tt>||PSI-MI controlled vocabulary term identifier for the method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The name of the method is also given in the <tt>i.method_name</tt> feature.<br />
|-<br />
| <tt>i.method_name</tt>||String||<tt>two hybrid</tt>||PSI-MI controlled vocabulary term name for the method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term identifer is also given in the <tt>i.method_cv</tt> feature.<br />
|-<br />
| <tt>i.participant_identification</tt>||String||<tt>predetermined participant</tt>||PSI-MI controlled vocabulary term for the participant identification method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The identifier for the term is also given in the <tt>i.participant_cv</tt> feature.<br />
|-<br />
| <tt>i.participant_cv</tt>||String||<tt>predetermined participant</tt>||PSI-MI controlled vocabulary term identifier for the participant identification method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.participant_identification</tt> feature.<br />
|-<br />
| <tt>i.query</tt>||String||<tt>NP_996224</tt>||The user's query that is responsible for returning this edge.<br />
|-<br />
| <tt>i.rig</tt>||Integer||<tt>27799</tt>||Redundant interaction group identifier for the interaction. <br />
This is an integer equivalent of <tt>i.rigid</tt>. Every rig has one corresponding rigid.<br />
|-<br />
| <tt>i.rigid</tt>||String||<tt>TAabV6yJ1XzUvEhYwZLpu5reBU0</tt>||Redundant interaction group identifier for the interaction. This is a universal key generated for the interaction by ordering according to ASCII value and concatentating the rogids participating in the interaction and then generating a Base-64 representation of an SHA-1 digest of the resulting string. See PMID 18823568 for details on how this key can be generated.<br />
|-<br />
| <tt>i.score_hpr</tt>||Integer||<tt>15</tt>||The hpr score (highest pmid re-use) is the highest number of interactions that any one PMID (supporting this interaction) is used to support. See PMID 18823568 for details. See also <tt>i.score_np</tt> and <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.score_lpr</tt>||Integer||<tt>11</tt>||The lpr score (lowest pmid re-use) is the lowest number of distinct interactions that any one PMID (supporting this interaction) is used to support. An lpr of greater than 20 is considered to be a high-throughput experiment. See PMID 18823568 for details. See also <tt>i.score_np</tt> and <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.score_np</tt>||Integer||<tt>2</tt>||Number of PubMed Identifiers (PMIDs) pointing to literature where this interaction is supported. See PMID 18823568 for details. See also <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.source_protein</tt>||Integer||<tt>-1</tt>||'''TO BE DESCRIBED'''<br />
|-<br />
| <tt>i.src_intxn_db</tt>||String||<tt>grid</tt>||Original interaction database where this interaction record was obtained.<br />
|-<br />
| <tt>i.src_intxn_id</tt>||String||<tt>38677</tt>||Original interaction database where this interaction record was obtained. <br />
In some case, it may be possible to right click and "Search ''[src_intxn_id]'' on the web -- Interaction databases -- the database" to see the original record.<br />
|-<br />
| <tt>i.type_cv</tt>||String||<tt>MI:0407</tt>||PSI-MI controlled vocabulary term identifier for the interaction type that occurs between the two proteins. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.type_name</tt> feature.<br />
|-<br />
| <tt>i.type_name</tt>||String||<tt>direct interaction</tt>||PSI-MI controlled vocabulary term identifier for the interaction type that occurs between the two proteins. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.type_name</tt> feature.<br />
|-<br />
| <tt>i.target_protein</tt>||Integer||<tt>-1</tt>||'''TO BE DESCRIBED'''<br />
|-<br />
|}<br />
<br />
=== User Attributes ===<br />
<br />
See [[iRefScape Batch Files]] for information on adding attributes to search results.<br />
<br />
== Obtaining Updates to the Data ==<br />
<br />
You can check for and download updates to the dataset used by your plugin using the Wizard (see "Check for iRefIndex updates").<br />
<br />
iRefIndex updates are announced through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group]<br />
<br />
==Obtaining Updates to the Plugin==<br />
<br />
If you already have a plugin called iRefScape (a menu entry "iRefScape" under the plugin menu of Cytoscape) and you want to make sure you have the latest version, use "Update plugins" from the "Plugins" menu. However, if you want to reinstall the plugin, you should uninstall any previous version of the plugin first.<br />
<br />
Plugin updates are announced through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group]<br />
<br />
<!--<br />
<br />
==Integrating User Data into the Plugin==<br />
<br />
===How to create node and edge attributes ===<br />
<br />
Example: Attaching [[DiG: Disease groups]] identifiers to nodes<br />
<br />
==Updating==<br />
# From Cytoscape updater<br />
# Using plugins update feature<br />
<br />
== Log Files, Search Details and Errors ==<br />
# How to interpret log messages and save them for later reference. <br />
<br />
==Using the plugin as a search tool ==<br />
The plugin could also be used to search the current network. However, there is a better search option in Cytoscape with Google suggest which may be more convenient to use. The reason for including the search function was that the Cytoscape search filed remained inactive on some occasions for networks crated using the plugin. The reason for this is still unknown and deleting a node on the network seems to activate it, when this bug will be fixed the users are encouraged to use the Cytoscape search option.<br />
Currently, if a user performs a search with a term and if the corresponding protein is already loaded, the loaded protein (corresponding node) would be highlighted with Cytoscape default highlight colors. <br />
<br />
<br />
== Exit plugin and force terminate operations ==<br />
The exit button performs two functions. <br />
# First one is to exit iRefIndex plugin, where the outcome is to detach the plugin from Cytoscape. <br />
# The second function "FORCE STOP" (only available during a active task) is to terminate current operation. The "FORCE STOP" is useful when the search query or a subsequent operation takes too long to finish or none-responding. When a force stop is performed the out come is unpredictable and behavior was undefined, therefore results after such operation could not be trusted. <br />
<br />
--><br />
<br />
==Advanced features==<br />
<br />
The advanced features panel holds a number of tabbed panels, most of which expose settings which can be adjusted to change the behaviour of the normal search operations. Many panels offer contextual help via the iRefScape help system, but a brief description of each panel is also given here.<br />
<br />
{| cellpadding="10" cellspacing="0" border="1"<br />
! Preferences<br />
| This panel configures the range of search types (such as <tt>UniProt_Ac</tt>) presented in the main query interface. More search types can be added, and existing search types can be removed.<br />
|-<br />
! Statistics<br />
| A selection of statistics measures for the current network can be calculated and displayed using this panel.<br />
|-<br />
! Compare<br />
| This panel configures the <tt>COMPARE</tt> search operation and the equivalent functionality in the "Grouping" submenu of the iRefScape menu.<br />
|-<br />
! Summary<br />
| This panel generates node-by-node summaries where the attributes of each selected node (or of all nodes in the current network, if no nodes are selected) are presented in a separate table in the help viewer.<br />
|-<br />
! Filter<br />
| As an alternative to the manual selection of nodes and edges using the graphical user interface, this panel permits the selection of nodes and edges according to certain criteria based on node and edge attributes.<br />
|-<br />
! Path parameters<br />
| This pane provides options that configure the path-finding functionality described below.<br />
|-<br />
! Loading options<br />
| The options presented here affect the retrieval of data in search operations, including or excluding certain kinds of data (such as lists of values for certain attributes) in order to either simplify the results or speed up each search operation.<br />
|-<br />
! Import<br />
| The import panel provides the ability to import a generic Cytoscape network into iRefScape by interpreting node attributes as iRefScape queries.<br />
|-<br />
! Export<br />
| The export panel provides the ability to export an iRefScape network in such a way that other Cytoscape plugins may be able to access and manipulate the network's essential information.<br />
|}<br />
<br />
=== Path-finding ===<br />
<br />
[[Image:NP_002515-NP_742031.png|thumb|187px|The path in the results, highlighted in green. Solid green lines indicate presence of evidence for this step of the path in the direction specified by the query ''or'' the presence of evidence that has no directionality. A dashed green line indicates there is evidence for this step of the path but only in the direction that is opposite to that specified in the query.]]<br />
<br />
iRefScape can be used to find interaction events connecting two proteins or a sequence of events involving several proteins. <br />
<br />
This process intakes two terminal nodes as input and returns all reasonable paths connecting these two. The results returned here are pathway independent. In other words, the sequences of interactions connecting the nodes are not constructed using currently published pathways. However, the paths returned may contain pathway centric information.<br />
<br />
The query format is as follows:<br />
<br />
NP_203524 <==> NP_002871<br />
<br />
Additional type and taxonomy parameters were also supplied as required:<br />
<br />
* '''Search type:''' <tt>RefSeq_Ac</tt><br />
* '''Taxonomy:''' <tt>9606 (Homo sapiens)</tt><br />
<br />
This query located all reasonable paths between <tt>NP_203524</tt> and <tt>NP_002871</tt> and the returned path also contains the shortest path between them. The results of the path finding was sorted in the ascending order of path length and the maximum path length was restricted to a default value of 6; this value can be modified by changing the value of "Maximum distance" from the "Path parameters" tab in the advanced options panel. The paths found in this way were "reasonable paths", this concept is different from finding the shortest path or finding all the paths. A "reasonable path" from A to B is a path extending from A to B where none of the intermediate points can be reached from A with fewer steps by a path that extends from A via B (in other words, when evaluating a path from A to B, nodes beyond B are not considered).<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Reversing the Path ===<br />
<br />
[[Image:NP_742031-NP_002515.png|thumb|187px|The path in the results, highlighted in green]]<br />
<br />
The query rewritten to find the reversed path is as follows:<br />
<br />
NP_002871 <==> NP_203524<br />
<br />
In this case, the same nodes and edges are retrieved and the path is merely reversed.<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Differences in Forward and Reverse Directions ===<br />
<br />
[[Image:P62070-Q13322.png|thumb|198px|The path in the results, highlighted in green. Here, many nodes have been hidden in order to show the nodes involved in the path.]]<br />
<br />
Consider the following path query (using <tt>UniProt_Ac</tt> as the search type:<br />
<br />
P62070 <==> Q13322<br />
<br />
This produces a network of 214 nodes and 253 edges, and the result is shown in the illustration.<br />
<br />
<div style="clear: right"></div><br />
<br />
[[Image:Q13322-P62070.png|thumb|270px|The reverse path in the results, highlighted in green. Here, many nodes have been hidden in order to show the nodes involved in the path.]]<br />
<br />
However, when searching with the accessions reversed...<br />
<br />
Q13322 <==> P62070<br />
<br />
...a network of 46 nodes and 91 edges was produced, as illustrated.<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Path Selection ===<br />
<br />
[[File:IRefScape-1.18-path-selector.png|thumb|500px|The path selector for the results]]<br />
<br />
After the path-finding is completed the "path selection" panel can be used to selectively load the paths. In order to make the selection easier, the paths found can be described by a particular attribute type: by selecting a value from the list for "Convert pop-up type to" (such as <tt>UniProt_Ac</tt>) and pressing the "Convert" button, a tooltip appearing over each path description will show the requested attribute values for each component of the path. Thus, a path description such as...<br />
<br />
4664766 -> 2079075 -> 4770079<br />
<br />
...will provide a tooltip showing the following identifiers:<br />
<br />
Q13322 -> P06241 -> P62070<br />
<br />
A "query helper" panel will also show the converted identifiers.<br />
<br />
=== List Comparison ===<br />
<br />
This feature is available with version 0.91 and later.<br />
<br />
This feature provides a way to compare two lists of proteins. When a <tt>COMPARE{<List1>,<List2>}</tt> format query is issued with default settings an interaction network is loaded with interactions involving only the proteins of the list and proteins which are not in the list but interacts with at least two proteins from each list (intermediate components). At the end of the operation, in addition to the Cytoscape network a adjacency cube (adjacency matrix with colours as the third dimension) is also created. This adjacency cube is synchronized with the network and can be used examine the results easily. A summary report function is provided to list the overall summary of each protein in the list sorted order so that the most connected protein appear first. The identifiers used to display the proteins in the adjacency cube are either iROGID or the ROGID of complexes. The user has the option to visualize these in popular identifier types using convert feature.<br />
<br />
An example query (from PMID:20670417):<br />
<br />
COMPARE{P08588,P16671|P07550,P13945}<br />
<br />
This query compares two groups:<br />
<br />
# P08588,P16671<br />
# P07550,P13945<br />
<br />
Members within the group are separated with a comma (<tt>,</tt>); groups are separated by a pipe (<tt>|</tt>).<br />
<br />
====Questions and answers about list comparison====<br />
<br />
''What is the maximum number of members a group can have?''<br />
<br />
You could have any number of members. The more members there are, the more time it will take for the operation, and the more memory it will need. For instance the above example search will complete comfortably in 1 minute with 256MB of allocated memory. If you have more than 100 members we recommend having at least 1GB dedicated memory for Cytoscape. <br />
<br />
''Can I compare more than two groups?''<br />
No. Only two groups could be compared in the current version. If a protein appears in both groups being compared these proteins will be treated as a third group. But this third group is defined after the execution. <br />
<br />
''What if a protein or protein resulting from query appears in more than one group?''<br />
<br />
All proteins found in more than one group are treated as a new group (group 3).<br />
<br />
==Troubleshooting==<br />
<br />
* See http://cytoscape.org/ for a manual and a set of tutorials which describe the installation and use of Cytoscape.<br />
* For problems with Cytoscape installation or use, try the [http://groups-beta.google.com/group/cytoscape-helpdesk Cytoscape Help Desk].<br />
* If you have problems with installation or use, please share your experience with us through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group].<br />
* When updating data on Microsoft Windows XP and Vista, a "Failed to find resources message" may appear in the log message window. If this happens please run the update again and the plugin will check and correct the problem during the second attempt.<br />
* If you are working with large graphs, make sure Cytoscape has at least 128MB memory. See the [http://cytoscape.org/cgi-bin/moin.cgi/How_to_increase_memory_for_Cytoscape Cytoscape documentation] for more information on setting up memory allowances.<br />
<br />
<br />
==Internal Testing==<br />
Our internal test results for this release of the plugin can be found on the [[iRefScape Test Cases 1.0]] page.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefR&diff=4012
iRefR
2011-11-24T23:20:39Z
<p>PaulBoddie: Made the page use the iRefScape layout. Added the package description.</p>
<hr />
<div>__NOTOC__<br />
<br />
<div style="float:right"><br />
<facebook-like /><br />
</div><br />
<br />
iRefR is an R package that provides access to [[iRefIndex]]. It allows the user to load any version of the consolidated protein interaction database "iRefIndex" and perform tasks such as: selecting databases, pmids, experimental methods, searching for specific proteins, separate binary interactions from complexes and polymers, generate complexes according to an algorithm that looks after possible binary-represented complexes, make general database statistics and create network graphs, among others.<br />
<br />
{|class="wikitable" style="text-align:left; clear: left" border="0" cellpadding="10"<br />
| style="vertical-align: top" |<br />
== Installation ==<br />
<br />
iRefR is available from CRAN as the [http://cran.r-project.org/web/packages/iRefR/index.html iRefR package] and can also be downloaded from...<br />
<br />
ftp://ftp.no.embnet.org/irefindex/iRefR/current/<br />
<br />
Documentation and tutorial material is included. First time users should refer to <tt>iRefR_tutorial.pdf</tt> in the <tt>doc</tt> directory of the source distribution.<br />
|<imagemap><br />
Image:Applications-system-80x80.png<br />
default [ftp://ftp.no.embnet.org/irefindex/iRefR/current/]<br />
</imagemap><br />
|-<br />
| style="vertical-align: top" |<br />
== Publication ==<br />
<br />
[http://www.biomedcentral.com/1471-2105/12/455 iRefR: an R package to manipulate the iRefIndex consolidated protein interaction database.]<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px|left<br />
default [http://www.biomedcentral.com/1471-2105/12/455]<br />
</imagemap><br />
|}</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Bioseminar&diff=4006
Bioseminar
2011-11-24T23:06:53Z
<p>PaulBoddie: Removed image.</p>
<hr />
<div>The Biotechnology Centre of Oslo seminars are held every Tuesday at Forskningsparken. More information can be found on the [http://www.biotek.uio.no/events/tuesday-seminar/ Tuesday Seminar] page of the [http://www.biotek.uio.no/ Biotechnology Centre's Web site].<br />
<br />
See the [http://www.biotek.uio.no/events/tuesday-seminar/directions/ directions] for information on getting to the seminar.<br />
<br />
Previous seminars from 2009 and earlier are listed here for reference.<br />
<br />
{|class="wikitable" style="text-align:center" border="1" cellspacing="0" cellpadding="5"<br />
|+ '''Archive of talks held in 2009'''<br />
!width="50"|Date<br />
!width="50"|Room<br />
!width="50"|Time<br />
!width="150"|Speaker<br />
!width="150"|Group<br />
!width="425"|Title<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| January<br />
|-<br />
|Jan 6 || 10 || 3 PM || Khalid Naseem || External:Tasken || TBA<br />
|-<br />
|Jan 13 || 10 || 3 PM || || Open || TBA<br />
|-<br />
|Jan 20|| 10 || 3 PM || Weiwen Yang || External: Hilsen || A novel protein CGI-128 stimulates angiogenesis via degradation of <br />
E2-2<br />
|-<br />
|Jan 27|| 10 || 3 PM || Lene Alsøe || External: Nilsen || An immunological fingerprint of an adenocarcinoma of the pancreas<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| February<br />
|-<br />
|Feb 3 || 10 || 3 PM || Ian Donaldson || Donaldson || Working with Cytoscape and the iRefIndex plugin<br />
|-<br />
|Feb 10 || 10 || 3 PM || Eirik Torheim || Tasken || Development of RI anchoring disruptor (RIAD) peptides for in vivo immunomodulation<br />
|-<br />
|Feb 17 || 10 || 3 PM || Nuriye Basdag || Chaudhry || Functional characterization of a novel transporter<br />
|-<br />
|Feb 24 || 10 || 3 PM || || || Cancelled<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| March<br />
|-<br />
|Mar 3 || Forum || 3 PM || Erik Sonnhammer || External:Donaldson (ES) || See http://sonnhammer.sbc.su.se/<br />
|-<br />
|Mar 10 || 10 || 3 PM || || || No seminar<br />
|-<br />
|Mar 17|| 10 || 3 PM || Kasia Arczewska || Nilsen || Identification and characterisation of NDX proteins from C. elegans<br />
|-<br />
|Mar 24|| 10 || 3 PM || Philipp Sell || Leitges || Generation of conditional PKD knock-out mice<br />
|-<br />
|Mar 31|| 10 || 3 PM || || External || TBA<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| April<br />
|-<br />
|Apr 7 || 10 || 3 PM || || || No seminar<br />
|-<br />
|Apr 14 || Forum || 3 PM || Søren Brunak || External: Donaldson || Using the human interactome to find new disease gene networks. <!-- Contact louisejh@cbs.dtu.dk --><br />
|-<br />
|Apr 21|| 7 || 3 PM || || || No seminar<br />
|-<br />
|Apr 28|| 10 || 3 PM || Antonio Mora || Donaldson || Identifying Protein Complexes related to Multigenic Diseases<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| May<br />
|-<br />
|May 5 || 10 || 3 PM || Randi Mosenden || Tasken || Transgenic mice expressing RIAD in peripheral T-cells.<br />
|-<br />
|May 12 || 10 || 3 PM || Professer S. Ivar Walaas, from Institute for Basic Medical Sciences || External: Chaudhry || Biochemical modulation of synaptic function <br />
|-<br />
|May 19|| 10 || 3 PM || Knut Tomas Dalen, PhD. || Chaudhry || Conditional disruption of lipid droplet-binding PAT proteins<br />
|-<br />
|May 26|| 10 || 3 PM || Christian Koehler || Thiede || IPTL (Isobaric peptide termini labeling) for MS/MS-based quantitative proteomics<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| June<br />
|-<br />
|Jun 2 || 10 || 3 PM || || External || TBA<br />
|-<br />
|Jun 9 || 10 || 3 PM || || || TBA<br />
|-<br />
|Jun 16|| 10 || 3 PM || || || Cancelled<br />
|-<br />
|Jun 23|| 10 || 3 PM || Katerina Michalickova || Donaldson || TBA<br />
|-<br />
|Jun 30|| 10 || 3 PM || TBA || Tasken || TBA<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| No Bioseminar in July<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| No Bioseminar in August<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| September<br />
|-<br />
|Sept 1 || 10 || 3 PM || Therese Solstad || Thiede || A proteomic approach identifies a marker for a highly suppressive subset of regulatory T cells<br />
|-<br />
|Sept 8 || 7 || 3 PM || Anne-Cathrine Lehre || External: Chaudhry || TBA: Co-occurs with Bioinformatics PhD course (in room 10) <br />
|-<br />
|Sept 15|| 7 || 3 PM || Nikolaus Oberprieler || Tasken ||<br />
High resolution mapping of Prostaglandin E2-dependent signaling networks identifies a constitutively active PKA signaling node in CD8+ memory T cells<br />
<br />
Co-occurs with Bioinformatics PhD course (in room 10)<br />
|-<br />
|Sept 22 || 10 || 3 PM || Øyvind Fensgård || Nilsen || Loss of DNA repair induces compensatory transcriptional responses modulating aging and stress phenotypes<br />
|-<br />
|Sept 29|| 10 || 3 PM || || Leitges || TBA<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| October<br />
|-<br />
|Oct 6 || 7 || 3 PM || Mona Bjørkmo || Chaudhry || Functional expression of two System A glutamine transporter isoforms in rat auditory brainstem neurons - A morphological and electrophysiolgical study<br />
|-<br />
|Oct 13 || 10 || 3 PM || Vibeke Bull || Thiede || Quantitative proteomic profiling of sorafenib-induced apoptosis in human neuroblastoma cells<br />
|-<br />
|Oct 20 || 7 || 3 PM || H. Werner Mewes || External: Donaldson || <br />
TBA: [http://www.helmholtz-muenchen.de/mips/ mips. institute for bioinformatics and systems biology. Germany.] <br />
<br />
Co-occurs with PhD course (room 10)<br />
|-<br />
|Oct 27 || 7 || 3 PM || Isabelle Cornez || Tasken || cAMP immunomodulating response of T cells in tumorigenesis: Co-occurs with PhD course (room 10)<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| November<br />
|-<br />
|Nov 3 || 10 || 3 PM || - || Nilsen || No Bioseminar<br />
|-<br />
|Nov 10 || 10 || 3 PM || Dave Ussery (TBC) || External: Donaldson|| <br />
On The Origins of a Vibrio Species: [http://www.cbs.dtu.dk/staff/dave/ Centre for Biological Sequence Analysis. Denmark]<br />
|-<br />
|Nov 17 || FORUM || 3 PM || Haakon B. Benestad || External: Chaudhry || Scientific fraud and other irregularities - What can we do about it?<br />
|-<br />
|Nov 24 || 7 || 3 PM || Magnus Arntzen || Thiede || New tools to enhance the detection of post-translational modifications: <br />
Development and application of a mass spectrometry-based proteomic <br />
software and of specific antibodies detecting propionylated lysines<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| December<br />
|-<br />
|Dec 1 || 10 || 3 PM || Sebastian Seidl || Leitges || Sequence-based localization study of atypical Protein kinase C isoforms<br />
Iota/lambda and Zeta<br />
|-<br />
|Dec 8 || 10 || 3 PM || Johannes Landskron || Tasken || The Ovarian Carcinoma Project<br />
|-<br />
|Dec 15 || 7 || 3 PM || || Nilsen || TBA<br />
|-<br />
|Dec 22 || 7 || 3 PM || || Leitges || No bioseminar<br />
|-<br />
|Dec 29 || 7 || 3 PM || || Donaldson || No bioseminar<br />
|}<br />
<br />
<br />
<br />
{|class="wikitable" style="text-align:center" border="1" cellspacing="0" cellpadding="5"<br />
|+ '''Archive of talks held in 2008'''<br />
!width="50"|Date<br />
!width="50"|Room<br />
!width="50"|Time<br />
!width="150"|Speaker<br />
!width="150"|Group<br />
!width="425"|Title<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| September<br />
|-<br />
|Sept 2 || 10 || 3 PM || Dr. Torkel Vang || Tasken || Regulation of TCR signaling by LYP<br />
|-<br />
|Sept 9 || 10 || 3 PM || Dr Anne Jorunn Stokka || Tasken || Chemical Biology Screening and the BiO Chemical Biology Platform<br />
|-<br />
|Sept 16 || 10 || 3 PM || Therese Solstad || Thiede || TBA<br />
|-<br />
|Sept 23 || 10 || 3 PM || Tanima Sengupta || Nilsen || Responses to 5-Fluorouracil in C. elegans<br />
|-<br />
|Sept 30 || 10 || 3 PM <br />
| colspan="3" align="center"| No seminar<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| October<br />
|-<br />
|Oct 7 || 10 || 3 PM || Sabry Razick || Donaldson || Cytoscape and the iRefIndex plugin.<br />
|-<br />
|Oct 14 || - || 3 PM || No seminar || External:Chaudry || concurrent with [http://www.biotek.uio.no/MNBTS/ MNBTS] <br />
|-<br />
|Oct 21 || 7 || 3 PM || Zainab Jallow || Chaudhry || Characterisation of TBP- Like Factor 2 <br/> (Concurrent with [http://www.biotek.uio.no/MNBTS/ MNBTS]. Note room change.) <br />
|-<br />
|Oct 28 || 10 || 3 PM || || External || TBA<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| November<br />
|-<br />
|Nov 4 || 7 || 3 PM || Nikolaus Oberprieler || Tasken || High temporal resolution of the PGE2-stimulated phosphoproteome in primary human T-cells <br/>(concurrent with [http://www.biotek.uio.no/PhDschool/ Ph.D. School of Molecular Medicine]. Note room change.)<br />
|-<br />
|Friday, Nov 7 || 10 || 2 PM || Edvard Moser || External: Chaudhry|| Where am I now? Mechanisms for representation of space in the brain.<br />
|-<br />
|Nov 11 || 7 || 3 PM || || External || concurrent with [http://www.biotek.uio.no/PhDschool/ Ph.D. School of Molecular Medicine]<br />
|-<br />
|Nov 18 || 10 || 3 PM || Vibeke Bull || Thiede || TBA <br />
|-<br />
|Nov 25 || 10 || 3 PM || || External || TBA<br />
|-<br />
|style="background:DarkGray; color:white; border-style: none" colspan="6" align="center"| December<br />
|-<br />
|Dec 2 || 10 || 3 PM || Hanne Kim Tuven || Nilsen || Base Excision repair of uracil in C. elegans<br />
|-<br />
|Friday, Dec 5 || 10 || 11 AM || Elzbieta Speina || External: Nilsen || Biochemical studies of RECQ DNA helicases associated with human disease and aging.<br />
|-<br />
|Dec 9 || 10 || 10 AM || || || No seminar<br />
|-<br />
|Dec 16|| 10 || 3 PM || Sandra Kunz || Leitges || Role of classical PKCs in colon cancer<br />
|-<br />
|Dec 23|| 10 || 3 PM || || - || No seminar<br />
|-<br />
|Dec 30|| 10 || 3 PM || || - || No seminar<br />
|}</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefScape_1.0&diff=4005
iRefScape 1.0
2011-11-24T17:38:18Z
<p>PaulBoddie: Fixed the table to work better with version 1.1 of the theme.</p>
<hr />
<div>__NOTOC__<br />
<br />
[[Image:NP_499166-NP_501526-iterations-1-400x278.png|right]]<br />
<br />
iRefScape is a plugin for Cytoscape that exposes iRefIndex data as a navigable graphical network.<br />
<br />
This page describes the iRefScape 1.0 plug-in for Cytoscape 2.8.x. See the [[#Compatibility_Information|compatibility information section]] for information on other versions.<br />
<br />
Last edited: {{REVISIONYEAR}}-{{padleft:{{REVISIONMONTH}}|2}}-{{REVISIONDAY2}}<br />
<br />
Release date: 2011-07-13<br />
<br />
{|class="wikitable" style="text-align:left; clear:left; min-width:50%" border="0" cellpadding="10"<br />
| style="vertical-align: top" |<br />
== Installation ==<br />
<br />
See the [[#Installing_iRefScape|installation section]] for quick installation instructions and references to other documentation.<br />
|<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[#Installing_iRefScape|installation section]]<br />
</imagemap><br />
|-<br />
| style="vertical-align: top" |<br />
== Publication ==<br />
<br />
[http://www.biomedcentral.com/content/pdf/1471-2105-12-388.pdf iRefScape. A Cytoscape plug-in for visualization and data mining of protein interaction data from iRefIndex.]<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px|left<br />
default [http://www.biomedcentral.com/content/pdf/1471-2105-12-388.pdf]<br />
</imagemap><br />
|-<br />
| style="vertical-align: top" |<br />
== Contact information and mailing list ==<br />
Join the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group] to be informed of updates. See also the [[iRefScape|latest release of iRefScape]] which may differ from the release described here.<br />
|<imagemap><br />
Image:Internet-mail-80x80.png<br />
default [http://groups.google.com/group/irefindex?hl=en]<br />
</imagemap><br />
|}<br />
<br />
__TOC__<br />
<br />
== Compatibility Information ==<br />
<br />
See the following table for more detailed iRefScape compatibility information.<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Cytoscape<br />
! align="center" style="background:#f0f0f0;"|iRefScape<br />
|-<br />
| 2.8.1, 2.8.2<br />
| iRefScape 1.0 (described on this page)<br />
|-<br />
| 2.7.0<br />
| [[iRefScape 0.9]]<br />
|-<br />
| 2.6.3<br />
| [[iRefScape 0.8]]<br />
|}<br />
<br />
== Installing iRefScape ==<br />
<br />
The plugin can be installed using Cytoscape's plugin menu. Select...<br />
<br />
# "Manage plugins"<br />
# "Available for Install"<br />
# "Network and Attribute I/O"<br />
# "iRefScape" (where the precise version will provide a specific version such as "iRefScape 1.0")<br />
<br />
Then follow the on-screen instructions.<br />
<br />
{|class="wikitable" style="text-align:left; clear: left; border: 1px solid #cccccc" cellpadding="10"<br />
| style="vertical-align: top" |<br />
=== Installation Guide ===<br />
<br />
More detailed instructions, troubleshooting tips and alternative methods are available in the [[iRefScape 1.0 Installation|installation guide]].<br />
|<imagemap><br />
Image:Applications-system-80x80.png<br />
default [[iRefScape 1.0 Installation|installation guide]]<br />
</imagemap><br />
|}<br />
<br />
After, installation, select the "iRefScape" entry from Cytoscape's plugin menu.<br />
<br />
When the plugin is started for the first time, it will download the publicly available data set.<br />
<br />
=== Tested systems ===<br />
This version of the iRefScape plugin has been tested with the following system configurations:<br />
<br />
{| cellspacing="0" cellpadding="10" border="1" style="margin: 2em"<br />
! style="background:#f0f0f0;" | Operating System<br />
! style="background:#f0f0f0;" | Java Version<br />
|-<br />
| Red Hat Enterprise Linux 5 (32-bit) (kernel 2.6.18)<br />
| 1.6.0_01 (32-bit)<br />
|-<br />
| Microsoft Windows 7 (64-bit)<br />
| 1.6.0_25 (64-bit)<br />
|-<br />
| Microsoft Windows Vista (32-bit)<br />
| 1.6.0 (32-bit)<br />
|-<br />
| Ubuntu Linux 8.04 (32-bit)<br />
| 1.6 (32-bit)<br />
|-<br />
| Mac OS X 10.6 (64-bit)<br />
| 1.6.0_15 (32-bit)<br />
|}<br />
<br />
Please refer to the [[iRefScape 1.0 Installation|installation guide]] for more details on system configuration issues.<br />
<br />
=== Source Code ===<br />
<br />
Since iRefScape is made available under version 3 or later of the [http://www.gnu.org/licenses/gpl.html GNU General Public License], the source code is also made available:<br />
<br />
* iRefScape 1.18:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/e12f853c5951.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/e12f853c5951 Source browser]<br />
* iRefScape 1.17:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/0001288b7527.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/0001288b7527 Source browser]<br />
* iRefScape 1.16:<br />
** Source downloads: [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.tar.bz2 tar.bz2 archive], [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.tar.gz tar.gz archive], [http://irefindex.uio.no/hg/iRefScape/archive/3ade99fc92b6.zip zip archive]<br />
** [http://irefindex.uio.no/hg/iRefScape/file/3ade99fc92b6 Source browser]<br />
* [http://irefindex.uio.no/hg/iRefScape/ iRefScape repository home]<br />
<br />
Please consult the <tt>README.txt</tt> file in the source distribution for information on building the software.<br />
<br />
== Using the Wizard - an example search ==<br />
<br />
Click the "Wizard" button - a pop-up window will appear. <br />
<br />
Follow the prompts. Here is an example search:<br />
<br />
# Select "Search protein-protein interactions for a protein".<br />
# Select "UniProt identifier".<br />
# For "Taxonomy identifier", select "9606 (Human)" <br />
# Type <tt>QCR2_HUMAN</tt> in the provided space. Click "Next".<br />
# Click "Search & load".<br />
<!-- commenting these out since they are outdated<br />
The images below show each of the steps in the wizard.<br />
<br />
<gallery perrow="5"><br />
Image:IRefIndex-Cytoscape-Wizard.png|The iRefIndex wizard<br />
Image:IRefIndex-Cytoscape-Wizard-step2.png|Choosing a result type<br />
Image:IRefIndex-Cytoscape-Wizard-step3.png|Choosing a taxonomy type<br />
Image:IRefIndex-Cytoscape-Wizard-step4.png|Specifying the search term<br />
Image:IRefIndex-Cytoscape-Wizard-step5.png|Additional options<br />
</gallery><br />
--><br />
<br />
== Using the Search Panel ==<br />
<br />
To perform a search, the following steps are involved:<br />
<br />
# Enter query term(s)<br />
# Select a search type<br />
# Select taxonomy/organism<br />
# Adjust search options (iterations, new view, canonical expansion) - this is optional<br />
# Start the search<br />
<br />
=== Enter query term(s) ===<br />
<br />
Queries may be loaded from a file or by pasting the query into the text box (one query per line). Multiple queries can also be separated by pipe characters (<tt>|</tt>) or by tab characters. Queries with spaces in them should be enclosed in double quotes.<br />
<br />
=== Select a search type ===<br />
<br />
Example searches are listed below.<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Search Type<br />
! align="center" style="background:#f0f0f0;"|Example<br />
! align="center" style="background:#f0f0f0;"|Notes<br />
|-<br />
| <tt>RefSeq_Ac</tt>||<tt>NP_996224</tt>||See http://www.ncbi.nlm.nih.gov/protein/221379660<br />
|-<br />
| <tt>UniProt_Ac</tt>||<tt>Q7KSF4</tt>||See http://www.uniprot.org/uniprot/Q7KSF4<br />
|-<br />
| <tt>UniProt_ID</tt>||<tt>Q7KSF4_DROME</tt>||See http://www.uniprot.org/uniprot/Q7KSF4<br />
|-<br />
| <tt>geneID</tt>||<tt>42066</tt>||See http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Retrieve&dopt=full_report&list_uids=42066<br />
|-<br />
| <tt>geneSymbol</tt>||<tt>cher</tt>||See http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&cmd=Retrieve&dopt=full_report&list_uids=42066<br />
|-<br />
| <tt>mass</tt>||<tt>72854<-->72866</tt>||Search protein interactors for a range of molecular mass (in Da).<br />
|-<br />
| <tt>rog</tt>||<tt>10121899</tt>||Redundant object group: iRefIndex's internal identifier for a protein. See note feature i.rog.<br />
|-<br />
| <tt>PMID</tt>||<tt>14605208</tt>||PubMed Identifier where an interaction is described. See http://www.ncbi.nlm.nih.gov/pubmed. Iterations and "Use canonical expansion" have no effect on this search type. This search will return all protein interactors in the given PMID and will automatically draw all interactions known between these proteins (even if these interactions are supported by different PMIDs). Select edges in the resulting graph, and see the i.PMID attribute in the Edge Attribute Browser.<br />
|-<br />
| <tt>src_intxn_id</tt>||<tt>EBI-212627</tt>||Source interaction database identifier. Iterations and "Use canonical expansion" have no effect on this search type. Caution: multiple databases may have overlapping interaction record identifiers (e.g. <tt>147805</tt> returns records from both BIND and BioGrid) and there is no way to limit this search to a specific database at this time.<br />
Equivalent interactions from other databases will be automatically retrieved using this search type (see provided example).<br />
|-<br />
| <tt>omim</tt>||<tt>227650</tt>||OMIM identifier. See http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=227650<br />
|-<br />
| <tt>digid</tt>||<tt>449</tt>||Internal identifier for a group of phenotypically related diseases. See [[DiG: Disease groups]]. A digid can be found by first performing a search for some omim identifier - the digid will then appear as the i.digid node attribute.<br />
|-<br />
|style="background:#f0f0f0;" colspan="3" align="center"| Additional search types: first select from Advanced features/Preferences.<br />
|-<br />
| <tt>dig_title</tt>||<tt>fanconi</tt>||Non-exact text search of OMIM titles. Select matching titles from the Query Helper and press return to copy titles to search box. Then hit "Search and load". See [[DiG: Disease groups]].<br />
|-<br />
| <tt>ROGID</tt>||<tt>5IrM14EfdlehbVJ0WAcAoQM3pFw9606</tt>||Exact search results for ROGID of a protein. This searches the i.rogid_TOP node feature. Users can also generate a ROGID for an amino acid sequence and taxon identifier pair using the Wizard/Create SEGUID/ROGID for sequence tool. See PMID 18823568.<br />
|-<br />
| <tt>RIGID</tt>||<tt>cXAoT7JjMde7J+CN/2tOR6gETyA</tt>||Exact search results for RIGID of an interaction. This searches the i.rigid edge feature. See PMID 18823568.<br />
|-<br />
|}<br />
<br />
=== Select taxonomy/organism ===<br />
<br />
This will limit the search results to a particular organism. An organism can be selected from the list, or a taxonomy identifier can be entered into the field itself. See [http://www.ncbi.nlm.nih.gov/taxonomy Entrez Taxonomy] for more details on taxonomy identifiers. For most search types, it is acceptable to leave this field set to <tt>Any</tt>.<br />
<br />
=== Adjust search options ===<br />
<br />
The following optional adjustments can be made:<br />
<br />
==== Iterations ====<br />
<br />
A distance from the query list's members can be specified:<br />
<br />
* Selecting <tt>0</tt> will return only interactions between nodes found by the query list<br />
* Selecting <tt>1</tt> will return immediate neighbours of nodes in the query list<br />
<br />
==== Create new view ====<br />
<br />
A new view will be opened for the search results if this option is selected. Otherwise, the results will be added to the current view.<br />
<br />
==== Use canonical expansion ====<br />
<br />
Selecting this option will expand the search to include all proteins that are related to the query protein (for example, splice isoforms). See [[Canonicalization]] for technical details.<br />
<br />
=== Start the search ===<br />
<br />
Press the "Search and load" button to perform the search.<br />
<br />
{{Note|<br />
See the [[iRefScape Batch Files]] document for information on using text files to describe searches, annotate result nodes and to define new search types using user-supplied data.<br />
}}<br />
<br />
== Viewing the Results ==<br />
<br />
=== Colours and Shapes ===<br />
<br />
* Blue nodes corresponds to proteins found by your query<br />
* Green nodes are interacting partners for your query protein<br />
* Purple hexagons are complex-nodes (also called pseudo-nodes); they keep partners of a complex together (i.e. QCR6_HUMAN is found in two complexes also involving "QCR2_HUMAN")<br />
* Orange-yellow edges indicate protein-protein interactions and pink edges represent membership of some protein in a complex<br />
<br />
=== Toggling Edges ===<br />
<br />
Multiple edges may appear between two nodes. These represent separate interaction records that support this link. Details on each original record can be viewed using the edge attribute viewer (below). You can toggle this multi-view on and off by selecting "Toggle selected multi-edges" in the iRefScape/View Tools menu. Only one of the edges will be shown in the collapsed view.<br />
<br />
=== iRefScape Menu ===<br />
<br />
The iRefScape menu in the Cytoscape menu bar contains a number of other functions that may help with searching and viewing interaction data. These are described in more detail in the [[iRefScape plugin menu]] document.<br />
<br />
=== Expanding the Interaction Map ===<br />
<br />
You can search for additional interactions by right-clicking on a node and selecting "iRefIndex -- Retrieve interactions".<br />
<br />
Some example result displays are shown below.<br />
<br />
<gallery widths="500px" heights="300px"><br />
Image:QCR2_HUMAN_initial.png|Results<br />
Image:QCR2_HUMAN.png|Results (tidied)<br />
</gallery><br />
<br />
== Attributes ==<br />
<br />
[[Image:iRefIndex-0.83-node-attributes-close-up-closed.png|right|The node attributes menu]]<br />
<br />
There are two types of attributes available from iRefIndex: node attributes and edge attributes. These may be used to view information about selected nodes or edges (like <tt>i.taxid</tt>). Some features may allow the user to link out to additional data sources through the "right-click" menu (like <tt>i.geneID</tt>). Features may also be used to sort and select nodes and edges with specific attributes (like <tt>i.order</tt>). The <tt>i.query</tt> feature shows the user's query that is responsible for returning the node or edge.<br />
<br />
Brief descriptions and examples of each attribute are provided below. <br />
<br />
The user must first select the attributes that are to be displayed. This can be done by clicking on the "attribute" icon at the top of the node or edge attribute browser, as shown in the illustrative images.<br />
<br />
<div style="clear: right"></div><br />
=== Node Attributes ===<br />
<br />
[[Image:iRefIndex-0.83-node-attributes-close-up-open.png|right|The open node attributes menu]]<br />
<br />
Each node represents a distinct amino acid sequence (protein) from a distinct organism (taxonomy identifier). Each of the attributes below, provide additional information about the node. Although each node is distinct, a graph produced by iRefIndex may contain multiple nodes that are related proteins (such as splice isoform products from the same gene). These nodes will all have the same <tt>i.canonical_rog</tt> and <tt>i.canonical_rogid</tt> feature values. See the notes below.<br />
<br />
Node attributes that can be lists of items (like <tt>i.UniProt</tt>) will have a corresponding attribute called <tt>i.''attribute name''_TOP</tt> (for example, <tt>i.UniProt_TOP</tt>) which provides the first item of the associated list.<br />
<br />
<div style="clear: right"></div><br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Attribute name<br />
! align="center" style="background:#f0f0f0;"|Data type<br />
! align="center" style="background:#f0f0f0;"|Example value<br />
! align="center" style="background:#f0f0f0;"|Description<br />
|-<br />
| <tt>ID</tt>||Integer||<tt>10121899</tt>||This is a unique identifier for the node assigned by iRefIndex (no two nodes will have the same ID). Each node corresponds to a distinct amino acid sequence from a distinct taxonomy identifier. See also <tt>i.rog</tt> and <tt>i.rogid</tt>.<br />
|-<br />
| <tt>canonicalName</tt>||Integer||<tt>10121899</tt>||This is the same as <tt>ID</tt>. This attribute is set by Cytoscape and is unrelated to the <tt>i.canonical_rog</tt> or <tt>i.canonical_rogid</tt> used by iRefIndex<br />
|-<br />
| <tt>i.RefSeq_Ac</tt>||List||<tt>[NP_996224]</tt> ||All RefSeq accessions with an amino acid sequence and taxon identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[RefSeq_Ac]'' on the web -- Entrez -- Protein" for more information. See also <tt>i.RefSeq_TOP</tt> for the first entry in this list of accessions.<br />
|-<br />
| <tt>i.UniProt_Ac</tt>||List||<tt>[Q7KSF4]</tt>||All UniProt accessions with an amino acid sequence and taxonomy identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[UniProt_Ac]'' on the web -- UniProt -- KB Beta" for more information. See also <tt>i.UniProt_Ac_TOP</tt> for the first entry in this list of accessions.<br />
|-<br />
| <tt>i.UniProt_ID</tt>||List||<tt>[Q7KSF4_DROME]</tt> ||All UniProt identifers with an amino acid sequence and taxonomy identifier identical to the protein represented by this node. Right click on this entry and select "Search ''[UniProt_ID]'' on the web -- UniProt -- KB Beta" for more information. See also <tt>i.UniProt_ID_TOP</tt> for the first entry in this list of IDs.<br />
|-<br />
| <tt>i.canonical_rog</tt>||Integer||<tt>10121899</tt>||Related proteins (say splice isoforms from the same gene) will all belong to the same canonical group. One member of this group is assigned as the canonical representative of this group. The <tt>i.canonical_rog</tt> attribute lists the identifier of the protein's canonical group identifier. For example, all products of Entrez Gene 42066 have the same <tt>i.canonical_rog</tt> (<tt>10121899</tt>). Each of these gene products has its own identifier (because they each have a distinct amino acid sequence). One of the splice isoforms (<tt>NP_996224</tt>) was chosen as the canonical representative of this group. See the [http://irefindex.uio.no/wiki/Canonicalization canonicalization document] for more details on how canonical groups are constructed and how canonical representatives are chosen.<br />
|-<br />
| <tt>i.canonical_rogid</tt>||String||<tt>1ZFb1WlW0OgOlhiAPtkJTdb6oOg7227</tt>||This is a unique alphanumeric key for the canonical representative of the canonical group to which this node belongs. Briefly, an SHA-1 digest of the amino acid sequence is used to generate a unique 27 character key and this is prepended to the taxonomy identifier for the protein's source organism in order to make the rogid. See PMID 18823568 for details on how this key can be generated. This is a string equivalent of the <tt>i.canonical_rog</tt> attribute. All <tt>i.canonical_rog</tt> instances (each being an integer) have one corresponding <tt>i.canonical_rogid</tt>. See the [http://irefindex.uio.no/wiki/Canonicalization canonicalization document] for more details on how canonical groups are constructed and how canonical representatives are chosen. Note that the rogid for the protein represented by this specific node is listed under <tt>i.rogid</tt>.<br />
|-<br />
| <tt>i.dataset</tt>||Integer||<tt>0</tt>||In the batch query mode this can be used to locate the query batch (i.e. which group of queries were responsible for the node). In single query mode, when a sequence of queries are issued one after another this variable can be used to distinguish the results from each step. All nodes with a i.dataset value higher than 999 can be found using more than one batch of queries. <br />
|-<br />
| <tt>i.digid</tt>||List||<tt>449</tt>||This is an integer identifier that is shared by a group of disease entries in OMIM that are related by their titles. See [[DiG: Disease groups]] for more details. Also see <tt>i.omim</tt> and <tt>i.dig_title</tt>.<br />
|-<br />
| <tt>i.dig_title</tt>||List||<tt>[Fanconi anemia, complementation group B, 300514 (3), VACTERL association with hydrocephalus, X-linked, 314390 (3)]</tt>||These are entries from OMIM's Morbid Map that are all part of the same disease group. See [[DiG: Disease groups]] for more details. Also see <tt>i.omim</tt> and <tt>i.digid</tt>.<br />
|-<br />
| <tt>i.displayLabel</tt>||List||<tt>[Q7KSF4_DROME]</tt> ||This is a list of short labels chosen by iRefIndex to label the node using the VizMapper. The UniProt identifier is preferentially chosen (if one is available) followed by the Entrez Gene Symbol. See also <tt>i.displayLabel_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.geneID</tt>||List||<tt>[42066]</tt>||All NCBI Entrez Gene identifiers that encode a protein sequence identical to that of this node. Right click on this entry and select "Search ''[geneID]'' on the web -- Entrez -- Gene" for more information. See also <tt>i.geneID_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.geneSymbol</tt>||List||<tt>[CHER]</tt>||All NCBI Entrez Gene official symbols that encode a protein sequence identical to that of this node. Right click on this entry and select "Search ''[geneSymbol]'' on the web -- Entrez -- Gene" for more information. See also <tt>i.geneSymbol_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.interactor_description</tt>||List||<tt>[Q7KSF4_DROME, CHER, DMEL_CG3937, SKO, DMEL CG3937, FLN, CG3937, CHER, DMEL\\CG3937, FLN, SKO, CHER, NAME=CHER, DMEL_CG3937]</tt>||A collection of all the names in their short form as given by the original interaction databases. See also <tt>i.interactor_description_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.mass</tt>||Integer|| <tt>259142</tt> ||Mass associated with the protein sequence for this node. From UniProt, if available. You can search for nodes inside a mass range using the <tt>mass</tt> search type in the iRefIndex plugin.<br />
|-<br />
| <tt>i.omim</tt>||List||<tt>[608053]</tt>||List of OMIM disease identifiers associated with this protein. Right click on the entry and select "Search for ''[omim]'' on the web -- Entrez -- OMIM" for more information. <br />
|-<br />
| <tt>i.order</tt>||Integer|| <tt>0</tt> || The distance of this node from the query node (query node has distance <tt>0</tt>, nodes that are returned by a query because they are a part of the same canonical group have a value of <tt>10</tt>, direct neighbours have a value of<tt>1</tt>). Pseudonodes have negative values (<tt>-1</tt> is a complex holder, <tt>-2</tt> is a collapsed instance).<br />
|-<br />
| <tt>i.overall_degree_TOP</tt>||Integer|| <tt>42</tt> ||The total number of interactions described for this node in the iRefIndex database. Not all of these edges will be necessarily shown in the current view. This is the node degree in the full iRefIndex interactome. When calculating the value of this all proteins in iRefIndex (not only the ones currently loaded) will be used<br />
|-<br />
| <tt>i.popularity</tt>||List|| <tt>42</tt> || '''TO BE DESCRIBED'''<br />
|-<br />
| <tt>i.pseudonode</tt>||Boolean|| <tt>false</tt> || This is set to true is the node represents a "complex" or n-ary interaction record. Protein nodes with edges incident to a pseudonode are member interactors from the interaction record where specific interactions between pairs of interactors is unknown. Pseudonodes appear as hexagons when using the iRefIndex VizMapper style. <br />
|-<br />
| <tt>i.query</tt>||String||<tt>NP_996224</tt>||The user query used to retrieve this specific node. Neighbours of "query" nodes will not have an <tt>i.query</tt> value. Nodes returned by queries are coloured blue when using the iRefIndex VizMapper style.<br />
|-<br />
| <tt>i.rog</tt>||Integer||<tt>10121899</tt>||This is a unique identifier for the node assigned by iRefIndex (no two nodes will have the same ID). Each node corresponds to a distinct amino acid sequence associated with a distinct taxonomy identifier. <tt>i.rog</tt> also appears as the <tt>ID</tt> attribute. Each <tt>i.rog</tt> has a corresponding <tt>i.rogid</tt> - see below.<br />
|-<br />
| <tt>i.rogid</tt>||String||<tt>2mL9oLZ9g/SSPyK0nOz97RmOzPg3702</tt>||This is a unique alphanumeric key for the protein represented by this node. Briefly, an SHA-1 digest of the amino acid sequence is used to generate a unique 27 character key and this is prepended to the taxonomy identifier for the protein's source organism in order to make the rogid. See PMID 18823568 for details on how this key can be generated. This is a string equivalent of the <tt>i.rog</tt> attribute. All <tt>i.rog</tt> instances (each being an integer) have one corresponding <tt>i.rogid</tt>.<br />
|-<br />
| <tt>i.taxid</tt>||Integer||<tt>7227</tt>||The NCBI taxonomy identifier for this protein's source organism. See http://www.ncbi.nlm.nih.gov/taxonomy?term=7227 for more details of this example value for <tt>i.taxid</tt>.<br />
|-<br />
| <tt>i.xref</tt>||List||<tt>[AAF70826.1,Q9M6R5]</tt> ||All the accessions as given by the original interaction database records to describe this protein. See also <tt>i.xref_TOP</tt> for the first entry in this list.<br />
|-<br />
| <tt>i.alive</tt>||Boolean||<tt>true or false</tt> ||This is true for all nodes after a search operation. This variable is used by the iRefScape filter and after a filter is applied, all nodes matching the filter criteria will have a true value for this variable (all other nodes will have false).<br />
|-<br />
| <tt>i.alive_degree</tt>||Integer||<tt>0,1,2-...</tt> ||This is will give the node degree after a search. When an iRefScape filter is applied this will give the number of nodes with "i.alive=true" connected to a particular node(How many nodes matching the filter criteria has connections with a particular node). <br />
|-<br />
|}<br />
<br />
===Edge Attributes===<br />
<br />
Each edge represents a distinct primary database record that supports some relationship between the two incident nodes. So, if an interaction between two proteins has been annotated by two databases (or twice by the same database) then two edges will appear between those two protein nodes.<br />
<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;"|Attribute name<br />
! align="center" style="background:#f0f0f0;"|Data type<br />
! align="center" style="background:#f0f0f0;"|Example value<br />
! align="center" style="background:#f0f0f0;"|Description<br />
|-<br />
| <tt>ID</tt>||String||<tt>10121899 (2771704(40952)) 13911416</tt>||This is a unique identifier for the edge assigned by Cytoscape (no two edges will have same <tt>ID</tt>). See <tt>i.rig</tt> and <tt>i.rigid</tt> for unique identifiers for the edge assigned by iRefIndex.<br />
|-<br />
| <tt>i.PMID</tt>||Integer||<tt>14605208</tt>||Publication identifier of the publication where the interaction represented by the edge mentioned. Right click on this entry and select "Search ''[PMID]'' on the web -- Entrez -- Pubmed" for more details on the publication.<br />
|-<br />
| <tt>i.bait</tt>||Integer||<tt>13911416</tt>||Node ID for the protein that was used as a bait in this experiment. Only applicable where the experimental system (see <tt>i.method_name</tt>) used to support this relationship was a bait-prey system (for example, two hybrid).<br />
|-<br />
| <tt>i.canonical_rig</tt>||Integer||<tt>27799</tt>||See notes for the <tt>i.rig</tt> edge feature. This is the rig constructed for the interaction using its canonical rogs. Use a web browser to query http://wodaklab.org/iRefWeb/interaction/show/27799 (where <tt>27799</tt> is the <tt>i.canonical_rig</tt> value) to retrieve more information on this interaction and equivalent source interaction records.<br />
|-<br />
| <tt>i.experiment</tt>||String||<tt>Giot L [2003]</tt>||A short label for the experiment where this interaction was found (usually contains authors names).<br />
|-<br />
| <tt>i.flag</tt>||Integer||<tt>1</tt>||Used by iRefIndex plugin to control display of edges (<tt>0</tt> being the representative edge, used in edge toggle; <tt>1</tt> being an edge which will disappear during edge toggle; <tt>2</tt> being a complex holder edge; <tt>6</tt> being a path; <tt>7</tt> being an edge from or to a collapsed node).<br />
|-<br />
| <tt>i.host_taxid</tt>||Integer||<tt>7227</tt>||Indicates the organism taxonomy identifier where the interaction was experimentally demonstrated.<br />
|-<br />
| <tt>i.isLoop</tt>||Integer||<tt>1</tt>||Indicates whether the interaction is a self interaction (such as a dimer or possibly multimer of the same protein type). See the source interaction record for details.<br />
|-<br />
| <tt>i.method_cv</tt>||String||<tt>MI:0018</tt>||PSI-MI controlled vocabulary term identifier for the method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The name of the method is also given in the <tt>i.method_name</tt> feature.<br />
|-<br />
| <tt>i.method_name</tt>||String||<tt>two hybrid</tt>||PSI-MI controlled vocabulary term name for the method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term identifer is also given in the <tt>i.method_cv</tt> feature.<br />
|-<br />
| <tt>i.participant_identification</tt>||String||<tt>predetermined participant</tt>||PSI-MI controlled vocabulary term for the participant identification method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The identifier for the term is also given in the <tt>i.participant_cv</tt> feature.<br />
|-<br />
| <tt>i.participant_cv</tt>||String||<tt>predetermined participant</tt>||PSI-MI controlled vocabulary term identifier for the participant identification method used to provide evidence for this interaction. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.participant_identification</tt> feature.<br />
|-<br />
| <tt>i.query</tt>||String||<tt>NP_996224</tt>||The user's query that is responsible for returning this edge.<br />
|-<br />
| <tt>i.rig</tt>||Integer||<tt>27799</tt>||Redundant interaction group identifier for the interaction. <br />
This is an integer equivalent of <tt>i.rigid</tt>. Every rig has one corresponding rigid.<br />
|-<br />
| <tt>i.rigid</tt>||String||<tt>TAabV6yJ1XzUvEhYwZLpu5reBU0</tt>||Redundant interaction group identifier for the interaction. This is a universal key generated for the interaction by ordering according to ASCII value and concatentating the rogids participating in the interaction and then generating a Base-64 representation of an SHA-1 digest of the resulting string. See PMID 18823568 for details on how this key can be generated.<br />
|-<br />
| <tt>i.score_hpr</tt>||Integer||<tt>15</tt>||The hpr score (highest pmid re-use) is the highest number of interactions that any one PMID (supporting this interaction) is used to support. See PMID 18823568 for details. See also <tt>i.score_np</tt> and <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.score_lpr</tt>||Integer||<tt>11</tt>||The lpr score (lowest pmid re-use) is the lowest number of distinct interactions that any one PMID (supporting this interaction) is used to support. An lpr of greater than 20 is considered to be a high-throughput experiment. See PMID 18823568 for details. See also <tt>i.score_np</tt> and <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.score_np</tt>||Integer||<tt>2</tt>||Number of PubMed Identifiers (PMIDs) pointing to literature where this interaction is supported. See PMID 18823568 for details. See also <tt>i.score_lpr</tt>.<br />
|-<br />
| <tt>i.source_protein</tt>||Integer||<tt>-1</tt>||'''TO BE DESCRIBED'''<br />
|-<br />
| <tt>i.src_intxn_db</tt>||String||<tt>grid</tt>||Original interaction database where this interaction record was obtained.<br />
|-<br />
| <tt>i.src_intxn_id</tt>||String||<tt>38677</tt>||Original interaction database where this interaction record was obtained. <br />
In some case, it may be possible to right click and "Search ''[src_intxn_id]'' on the web -- Interaction databases -- the database" to see the original record.<br />
|-<br />
| <tt>i.type_cv</tt>||String||<tt>MI:0407</tt>||PSI-MI controlled vocabulary term identifier for the interaction type that occurs between the two proteins. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.type_name</tt> feature.<br />
|-<br />
| <tt>i.type_name</tt>||String||<tt>direct interaction</tt>||PSI-MI controlled vocabulary term identifier for the interaction type that occurs between the two proteins. See http://www.ebi.ac.uk/ontology-lookup/ for more details. The term itself is also given in the <tt>i.type_name</tt> feature.<br />
|-<br />
| <tt>i.target_protein</tt>||Integer||<tt>-1</tt>||'''TO BE DESCRIBED'''<br />
|-<br />
|}<br />
<br />
=== User Attributes ===<br />
<br />
See [[iRefScape Batch Files]] for information on adding attributes to search results.<br />
<br />
== Obtaining Updates to the Data ==<br />
<br />
You can check for and download updates to the dataset used by your plugin using the Wizard (see "Check for iRefIndex updates").<br />
<br />
iRefIndex updates are announced through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group]<br />
<br />
==Obtaining Updates to the Plugin==<br />
<br />
If you already have a plugin called iRefScape (a menu entry "iRefScape" under the plugin menu of Cytoscape) and you want to make sure you have the latest version, use "Update plugins" from the "Plugins" menu. However, if you want to reinstall the plugin, you should uninstall any previous version of the plugin first.<br />
<br />
Plugin updates are announced through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group]<br />
<br />
<!--<br />
<br />
==Integrating User Data into the Plugin==<br />
<br />
===How to create node and edge attributes ===<br />
<br />
Example: Attaching [[DiG: Disease groups]] identifiers to nodes<br />
<br />
==Updating==<br />
# From Cytoscape updater<br />
# Using plugins update feature<br />
<br />
== Log Files, Search Details and Errors ==<br />
# How to interpret log messages and save them for later reference. <br />
<br />
==Using the plugin as a search tool ==<br />
The plugin could also be used to search the current network. However, there is a better search option in Cytoscape with Google suggest which may be more convenient to use. The reason for including the search function was that the Cytoscape search filed remained inactive on some occasions for networks crated using the plugin. The reason for this is still unknown and deleting a node on the network seems to activate it, when this bug will be fixed the users are encouraged to use the Cytoscape search option.<br />
Currently, if a user performs a search with a term and if the corresponding protein is already loaded, the loaded protein (corresponding node) would be highlighted with Cytoscape default highlight colors. <br />
<br />
<br />
== Exit plugin and force terminate operations ==<br />
The exit button performs two functions. <br />
# First one is to exit iRefIndex plugin, where the outcome is to detach the plugin from Cytoscape. <br />
# The second function "FORCE STOP" (only available during a active task) is to terminate current operation. The "FORCE STOP" is useful when the search query or a subsequent operation takes too long to finish or none-responding. When a force stop is performed the out come is unpredictable and behavior was undefined, therefore results after such operation could not be trusted. <br />
<br />
--><br />
<br />
==Advanced features==<br />
<br />
The advanced features panel holds a number of tabbed panels, most of which expose settings which can be adjusted to change the behaviour of the normal search operations. Many panels offer contextual help via the iRefScape help system, but a brief description of each panel is also given here.<br />
<br />
{| cellpadding="10" cellspacing="0" border="1"<br />
! Preferences<br />
| This panel configures the range of search types (such as <tt>UniProt_Ac</tt>) presented in the main query interface. More search types can be added, and existing search types can be removed.<br />
|-<br />
! Statistics<br />
| A selection of statistics measures for the current network can be calculated and displayed using this panel.<br />
|-<br />
! Compare<br />
| This panel configures the <tt>COMPARE</tt> search operation and the equivalent functionality in the "Grouping" submenu of the iRefScape menu.<br />
|-<br />
! Summary<br />
| This panel generates node-by-node summaries where the attributes of each selected node (or of all nodes in the current network, if no nodes are selected) are presented in a separate table in the help viewer.<br />
|-<br />
! Filter<br />
| As an alternative to the manual selection of nodes and edges using the graphical user interface, this panel permits the selection of nodes and edges according to certain criteria based on node and edge attributes.<br />
|-<br />
! Path parameters<br />
| This pane provides options that configure the path-finding functionality described below.<br />
|-<br />
! Loading options<br />
| The options presented here affect the retrieval of data in search operations, including or excluding certain kinds of data (such as lists of values for certain attributes) in order to either simplify the results or speed up each search operation.<br />
|-<br />
! Import<br />
| The import panel provides the ability to import a generic Cytoscape network into iRefScape by interpreting node attributes as iRefScape queries.<br />
|-<br />
! Export<br />
| The export panel provides the ability to export an iRefScape network in such a way that other Cytoscape plugins may be able to access and manipulate the network's essential information.<br />
|}<br />
<br />
=== Path-finding ===<br />
<br />
[[Image:NP_002515-NP_742031.png|thumb|187px|The path in the results, highlighted in green. Solid green lines indicate presence of evidence for this step of the path in the direction specified by the query ''or'' the presence of evidence that has no directionality. A dashed green line indicates there is evidence for this step of the path but only in the direction that is opposite to that specified in the query.]]<br />
<br />
iRefScape can be used to find interaction events connecting two proteins or a sequence of events involving several proteins. <br />
<br />
This process intakes two terminal nodes as input and returns all reasonable paths connecting these two. The results returned here are pathway independent. In other words, the sequences of interactions connecting the nodes are not constructed using currently published pathways. However, the paths returned may contain pathway centric information.<br />
<br />
The query format is as follows:<br />
<br />
NP_203524 <==> NP_002871<br />
<br />
Additional type and taxonomy parameters were also supplied as required:<br />
<br />
* '''Search type:''' <tt>RefSeq_Ac</tt><br />
* '''Taxonomy:''' <tt>9606 (Homo sapiens)</tt><br />
<br />
This query located all reasonable paths between <tt>NP_203524</tt> and <tt>NP_002871</tt> and the returned path also contains the shortest path between them. The results of the path finding was sorted in the ascending order of path length and the maximum path length was restricted to a default value of 6; this value can be modified by changing the value of "Maximum distance" from the "Path parameters" tab in the advanced options panel. The paths found in this way were "reasonable paths", this concept is different from finding the shortest path or finding all the paths. A "reasonable path" from A to B is a path extending from A to B where none of the intermediate points can be reached from A with fewer steps by a path that extends from A via B (in other words, when evaluating a path from A to B, nodes beyond B are not considered).<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Reversing the Path ===<br />
<br />
[[Image:NP_742031-NP_002515.png|thumb|187px|The path in the results, highlighted in green]]<br />
<br />
The query rewritten to find the reversed path is as follows:<br />
<br />
NP_002871 <==> NP_203524<br />
<br />
In this case, the same nodes and edges are retrieved and the path is merely reversed.<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Differences in Forward and Reverse Directions ===<br />
<br />
[[Image:P62070-Q13322.png|thumb|198px|The path in the results, highlighted in green. Here, many nodes have been hidden in order to show the nodes involved in the path.]]<br />
<br />
Consider the following path query (using <tt>UniProt_Ac</tt> as the search type:<br />
<br />
P62070 <==> Q13322<br />
<br />
This produces a network of 214 nodes and 253 edges, and the result is shown in the illustration.<br />
<br />
<div style="clear: right"></div><br />
<br />
[[Image:Q13322-P62070.png|thumb|270px|The reverse path in the results, highlighted in green. Here, many nodes have been hidden in order to show the nodes involved in the path.]]<br />
<br />
However, when searching with the accessions reversed...<br />
<br />
Q13322 <==> P62070<br />
<br />
...a network of 46 nodes and 91 edges was produced, as illustrated.<br />
<br />
<div style="clear: right"></div><br />
<br />
=== Path Selection ===<br />
<br />
[[File:IRefScape-1.18-path-selector.png|thumb|500px|The path selector for the results]]<br />
<br />
After the path-finding is completed the "path selection" panel can be used to selectively load the paths. In order to make the selection easier, the paths found can be described by a particular attribute type: by selecting a value from the list for "Convert pop-up type to" (such as <tt>UniProt_Ac</tt>) and pressing the "Convert" button, a tooltip appearing over each path description will show the requested attribute values for each component of the path. Thus, a path description such as...<br />
<br />
4664766 -> 2079075 -> 4770079<br />
<br />
...will provide a tooltip showing the following identifiers:<br />
<br />
Q13322 -> P06241 -> P62070<br />
<br />
A "query helper" panel will also show the converted identifiers.<br />
<br />
=== List Comparison ===<br />
<br />
This feature is available with version 0.91 and later.<br />
<br />
This feature provides a way to compare two lists of proteins. When a <tt>COMPARE{<List1>,<List2>}</tt> format query is issued with default settings an interaction network is loaded with interactions involving only the proteins of the list and proteins which are not in the list but interacts with at least two proteins from each list (intermediate components). At the end of the operation, in addition to the Cytoscape network a adjacency cube (adjacency matrix with colours as the third dimension) is also created. This adjacency cube is synchronized with the network and can be used examine the results easily. A summary report function is provided to list the overall summary of each protein in the list sorted order so that the most connected protein appear first. The identifiers used to display the proteins in the adjacency cube are either iROGID or the ROGID of complexes. The user has the option to visualize these in popular identifier types using convert feature.<br />
<br />
An example query (from PMID:20670417):<br />
<br />
COMPARE{P08588,P16671|P07550,P13945}<br />
<br />
This query compares two groups:<br />
<br />
# P08588,P16671<br />
# P07550,P13945<br />
<br />
Members within the group are separated with a comma (<tt>,</tt>); groups are separated by a pipe (<tt>|</tt>).<br />
<br />
====Questions and answers about list comparison====<br />
<br />
''What is the maximum number of members a group can have?''<br />
<br />
You could have any number of members. The more members there are, the more time it will take for the operation, and the more memory it will need. For instance the above example search will complete comfortably in 1 minute with 256MB of allocated memory. If you have more than 100 members we recommend having at least 1GB dedicated memory for Cytoscape. <br />
<br />
''Can I compare more than two groups?''<br />
No. Only two groups could be compared in the current version. If a protein appears in both groups being compared these proteins will be treated as a third group. But this third group is defined after the execution. <br />
<br />
''What if a protein or protein resulting from query appears in more than one group?''<br />
<br />
All proteins found in more than one group are treated as a new group (group 3).<br />
<br />
==Troubleshooting==<br />
<br />
* See http://cytoscape.org/ for a manual and a set of tutorials which describe the installation and use of Cytoscape.<br />
* For problems with Cytoscape installation or use, try the [http://groups-beta.google.com/group/cytoscape-helpdesk Cytoscape Help Desk].<br />
* If you have problems with installation or use, please share your experience with us through the [http://groups.google.com/group/irefindex?hl=en iRefIndex Google Group].<br />
* When updating data on Microsoft Windows XP and Vista, a "Failed to find resources message" may appear in the log message window. If this happens please run the update again and the plugin will check and correct the problem during the second attempt.<br />
* If you are working with large graphs, make sure Cytoscape has at least 128MB memory. See the [http://cytoscape.org/cgi-bin/moin.cgi/How_to_increase_memory_for_Cytoscape Cytoscape documentation] for more information on setting up memory allowances.<br />
<br />
<br />
==Internal Testing==<br />
Our internal test results for this release of the plugin can be found on the [[iRefScape Test Cases 1.0]] page.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefScape_Test_Cases_1.0&diff=4004
iRefScape Test Cases 1.0
2011-11-24T17:32:48Z
<p>PaulBoddie: /* To be corrected */ Added another thing.</p>
<hr />
<div>Last edited: {{REVISIONYEAR}}-{{padleft:{{REVISIONMONTH}}|2}}-{{REVISIONDAY2}}<br />
<br />
All tests have been performed against iRefIndex 8.1 data.<br />
<br />
==Search cases==<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;" | Query<br />
! align="center" style="background:#f0f0f0;" | Search Type<br />
! align="center" style="background:#f0f0f0;" | Options<br />
! align="center" style="background:#f0f0f0;" | Expected Result<br />
! align="center" style="background:#f0f0f0;" | Pass/Fail<br />
|-<br />
| rowspan="10" valign="top" | <pre>Q39009<br />
Q9ZNV8</pre><br />
| rowspan="12" valign="top" | <pre>UniProt_Ac</pre><br />
| rowspan="9" valign="top" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: no<br />
Use canonical expansion: no<br />
Add edges between neighbours: no</pre><br />
| 33 nodes, 94 edges. 2 queried nodes are blue (DMC_ARATH and AHP2_ARATH)<br />
| Pass<br />
|-<br />
| Node AHP2_ARATH linkout from i.RefSeq_Ac_TOP attribute (NP_189581) to Entrez Protein gives http://www.ncbi.nlm.nih.gov/protein/NP_189581?report=GenPept<br />
| Pass<br />
|-<br />
| i.taxid is 3702 and i.geneID is 822593<br />
| Pass<br />
|-<br />
| Node AHP2_ARATH linkout from i.UniProt_Ac_TOP attribute (Q9ZNV8) to UniProt/KB beta gives http://www.uniprot.org/uniprot/Q9ZNV8<br />
| Pass<br />
|-<br />
| UniProt record agrees with iRefScape on iRefSeq_Ac_TOP (see "Sequence databases"), i.taxid, i.geneID (see "Genome annotation databases")<br />
| Pass<br />
|-<br />
| Two edges between query nodes are EBI-1555390, EBI-1555417<br />
| Pass<br />
|-<br />
| Linkouts for query node edges (i.src_intxn_id) to Intact ("Interaction databases") provide PubMed #17937504 which should match i.PMID, and an interaction detection method of "anti tag coip" which should match i.method_name<br />
| Pass<br />
|-<br />
| The molecule names are DMC1 and ATHP1 in IntAct and these names should be available under the i.interactor_alias node attribute<br />
| Pass<br />
|-<br />
| Both interactions should have http://wodaklab.org/iRefWeb/interaction/show/102203 as i.iRefWEB<br />
| Pass<br />
|-<br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: no<br />
Use canonical expansion: no<br />
Add edges between neighbours: no</pre><br />
| 2 nodes, 3 edges; 2 queried nodes are blue (DMC_ARATH and AHP2_ARATH) and are connected by two edges<br />
| Pass<br />
|-<br />
| valign="top" | <pre>Q39009-1<br />
Q9ZNV8-2</pre><br />
| rowspan="2" valign="top" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: no<br />
Use canonical expansion: no<br />
Add edges between neighbours: no</pre><br />
| Return the same results as for UniProtKB: the isoform information is ignored when searching<br />
| Pass<br />
|-<br />
| valign="top" | <pre>Q39009.1<br />
Q9ZNV8.2</pre><br />
| Returns no results. Version information is not a valid annotation for UniProtKB.<br />
| Pass<br />
|-<br />
| valign="top" | <pre>NP_188928.2</pre><br />
| rowspan="5" valign="top" | <pre>RefSeq_Ac</pre><br />
| rowspan="11" valign="top" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 11 nodes, 29 edges. i.RefSeq_Ac = NP_188928 was returned; edges returned include EBI-1555390, EBI-1555417 between DMC1_ARATH and AHP2_ARATH<br />
| Pass<br />
|-<br />
| <pre>NP_188928</pre><br />
| The same results are returned<br />
| Pass<br />
|-<br />
| <pre>NP_188928.567</pre><br />
| The same results are returned<br />
| Pass<br />
|-<br />
| <pre>188928</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| <pre>NP 188928</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>NP_188928.2</pre><br />
| valign="top" | <pre>UniProt_Ac</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>NP_188928.2</pre><br />
| valign="top" | <pre>geneID</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>NP_188928.2</pre><br />
| valign="top" | <pre>geneSymbol</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>NP_188928.2</pre><br />
| valign="top" | <pre>ipi</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>NP_188928.2</pre><br />
| valign="top" | <pre>mass</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>NP_188928.2</pre><br />
| valign="top" | <pre>rog</pre><br />
| No results are returned<br />
| Pass (''Modified to give a warning'')<br />
|-<br />
| valign="top" | <pre>AHP2_ARATH</pre><br />
| valign="top" rowspan="6" | <pre>UniProt_ID</pre><br />
| valign="top" rowspan="10" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 26 nodes, 67 edges<br />
| Pass<br />
|-<br />
| valign="top" | <pre>AHP2</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>AHP2 ARATH</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>ARATH</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>AH2_ARATH</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>AHP2_ARATH.2</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>AHP2_ARATH</pre><br />
| valign="top" | <pre>UniProt_Ac</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>AHP2_ARATH</pre><br />
| valign="top" | <pre>geneID</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>AHP2_ARATH</pre><br />
| valign="top" | <pre>geneSymbol</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>AHP2_ARATH</pre><br />
| valign="top" | <pre>ipi</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>821860<br />
822593</pre><br />
| valign="top" | <pre>geneID</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 33 nodes and 94 edges. (Should return the same result as a UniProt_Ac query for Q39009 and Q9ZNV8.)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>821860</pre><br />
| valign="top" | <pre>geneID</pre><br />
| valign="top" rowspan="4"| <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 1 node<br />
| Pass<br />
|-<br />
| valign="top" | <pre>821860</pre><br />
| valign="top" | <pre>geneSymbol</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>821860</pre><br />
| valign="top" | <pre>UniProt_Ac</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>821860</pre><br />
| valign="top" | <pre>UniProt_ID</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>1234</pre><br />
| valign="top" | <pre>geneID</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 47 nodes and 158 edges<br />
| Pass<br />
|-<br />
| valign="top" | All geneIds from a search for <pre>1234</pre> (45 values producing 44 unique gene identifiers - highlight values in i.geneID and use control-C to copy)<br />
| valign="top" | <pre>geneID</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 80 nodes and 733 edges (upon last attempt)<br />
| Pass<br />
|-<br />
| colspan="5" | '''Note:''' This test is made incredibly difficult current bug in 2.8.1 attribute browser- right-click and copy of multiple values does not work).<br />
|-<br />
| valign="top" | <pre>CXCR4</pre><br />
| valign="top" | <pre>geneSymbol</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 2 nodes<br />
| Pass<br />
|-<br />
| valign="top" | <pre>P61073-2</pre><br />
| valign="top" | <pre>UniProt_Ac</pre><br />
| valign="top" rowspan="2" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 2 nodes (CXCR4 and CXCR4_HUMAN)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>P61073</pre><br />
| valign="top" | <pre>UniProt_Ac</pre><br />
| 1 node (CXCR4_HUMAN)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>P61073-2</pre><br />
| valign="top" | <pre>UniProt_Ac</pre><br />
| valign="top" rowspan="2" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: yes</pre><br />
| 2 nodes (CXCR4 and CXCR4_HUMAN)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>P61073</pre><br />
| valign="top" | <pre>UniProt_Ac</pre><br />
| 2 nodes (CXCR4 and CXCR4_HUMAN)<br />
| Pass<br />
|-<br />
| colspan="5" | '''Note:''' P61073-2 and P61073 are isoforms of CXCR4. Searching for P61073-2 actually results in the removal of the "-2" and a search for all isoforms, under the assumption that the user is unsure which isoform should be retrieved; as a result, all isoforms are returned, even though a specific isoform was requested. In contrast, the search without a "-" character results in just one protein with that exact name being returned.<br />
|-<br />
| valign="top" | <pre>CXCR</pre><br />
| valign="top" rowspan="4" | <pre>geneSymbol</pre><br />
| valign="top" rowspan="4" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>CXCR4</pre><br />
| rowspan="3"| 2 nodes with i.UniProt_Ac_TOP set to P61073 and P61073-2<br />
| Pass<br />
|-<br />
| valign="top" | <pre>cxcR4</pre><br />
| Pass<br />
|-<br />
| valign="top" | <pre>CXC<br />
CXCR<br />
CXCR4</pre><br />
| Pass<br />
|-<br />
| colspan="5" | '''Note:''' no indication is given that CXC and CXCR failed to provide matches when the successfully used CXCR4 term is present. Maybe some feedback could be given about this.<br />
|-<br />
| valign="top" | <pre>CXCR5</pre><br />
| valign="top" | <pre>geneSymbol</pre><br />
| valign="top" rowspan="3" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 3 nodes<br />
| Pass<br />
|-<br />
| valign="top" | <pre>CXCR5</pre><br />
| valign="top" | <pre>UniProt_ID</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>PTK2</pre><br />
| valign="top" | <pre>geneSymbol</pre><br />
| 6 nodes (involving 5 taxons)<br />
| Pass<br />
|-<br />
| valign="top" rowspan="3" | <pre>RPB1</pre><br />
| valign="top" rowspan="3" | <pre>geneSymbol</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 1 node (RPB1_SCHPO)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>Taxon id: 4932<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| No results returned (even though an alias for yeast RPO21, gene identifier 851415, is RPB1, this search only searches on official gene symbols from Entrez<br />
| Pass<br />
|-<br />
| valign="top" | <pre>Taxon id: 9606<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| No results returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>5366033</pre><br />
| valign="top" | <pre>rog</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: no<br />
Use canonical expansion: no</pre><br />
| 2 nodes (POL_HV1H2 and POL_HV1B1 interacting with it)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>IPI00543858<br />
IPI00517160</pre><br />
| valign="top" rowspan="4" | <pre>ipi</pre><br />
| valign="top" rowspan="4" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: no<br />
Use canonical expansion: no</pre><br />
| rowspan="3" | 33 nodes, 94 edges. 2 queried nodes are blue (DMC_ARATH and AHP2_ARATH)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>IPI00543858.1<br />
IPI00517160.1</pre><br />
| Pass<br />
|-<br />
| valign="top" | <pre>00543858<br />
00517160</pre><br />
| Pass<br />
|-<br />
| valign="top" | <pre>00543858<br />
0051716</pre><br />
| 33 nodes, 94 edges, but only after IPI00517160 has been chosen from the query helper<br />
| Pass<br />
|-<br />
| colspan="5" | '''Note:''' the transfer of the search term from the query helper and the augmentation of results from the term isn't particularly easy to accomplish, or it isn't obvious how to accomplish this successfully, because the iRefScape panel is hidden in the main window (a Cytoscape bug which appears to switch the visible panel all the time) and because a new search is required (without a new view being created, which is potentially how the original search might be set up).<br />
|-<br />
| valign="top" | <pre>IPI</pre><br />
| valign="top" rowspan="2" | <pre>ipi</pre><br />
| valign="top" rowspan="2" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: no<br />
Use canonical expansion: no</pre><br />
| Initiates query helper<br />
| Pass<br />
|-<br />
| valign="top" | <pre>IPI00543858</pre><br />
| 26 nodes, 67 edges<br />
| Pass<br />
|-<br />
| valign="top" | <pre>72854<-->72866</pre><br />
| valign="top" | <pre>mass</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| The query helper is shown. Upon transferring the 4 suggestions into the query box and searching again, 5 nodes are retrieved.<br />
| Pass<br />
|-<br />
| colspan="5" | '''Note:''' the nodes are not laid out in a nice way, probably because no edges connect them.<br />
|-<br />
| valign="top" | <pre>72854 kda</pre><br />
| valign="top" | <pre>mass</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>72854<-->72856 kda</pre><br />
| valign="top" | <pre>mass</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| Query helper with 3 possible results<br />
| Pass<br />
|-<br />
| valign="top" | <pre>11401546</pre><br />
| valign="top" rowspan="12" | <pre>PMID</pre><br />
| valign="top" rowspan="12" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 4 nodes and 7 edges are returned. Three edges have PMID 11401546 and these involve the 4 nodes shown. All other edges (from different PMIDs) involve these proteins.<br />
|<br />
|-<br />
| valign="top" | <pre>11401546.1</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>1140154</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>SPTAN1</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>11401546<br />
SPTAN1</pre><br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>10551855<br />
11401546</pre><br />
| 6 nodes and 27 edges<br />
| Pass<br />
|-<br />
| valign="top" | <pre>10551855|11401546</pre><br />
| 6 nodes and 27 edges<br />
| Pass<br />
|-<br />
| valign="top" | <pre>10551855| 11401546</pre><br />
| 6 nodes and 27 edges<br />
| Pass<br />
|-<br />
| valign="top" | <pre>10551855 | 11401546</pre><br />
| 6 nodes and 27 edges<br />
| Pass<br />
|-<br />
| valign="top" | <pre>10551855 [tab] 11401546</pre><br />
| 6 nodes and 27 edges<br />
| Pass<br />
|-<br />
| valign="top" | <pre>10551855, 11401546</pre><br />
| 6 nodes and 27 edges<br />
| Pass<br />
|-<br />
| valign="top" | <pre>10551855 11401546</pre><br />
| No results are returned (since space-delimited queries are not supported)<br />
| Pass<br />
|-<br />
| colspan="5" | The following example searches are listed in the [[README_Cytoscape_plugin_0.8x#Using_the_Search_Panel|Using the Search Panel]] documentation.<br />
|-<br />
| valign="top" | <pre>Q7KSF4</pre><br />
| valign="top" | <pre>UniProt_Ac</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: yes<br />
Use canonical expansion: yes</pre><br />
| 69 nodes, 213 edges, with the 4 query nodes having i.query = Q7KSF4, one of the query nodes having i.order = 0 (Q7KSF4_DROME) and the others having i.order = 10 (as canonical group members)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>Q7KSF4</pre><br />
| valign="top" | <pre>UniProt_Ac</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| 1 node (Q7KSF4_DROME) verifying the attributes in the previous search<br />
| Pass<br />
|-<br />
| valign="top" | <pre>NP_996224</pre><br />
| valign="top" | <pre>RefSeq_Ac</pre><br />
| valign="top" rowspan="3" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: yes<br />
Use canonical expansion: yes</pre><br />
| rowspan="2" | 69 nodes, 213 edges (same as the above query for Q7KSF4)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>Q7KSF4_DROME</pre><br />
| valign="top" | <pre>UniProt_ID</pre><br />
| Pass<br />
|-<br />
| valign="top" | <pre>42066</pre><br />
| valign="top" | <pre>geneID</pre><br />
| rowspan="3" | 69 nodes, 213 edges, with the 4 query nodes having i.order = 0 since all also have i.geneID = 42066<br />
| Pass<br />
|-<br />
| valign="top" | <pre>42066</pre><br />
| valign="top" | <pre>geneID</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| Pass<br />
|-<br />
| valign="top" | <pre>cher</pre><br />
| valign="top" | <pre>geneSymbol</pre><br />
| valign="top" rowspan="4" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: yes<br />
Use canonical expansion: yes</pre><br />
| Pass<br />
|-<br />
| valign="top" | <pre>72854<-->72866</pre><br />
| valign="top" | <pre>mass</pre><br />
| 44 nodes, 140 edges, with 8 query nodes, 5 of which with i.mass in (72854, 72855, 72856, 72861) having i.order = 0 and the remaining 3 query nodes with i.mass outside the given range having i.order = 10<br />
| Pass<br />
|-<br />
| valign="top" | <pre>10121899</pre><br />
| valign="top" | <pre>rog</pre><br />
| 69 nodes, 213 edges (same as the above query for Q7KSF4), but with one of the query nodes having i.order = 0 and i.query = 10121899 and the remaining 3 query nodes having i.order = 10<br />
| Pass<br />
|-<br />
| valign="top" | <pre>14605208</pre><br />
| valign="top" | <pre>PMID</pre><br />
| rowspan="3" | 929 nodes and 1605 edges returned all with PMID of 14605208<br />
|<br />
|-<br />
| valign="top" | <pre>14605208</pre><br />
| valign="top" | <pre>PMID</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: yes</pre><br />
|<br />
|-<br />
| valign="top" | <pre>14605208</pre><br />
| valign="top" | <pre>PMID</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
|<br />
|-<br />
| valign="top" | <pre>47513</pre><br />
| valign="top" | <pre>src_intxn_id</pre><br />
| valign="top" rowspan="6" | <pre>Taxon id: Any<br />
Iterations: 1<br />
Create new view: yes<br />
Use canonical expansion: yes</pre><br />
| 2 nodes and 1 edge returned (one query node, Q7KSF4_DROME)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>EBI-212627</pre><br />
| valign="top" | <pre>src_intxn_id</pre><br />
| 2 nodes and 5 edges returned (from bind, dip, intact, mint and BIND_Translation), with 2 query nodes (CRBN_DROME and Q9W279_DROME)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>147805</pre><br />
| valign="top" | <pre>src_intxn_id</pre><br />
| Returns 4 nodes and 4 interactions because the BIND/BIND_Translation and BioGRID interaction identifier spaces overlap (so 147805 refers to completely different interactions in different databases)<br />
| Pass<br />
|-<br />
| valign="top" | <pre>227650</pre><br />
| valign="top" | <pre>omim</pre><br />
| 96 nodes (including 18 complex nodes) and 498 edges returned, with 10 query nodes having i.digid = 460<br />
| Pass<br />
|-<br />
| valign="top" | <pre>449</pre><br />
| valign="top" | <pre>digid</pre><br />
| 321 nodes and 1010 edges returned, with the 3 query nodes having i.omim = 612219<br />
| Pass<br />
|-<br />
| valign="top" | <pre>460</pre><br />
| valign="top" | <pre>digid</pre><br />
| 668 nodes and 17554 edges returned, with the 16 query nodes having i.digid = 460<br />
| Pass<br />
|-<br />
| valign="top" | <pre>fanconi</pre><br />
| valign="top" | <pre>dig_title</pre><br />
| valign="top" | <pre>Taxon id: Any<br />
Iterations: 0<br />
Create new view: yes<br />
Use canonical expansion: no</pre><br />
| ''(Query helper invoked as the dig_title is a non-exact match, all choices selected)'' 44 nodes and 473 edges returned, 14 nodes have a digid of 460<br />
| Pass<br />
|}<br />
<br />
=== Invalid input tests ===<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;" | Query<br />
! align="center" style="background:#f0f0f0;" | Search Type<br />
! align="center" style="background:#f0f0f0;" | Options<br />
! align="center" style="background:#f0f0f0;" | Expected Result<br />
! align="center" style="background:#f0f0f0;" | Pass/Fail<br />
|-<br />
| valign="top" | <pre>0</pre><br />
| valign="top" | <pre>RefSeq_Ac</pre><br />
|<br />
| No results are returned<br />
|Pass<br />
|-<br />
| valign="top" | <pre>0</pre><br />
| valign="top" | <pre>UniProt_ID</pre><br />
|<br />
| No results are returned<br />
|Pass<br />
|-<br />
| valign="top" | <pre>0</pre><br />
| valign="top" | <pre>geneSymbol</pre><br />
|<br />
| No results are returned<br />
|Pass<br />
|-<br />
| valign="top" | <pre>0</pre><br />
| valign="top" | <pre>geneID</pre><br />
|<br />
| No results are returned<br />
|Pass<br />
|-<br />
| valign="top" | <pre>0</pre><br />
| valign="top" | <pre>mass</pre><br />
|<br />
| No results are returned<br />
|Pass<br />
|-<br />
| valign="top" | <pre>0</pre><br />
| valign="top" | <pre>pmid</pre><br />
|<br />
| No results are returned<br />
|Pass<br />
|-<br />
| valign="top" | <pre>23</pre><br />
| valign="top" | <pre>UniProt_ID</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>-1</pre><br />
| valign="top" | <pre>mass</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>-1</pre><br />
| valign="top" | <pre>pmid</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>0000</pre><br />
| valign="top" | <pre>geneSymbol</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>00001</pre><br />
| valign="top" | <pre>geneID</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>12345</pre><br />
| valign="top" | <pre>geneSymbol</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>abcd</pre><br />
| valign="top" | <pre>ipi</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>abcde</pre><br />
| valign="top" | <pre>RefSeq_Ac</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>abcde</pre><br />
| valign="top" | <pre>UniProt_ID</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>abcde</pre><br />
| valign="top" | <pre>geneSymbol</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>abcde</pre><br />
| valign="top" | <pre>geneID</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>abcdes</pre><br />
| valign="top" | <pre>geneID</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
| valign="top" | <pre>MW</pre><br />
| valign="top" | <pre>mass</pre><br />
|<br />
| No results are returned<br />
| Pass<br />
|-<br />
|}<br />
<br />
=== Not currently tested ===<br />
<br />
* src_intxn_id search 1<br />
* audit against external database - Intact<br />
* audit against external database - MINT<br />
* audit against external database - BioGRID<br />
<br />
=== User interface notes ===<br />
<br />
Many search types such as UniProt_ID now lead to exact-only searches. The production of results should not be allowed for imprecise protein names, for example, since the user might have entered gene names, selected UniProt_ID by mistake and would not be aware of their mistake because their search returned results. Also, since the first few characters of UniProt_ID search terms may be shared by multiple proteins from different organisms, an inexact match would need to trigger the query helper.<br />
<br />
Generally, searches should provide predictable outcomes without resorting to the attribute browsers to discover which search terms produced which results. For example, AH2_ARATH which returns no results from an exact match search, should not encourage similar terms to be used for searching. Previously AH2_ARATH returned CAH2_ARATH. Imagine if the user accidentally had such a query term embedded in a long list. They would never detect this search error!<br />
<br />
Where the taxonomy field is set to <tt>Any</tt>, a warning will be given. It is envisaged that the user will most frequently be working with a single organism's proteins or would at least tolerate being reminded that potentially irrelevant proteins might be searched for due to naming coincidences.<br />
<br />
The iterations setting resets to 1 after a query, even one which led to the query helper being shown, where the query will be completed by trying the search again.<br />
<br />
The i.query attribute on nodes will collect queries as they are performed. Thus, nodes will appear blue in a graph even if the current query had no direct relationship with the node.<br />
<br />
Exporting lists of attribute values should be as simple as selecting the values in the attribute browser and opening a context menu and copying the selection. However, it is also possible (when the context menus don't work) to use the "File" -> "Export" -> "Node Attributes" menu entry and to choose "i.geneID", then saving and processing the saved file to get a list. This seems to be rather unreliable, however.<br />
<br />
== Export cases ==<br />
<br />
{| border="1" cellspacing="0" cellpadding="5" style="margin: 2em"<br />
! align="center" style="background:#f0f0f0;" | Query/Search Type/Options<br />
! align="center" style="background:#f0f0f0;" | Export Type<br />
! align="center" style="background:#f0f0f0;" | Expected Result<br />
! align="center" style="background:#f0f0f0;" | Pass/Fail<br />
|-<br />
| valign="top" rowspan="2" | <pre>CXCR4<br />
<br />
geneSymbol<br />
<br />
Taxon id: Any<br />
Iterations: 0<br />
Create new view: no<br />
Use canonical expansion: no</pre><br />
| valign="top" | <pre>i.UniProt_Ac_TOP</pre><br />
| 1 node showing P61073<br />
| Pass<br />
|-<br />
| valign="top" | <pre>i.canonical_rog_TOP</pre><br />
| 1 node showing 107322<br />
| Pass<br />
|}<br />
<br />
== Currently untested areas ==<br />
<br />
* Preferences<br />
* iRefScape menu<br />
* Right-click menu<br />
* Node attributes<br />
* Edge attributes<br />
* Wizard<br />
* Installation<br />
* Help system<br />
* Windows and sessions<br />
* Loading from file<br />
<br />
==To be corrected==<br />
*Remove the non-proprietary flag/check for current data<br />
*Handle the neighbourhood completion when expanding network (do not use all the nodes)<br />
*show_inxc dynamic index behaviour change not working<br />
*Scaling GUI at low resolution, maximise button may get hidden<br />
*Path finding cancelling time<br />
*Focus progress when path finding<br />
*Ending with collapsed node error<br />
*Unselect all nodes before edge filtering<br />
*Repeated errors when importing networks not providing the chosen attribute for import<br />
<br />
==List of GeneIDs to test the new canonical expansion==<br />
<br />
Available in data version 8.4:<br />
<br />
*945577 http://www.ncbi.nlm.nih.gov/gene/?term=945577<br />
*947704 http://www.ncbi.nlm.nih.gov/gene/?term=947704<br />
*2765365 http://www.ncbi.nlm.nih.gov/gene/?term=2765365<br />
*944797 http://www.ncbi.nlm.nih.gov/gene/?term=944797<br />
*29924 http://www.ncbi.nlm.nih.gov/gene/?term=29924 <br />
*948517 http://www.ncbi.nlm.nih.gov/gene/?term=948517<br />
*3673 http://www.ncbi.nlm.nih.gov/gene/?term=3673 <br />
*946848 http://www.ncbi.nlm.nih.gov/gene/?term=946848<br />
*653361 http://www.ncbi.nlm.nih.gov/gene/?term=653361<br />
*5657 http://www.ncbi.nlm.nih.gov/gene/?term=5657<br />
<br />
== All iRefIndex Pages ==<br />
<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Bioscape&diff=4003
Bioscape
2011-11-24T15:14:57Z
<p>PaulBoddie: Updated current status.</p>
<hr />
<div>Currently, this service is unavailable. It is anticipated that it will be deployed again in 2012.<br />
<br />
The service can be used with Firefox, Internet Explorer, Google Chrome, and probably any modern Web browser, although some browsers may exhibit rendering issues. We aim for maximum compatibility while adhering to official Web standards.<br />
<br />
See the [[Bioscape Manual]] currently in preparation.<br />
<br />
[[Category:Bioscape]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Magrathea&diff=4002
Magrathea
2011-11-24T15:09:36Z
<p>PaulBoddie: /* Example movies */ Removed the heavy header and footer.</p>
<hr />
<div>[[Image:Magrathea_logo.png|right|Magrathea logo]]<br />
<br />
==What is Magrathea?==<br />
Magrathea is a prototype application that demonstrates the use of Coordinated Agent Modeling (CAM).<br />
Coordinated Agent Modeling or Context Aware Modelling explores the idea that animation and simulation of molecular pathways belong to the same continuum. The prototype also explores the idea of an abstract concept called a '''coordinator''' to capture the idea of context in molecular pathways. You can read more about Magrathea and CAM in the Magrathea Manual.<br />
<br />
<br />
Click [http://www.biotek.uio.no/research_groups/donaldson_group.html here] to go back to the Donaldson page on the BiO web site.<br />
<br />
<br />
<br />
==Downloading the magrathea package==<br />
<br />
magrathea can be dowloaded via anonymous FTP at<br />
<br />
ftp://ftp.no.embnet.org/magrathea<br />
<br />
<br />
user name: anonymous<br />
<br />
password: ftp or your email address<br />
<br />
The package includes a tutorial, a manual and everything needed to create your own animations (magrathea and a Windows version of breve). Breve for other operating systems can be downloaded from http://spiderland.org.<br />
<br />
The current version of magrathea (1.68) is compatible with breve 2.72. <br />
<br />
<br />
<br />
<!--This is comment<br />
<br />
[[The Magrathea Manual: Coordinated Agent Modelling Explained]]<br />
<br />
[[The Magrathea Manual: Coordinated Agent Modelling By Example]]<br />
<br />
[[The Magrathea Manual: Building Coordinated Agent Models]]<br />
<br />
[[Magrathea Installation and Use]]<br />
<br />
[[Example model files for Magrathea]]<br />
<br />
--><br />
<br />
<br />
==Example movies==<br />
Here are some sample movies from the Magrathea project. <br />
<br />
Click on the pictures to view the movies. Your web browser requires a plugin (like Quicktime) that is capable of viewing mpeg files. Click the back button in your browser to return here when you are done viewing the movie.<br />
<br />
{|class="wikitable" style="text-align:left" border="0" cellpadding="10"<br />
|<imagemap><br />
Image:rules.1.png|200x200px<br />
default [[Special:Player/rules.1.mpeg]]<br />
</imagemap> <br />
|| This is a simple demonstration of the use of rules by coordinators.<br />
This example shows a substrate (molecule 1) that undergoes a state-change (phosphorylation) upon interaction with a kinase (molecule 2). The state change is visualized by a colour change.<br />
<br />
|-<br />
|<imagemap><br />
Image:rules.2.png|200x200px<br />
default [[Special:Player/rules.2.mpeg]]<br />
</imagemap> <br />
|| This demonstration shows a somewhat more complex use of rules to demonstrate the process of lateral signalling.<br />
In this example, a ligand molecule binds to a membrane-bound receptor. On binding, the receptor changes state (becomes activated) as indicated by a colour change. The receptor in this new state also gains the ability to bind to and activate other membrane bound receptors. The overall effect is that a single ligand-receptor binding event initiates a lateral cascade of receptor activation events. <br />
|-<br />
|<imagemap><br />
Image:rules.3.png|200x200px<br />
default [[Special:Player/Rules.3.mpeg]]<br />
</imagemap> <br />
|| This demonstration shows a set of rules that control translocation into and out of a spherical compartment. A number of ligand molecules (blue) bind to a surface receptor (black) on the orange compartment’s surface. The coordinator handling this ligand receptor interaction encodes a rule that translocates the ligand into the orange compartment and changes the state of the ligand such that it is capable of binding to a second receptor (grey) that is also a surface component of the orange compartment. This state change is associated with a colour change for the ligand molecule (blue to yellow). A second coordinator handles the interaction between the ligand and the second receptor and encodes a rule that translocates the ligand out of the orange compartment. Once the ligand is outside the compartment, its state (yellow) prevents it from assembling with the importing receptor (black) for a second time.<br />
|}<br />
<br />
<br />
http://groups.google.com/intl/en/images/logos/groups_logo_sm.gif<br />
<br />
<b>magrathea-models</b> [http://groups.google.com/group/magrathea-models Visit this group]<br />
<br />
<b>Subscribe to magrathea-models</b> [http://groups.google.com/group/magrathea-models/subscribe Subscribe]<br />
<br />
<!--<br />
==About the name==<br />
Magrathea is a reference to a [http://en.wikipedia.org/wiki/Places_in_The_Hitchhiker%27s_Guide_to_the_Galaxy#Magrathea fictional planet in Douglas Adam's Hitchhikers Guide to the Galaxy]. Magrathea's economy was based on the manufacturing of other planets. Amongst the clients who asked for planets to be created were a race of hyper-intelligent pan-dimensional beings who asked the Magratheans to create the Earth which, in addition to being a planet, was a super-computer designed to calculate the ultimate question to the ultimate answer to life, the universe, and everything. Amongst the people who worked on it was Slartibartfast, a coastal designer who won an award for his work on Norway.<br />
--><br />
<br />
<!--<br />
heres another way of doing things but the player doesnt give a preview of the movie and the movie itself is so small that only a <br />
keyhole of the movie is observable<br />
<player>rules.1.mpeg</player><br />
<player>rules.2.mpeg</player><br />
<player>Rules.3.mpeg</player><br />
--></div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=Donaldson_Group&diff=4001
Donaldson Group
2011-11-24T15:08:49Z
<p>PaulBoddie: Fixed the table to work better with version 1.1 of the theme.</p>
<hr />
<div>__NOTOC__<br />
<br />
= The Donaldson Group at the Biotechnology Centre of Oslo =<br />
<br />
<div class="floatright"><br />
<imagemap><br />
Image:BiO-logo-liten-pms-border.png<br />
default [http://www.biotek.uio.no]<br />
</imagemap><br />
<br />
<facebook-like /><br />
</div><br />
<br />
== Research Interests ==<br />
<br />
Our primary interests include protein interaction data consolidation, text mining and data mining especially with respect to diseases. <br />
<br />
Our recent work on a consolidated protein interaction database can be found at http://irefindex.uio.no/ .<br />
<br />
== Projects ==<br />
<br />
{|class="wikitable" style="text-align:left; clear:left" border="0" cellpadding="10"<br />
<br />
|-<br />
|<imagemap><br />
Image:iRefIndex_logo.png|100x100px<br />
default [[iRefIndex]]<br />
</imagemap><br />
|<br />
=== [[iRefIndex | iRefIndex, iRefWeb, iRefScape, iRefR]] ===<br />
<br />
[[iRefIndex|http://irefindex.uio.no/]]<br/> iRefIndex (interaction Reference Index) provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.<br />
<br />
iRefIndex is available via a number of interfaces: in MITAB tab-delimited text (iRefIndex), web-site (iRefWeb), Cytoscape plugin (iRefScape) and an R package (iRefR). <br />
<br />
|-<br />
|<imagemap><br />
Image:Magrathea_logo.png|100x100px<br />
default [[Magrathea]]<br />
</imagemap><br />
|<br />
=== [[Magrathea]] ===<br />
<br />
[[Magrathea|http://magrathea.uio.no/]]<br/> Magrathea is prototype software demonstrating how animations of molecular pathways can be driven automatically using local context of the participant molecules. <br />
<br />
|-<br />
|<imagemap><br />
Image:ancientlibraryalex.jpg|100x100px<br />
default [[The Biolibrarian Proposal]]<br />
</imagemap><br />
|<br />
=== [[The Biolibrarian Proposal]] ===<br />
<br />
The Biolibrarian proposal proposes the creation of new positions at university libraries around the world. <br />
These people would act as local biocurators that help local university researchers submit data to relevant biological databases.<br />
<br />
|-<br />
|<imagemap><br />
Image:Vitruvian_man.jpg|100x150px<br />
default [[DiG:_Disease_groups]]<br />
</imagemap><br />
|<br />
=== [[DiG: Disease groups|DiG: Disease Groups]] ===<br />
<br />
[[DiG:_Disease_groups|http://donaldson.uio.no/wiki/DiG:_Disease_groups]]<br/> The Disease Groups project groups together phenotypically related disease-gene associations found in OMIM's Morbid Map. The resulting map of disease genes may be used to explore relationships between disease genes in the human protein-interactome.<br />
<br />
|-<br />
|<imagemap><br />
Image:Bioscape_logo.gif|140x140px<br />
default [[Bioscape]]<br />
</imagemap><br />
|<br />
=== [[Bioscape]] ===<br />
<br />
http://bioscape.uio.no/<br/> Bioscape is our in-house text-mining system used to locate gene and protein mentions in PubMed abstracts.<br />
|}<br />
<br />
== Group Members ==<br />
<br />
* [http://www.biotek.uio.no/english/about/organization/donaldson-group/people/iand/ Ian Donaldson]<br />
* [http://www.biotek.uio.no/english/about/organization/donaldson-group/people/paulbodd/ Paul Boddie]<br />
* [[Antonio Mora]]<br />
<br />
== Past Group Members ==<br />
* Katerina Michalickova<br />
* Hanna Nemchenko<br />
* Sabry Razick<br />
<br />
==Local Seminar Series==<br />
<br />
The Biotechnology Centre of Oslo holds a weekly [[Bioseminar|Tuesday seminar]] at Forskningsparken, Gaustadalléen 21, Oslo.<br />
<br />
The [http://www.ifi.uio.no/research/clsi/seminars.html Computational Life Science seminars] are held every Wednesday at Ole-Johan Dahls hus, located at Gaustadalléen 23D, Oslo (opposite the Forskningsparken main entrance).<br />
<br />
==Courses==<br />
<br />
{|class="wikitable" style="text-align:left" border="0" cellpadding="10"<br />
|-<br />
|<imagemap><br />
Image:Bioinfo_course_logo.jpg|100x100px<br />
default [[Bioinformatics course]]<br />
</imagemap> ||<br />
=== [[Bioinformatics_course|Bioinformatics for molecular biology]] ===<br />
<br />
A new, two-week, intensive bioinformatics course that covers various aspects of bioinformatics analyses for molecular biology. Statistics, multiple hypothesis testing, microarray analysis, sequence alignments, working with protein structures, protein interaction networks and more. See the [[Bioinformatics course|course page]] for schedule information along with all material used in the course. The course is composed of lectures and practical tutorials. <br />
|}<br />
<br />
Introductory Perl is taught by Antonio Mora and Ian Donaldson as part of the [http://www.uio.no/studier/emner/matnat/molbio/MBV3070/ MBV3070] course. The slides for these lectures are available here at [[MBV3070|Perl lectures for MBV3070]].<br />
<!--Antonio Mora and Ian Donaldson also hold the "Applied readings in mathematics, computer science and biology" course every second Autumn term. See [http://www.uio.no/studier/emner/matnat/molbio/MBV-INF4410/ MBV-INF4410].<br />
--><br />
<br />
Ian Donaldson is organizing this year's Molecular Biotechnology Course at the Biotechnology Centre of Oslo. You can find the MBV9100 course web page [https://www.biotek.uio.no/events/courses_workshops/2011/MBV9100BTS.html here] and the latest schedule [[MBV9100|here]].<br />
<br />
==Contact==<br />
<br />
ian.donaldson at biotek.uio.no</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex&diff=4000
iRefIndex
2011-11-24T15:04:22Z
<p>PaulBoddie: Changed the positioning to work with version 1.1 of the theme.</p>
<hr />
<div>__NOTOC__<br />
<br />
<div style="float:right"><br />
<facebook-like /><br />
</div><br />
<br />
<div class="floatleft"><br />
<imagemap><br />
Image:iRefIndex_logo.png|120x120px<br />
default [[iRefIndex#A_reference_index_for_protein_interaction_data]]<br />
</imagemap><br />
</div><br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. <br />
<br />
The iRefScape plugin for Cytoscape has been published recently [http://www.biomedcentral.com/1471-2105/12/388 here].<br />
<br />
[[iRefIndex#A_reference_index_for_protein_interaction_data|Read more]]<br />
<br />
{|class="wikitable" style="text-align:left; clear: left" border="0" cellpadding="10"<br />
|<imagemap><br />
Image:Document-save-80x80.png<br />
default [[README_MITAB2.6_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[README_MITAB2.6_for_iRefIndex|Download]] ===<br />
<br />
Download the current iRefIndex release in PSI-MITAB tab-delimited format via FTP.<br />
|<imagemap><br />
Image:Accessories-text-editor-80x80.png<br />
default [[iRefIndex_Release_Notes]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex Release Notes]] ===<br />
<br />
View release notes and news for each release of iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px<br />
default [[iRefIndex_Citations]]<br />
</imagemap><br />
| style="vertical-align: top" colspan="3" |<br />
=== [[iRefIndex_Citations | Publications, citing, citations and further reading]] ===<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:CytoscapeLogo.png|80x80px<br />
default [[iRefScape]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefScape|iRefScape]] ===<br />
<br />
iRefScape is a plugin for [http://www.cytoscape.org/ Cytoscape] that exposes iRefIndex data as a navigable graphical network.<br />
|<imagemap><br />
Image:firefox.png|80x80px<br />
default [http://wodaklab.org/iRefWeb/]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [http://wodaklab.org/iRefWeb/ iRefWeb] ===<br />
<br />
iRefWeb provides a searchable web interface to the iRefIndex. This interface was developed as part of a collaboration with the Wodak group at the hospital for Sick Children in Toronto, Canada.<br />
|-<br />
|<imagemap><br />
Image:R-logo.jpg|80x80px<br />
default [[iRefR]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefR]] ===<br />
<br />
An R package providing access to iRefIndex data.<br />
|<imagemap><br />
Image:Applications-internet-80x80.png<br />
default [[README_PSICQUIC_web_services_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
<br />
=== [[PSICQUIC|Web services]] ===<br />
<br />
iRefIndex PSICQUIC and PSISCORE web services are now running on release 9.0 of iRefIndex <br />
|-<br />
|<imagemap><br />
Image:Youtube-256.png|80x80px<br />
default [[iRefIndex_Videos]]<br />
</imagemap> <br />
| style="vertical-align: top" |<br />
<br />
=== [[iRefIndex Videos|iRefIndex videos]] ===<br />
<br />
Video learning materials for iRefIndex, iRefScape and iRefWeb.<br />
|<imagemap><br />
Image:Internet-mail-80x80.png<br />
default [[iRefIndex#Contact and mailing list]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex#Contact and mailing list|Contact information and mailing list]] ===<br />
<br />
How to get in touch with the developers.<br />
|-<br />
|<imagemap><br />
Image:Emblem-notice-80x80.png<br />
default [[Sources_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Sources_iRefIndex|Source data information]] ===<br />
<br />
Details of all the different source databases that provide the foundation for iRefIndex.<br />
|<imagemap><br />
Image:X-office-spreadsheet-80x80.png<br />
default [[Statistics_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Statistics_iRefIndex|Statistical information]] ===<br />
<br />
Statistics for the current iRefIndex release.<br />
|}<br />
<br />
== Technical information on the iRefIndex database ==<br />
<br />
Build process: [[iRefIndex Manual]]<br />
<br />
Feedback files: [[README iRefIndex Feedback]]<br />
<br />
Mapping files: [[Protein identifier mapping]]<br />
<br />
Normalization of MI cv terms: [[Mapping of terms to MI term ids - iRefIndex]]<br />
<br />
Canonicalization: [[Canonicalization]]<br />
<br />
Disease Groups: [[DiG: Disease groups]]<br />
<br />
All iRefIndex pages and archived releases: [[iRefIndex#All_iRefIndex_Pages|see below]]<br />
<br />
License and disclaimer: [[iRefIndex#License_and_disclaimer|see below]]<br />
<br />
----<br />
<br />
== A reference index for protein interaction data ==<br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including [http://bond.unleashedinformatics.com/ BIND], [http://www.thebiogrid.org/ BioGRID], [http://mips.gsf.de/genre/proj/corum/index.html CORUM], [http://dip.doe-mbi.ucla.edu/ DIP], [http://www.hprd.org/ HPRD], [http://www.ebi.ac.uk/intact/site/index.jsf IntAct], [http://mint.bio.uniroma2.it/mint/Welcome.do MINT], [http://mips.gsf.de/genre/proj/mpact MPact], [http://mips.gsf.de/proj/ppi/ MPPI] and [http://ophid.utoronto.ca/ OPHID]. This index includes multiple interaction types including physical and genetic (mapped to their corresponding protein products) as determined by a multitude of methods. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.<br />
<br />
iRefIndex assigns a global unique identifier (rigid) which looks like 'tjWXXjgPyHyT2J6EwED8zK2x18U' to identify interactions that are identical (according to the sequence and taxon ids of the interactors). iRefIndex also assigns similar looking keys to protein interactors. These keys are global meaning they can be generated by anyone using the method described in the paper. This method allows users to integrate their own data with the iRefIndex in a way that ensures proteins with the exact same sequence will be represented only once.<br />
<br />
== Publications and further reading ==<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex are provided on the [[iRefIndex Citations]] page.<br />
<br />
----<br />
<br />
== Long term goals of the iRefIndex project ==<br />
<br />
We believe that protein interaction data hold incredible potential for biomedical research. Presently, these data are collected and archived by multiple groups around the world and the number of groups taking part in this work is growing rather than diminishing. <br />
<br />
As such, it is important that these databases have the means to effectively exchange and compare data and that they are curating and representing data using similar standards in order to make their data accessible and allow effective use.<br />
<br />
To this end, the iRefIndex project has three long term objectives:<br />
<br />
;1) to facilitate exchange of interaction data between interaction databases. <br />
<br />
:The iRefIndex paper describes a method for assigning unique and global identifiers to protein interactors, interactions and complexes. This method is independent of the iRefIndex resource and may be used by anyone to facilitate exchange and consolidation of data.<br />
<br />
;2) to consolidate interaction data from multiple sources. <br />
<br />
:The method has been used by to index interaction records from multiple sources. The resulting iRefIndex may be used search for the existence of interaction data for any protein regardless of the original resource. Nine interaction databases have been incorporated so far, others will follow.<br />
<br />
;3) to provide feedback to source interaction databases. <br />
<br />
:During the process of data consolidation, iRefIndex uses a sophisticated method to keep track of potential problems with source records such as outdated or unfound protein identifiers or incorrectly assigned taxonomy identifiers. These data are provided as feedback files to source interaction databases for correction, clarification or improvements to our own system. This process will help to harmonize data representation and improve the overall quality of interaction records for all source databases. This process will also help source databases to exchange data with one another.<br />
<br />
== iRefIndex availability ==<br />
<br />
iRefIndex is made available in a number of formats: MITAB tab-delimited text files, iRefWeb interface, iRefScape plugin for Cytoscape, PSICQUIC Web services, and an interface for the R programming language environment. See the links at the top of this page. For the license and disclaimer, [[iRefIndex#License_and_disclaimer | see below]].<br />
<br />
== Credits and collaborations ==<br />
<br />
'''Sabry Razick and Ian Donaldson''' developed iRefIndex at the Biotechnology Centre of Oslo, University of Oslo. <br />
<br />
'''Paul Boddie''' provides ongoing maintenance and development.<br />
<br />
'''George Magklaras''' provides systems engineer support and [http://www.no.embnet.org/ EMBNet Norway] provided hardware support.<br />
<br />
'''Antonio Mora''' developed [[iRefR|iRefR]].<br />
<br />
'''Katerina Michalickova''' developed [[DiG:_Disease_groups|Disease groups]].<br />
<br />
'''Brian Turner and Andrei Turinsky''' from the [http://wodaklab.org/ws/ Wodak group] at the Hospital for Sick Children in Toronto, Canada developed the [http://wodaklab.org/iRefWeb/ iRefWeb interface].<br />
<br />
<imagemap><br />
Image:IMEx_logo_webmedium.jpg|100x100px<br />
default [http://www.psimex.org]<br />
</imagemap> <br />
<br />
iRefIndex is a PSIMex partner: http://www.psimex.org<br />
<br />
<!--<br />
=== iRefWeb in the NCBI LinkOut programme ===<br />
<br />
Many [http://www.ncbi.nlm.nih.gov/gene Entrez Gene] records provided by NCBI contain links to iRefWeb in the [http://www.ncbi.nlm.nih.gov/projects/linkout/index.html LinkOut] section, allowing users to consult iRefWeb for related protein interactions when browsing gene information. The software which exposes iRefIndex information to the LinkOut programme can be found on the [[iRefWeb LinkOut Generator]] page.<br />
<br />
--><br />
----<br />
<br />
== License and disclaimer==<br />
<br />
Data released on the public FTP site is released under the [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution 2.5 Generic (CC BY 2.5) license].<br />
<br />
<imagemap><br />
Image:By-100x35.png<br />
default [http://creativecommons.org/licenses/by/2.5/]<br />
</imagemap><br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
----<br />
<br />
== Contact and mailing list ==<br />
<br />
Suggestions, requests and comments are welcome.<br />
<br />
<pre>ian.donaldson@biotek.uio.no</pre><br />
<br />
Full contact details are available at the [http://www.biotek.uio.no/english/research/groups/donaldson-group/ group home page].<br />
<br />
<imagemap><br />
Image:google-groups-logo.gif<br />
default [http://groups.google.com/group/irefindex]<br />
</imagemap><br />
<br />
See the [http://groups.google.com/group/irefindex iRefIndex Google Group] for announcements and discussion.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=PSICQUIC_Web_Services_for_iRefIndex&diff=3999
PSICQUIC Web Services for iRefIndex
2011-11-24T14:41:04Z
<p>PaulBoddie: Updated the MITAB/download link.</p>
<hr />
<div>These services provide programmatic access to iRefIndex data as well as many other interaction databases and are further described in a Nature Methods article [http://www.nature.com/nmeth/journal/v8/n7/full/nmeth.1637.html available here].<br />
<br />
These services are based on a template from the IntAct group at EBI that was developed and implemented by a consortium of authors and interaction database groups. A related interface called PSISCORE, provides programmatic access to interaction data scoring methods. <br />
<br />
"PSICQUIC" is pronounced like the English word "psychic" as in the ability to perceive information hidden from the normal senses through extrasensory perception.<br />
<br />
A number of use cases are envisioned for these services. For example, once implemented by multiple interaction databases, they would allow a user to make a query for information at one database web-site and then have that query replicated at all other interaction databases (that have implemented the service interface). The user could then receive a compiled list of results that match their query regardless of the database that contains the information.<br />
<br />
Note that only the first 15 columns of the iRefIndex MITAB files are returned from this service. The full data set is available via FTP download as described on the [[README_MITAB2.6_for_iRefIndex|iRefIndex MITAB page]] and via [http://wodaklab.org/iRefWeb/ iRefWeb].<br />
<br />
WSDL locations:<br />
*http://irefindex.uio.no/psicquic-ws/webservices/psicquic?wsdl<br />
*http://irefindex.uio.no/psiscore-ws/webservices/psiscore?wsdl<br />
<br />
Version information:<br />
*http://irefindex.uio.no/psicquic-ws/webservices/psicquic/getVersion<br />
<br />
Available services:<br />
*http://irefindex.uio.no/psicquic-ws/webservices/<br />
*http://irefindex.uio.no/psiscore-ws/webservices/<br />
<br />
All active services:<br />
*http://www.ebi.ac.uk/Tools/webservices/psicquic/registry/registry?action=STATUS<br />
<br />
REST service samples:<br />
*http://irefindex.uio.no/psicquic-ws/webservices/current/search/query/species:4932<br />
*http://irefindex.uio.no/psicquic-ws/webservices/current/search/query/species:9606<br />
<br />
More details about the project:<br />
*http://code.google.com/p/psicquic/<br />
<br />
IntAct PSICQUIC View:<br />
*http://www.ebi.ac.uk/Tools/webservices/psicquic/view/main.xhtml<br />
<br />
The format and content of the interaction data returned by these services is described on the [[README_MITAB2.6_for_iRefIndex]] page.<br />
<br />
===Changes from last version===<br />
<br />
There are major changes between iRefindex PSICQUIC service version 1.1.5 and version 1.1.6. In 1.1.5 the results returned were interaction-centric (one line in the results representing one binary interaction or a complex membership). From 1.1.6 onwards the results will be experiment-centric (one line per evidence per source). The new extended PSI MI-TAB (version 2.6) is used in the implementation, but only the first 15 columns will be available at the moment in order to maintain compatibility with other services. For further details on the returned results please refer the [[README_MITAB2.6_for_iRefIndex]] page.<br />
<br />
The second major change is the usage of UniProt and RefSeq identifiers as uidA and uidB whenever that is possible.<br />
<br />
Users are advised that there are differences between our implementation of these services and that of other databases.<br />
Most notably, interactions between 3 or more proteins (n-ary data) are represented in the PSI-MITAB files using a bi-partite model (one of the first two columns is used to represent the complex and the second column is used to represent a member of the complex). Other PSICQUIC services will likely use a spoke model representation.<br />
<br />
Please refer to our documentation above and documentation provided by other PSICQUIC providers if you plan to combine results from multiple services.<br />
<br />
== Main iRefIndex Page ==<br />
<br />
http://irefindex.uio.no/<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex&diff=3998
iRefIndex
2011-11-24T14:35:02Z
<p>PaulBoddie: /* Web services */ Updated the Web services.</p>
<hr />
<div>__NOTOC__<br />
<br />
<div class="floatright"><br />
<facebook-like /><br />
</div><br />
<br />
<div class="floatleft"><br />
<imagemap><br />
Image:iRefIndex_logo.png|120x120px<br />
default [[iRefIndex#A_reference_index_for_protein_interaction_data]]<br />
</imagemap><br />
</div><br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. <br />
<br />
The iRefScape plugin for Cytoscape has been published recently [http://www.biomedcentral.com/1471-2105/12/388 here].<br />
<br />
[[iRefIndex#A_reference_index_for_protein_interaction_data|Read more]]<br />
<br />
{|class="wikitable" style="text-align:left; clear: left" border="0" cellpadding="10"<br />
|<imagemap><br />
Image:Document-save-80x80.png<br />
default [[README_MITAB2.6_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[README_MITAB2.6_for_iRefIndex|Download]] ===<br />
<br />
Download the current iRefIndex release in PSI-MITAB tab-delimited format via FTP.<br />
|<imagemap><br />
Image:Accessories-text-editor-80x80.png<br />
default [[iRefIndex_Release_Notes]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex Release Notes]] ===<br />
<br />
View release notes and news for each release of iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px<br />
default [[iRefIndex_Citations]]<br />
</imagemap><br />
| style="vertical-align: top" colspan="3" |<br />
=== [[iRefIndex_Citations | Publications, citing, citations and further reading]] ===<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:CytoscapeLogo.png|80x80px<br />
default [[iRefScape]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefScape|iRefScape]] ===<br />
<br />
iRefScape is a plugin for [http://www.cytoscape.org/ Cytoscape] that exposes iRefIndex data as a navigable graphical network.<br />
|<imagemap><br />
Image:firefox.png|80x80px<br />
default [http://wodaklab.org/iRefWeb/]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [http://wodaklab.org/iRefWeb/ iRefWeb] ===<br />
<br />
iRefWeb provides a searchable web interface to the iRefIndex. This interface was developed as part of a collaboration with the Wodak group at the hospital for Sick Children in Toronto, Canada.<br />
|-<br />
|<imagemap><br />
Image:R-logo.jpg|80x80px<br />
default [[iRefR]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefR]] ===<br />
<br />
An R package providing access to iRefIndex data.<br />
|<imagemap><br />
Image:Applications-internet-80x80.png<br />
default [[README_PSICQUIC_web_services_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
<br />
=== [[PSICQUIC|Web services]] ===<br />
<br />
iRefIndex PSICQUIC and PSISCORE web services are now running on release 9.0 of iRefIndex <br />
|-<br />
|<imagemap><br />
Image:Youtube-256.png|80x80px<br />
default [[iRefIndex_Videos]]<br />
</imagemap> <br />
| style="vertical-align: top" |<br />
<br />
=== [[iRefIndex Videos|iRefIndex videos]] ===<br />
<br />
Video learning materials for iRefIndex, iRefScape and iRefWeb.<br />
|<imagemap><br />
Image:Internet-mail-80x80.png<br />
default [[iRefIndex#Contact and mailing list]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex#Contact and mailing list|Contact information and mailing list]] ===<br />
<br />
How to get in touch with the developers.<br />
|-<br />
|<imagemap><br />
Image:Emblem-notice-80x80.png<br />
default [[Sources_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Sources_iRefIndex|Source data information]] ===<br />
<br />
Details of all the different source databases that provide the foundation for iRefIndex.<br />
|<imagemap><br />
Image:X-office-spreadsheet-80x80.png<br />
default [[Statistics_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Statistics_iRefIndex|Statistical information]] ===<br />
<br />
Statistics for the current iRefIndex release.<br />
|}<br />
<br />
== Technical information on the iRefIndex database ==<br />
<br />
Build process: [[iRefIndex Manual]]<br />
<br />
Feedback files: [[README iRefIndex Feedback]]<br />
<br />
Mapping files: [[Protein identifier mapping]]<br />
<br />
Normalization of MI cv terms: [[Mapping of terms to MI term ids - iRefIndex]]<br />
<br />
Canonicalization: [[Canonicalization]]<br />
<br />
Disease Groups: [[DiG: Disease groups]]<br />
<br />
All iRefIndex pages and archived releases: [[iRefIndex#All_iRefIndex_Pages|see below]]<br />
<br />
License and disclaimer: [[iRefIndex#License_and_disclaimer|see below]]<br />
<br />
----<br />
<br />
== A reference index for protein interaction data ==<br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including [http://bond.unleashedinformatics.com/ BIND], [http://www.thebiogrid.org/ BioGRID], [http://mips.gsf.de/genre/proj/corum/index.html CORUM], [http://dip.doe-mbi.ucla.edu/ DIP], [http://www.hprd.org/ HPRD], [http://www.ebi.ac.uk/intact/site/index.jsf IntAct], [http://mint.bio.uniroma2.it/mint/Welcome.do MINT], [http://mips.gsf.de/genre/proj/mpact MPact], [http://mips.gsf.de/proj/ppi/ MPPI] and [http://ophid.utoronto.ca/ OPHID]. This index includes multiple interaction types including physical and genetic (mapped to their corresponding protein products) as determined by a multitude of methods. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.<br />
<br />
iRefIndex assigns a global unique identifier (rigid) which looks like 'tjWXXjgPyHyT2J6EwED8zK2x18U' to identify interactions that are identical (according to the sequence and taxon ids of the interactors). iRefIndex also assigns similar looking keys to protein interactors. These keys are global meaning they can be generated by anyone using the method described in the paper. This method allows users to integrate their own data with the iRefIndex in a way that ensures proteins with the exact same sequence will be represented only once.<br />
<br />
== Publications and further reading ==<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex are provided on the [[iRefIndex Citations]] page.<br />
<br />
----<br />
<br />
== Long term goals of the iRefIndex project ==<br />
<br />
We believe that protein interaction data hold incredible potential for biomedical research. Presently, these data are collected and archived by multiple groups around the world and the number of groups taking part in this work is growing rather than diminishing. <br />
<br />
As such, it is important that these databases have the means to effectively exchange and compare data and that they are curating and representing data using similar standards in order to make their data accessible and allow effective use.<br />
<br />
To this end, the iRefIndex project has three long term objectives:<br />
<br />
;1) to facilitate exchange of interaction data between interaction databases. <br />
<br />
:The iRefIndex paper describes a method for assigning unique and global identifiers to protein interactors, interactions and complexes. This method is independent of the iRefIndex resource and may be used by anyone to facilitate exchange and consolidation of data.<br />
<br />
;2) to consolidate interaction data from multiple sources. <br />
<br />
:The method has been used by to index interaction records from multiple sources. The resulting iRefIndex may be used search for the existence of interaction data for any protein regardless of the original resource. Nine interaction databases have been incorporated so far, others will follow.<br />
<br />
;3) to provide feedback to source interaction databases. <br />
<br />
:During the process of data consolidation, iRefIndex uses a sophisticated method to keep track of potential problems with source records such as outdated or unfound protein identifiers or incorrectly assigned taxonomy identifiers. These data are provided as feedback files to source interaction databases for correction, clarification or improvements to our own system. This process will help to harmonize data representation and improve the overall quality of interaction records for all source databases. This process will also help source databases to exchange data with one another.<br />
<br />
== iRefIndex availability ==<br />
<br />
iRefIndex is made available in a number of formats: MITAB tab-delimited text files, iRefWeb interface, iRefScape plugin for Cytoscape, PSICQUIC Web services, and an interface for the R programming language environment. See the links at the top of this page. For the license and disclaimer, [[iRefIndex#License_and_disclaimer | see below]].<br />
<br />
== Credits and collaborations ==<br />
<br />
'''Sabry Razick and Ian Donaldson''' developed iRefIndex at the Biotechnology Centre of Oslo, University of Oslo. <br />
<br />
'''Paul Boddie''' provides ongoing maintenance and development.<br />
<br />
'''George Magklaras''' provides systems engineer support and [http://www.no.embnet.org/ EMBNet Norway] provided hardware support.<br />
<br />
'''Antonio Mora''' developed [[iRefR|iRefR]].<br />
<br />
'''Katerina Michalickova''' developed [[DiG:_Disease_groups|Disease groups]].<br />
<br />
'''Brian Turner and Andrei Turinsky''' from the [http://wodaklab.org/ws/ Wodak group] at the Hospital for Sick Children in Toronto, Canada developed the [http://wodaklab.org/iRefWeb/ iRefWeb interface].<br />
<br />
<imagemap><br />
Image:IMEx_logo_webmedium.jpg|100x100px<br />
default [http://www.psimex.org]<br />
</imagemap> <br />
<br />
iRefIndex is a PSIMex partner: http://www.psimex.org<br />
<br />
<!--<br />
=== iRefWeb in the NCBI LinkOut programme ===<br />
<br />
Many [http://www.ncbi.nlm.nih.gov/gene Entrez Gene] records provided by NCBI contain links to iRefWeb in the [http://www.ncbi.nlm.nih.gov/projects/linkout/index.html LinkOut] section, allowing users to consult iRefWeb for related protein interactions when browsing gene information. The software which exposes iRefIndex information to the LinkOut programme can be found on the [[iRefWeb LinkOut Generator]] page.<br />
<br />
--><br />
----<br />
<br />
== License and disclaimer==<br />
<br />
Data released on the public FTP site is released under the [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution 2.5 Generic (CC BY 2.5) license].<br />
<br />
<imagemap><br />
Image:By-100x35.png<br />
default [http://creativecommons.org/licenses/by/2.5/]<br />
</imagemap><br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
----<br />
<br />
== Contact and mailing list ==<br />
<br />
Suggestions, requests and comments are welcome.<br />
<br />
<pre>ian.donaldson@biotek.uio.no</pre><br />
<br />
Full contact details are available at the [http://www.biotek.uio.no/english/research/groups/donaldson-group/ group home page].<br />
<br />
<imagemap><br />
Image:google-groups-logo.gif<br />
default [http://groups.google.com/group/irefindex]<br />
</imagemap><br />
<br />
See the [http://groups.google.com/group/irefindex iRefIndex Google Group] for announcements and discussion.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie
http://irefindex.vib.be/wiki/index.php?title=iRefIndex&diff=3981
iRefIndex
2011-11-24T00:15:21Z
<p>PaulBoddie: /* Contact and mailing list */ Updated link to BiO site.</p>
<hr />
<div>__NOTOC__<br />
<br />
<div class="floatright"><br />
<facebook-like /><br />
</div><br />
<br />
<div class="floatleft"><br />
<imagemap><br />
Image:iRefIndex_logo.png|120x120px<br />
default [[iRefIndex#A_reference_index_for_protein_interaction_data]]<br />
</imagemap><br />
</div><br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including BIND, BioGRID, CORUM, DIP, HPRD, IntAct, MINT, MPact, MPPI and OPHID. <br />
<br />
The iRefScape plugin for Cytoscape has been published recently [http://www.biomedcentral.com/1471-2105/12/388 here].<br />
<br />
[[iRefIndex#A_reference_index_for_protein_interaction_data|Read more]]<br />
<br />
{|class="wikitable" style="text-align:left; clear: left" border="0" cellpadding="10"<br />
|<imagemap><br />
Image:Document-save-80x80.png<br />
default [[README_MITAB2.6_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[README_MITAB2.6_for_iRefIndex|Download]] ===<br />
<br />
Download the current iRefIndex release in PSI-MITAB tab-delimited format via FTP.<br />
|<imagemap><br />
Image:Accessories-text-editor-80x80.png<br />
default [[iRefIndex_Release_Notes]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex Release Notes]] ===<br />
<br />
View release notes and news for each release of iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:pubmed-logo.png|80x80px<br />
default [[iRefIndex_Citations]]<br />
</imagemap><br />
| style="vertical-align: top" colspan="3" |<br />
=== [[iRefIndex_Citations | Publications, citing, citations and further reading]] ===<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex.<br />
|-<br />
|<imagemap><br />
Image:CytoscapeLogo.png|80x80px<br />
default [[iRefScape]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefScape|iRefScape]] ===<br />
<br />
iRefScape is a plugin for [http://www.cytoscape.org/ Cytoscape] that exposes iRefIndex data as a navigable graphical network.<br />
|<imagemap><br />
Image:firefox.png|80x80px<br />
default [http://wodaklab.org/iRefWeb/]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [http://wodaklab.org/iRefWeb/ iRefWeb] ===<br />
<br />
iRefWeb provides a searchable web interface to the iRefIndex. This interface was developed as part of a collaboration with the Wodak group at the hospital for Sick Children in Toronto, Canada.<br />
|-<br />
|<imagemap><br />
Image:R-logo.jpg|80x80px<br />
default [[iRefR]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefR]] ===<br />
<br />
An R package providing access to iRefIndex data.<br />
|<imagemap><br />
Image:Applications-internet-80x80.png<br />
default [[README_PSICQUIC_web_services_for_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[PSICQUIC|Web services]] ===<br />
<br />
iRefIndex PSICQUIC web services are now running on release 8.0 of iRefIndex <br />
|-<br />
|<imagemap><br />
Image:Youtube-256.png|80x80px<br />
default [[iRefIndex_Videos]]<br />
</imagemap> <br />
| style="vertical-align: top" |<br />
=== [[iRefIndex Videos|iRefIndex videos]] ===<br />
<br />
Video learning materials for iRefIndex, iRefScape and iRefWeb.<br />
|<imagemap><br />
Image:Internet-mail-80x80.png<br />
default [[iRefIndex#Contact and mailing list]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[iRefIndex#Contact and mailing list|Contact information and mailing list]] ===<br />
<br />
How to get in touch with the developers.<br />
|-<br />
|<imagemap><br />
Image:Emblem-notice-80x80.png<br />
default [[Sources_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Sources_iRefIndex|Source data information]] ===<br />
<br />
Details of all the different source databases that provide the foundation for iRefIndex.<br />
|<imagemap><br />
Image:X-office-spreadsheet-80x80.png<br />
default [[Statistics_iRefIndex]]<br />
</imagemap><br />
| style="vertical-align: top" |<br />
=== [[Statistics_iRefIndex|Statistical information]] ===<br />
<br />
Statistics for the current iRefIndex release.<br />
|}<br />
<br />
== Technical information on the iRefIndex database ==<br />
<br />
Build process: [[iRefIndex Manual]]<br />
<br />
Feedback files: [[README iRefIndex Feedback]]<br />
<br />
Mapping files: [[Protein identifier mapping]]<br />
<br />
Normalization of MI cv terms: [[Mapping of terms to MI term ids - iRefIndex]]<br />
<br />
Canonicalization: [[Canonicalization]]<br />
<br />
Disease Groups: [[DiG: Disease groups]]<br />
<br />
All iRefIndex pages and archived releases: [[iRefIndex#All_iRefIndex_Pages|see below]]<br />
<br />
License and disclaimer: [[iRefIndex#License_and_disclaimer|see below]]<br />
<br />
----<br />
<br />
== A reference index for protein interaction data ==<br />
<br />
iRefIndex provides an index of protein interactions available in a number of primary interaction databases including [http://bond.unleashedinformatics.com/ BIND], [http://www.thebiogrid.org/ BioGRID], [http://mips.gsf.de/genre/proj/corum/index.html CORUM], [http://dip.doe-mbi.ucla.edu/ DIP], [http://www.hprd.org/ HPRD], [http://www.ebi.ac.uk/intact/site/index.jsf IntAct], [http://mint.bio.uniroma2.it/mint/Welcome.do MINT], [http://mips.gsf.de/genre/proj/mpact MPact], [http://mips.gsf.de/proj/ppi/ MPPI] and [http://ophid.utoronto.ca/ OPHID]. This index includes multiple interaction types including physical and genetic (mapped to their corresponding protein products) as determined by a multitude of methods. This index allows the user to search for a protein and retrieve a non-redundant list of interactors for that protein.<br />
<br />
iRefIndex assigns a global unique identifier (rigid) which looks like 'tjWXXjgPyHyT2J6EwED8zK2x18U' to identify interactions that are identical (according to the sequence and taxon ids of the interactors). iRefIndex also assigns similar looking keys to protein interactors. These keys are global meaning they can be generated by anyone using the method described in the paper. This method allows users to integrate their own data with the iRefIndex in a way that ensures proteins with the exact same sequence will be represented only once.<br />
<br />
== Publications and further reading ==<br />
<br />
iRefIndex related publications, references for source databases and works citing and using the iRefIndex are provided on the [[iRefIndex Citations]] page.<br />
<br />
----<br />
<br />
== Long term goals of the iRefIndex project ==<br />
<br />
We believe that protein interaction data hold incredible potential for biomedical research. Presently, these data are collected and archived by multiple groups around the world and the number of groups taking part in this work is growing rather than diminishing. <br />
<br />
As such, it is important that these databases have the means to effectively exchange and compare data and that they are curating and representing data using similar standards in order to make their data accessible and allow effective use.<br />
<br />
To this end, the iRefIndex project has three long term objectives:<br />
<br />
;1) to facilitate exchange of interaction data between interaction databases. <br />
<br />
:The iRefIndex paper describes a method for assigning unique and global identifiers to protein interactors, interactions and complexes. This method is independent of the iRefIndex resource and may be used by anyone to facilitate exchange and consolidation of data.<br />
<br />
;2) to consolidate interaction data from multiple sources. <br />
<br />
:The method has been used by to index interaction records from multiple sources. The resulting iRefIndex may be used search for the existence of interaction data for any protein regardless of the original resource. Nine interaction databases have been incorporated so far, others will follow.<br />
<br />
;3) to provide feedback to source interaction databases. <br />
<br />
:During the process of data consolidation, iRefIndex uses a sophisticated method to keep track of potential problems with source records such as outdated or unfound protein identifiers or incorrectly assigned taxonomy identifiers. These data are provided as feedback files to source interaction databases for correction, clarification or improvements to our own system. This process will help to harmonize data representation and improve the overall quality of interaction records for all source databases. This process will also help source databases to exchange data with one another.<br />
<br />
== iRefIndex availability ==<br />
<br />
iRefIndex is made available in a number of formats: MITAB tab-delimited text files, iRefWeb interface, iRefScape plugin for Cytoscape, PSICQUIC Web services, and an interface for the R programming language environment. See the links at the top of this page. For the license and disclaimer, [[iRefIndex#License_and_disclaimer | see below]].<br />
<br />
== Credits and collaborations ==<br />
<br />
'''Sabry Razick and Ian Donaldson''' developed iRefIndex at the Biotechnology Centre of Oslo, University of Oslo. <br />
<br />
'''Paul Boddie''' provides ongoing maintenance and development.<br />
<br />
'''George Magklaras''' provides systems engineer support and [http://www.no.embnet.org/ EMBNet Norway] provided hardware support.<br />
<br />
'''Antonio Mora''' developed [[iRefR|iRefR]].<br />
<br />
'''Katerina Michalickova''' developed [[DiG:_Disease_groups|Disease groups]].<br />
<br />
'''Brian Turner and Andrei Turinsky''' from the [http://wodaklab.org/ws/ Wodak group] at the Hospital for Sick Children in Toronto, Canada developed the [http://wodaklab.org/iRefWeb/ iRefWeb interface].<br />
<br />
<imagemap><br />
Image:IMEx_logo_webmedium.jpg|100x100px<br />
default [http://www.psimex.org]<br />
</imagemap> <br />
<br />
iRefIndex is a PSIMex partner: http://www.psimex.org<br />
<br />
<!--<br />
=== iRefWeb in the NCBI LinkOut programme ===<br />
<br />
Many [http://www.ncbi.nlm.nih.gov/gene Entrez Gene] records provided by NCBI contain links to iRefWeb in the [http://www.ncbi.nlm.nih.gov/projects/linkout/index.html LinkOut] section, allowing users to consult iRefWeb for related protein interactions when browsing gene information. The software which exposes iRefIndex information to the LinkOut programme can be found on the [[iRefWeb LinkOut Generator]] page.<br />
<br />
--><br />
----<br />
<br />
== License and disclaimer==<br />
<br />
Data released on the public FTP site is released under the [http://creativecommons.org/licenses/by/2.5/ Creative Commons Attribution 2.5 Generic (CC BY 2.5) license].<br />
<br />
<imagemap><br />
Image:By-100x35.png<br />
default [http://creativecommons.org/licenses/by/2.5/]<br />
</imagemap><br />
<br />
Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.<br />
<br />
----<br />
<br />
== Contact and mailing list ==<br />
<br />
Suggestions, requests and comments are welcome.<br />
<br />
<pre>ian.donaldson@biotek.uio.no</pre><br />
<br />
Full contact details are available at the [http://www.biotek.uio.no/english/research/groups/donaldson-group/ group home page].<br />
<br />
<imagemap><br />
Image:google-groups-logo.gif<br />
default [http://groups.google.com/group/irefindex]<br />
</imagemap><br />
<br />
See the [http://groups.google.com/group/irefindex iRefIndex Google Group] for announcements and discussion.<br />
<br />
== All iRefIndex Pages ==<br />
<br />
Follow this link for a listing of all iRefIndex related pages (archived and current).<br />
[[Category:iRefIndex]]</div>
PaulBoddie