Difference between revisions of "README iRefIndex expanded MITAB proposal"
PaulBoddie (talk | contribs) (Added deprecation note.) |
|||
(127 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | + | {{Note| | |
− | This is | + | This is an expansion of the MITAB format that was proposed for use in iRefIndex 7.0 and subsequent releases; it does not correspond to any released product and is considered '''obsolete'''. |
+ | |||
+ | * See [[README MITAB2.6 for iRefIndex 7.0]] for the revised MITAB format eventually adopted for iRefIndex 7.0 and for future releases. | ||
+ | * See http://irefindex.uio.no for links to the latest release and relevant README documentation. | ||
+ | |||
This proposal is based on the experimental form of the iRefIndex MITAB format found at | This proposal is based on the experimental form of the iRefIndex MITAB format found at | ||
http://irefindex.uio.no/wiki/README_iRefIndex_experiment_MITAB_6.0 | http://irefindex.uio.no/wiki/README_iRefIndex_experiment_MITAB_6.0 | ||
− | Look for xxx for things that need to be changed to create a version specific form of | + | |
+ | Look for xxx for things that need to be changed to create a version specific form of this README. | ||
Look for CHANGE for items that differ significantly from the current MITAB format. | Look for CHANGE for items that differ significantly from the current MITAB format. | ||
− | This format is based on recent changes agreed | + | |
− | See http://code.google.com/p/psimi/issues/detail?id=2 | + | This format is based on recent changes agreed upon by the PSI-MI working group in Turku, Finland. |
− | + | }} | |
+ | |||
+ | See also: | ||
+ | |||
+ | * http://code.google.com/p/psimi/issues/detail?id=2 | ||
+ | * http://code.google.com/p/psimi/wiki/PsimiTabFormat | ||
Last edited: xxx | Last edited: xxx | ||
Line 132: | Line 142: | ||
Since this PSI-MITAB format allows for only two interactors to be described on each line, it is best suited for describing binary interaction data (the original experiment, say yeast two hybrid, gives a binary readout). However, other source PSI-MI XML source records will describe interactions involving only one interactor type (dimers or multimers) or they will contain associative or (a.k.a "n-ary") interaction data (say from immunoprecipitation experiments where the exact interactions between any pair of interactors are unknown. These cases are problematic for the PSI-MITAB format. This README describes exactly how we use the MITAB format to describe these alternate (non-binary) interaction types. | Since this PSI-MITAB format allows for only two interactors to be described on each line, it is best suited for describing binary interaction data (the original experiment, say yeast two hybrid, gives a binary readout). However, other source PSI-MI XML source records will describe interactions involving only one interactor type (dimers or multimers) or they will contain associative or (a.k.a "n-ary") interaction data (say from immunoprecipitation experiments where the exact interactions between any pair of interactors are unknown. These cases are problematic for the PSI-MITAB format. This README describes exactly how we use the MITAB format to describe these alternate (non-binary) interaction types. | ||
− | CHANGE | + | === CHANGE === |
− | + | <pre> | |
− | Each row in the MITAB file represents a **single** interaction | + | Each row in the MITAB file represents a **single** interaction record from one primary data source. |
+ | </pre> | ||
+ | |||
Previously, each line represented a **collection** of interaction records where each member of this collection describes an interaction involving the exact same set of proteins (as defined by their primary sequence and taxon ids). We have moved to representing a single interaction on each line since this allows us to convey additional information about each of the original source records. Users can still "collapse" or find all lines that describe an interaction between the same set of proteins by using the "RIG" (column xxx). Rows with identical rigids (redundant interaction group identifiers) all describe interactions between the same set of proteins. | Previously, each line represented a **collection** of interaction records where each member of this collection describes an interaction involving the exact same set of proteins (as defined by their primary sequence and taxon ids). We have moved to representing a single interaction on each line since this allows us to convey additional information about each of the original source records. Users can still "collapse" or find all lines that describe an interaction between the same set of proteins by using the "RIG" (column xxx). Rows with identical rigids (redundant interaction group identifiers) all describe interactions between the same set of proteins. | ||
− | |||
The natural keys for each interaction record in this group (i.e. the record identifiers from the source database) are listed under interactionIdentifier (column xxx). For example: | The natural keys for each interaction record in this group (i.e. the record identifiers from the source database) are listed under interactionIdentifier (column xxx). For example: | ||
Line 142: | Line 153: | ||
<pre>intact:EBI-761694</pre> | <pre>intact:EBI-761694</pre> | ||
− | CHANGE | + | === CHANGE === |
− | + | ||
− | Our surrogate (primary) key for | + | <pre> |
− | + | Our surrogate (primary) key for a group of redundant interaction records (RIG) is no longer listed in column xxx. | |
+ | Only, the source database record is listed in this column. The source db name and record id (separated by a colon) | ||
+ | are given in this column. The RIG identifier is now listed (by itself) in column xxx. | ||
+ | </pre> | ||
The RIG identifier is a 27 character key that is derived from the ROGIDs of the interactors involved in the interaction record (see columns xxx and xxx). The RIG identifier is listed (by itself) in column xxx for convenience. The ROGID is a SHA-1 digest of the protein interactor's primary amino acid sequence concatenated with the NCBI taxon id (see the paper for details). | The RIG identifier is a 27 character key that is derived from the ROGIDs of the interactors involved in the interaction record (see columns xxx and xxx). The RIG identifier is listed (by itself) in column xxx for convenience. The ROGID is a SHA-1 digest of the protein interactor's primary amino acid sequence concatenated with the NCBI taxon id (see the paper for details). | ||
Line 223: | Line 237: | ||
− | === Column number: 1 | + | |
+ | |||
+ | |||
+ | === Column number: 1 (uidA) === | ||
{| | {| | ||
|Column name: ||uidA | |Column name: ||uidA | ||
|- | |- | ||
− | |Column type: || | + | |Column type: ||Integer |
|- | |- | ||
− | |Description: ||Unique identifier for interactor A | + | |Description: ||Unique identifier for the canonical group to which interactor A belongs. |
|- | |- | ||
− | |Example: ||<pre>irefindex: | + | |Example: ||<pre>irefindex:2345</pre> |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | CHANGE | |
− | + | This column contains an internal, integer key for the canonical group to which interactor A belongs. | |
− | + | ||
− | + | A alphanumeric equivalent of this key (that can be generated by anyone, i.e., universal) appears in column 41. | |
− | + | See the notes for column 41 for more details on how protein identifiers were mapped from the original database record to this key. | |
− | + | ||
− | + | Column 3 lists database names and accessions that belong to this group. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the same taxon). Members of a canonical group may include splice isoform products from the same or related genes. One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in this column. | |
− | + | ||
+ | The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0. | ||
− | + | Internal note: Also referred to as iROG-A. | |
− | |||
− | |||
− | |||
− | |||
=== Column number: 2 (uidB)=== | === Column number: 2 (uidB)=== | ||
Line 267: | Line 272: | ||
|Column name: ||uidB | |Column name: ||uidB | ||
|- | |- | ||
− | |Column type: || | + | |Column type: ||Integer |
|- | |- | ||
− | |Description: ||Unique identifier for interactor B | + | |Description: ||Unique identifier for the canonical group to which interactor B belongs. Also referred to as iROG-B. |
|- | |- | ||
− | |Example: ||<pre>irefindex: | + | |Example: ||<pre>irefindex:456543</pre> |
|} | |} | ||
Line 277: | Line 282: | ||
See notes for column 1. | See notes for column 1. | ||
+ | |||
=== Column number: 3 (altA)=== | === Column number: 3 (altA)=== | ||
Line 291: | Line 297: | ||
'''Notes''' | '''Notes''' | ||
+ | |||
+ | Column 3 lists database names and accessions that belong to the same canonical group. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the same taxon). Members of a canonical group may include splice isoform products from the same or related genes. One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in column 1. | ||
Each pipe-delimited entry is a database_name:accession pair delimited by | Each pipe-delimited entry is a database_name:accession pair delimited by | ||
Line 298: | Line 306: | ||
;uniprotkb | ;uniprotkb | ||
− | :The accessions this protein is known by in UniProt(http://www.uniprot.org/). More information regarding this protein can be retrieved using this accession from UniProt. See the AC line in the flat file. http://au.expasy.org/sprot/userman.html#AC_line | + | :The accessions this protein is known by in UniProt(http://www.uniprot.org/). More information regarding this protein can be retrieved using this accession from UniProt. See the AC line in the flat file. http://au.expasy.org/sprot/userman.html#AC_line. |
;refseq | ;refseq | ||
− | :If a protein accession exists in the RefSeq data base (http://www.ncbi.nlm.nih.gov/RefSeq/) that reference is indicated here. More information about this protein can be obtained from RefSeq using this accession. | + | :If a protein accession exists in the RefSeq data base (http://www.ncbi.nlm.nih.gov/RefSeq/) that reference is indicated here. More information about this protein can be obtained from RefSeq using this accession. |
;entrezgene/locuslink | ;entrezgene/locuslink | ||
:NCBI gene Identifiers for the gene encoding this protein. See ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq column GeneID given protein's accession.version | :NCBI gene Identifiers for the gene encoding this protein. See ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq column GeneID given protein's accession.version | ||
Line 314: | Line 322: | ||
<pre>irefindex:xBr9cTXgzPLNxsaKiYyHcoEm/DM</pre> | <pre>irefindex:xBr9cTXgzPLNxsaKiYyHcoEm/DM</pre> | ||
− | |||
− | |||
=== Column number: 4 (altB)=== | === Column number: 4 (altB)=== | ||
Line 384: | Line 390: | ||
|Column name: ||Method | |Column name: ||Method | ||
|- | |- | ||
− | |Column type: || | + | |Column type: ||string |
|- | |- | ||
|Description: ||Interaction detection method | |Description: ||Interaction detection method | ||
Line 398: | Line 404: | ||
Both the controlled vocabulary term identifier for the method (e.g. MI:0399) and the controlled vocabulary term short label in brackets (e.g. 2h fragment pooling) will appear in this column. See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to look up controlled vocabulary term identifiers. | Both the controlled vocabulary term identifier for the method (e.g. MI:0399) and the controlled vocabulary term short label in brackets (e.g. 2h fragment pooling) will appear in this column. See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to look up controlled vocabulary term identifiers. | ||
− | The interaction detection method is from the | + | The interaction detection method is from the original record. Path for PSI-MI 2.5: |
<pre>entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel/</pre> | <pre>entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel/</pre> | ||
Line 407: | Line 413: | ||
If a controlled vocabulary term identifier was not provided by the source database then an attempt was made to use the supplied short label to find the correct term identifier. If a term identifier could not be found, then <tt>MI:0000</tt> will appear before the shortLabels. | If a controlled vocabulary term identifier was not provided by the source database then an attempt was made to use the supplied short label to find the correct term identifier. If a term identifier could not be found, then <tt>MI:0000</tt> will appear before the shortLabels. | ||
− | |||
<tt>NA</tt> or <tt>-1</tt> may appear in place of a recognised shortLabel. | <tt>NA</tt> or <tt>-1</tt> may appear in place of a recognised shortLabel. | ||
Line 417: | Line 422: | ||
MI:0000(NA) | MI:0000(NA) | ||
</pre> | </pre> | ||
+ | |||
+ | xxx Sabry check above that there is only one method per line and that the normailzed cv term id is used when appropriate and check what is used when no mapping can be made. | ||
=== Column number: 8 (author) === | === Column number: 8 (author) === | ||
Line 436: | Line 443: | ||
CHANGE | CHANGE | ||
− | This column will usually include only one author name reference. | + | This column will usually include only one author name reference. However, some experimental evidences have secondary references which could be included here. |
xxx Sabry Check this | xxx Sabry Check this | ||
− | |||
− | |||
=== Column number: 9 (pmids) === | === Column number: 9 (pmids) === | ||
Line 464: | Line 469: | ||
CHANGE | CHANGE | ||
− | This column will usually include only one pubmed reference. | + | This column will usually include only one pubmed reference that describes where the experimental evidence is found. In some cases, secondary references will be included here. |
xxx Sabry Check this | xxx Sabry Check this | ||
Line 480: | Line 485: | ||
|Description: ||Taxonomy identifier for interactor A | |Description: ||Taxonomy identifier for interactor A | ||
|- | |- | ||
− | |Example: ||<pre>taxid:83333</pre> | + | |Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre> |
+ | |||
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | The NCBI taxonomy identifier listed here is that of the sequence record for the interactor and may be different than what is listed in the interaction record. See the methods section for more details. See the NCBI taxonomy database at http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy . According to MITAB2.5 format, this column should contain a pipe delimited set of databaseName:identifier pairs such as <tt>taxid:12345</tt>. The source database name has been listed as taxid since it is always NCBI's taxonomy database. The value in this column will be <tt>NA</tt> if the interactor is a complex. | + | The NCBI taxonomy identifier listed here is that of the sequence record for the interactor and may be different than what is listed in the interaction record. See the methods section for more details. See the NCBI taxonomy database at http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy . According to MITAB2.5 format, this column should contain a pipe delimited set of databaseName:identifier pairs such as <tt>taxid:12345</tt>. The source database name has been listed as taxid since it is always NCBI's taxonomy database. The value in this column will be <tt>NA</tt> if the interactor is a complex. |
=== Column number: 11 (taxb) === | === Column number: 11 (taxb) === | ||
Line 496: | Line 502: | ||
|Description: ||Taxonomy identifier for interactor B | |Description: ||Taxonomy identifier for interactor B | ||
|- | |- | ||
− | |Example: ||<pre>taxid:83333</pre> | + | |Example: ||<pre>taxid:83333(Escherichia coli K-12)</pre> |
|} | |} | ||
Line 508: | Line 514: | ||
|Column name: ||interactionType | |Column name: ||interactionType | ||
|- | |- | ||
− | |Column type: || | + | |Column type: ||string |
|- | |- | ||
|Description: ||Interaction Type from controlled vocabulary or short label | |Description: ||Interaction Type from controlled vocabulary or short label | ||
Line 516: | Line 522: | ||
'''Notes''' | '''Notes''' | ||
+ | CHANGE | ||
+ | |||
+ | Only one interaction type will be present in each line of the file (previously, multiple types were listed). | ||
− | + | The interaction type is taken from the PSI-MI controlled vocabulary and represented as... | |
<pre>database:identifier(interaction type)</pre> | <pre>database:identifier(interaction type)</pre> | ||
Line 559: | Line 568: | ||
Only one source database will be listed in each row. | Only one source database will be listed in each row. | ||
− | === Column number: 14 ( | + | === Column number: 14 (interactionIdentifier) === |
{| | {| | ||
− | |Column name: || | + | |Column name: ||interactionIdentifier |
|- | |- | ||
|Column type: ||String | |Column type: ||String | ||
Line 585: | Line 594: | ||
http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI where possible | http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI where possible | ||
− | + | If an interaction record identifier is not provided by the source database, this entry will appear as | |
− | |||
− | |||
− | |||
− | + | database-name:- | |
− | xxx Sabry | + | xxx Sabry check this |
=== Column number: 15 (confidence) === | === Column number: 15 (confidence) === | ||
Line 643: | Line 649: | ||
COLUMNS PAST THIS POINT (16 - 31) ARE PART OF THE NEW PSI-MITAB 2.6 FORMAT | COLUMNS PAST THIS POINT (16 - 31) ARE PART OF THE NEW PSI-MITAB 2.6 FORMAT | ||
+ | |||
+ | |||
=== Column number: 16 (expansion) === | === Column number: 16 (expansion) === | ||
Line 657: | Line 665: | ||
'''Notes''' | '''Notes''' | ||
− | For iRefIndex, this column will always contain either bipartite or | + | For iRefIndex, this column will always contain either "bipartite" or "none". |
− | Other databases may use either "spoke" or "matrix" in this column. | + | Other databases may use either "spoke" or "matrix" or "none" in this column. |
See | See | ||
− | http://irefindex.uio.no/wiki/ | + | [http://irefindex.uio.no/wiki/README_iRefIndex_expanded_MITAB_proposal#Understanding_the_iRefIndex_MITAB_format Understanding_the_iRefIndex_MITAB_format] at the top of this file for an explanation. |
− | |||
− | |||
− | |||
− | |||
=== Column number: 17 (biological role A) === | === Column number: 17 (biological role A) === | ||
{| | {| | ||
− | |Column name: ||biological role A | + | |Column name: ||biological_role_A (change by Sabry from biological role A) |
|- | |- | ||
|Column type: ||String | |Column type: ||String | ||
Line 677: | Line 681: | ||
|Description: ||Biological role of interactor A | |Description: ||Biological role of interactor A | ||
|- | |- | ||
− | |Example: ||<pre> | + | |Example: ||<pre>MI:0501(enzyme)</pre> |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | + | When provided by the source database, this includes single entry such as "MI:0501(enzyme)", MI:0502(enzyme target), MI:0580(electron acceptor), or MI:0499(unspecified role). | |
− | MI:0499(unspecified role). | ||
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to browse possible values for biological role. | See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to browse possible values for biological role. | ||
+ | |||
+ | xxx Sabry check this format | ||
+ | <span style="color:#ff00C1"> <b>Sabry: For complexes and when no role specified this will be <pre>unspecified role</pre></b></span> | ||
=== Column number: 18 (biological role B) === | === Column number: 18 (biological role B) === | ||
{| | {| | ||
− | |Column name: ||biological role B | + | |Column name: || biological_role_B (changed by Sabry from biological role B) |
|- | |- | ||
|Column type: ||String | |Column type: ||String | ||
Line 694: | Line 700: | ||
|Description: ||Biological role of interactor B | |Description: ||Biological role of interactor B | ||
|- | |- | ||
− | |Example: ||<pre> | + | |Example: ||<pre>MI:0501(enzyme)</pre> |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
See notes for column 17. | See notes for column 17. | ||
− | |||
− | |||
− | |||
=== Column number: 19 (experimental role A) === | === Column number: 19 (experimental role A) === | ||
{| | {| | ||
− | |Column name: ||experimental role A | + | |Column name: ||experimental_role_A (changed by Sabry from experimental role A) |
|- | |- | ||
|Column type: ||String | |Column type: ||String | ||
Line 714: | Line 717: | ||
|- | |- | ||
|Example: ||<pre>MI:0498(prey)</pre> | |Example: ||<pre>MI:0498(prey)</pre> | ||
− | |||
− | |||
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | This column indicates the experimental role (if any) that was played by interactor A (column 1). | + | |
+ | This column indicates the experimental role (if any was provided by the source database) that was played by interactor A (column 1). | ||
+ | |||
+ | See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to see definitions of bait and prey. | ||
+ | as well as browse other possible values of experimental role that may appear in this column for other databases. | ||
+ | |||
+ | xxx Sabry check format | ||
+ | |||
+ | <span style="color:#ff00C1"> <b>Sabry: For complexes and when no role specified this will be <pre>MI:0499(unspecified role)</pre></b></span> | ||
=== Column number: 20 (experimental role B) === | === Column number: 20 (experimental role B) === | ||
{| | {| | ||
− | |Column name: ||experimental role B | + | |Column name: ||experimental_role_B (changed by Sabry from experimental role B) |
|- | |- | ||
|Column type: ||String | |Column type: ||String | ||
Line 732: | Line 741: | ||
|- | |- | ||
|Example: ||<pre>MI:0498(prey)</pre> | |Example: ||<pre>MI:0498(prey)</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This column indicates the experimental role (if any) that was played by interactor B (column 2). | ||
+ | |||
+ | See notes above for column 19. | ||
+ | |||
+ | === Column number: 21 (interactor type A) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||interactor_type_A (change by sabry from interactor type A) | ||
+ | |- | ||
+ | |Column type: ||string | ||
+ | |- | ||
+ | |Description: ||describes the type of molecule that A is | ||
+ | |- | ||
+ | |Example: ||<pre>MI:0326(protein)</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | For iRefIndex, this will always be one of... | ||
+ | |||
+ | <pre> | ||
+ | MI:0326(protein) | ||
+ | MI:0315(protein complex) | ||
+ | </pre> | ||
+ | |||
+ | xxx Sabry check why - sometimes appear here | ||
+ | |||
+ | === Column number: 22 (interactor type B) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||interactor_type_B (change by sabry from interactor type B) | ||
+ | |- | ||
+ | |- | ||
+ | |Column type: ||string | ||
+ | |- | ||
+ | |Description: ||describes the type of molecule that B is | ||
+ | |- | ||
+ | |Example: ||<pre>MI:0326(protein)</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | See column 21. | ||
+ | |||
+ | === Column number: 23 (xrefs A) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||xrefs_A (changed from xrefs A by Sabry) | ||
+ | |- | ||
+ | |Column type: ||a|b: pipe-delimited set of strings | ||
+ | |- | ||
+ | |Description: ||xrefs for molecule A | ||
+ | |- | ||
+ | |Example: || omim:152430(longevity)|go:"GO:0016233"(telomere capping) | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This is not used by iRefIndex. A dash ( - ) will always appear in this column. | ||
+ | |||
+ | This column may be used to list cross-references to annotation information for molecule A. | ||
+ | For example, Gene Ontology identifiers or OMIM identifiers. | ||
+ | |||
+ | === Column number: 24 (xrefs B) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||xrefs_B (changed from xrefs B by Sabry) | ||
+ | |- | ||
+ | |Column type: ||a|b: pipe-delimited set of strings | ||
+ | |- | ||
+ | |Description: ||xrefs for molecule A | ||
+ | |- | ||
+ | |Example: || - | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This is not used by iRefIndex. A dash ( - ) will always appear in this column. | ||
+ | |||
+ | See notes to column 23. | ||
+ | |||
+ | === Column number: 25 (xrefs Interaction) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||xrefs_Interaction (changed from xrefs Interaction by Sabry) | ||
+ | |- | ||
+ | |Column type: ||a|b: pipe-delimited set of strings | ||
+ | |- | ||
+ | |Description: ||xrefs for the interaction | ||
+ | |- | ||
+ | |Example: || go:"GO:0048786"(presynaptic active zone) | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This is not used by iRefIndex. A dash ( - ) will always appear in this column. | ||
+ | |||
+ | This column may be used to list cross-references to annotation information for the interaction. For example, Gene Ontology identifiers or OMIM identifiers. | ||
+ | |||
+ | === Column number: 26 (Annotations A) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Annotations_A (changed from Annotations A by Sabry) | ||
+ | |- | ||
+ | |Column type: || a|b: pipe-delimited set of strings | ||
+ | |- | ||
+ | |Description: ||Annotations for molecule A | ||
+ | |- | ||
+ | |Example: || This protein has an apparent MW of 25 kDa|This protein binds 7 zinc molecules | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This is not used by iRefIndex. A dash ( - ) will always appear in this column. | ||
+ | |||
+ | This column may be used to list free-text annotation information for the interaction. | ||
+ | |||
+ | Some databases may use dataset:* or data-processing:* (where * is non-controlled free-text) in this column. | ||
+ | |||
+ | === Column number: 27 (Annotations B) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Annotations_B (changed from Annotations B by Sabry) | ||
+ | |- | ||
+ | |- | ||
+ | |Column type: ||string | ||
+ | |- | ||
+ | |Description: ||Annotations for molecule B | ||
+ | |- | ||
+ | |Example: || - | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This is not used by iRefIndex. A dash ( - ) will always appear in this column. | ||
+ | |||
+ | See notes to column 26. | ||
+ | |||
+ | === Column number: 28 (Annotations Interaction) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Annotations Interaction | ||
+ | |- | ||
+ | |Column type: || a|b: pipe-delimited set of strings | ||
+ | |- | ||
+ | |Description: ||Annotations for interaction | ||
+ | |- | ||
+ | |Example: || figure-legend:F1A|prediction score:432|comment:prediction based on phage display consensus|author-confidence:8|comment:AD-ORFeome library used in the experiment. |Interaction of the NON_CORE set.|number of hits:1 | ||
+ | |||
+ | pbs signal:-8.966|pbs category:B|figure-legend:NFA | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This is not used by iRefIndex. A dash ( - ) will always appear in this column. | ||
+ | |||
+ | This column may be used to list free-text annotation information for the interaction. | ||
+ | The keys used before the : (like "comment") are database specific and not controlled. | ||
+ | |||
+ | Some databases may use dataset:* or data-processing:* (where * is non-controlled free-text) in this column. | ||
+ | |||
+ | === Column number: 29 (Host organism taxid) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Host_organism_taxid (Changed from Host organism taxid by Sabry) | ||
+ | |- | ||
+ | |Column type: ||string | ||
+ | |- | ||
+ | |Description: ||Host organism taxid where the interaction was experimentally demonstrated | ||
+ | |- | ||
+ | |Example: || taxid:10090(Mus musculus) | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | |||
+ | |||
+ | Host organism taxid where the interaction was experimentally demonstrated. This may differ from the taxid of the interactors. Other possible entries are: | ||
+ | |||
+ | taxid:-1(in vitro) | ||
+ | |||
+ | taxid:-4(in vivo) | ||
+ | |||
+ | A dash ( - ) will be used when no information about the Host organism taxid is vailable | ||
+ | |||
+ | taxid:32644(unidentified) will be used when the source specify the Host organism taxid as 32644 | ||
+ | |||
+ | xxx Sabry check format | ||
+ | |||
+ | === Column number: 30 (parameters Interaction) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||parameters_Interaction (changed from parameters Interaction by Sabry) | ||
+ | |- | ||
+ | |Column type: ||string | ||
+ | |- | ||
+ | |Description: ||Parameters for the interaction | ||
+ | |- | ||
+ | |Example: || - | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This is not used by iRefIndex. A dash ( - ) will always appear in this column. | ||
+ | |||
+ | Internal note : use of this column is not well-defined or characterized. | ||
+ | |||
+ | === Column number: 31 (Creation date) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Creation_date (Changed from Creation date by sabry ) | ||
+ | |- | ||
+ | |Column type: || string (yyyy/mm/dd) | ||
+ | |- | ||
+ | |Description: ||When was the entry created. | ||
+ | |- | ||
+ | |Example: || - 2010/05/06 | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This will be the release date of iRefIndex for all entries in this file. | ||
+ | |||
+ | This date will not match the date for the corresponding record in the source database. | ||
+ | |||
+ | === Column number: 32 (Update date) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Update_date (Changed from Update date by Sabry) | ||
+ | |- | ||
+ | |Column type: ||string (yyyy/mm/dd) | ||
+ | |- | ||
+ | |Description: ||When was this record last updated? | ||
+ | |- | ||
+ | |Example: || 2010/05/06 | ||
+ | |} | ||
+ | |||
+ | |||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This will be the release date of iRefIndex for all entries in this file. | ||
+ | |||
+ | This date will not match the date for the corresponding record in the source database. | ||
+ | |||
+ | === Column number: 33 (Checksum A) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Checksum_A (Changed from Checksum A by Sabry) | ||
+ | |- | ||
+ | |Column type: ||String | ||
+ | |- | ||
+ | |Description: ||Calculation of the checksum is as described in PMID 18823568. | ||
+ | |- | ||
+ | |Example: ||<pre>rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This only applies if interactor A is of type protein. | ||
+ | This value is a ROGID and is the same as column 39. See notes for column 39. | ||
+ | This ROGID is calculated by iRefIndex. The source database should generate an equivalent ROGID value for one of the interactors in the same source record unless the underlying sequence has been updated during the iRefIndex build process. | ||
+ | |||
+ | === Column number: 34 (Checksum B) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Checksum B | ||
+ | |- | ||
+ | |Column type: ||String | ||
+ | |- | ||
+ | |Description: ||Calculation of the checksum is as described in PMID 18823568. | ||
+ | |- | ||
+ | |Example: ||<pre>rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This only applies if interactor B is of type protein. | ||
+ | This value is a ROGID and is the same as column 40. See notes for column 40. | ||
+ | This ROGID is calculated by iRefIndex. The source database should generate an equivalent ROGID value for one of the interactors in the same source record unless the underlying sequence has been updated during the iRefIndex build process. | ||
+ | |||
+ | |||
+ | === Column number: 35 (Checksum Interaction) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Checksum_Interaction (Changed from Checksum Interaction by Sabry) | ||
+ | |- | ||
+ | |Column type: ||String | ||
+ | |- | ||
+ | |Description: ||Calculation of the checksum is as described in PMID 18823568. | ||
+ | |- | ||
+ | |Example: ||<pre>rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | This only applies if all interactors in the interaction are of type protein. | ||
+ | This value is a RIGID and is the same as column 49. See notes for column 49. | ||
+ | This RIGID is calculated by iRefIndex. The source database should generate an equivalent RIGID value for the same source record unless one of the participating protein interactor sequences has been updated during the iRefIndex build process. | ||
+ | |||
+ | === Column number: 36 (Negative) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Negative | ||
+ | |- | ||
+ | |Column type: || Boolean (true or false) | ||
+ | |- | ||
+ | |Description: ||Does the interaction record provide evidence that some interaction does NOT occur. | ||
+ | |- | ||
+ | |Example: ||<pre>false</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This value will be false for all lines in this file since iRefIndex does not include "negative" interactions from any of the source databases. | ||
+ | |||
+ | |||
+ | |||
+ | COLUMNS PAST THIS POINT (37 -) ARE NOT DEFINED BY THE PSI-MITAB2.6 STANDARD. | ||
+ | THESE COLUMNS ARE SPECIFIC TO THIS IREFINDEX RELEASE AND MAY CHANGE FROM ONE RELEASE TO ANOTHER | ||
+ | |||
+ | === Column number: 37 (OriginalReferenceA) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||OriginalReferenceA | ||
+ | |- | ||
+ | |Column type: ||database name:accession | ||
+ | |- | ||
+ | |Description: ||Database name and reference used in the original interaction record to describe interactor A | ||
|- | |- | ||
− | |Example: ||<pre> | + | |Example: ||<pre>uniprotkb:P23367</pre> |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | This column | + | This is the protein reference that was found in the original interaction record to describe interactor A. It is a colon-delimited pair of database name and accession. It may be either the primary or secondary reference for the protein provided by the source database. |
+ | |||
+ | <span style="color:#ff00C1"> <b>Sabry: For complexes this will be ROGID of complex</b></span> | ||
+ | |||
+ | === Column number: 38 (OriginalReferenceB) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||OriginalReferenceB | ||
+ | |- | ||
+ | |Column type: ||database name:accession | ||
+ | |- | ||
+ | |Description: ||Database name and reference used in the original interaction record to describe interactor B | ||
+ | |- | ||
+ | |Example: ||<pre>uniprotkb:P23367</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | See notes for column 33. | ||
+ | |||
+ | |||
+ | === Column number: 39 (Before-C13N-ROGID-A) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Final_ROGID_A (Changed from Final-ROGID-A by sabry) | ||
+ | |- | ||
+ | |Column type: ||String | ||
+ | |- | ||
+ | |Description: ||Unique identifier for interactor A. Before Canonicalization (C13N). | ||
+ | |- | ||
+ | |Example: ||<pre>rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | CHANGE | ||
+ | |||
+ | This column contains a universal key for the interactor. It corresponds to the ROGID (redundant object group identifier) described in the original iRefIndex paper BEFORE canonicalization has been performed. PMID 18823568. | ||
+ | |||
+ | Protein references from the original interaction record (and a description of how they were mapped to the final form and then to the canonical form) can be found in the corresponding iRefWeb record. See column xxx and search for this interaction record at http://wodaklab.org/iRefWeb/. | ||
+ | |||
+ | === Column number: 40 (Before-C13N-ROGID-B) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||Final_ROGID_B (Changed from Final-ROGID-B by sabry) | ||
+ | |- | ||
+ | |Column type: ||String | ||
+ | |- | ||
+ | |Description: ||Unique identifier for interactor B. Before Canonicalization. | ||
+ | |- | ||
+ | |Example: ||<pre>rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | See notes for column 35. | ||
+ | |||
+ | === Column number: 41 (After-C13N-ROGID-A) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||After_C13N_ROGID_A (changed from After-C13N-ROGID-A (Canonical ROGID A) by Sabry) | ||
+ | |- | ||
+ | |Column type: ||String | ||
+ | |- | ||
+ | |Description: ||Unique identifier for the canonical group to which interactor A belongs. Column 1 is an integer equivalent to this identifier. | ||
+ | |- | ||
+ | |Example: ||<pre>crogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | CHANGE | ||
+ | |||
+ | This column contains a universal key for the canonical group to which interactor A belongs. | ||
+ | |||
+ | Column 3 lists database names and accessions that belong to this group. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the same taxon). Members of a canonical group may include splice isoform products from the same or related genes. One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in this column. | ||
+ | |||
+ | See http://irefindex.uio.no/wiki/Canonicalization for a description of canonicalization. Protein references from the original interaction record (and a description of how they were mapped to the canonical form) can be found in columns 33 - 38 as well as in the corresponding iRefWeb record. See column xxx and search for this interaction record at http://wodaklab.org/iRefWeb/. | ||
+ | |||
+ | This universal key listed here is the ROGID (redundant object group identifier) described in the original iRefIndex paper. PMID 18823568. However, an additional round of processing called canonicalization (C13N) has been performed before choosing a protein to represent this interactor. Therefore this identifier may differ from the value in column 35 (before canonicalization). | ||
+ | |||
+ | An internal, integer equivalent of this universal key appears in column 1 of this table. | ||
+ | |||
+ | If this line (entry) describes a binary interaction between two proteins, | ||
+ | then the protein with the 'ascibetically' (ASCII value sort order) larger ROGID is listed as interactor A (see | ||
+ | After-C13N-ROGID-A (column 41) and uidA (column 1)). | ||
+ | |||
+ | If this entry describes the membership of a protein in a complex, then the ROGID of the complex is always listed first as interactor A and the member protein's ROGID is listed second (see After-C13N-ROGID-B (column 42) and uidB (column 2)). | ||
+ | |||
+ | If this entry describes a an interaction involving only one protein type, then the ROGID of that protein is listed for both interactor A and B. | ||
+ | |||
+ | The ROGID (redundant object group identifier) for proteins, consists of the SEGUID for the protein concatenated with the taxon identifier for the protein. For complex nodes, the ROGID is calculated as the SHA-1 digest of the ROGID's of all the protein participants (after first ordering them by ASCII-based lexicographical sorting in ascending order and concatenating them) See the iRefIndex paper for details. The SEGUID is always 27 characters long. So the ROGID will be composed of 27 characters concatenated with a taxon identifier for proteins. | ||
+ | |||
+ | === Column number: 42 (After-C13N-ROGID-B) === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||After_C13N_ROGID_B (changed from After-C13N-ROGID-B (Canonical ROGID B) by Sabry) | ||
+ | |- | ||
+ | |Column type: ||String | ||
+ | |- | ||
+ | |Description: ||Unique identifier for the canonical group to which interactor B belongs. Column 2 is an integer equivalent to this identifier. | ||
+ | |- | ||
+ | |Example: ||<pre>crogid:AhmYiMtz8lR12Gixt91txbAd3JY83333</pre> | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | See notes for column 41. | ||
− | === Column number: | + | === Column number: 43 (entrezGeneIds-A) === |
{| | {| | ||
− | |Column name: || | + | |Column name: ||entrezGeneIds_A (Changed from entrezGeneIds-A by Sabry) |
|- | |- | ||
|Column type: ||pipe delimited list of integers or a string | |Column type: ||pipe delimited list of integers or a string | ||
Line 754: | Line 1,202: | ||
'''Notes''' | '''Notes''' | ||
− | If | + | CHANGE |
− | ROGID will appear in this column (see notes to column | + | |
+ | xxx discuss with Sabry | ||
+ | |||
+ | This column contains a pipe-delimited list of integers that are Entrez GeneIds. This list makes up a related gene group (RGG) that was used in the canonicalization procedure. See [[http://bioinformatics.uio.no/wiki/Canonicalization Canonicalization]] for more details. Briefly, EntrezGene identifiers were grouped together into related gene groups (RGGs) if they shared at least one identical protein product. | ||
+ | |||
+ | If you are looking for the specific Entrez Gene identifier for molecule A in column 1, then refer to column 3. | ||
+ | |||
+ | If no EntrezGene identifier is available for the interactor, then a | ||
+ | ROGID will appear in this column (see notes to column 41). | ||
If the interactor is a node representing a complex, then the ROGID for | If the interactor is a node representing a complex, then the ROGID for | ||
the complex will appear here. | the complex will appear here. | ||
− | === Column number: | + | === Column number: 44 (entrezGeneIds-B) === |
{| | {| | ||
− | |Column name: || | + | |Column name: ||entrezGeneIds_B (Changed from entrezGeneIds-B by Sabry) |
|- | |- | ||
− | |Column type: || | + | |Column type: ||pipe delimited list of integers or a string |
|- | |- | ||
− | |Description: ||EntrezGene identifier for interactor B | + | |Description: ||EntrezGene identifier(possibly a pipe-delimited list) for interactor B |
|- | |- | ||
|Example: ||<pre>948691</pre> | |Example: ||<pre>948691</pre> | ||
Line 774: | Line 1,230: | ||
'''Notes''' | '''Notes''' | ||
− | See notes for column | + | See notes for column 43. |
− | === Column number: | + | === Column number: 45 (MappingScoreA) === |
{| | {| | ||
− | |Column name: || | + | |Column name: ||MappingScoreA |
|- | |- | ||
− | |Column type: || | + | |Column type: ||String |
|- | |- | ||
− | |Description: || | + | |Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (columns 32) to the final protein reference (columns 34). |
|- | |- | ||
− | |Example: ||<pre> | + | |Example: ||<pre>PTUO+</pre> |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | + | CHANGE | |
− | + | This column contains a description of mapping operations as a condensed string of letters. See the original iRefIndex paper. PMID 18823568. Protein references from the original interaction record (and a description of how they were mapped to the canonical form) can also be found in the corresponding iRefWeb record. See column xxx and search for this interaction record at http://wodaklab.org/iRefWeb/. | |
− | + | ||
− | + | <span style="color:#ff00C1"> <b>Sabry: For complexes this will be (includig 'NA' might confuse as it looks like a score)<pre> - </pre></b></span> | |
− | </pre> | ||
− | === Column number: | + | === Column number: 46 (MappingScoreB) === |
{| | {| | ||
− | |Column name: || | + | |Column name: ||MappingScoreB |
|- | |- | ||
− | |Column type: || | + | |Column type: ||String |
|- | |- | ||
− | |Description: || | + | |Description: ||String describing operations performed by iRefIndex procedure during mapping from original protein reference (columns 33) to the final protein reference (columns 35). |
|- | |- | ||
− | |Example: ||<pre> | + | |Example: ||<pre>SU</pre> |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | See column | + | See notes for column 45. |
+ | <span style="color:#ff00C1"> <b> ---------------------------------------------------------------------------- checked</b></span> | ||
− | === Column number: | + | === Column number: 47 (C13N-rigid) === |
{| | {| | ||
− | |Column name: ||rigid | + | |Column name: ||C13N_rigid (Chnaged from C13N-rigid (Canonical RIGID) by Sabry) |
|- | |- | ||
|Column type: ||string | |Column type: ||string | ||
Line 822: | Line 1,278: | ||
|Description: ||Redundant interaction group identifier | |Description: ||Redundant interaction group identifier | ||
|- | |- | ||
− | |Example: ||<pre>3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre> | + | |Example: ||<pre>crigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre> |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | The RIGID (for redundant interaction group identifier) consists of the | + | The Canonical RIGID (for redundant interaction group identifier) consists of the |
− | ROG identifiers for each of the protein participants (see notes above) | + | canonical (C13N) ROG identifiers for each of the protein participants (see notes above) |
ordered by ASCII-based lexicographic sorting in ascending order, | ordered by ASCII-based lexicographic sorting in ascending order, | ||
concatenated and then digested with the SHA-1 algorithm. See the iRefIndex | concatenated and then digested with the SHA-1 algorithm. See the iRefIndex | ||
Line 835: | Line 1,291: | ||
exact same primary sequences. | exact same primary sequences. | ||
− | === Column number: | + | === Column number: 48 (C13N-rig) === |
{| | {| | ||
− | |Column name: || | + | |Column name: ||C13N_rig (Chaged from rig by Sabry) |
|- | |- | ||
− | |Column type: || | + | |Column type: ||string |
|- | |- | ||
− | |Description: || | + | |Description: ||Redundant interaction group |
|- | |- | ||
− | |Example: ||<pre> | + | |Example: ||<pre>irefindex:12345</pre> |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | + | CHANGE | |
− | + | xxx discuss with Sabry | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | This is an internal, integer equivalent of the C13N-RIGID. See column 47. | |
− | |||
− | |||
− | + | This integer may be used to query the iRefWeb interface for the interaction record. For example | |
+ | http://wodaklab.org/iRefWeb/interaction/show/13653 | ||
+ | where 13653 is the C13N-rig. | ||
+ | Starting with release 6.0, this C13N-rig is stable from one release of iRefIndex to another. | ||
− | + | === Column number: 49 (Before-C13N-rigid) === | |
− | |||
− | === Column number: | ||
{| | {| | ||
− | |Column name: || | + | |Column name: ||rigid |
|- | |- | ||
− | |Column type: || | + | |Column type: ||string |
|- | |- | ||
− | |Description: || | + | |Description: ||Redundant interaction group identifier - before canonicalization (C13N). |
|- | |- | ||
− | |Example: ||<pre> | + | |Example: ||<pre>rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA</pre> |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | + | The RIGID (for redundant interaction group identifier) consists of the | |
− | + | ROG identifiers for each of the protein participants (see notes above) | |
− | + | ordered by ASCII-based lexicographic sorting in ascending order, | |
+ | concatenated and then digested with the SHA-1 algorithm. See the iRefIndex | ||
+ | paper for details. This identifier points to a set of redundant | ||
+ | protein-protein interactions that involve the same set of proteins with the | ||
+ | exact same primary sequences. | ||
+ | The rigid is constructed from ROGs **before** canonicalization. This identifier can be easily and universally constructed by data providers to facilitate data integration and exchange. | ||
− | === Column number: | + | === Column number: 50 (imex-id) === |
{| | {| | ||
− | |Column name: || | + | |Column name: ||imex_id (changed from imex-id by Sabry) |
+ | |- | ||
+ | |Column type: || string | ||
|- | |- | ||
− | | | + | |Description: ||IMEx identifier if available |
|- | |- | ||
− | | | + | |Example: ||<pre>imex:IM-12202-3</pre> |
|- | |- | ||
− | |Example: ||<pre> | + | |Example: ||<pre>When no information available a dash will be used ( - )</pre> |
|} | |} | ||
− | === Column number: | + | '''Notes''' |
+ | |||
+ | === Column number: 51 (edgetype) === | ||
{| | {| | ||
− | |Column name: || | + | |Column name: ||edgetype |
|- | |- | ||
− | |Column type: || | + | |Column type: ||Character |
|- | |- | ||
− | |Description: || | + | |Description: ||Does the edge represent a binary interaction (X), member of complex (C) data, or a multimer (Y)? |
|- | |- | ||
− | |Example: ||<pre> | + | |Example: ||<pre>X</pre> |
|} | |} | ||
− | === Column number: | + | '''Notes''' |
+ | |||
+ | CHANGE xxx discuss with Sabry | ||
+ | |||
+ | Edges can be labelled as either X, C or Y: | ||
+ | |||
+ | ;X | ||
+ | :a binary interaction with two protein participants | ||
+ | |||
+ | ;C | ||
+ | :denotes that this edge is a binary expansion of interaction record that had 3 or more interactors (so-called "complex" or "n-ary" data). The expansion type is described in column 16 (expansion). In the case of iRefIndex, the expansion is always "bipartite" meaning that Interactor A (column 1) of this row represents the collection of interactors and Interactor B (column 2) represents a protein that is a member of this group. | ||
+ | See [http://irefindex.uio.no/wiki/README_iRefIndex_expanded_MITAB_proposal#Understanding_the_iRefIndex_MITAB_format Understanding_the_iRefIndex_MITAB_format] for further explanation. | ||
+ | |||
+ | ;Y | ||
+ | :for dimers and polymers. In case of dimers and polymers when the number of subunits is not described in the original interaction record, the edge is labelled by a Y. Interactor A (column 1) will be identical to the Interactor B (column 2). The graphical representation of this will appear as a single node connected to itself (loop). The actual number of self-interacting subunits may be 2 (dimer) or more (say 5 for a pentamer). Refer to the original interaction record for more details and see column "numParticipants". | ||
+ | |||
+ | |||
+ | === Column number: 52 (numParticipants) === | ||
{| | {| | ||
− | |Column name: || | + | |Column name: ||numParticipants |
|- | |- | ||
− | |Column type: || | + | |Column type: ||Integer |
|- | |- | ||
− | |Description: || | + | |Description: ||Number of participants in the interaction |
|- | |- | ||
− | |Example: ||<pre> | + | |Example: ||<pre>2</pre> |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | * | + | CHANGE xxx discuss with Sabry |
− | * | + | |
+ | * For edges labelled <tt>X</tt> (see column 21) this value will be two. | ||
+ | * For edges labelled <tt>C</tt>, this value will be equivalent to the number of protein interactors in the original n-ary interaction record. | ||
+ | * For interactions labelled <tt>Y</tt>, this value will either be the number of self-interacting subunits (if present in the original interaction record) or 1 where the exact number of subunits is unknown or unspecified. | ||
− | |||
− | |||
− | |||
− | === Column number: | + | === Column number: 53 (interaction_name)=== |
{| | {| | ||
− | |Column name: || | + | |Column name: ||interaction_name |
|- | |- | ||
|Column type: ||String | |Column type: ||String | ||
|- | |- | ||
− | |Description: || | + | |Description: ||The name of the interaction, |
|- | |- | ||
− | |Example: ||<pre> | + | |Example: ||<pre>MTA1-HDAC core complex</pre> |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | + | CHANGE xxx discuss with Sabry | |
− | + | * A name was selected from the original interaction data provided when available. | |
+ | * When no interaction name available a name was constructed using the names of the interactors (e.g.Interaction involving HCK_HUMAN and RASA1_HUMAN). | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
[[Category:iRefIndex]] | [[Category:iRefIndex]] |
Latest revision as of 13:44, 27 October 2010
Note |
This is an expansion of the MITAB format that was proposed for use in iRefIndex 7.0 and subsequent releases; it does not correspond to any released product and is considered obsolete.
This proposal is based on the experimental form of the iRefIndex MITAB format found at http://irefindex.uio.no/wiki/README_iRefIndex_experiment_MITAB_6.0 Look for xxx for things that need to be changed to create a version specific form of this README. Look for CHANGE for items that differ significantly from the current MITAB format. This format is based on recent changes agreed upon by the PSI-MI working group in Turku, Finland. |
See also:
- http://code.google.com/p/psimi/issues/detail?id=2
- http://code.google.com/p/psimi/wiki/PsimiTabFormat
Last edited: xxx
Applies to iRefIndex release: xxx
Release date: xxx
Download location: xxx
Authors: Ian Donaldson, Sabry Razick, Paul Boddie
Database: iRefIndex (http://irefindex.uio.no)
Organization: Biotechnology Centre of Oslo, University of Oslo (http://www.biotek.uio.no/)
Note: this distribution includes only those data that may be freely distributed under the copyright license of the source database. See Description below. xxx
Contents
- 1 Description
- 2 Directory contents
- 3 Changes from last version
- 4 Known Issues
- 5 Understanding the iRefIndex MITAB format
- 6 License
- 7 Citation
- 8 Disclaimer
- 9 Description of PSI-MITAB2.5 file
- 9.1 Column number: 1 (uidA)
- 9.2 Column number: 2 (uidB)
- 9.3 Column number: 3 (altA)
- 9.4 Column number: 4 (altB)
- 9.5 Column number: 5 (aliasA)
- 9.6 Column number: 6 (aliasB)
- 9.7 Column number: 7 (Method)
- 9.8 Column number: 8 (author)
- 9.9 Column number: 9 (pmids)
- 9.10 Column number: 10 (taxa)
- 9.11 Column number: 11 (taxb)
- 9.12 Column number: 12 (interactionType)
- 9.13 Column number: 13 (sourcedb)
- 9.14 Column number: 14 (interactionIdentifier)
- 9.15 Column number: 15 (confidence)
- 9.16 Column number: 16 (expansion)
- 9.17 Column number: 17 (biological role A)
- 9.18 Column number: 18 (biological role B)
- 9.19 Column number: 19 (experimental role A)
- 9.20 Column number: 20 (experimental role B)
- 9.21 Column number: 21 (interactor type A)
- 9.22 Column number: 22 (interactor type B)
- 9.23 Column number: 23 (xrefs A)
- 9.24 Column number: 24 (xrefs B)
- 9.25 Column number: 25 (xrefs Interaction)
- 9.26 Column number: 26 (Annotations A)
- 9.27 Column number: 27 (Annotations B)
- 9.28 Column number: 28 (Annotations Interaction)
- 9.29 Column number: 29 (Host organism taxid)
- 9.30 Column number: 30 (parameters Interaction)
- 9.31 Column number: 31 (Creation date)
- 9.32 Column number: 32 (Update date)
- 9.33 Column number: 33 (Checksum A)
- 9.34 Column number: 34 (Checksum B)
- 9.35 Column number: 35 (Checksum Interaction)
- 9.36 Column number: 36 (Negative)
- 9.37 Column number: 37 (OriginalReferenceA)
- 9.38 Column number: 38 (OriginalReferenceB)
- 9.39 Column number: 39 (Before-C13N-ROGID-A)
- 9.40 Column number: 40 (Before-C13N-ROGID-B)
- 9.41 Column number: 41 (After-C13N-ROGID-A)
- 9.42 Column number: 42 (After-C13N-ROGID-B)
- 9.43 Column number: 43 (entrezGeneIds-A)
- 9.44 Column number: 44 (entrezGeneIds-B)
- 9.45 Column number: 45 (MappingScoreA)
- 9.46 Column number: 46 (MappingScoreB)
- 9.47 Column number: 47 (C13N-rigid)
- 9.48 Column number: 48 (C13N-rig)
- 9.49 Column number: 49 (Before-C13N-rigid)
- 9.50 Column number: 50 (imex-id)
- 9.51 Column number: 51 (edgetype)
- 9.52 Column number: 52 (numParticipants)
- 9.53 Column number: 53 (interaction_name)
Description
This file describes the contents of the
xxx
directory and the format of the tab-delimited text files contained within. Each index file follows the PSI-MITAB2.5 format with additional columns for annotating edges and nodes. Each line in PSI-MITAB2.5 format represents a single interaction record from an experiment . Assignment of source interaction records to these redundant groups is described at http://irefindex.uio.no. The PSI-MI2.5 format plus additional columns are described at the end of the file.
Details on the build process are available from the publication PMID 18823568.
There are two sets of data free and proprietary. Free version includes only those data that may be freely distributed under the copyright license of the source database. This includes data from BIND, BioGRID, IntAct, MINT, MPPI and OPHID.
iRefIndex also integrates data from CORUM, DIP, HPRD and MPact. These data are not distributed publicly. These data may be made available to academic users under a collaborative agreement.
Contact ian.donaldson at biotek.uio.no if you are interested in using the iRefIndex database or would like your database included in the public release of the index.
Sources | http://irefindex.uio.no/wiki/Sources_iRefIndex_xxx |
Statistics | http://irefindex.uio.no/wiki/Statistics_iRefIndex_xxx |
Download location | ftp://ftp.no.embnet.org/irefindex/data/archive/xxx |
Directory contents
README | pointer to this file at http://irefindex.uio.no/wiki/README_iRefIndex_MITAB_xxx |
Sources | pointer to data files for this release at http://irefindex.uio.no/wiki/Sources_iRefIndex_xxx |
Statistics | pointer to statisitics for this release at http://irefindex.uio.no/wiki/Statistics_iRefIndex_xxx |
xxxx.mitab.mmddyyyy.txt.zip | individual indices in PSI-MITAB2.5 format |
iRefIndex data is distributed as a set of tab-delimited text files with names of the form xxxx.mitab.mmddyyyy.txt.zip where mmddyyyy represents the file's creation date.
The complete index is available as All.mitab.mmddyyyy.txt.zip .
Taxon specific data sets are also available for:
Taxon Id | |
Homo sapiens | 9606 (human) |
Mus musculus | 10090 (mouse) |
Rattus norvegicus | 10116 (brown rat) |
Caenorhabditis elegans | 6239 (nematode) |
Drosophila melanogaster | 7227 (fruit fly) |
Saccharomyces cerevisiae | 4932 (baker's yeast) |
Escherichia coli. | 562 (E. Coli) |
Other | other |
All | all |
Taxon specific subsets of the data are named xxxx.mitab.mmddyyyy.txt.zip where xxxx is the taxonomy identifier of at least one of the interactors according to either the source interaction database or the sequence database record. Each zip compressed file contains a single text file with the corresponding name xxxx.mitab.mmddyyyy.txt.
In some cases, other objects may belong to other taxons if a virus-host interaction is being represented or if a protein from another organism has been used to model a protein in the specified organism.
Taxonomy identifiers are provided in the data sets allowing these exceptions to be identified. The taxonomy identifiers listed are derived from the source protein sequence record. In some cases, this taxonomy identifier will be a child of the taxon listed in the file's title; for example, Escherichia coli K12 (taxonomy identifier 83333) will appear in the Escherichia coli (taxonomy identifier 562) file.
A description of the NCBI taxon identifiers is available at the following location:
http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy
The above data taxon division scheme leads to duplications; for instance, an interaction present in the mouse index could also appear in the human index if the interaction record lists protein sequence records from both human and mouse. The All.mitab.mmddyyyy file is a complete and non-redundant listing.
The data format and divisions provided in this initial release were chosen in the hopes that they would be immediately useful to the largest possible set of users. Other formats and divisions are possible and we welcome your input on future releases.
Changes from last version
xxx
Known Issues
xxx
Understanding the iRefIndex MITAB format
iRefIndex is distributed in PSI-MITAB format. This format was originally described in a recent PSI-MI paper (PMID 17925023).
Since this PSI-MITAB format allows for only two interactors to be described on each line, it is best suited for describing binary interaction data (the original experiment, say yeast two hybrid, gives a binary readout). However, other source PSI-MI XML source records will describe interactions involving only one interactor type (dimers or multimers) or they will contain associative or (a.k.a "n-ary") interaction data (say from immunoprecipitation experiments where the exact interactions between any pair of interactors are unknown. These cases are problematic for the PSI-MITAB format. This README describes exactly how we use the MITAB format to describe these alternate (non-binary) interaction types.
CHANGE
Each row in the MITAB file represents a **single** interaction record from one primary data source.
Previously, each line represented a **collection** of interaction records where each member of this collection describes an interaction involving the exact same set of proteins (as defined by their primary sequence and taxon ids). We have moved to representing a single interaction on each line since this allows us to convey additional information about each of the original source records. Users can still "collapse" or find all lines that describe an interaction between the same set of proteins by using the "RIG" (column xxx). Rows with identical rigids (redundant interaction group identifiers) all describe interactions between the same set of proteins.
The natural keys for each interaction record in this group (i.e. the record identifiers from the source database) are listed under interactionIdentifier (column xxx). For example:
intact:EBI-761694
CHANGE
Our surrogate (primary) key for a group of redundant interaction records (RIG) is no longer listed in column xxx. Only, the source database record is listed in this column. The source db name and record id (separated by a colon) are given in this column. The RIG identifier is now listed (by itself) in column xxx.
The RIG identifier is a 27 character key that is derived from the ROGIDs of the interactors involved in the interaction record (see columns xxx and xxx). The RIG identifier is listed (by itself) in column xxx for convenience. The ROGID is a SHA-1 digest of the protein interactor's primary amino acid sequence concatenated with the NCBI taxon id (see the paper for details).
Sometimes source interaction records in PSI-MI format only list one interactor. These are cases where either 1) an intramolecular interaction is being represented or 2) a multimer (3 or more) of some protein is being represented. These records are difficult to represent in the PSI-MITAB format because PSI-MITAB requires that each row (interaction) list two interactors. The way we handle this is to list the ROG identifier for the single interactor twice (once in each of columns xxx and xxx) of the MITAB. The RIG identifier for these interactions will be the SHA-1 digest of the interactor’s ROG id (see column xxx). These interactions are marked by a Y in column xxx (see the README).
Note that column xxx may also contain a C. This indicates that the MITAB entry describes membership of a protein in some complex. These entries correspond to PSI-MI records where more than two interactors are listed (associative interaction data; a.k.a. n-ary data cf. binary data). In these cases, the first column holds the ROG identifier of the complex and the second column contains the ROG id of the protein. We refer to this method of representation as a bi-partite model since there are two kinds of nodes corresponding to complexes and proteins.
As an example, let’s say that a source interaction record contained interactors A, B and C found by affinity purification and mass-spec where a tagged version of protein A was used as the bait protein to perform the immunoprecipitation.
Then we would represent the complex in the MITAB file using three lines: X-A, X-B, and X-C. All three entries would have the same string in column 1 (the RIG id for the complex). All three entries would have the same string in column 21 (again, the RIG id for the complex) All three entries would have a C in column xxx.
xxx
Other databases take an interaction record with multiple interactors (n-ary data) and make a list of binary interactions (based on the spoke or matrix model) and then list these binary interactions in the MITAB. For the example above, using a spoke model to transform the data into a set of binary interactions, these data would be represented using two lines in the MITAB file: A-B and A-C. Here A is chosen as the "hub" of the spoke model since it was the "bait" protein. For experimental systems that do not have "baits" and "preys" (e.g. x-ray crystallography), an arbitrary protein might be chosen as the bait.
Alternatively, a matrix model might be used to transform the n-ary data into a list of binary interactions. Here all pairwise combinations of interactors in the original n-ary data are represented as binary interactions. So, in the above example, the immunoprecipitated complex would be represented using three lines of the MITAB file: A-B, B-C, and A-C.
All three methods for representing n-ary data in a MITAB file (bi-partite, spoke, and matrix) are different representations of the same data.
We have chosen to use the bi-partite method of representation so that it is impossible to mistake spoke or matrix binary entries for true binary entries; the identifiers used for complexes will, of course, not appear in a protein database and any programme that tries to treat complex identifiers as though they were protein identifiers will fail. The method allows you to reconstruct the members of the original interaction record that describes a complex of proteins (say from an affinity purification experiment). From there, you can choose to make a spoke or matrix model by yourself if you want.
xxx Users are advised that other databases will use spoke and matrix model representations of complexes. In these cases, column xxx will indicate this fact. The pairs of proteins found in these entries do not necessarily represent observations of real binary interactions: they merely represent membership in some larger list of proteins observed to be somehow associated.
For binary interaction data, column xxx will contain an X. Two protein interactor ROGIDs will be listed in columns 1 and 2.
License
Data released on this public ftp site are released under the Creative Commons Attribution License http://creativecommons.org/licenses/by/2.5/. This means that you are free to use, modify and redistribute these data for personal or commercial use so long as you provide appropriate credit. See next section.
iRefIndex data distributed on the FTP site includes only those data that may be freely distributed under the copyright license of the source database. This includes data from BIND, BioGRID, IntAct, MINT, MPPI and OPHID.
iRefIndex also integrates data from CORUM, DIP, HPRD and MPact. These data are not distributed publicly. These data may be made available to academic users under a collaborative agreement.
Contact ian.donaldson at biotek.uio.no if you are interested in using the iRefIndex database or would like your database included in the public release of the index.
Copyright © 2008-2010 Ian Donaldson
Citation
Credit should include citing the iRefIndex paper (PMID 18823568) and any of the source databases upon which this resource is based. See http://irefindex.uio.no for appropriate citations.
Disclaimer
Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Description of PSI-MITAB2.5 file
Each line in this file represents either
- an interaction between two proteins (binary interaction) or
- the membership of a protein in some complex (complex membership) or
- an interaction that involves only one protein type (multimer or self-interaction).
See column xxx for more details.
Column number: 1 (uidA)
Column name: | uidA |
Column type: | Integer |
Description: | Unique identifier for the canonical group to which interactor A belongs. |
Example: | irefindex:2345 |
Notes
CHANGE This column contains an internal, integer key for the canonical group to which interactor A belongs.
A alphanumeric equivalent of this key (that can be generated by anyone, i.e., universal) appears in column 41. See the notes for column 41 for more details on how protein identifiers were mapped from the original database record to this key.
Column 3 lists database names and accessions that belong to this group. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the same taxon). Members of a canonical group may include splice isoform products from the same or related genes. One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in this column.
The identifier listed here is stable from one release of iRefIndex to another starting from release 6.0.
Internal note: Also referred to as iROG-A.
Column number: 2 (uidB)
Column name: | uidB |
Column type: | Integer |
Description: | Unique identifier for the canonical group to which interactor B belongs. Also referred to as iROG-B. |
Example: | irefindex:456543 |
Notes
See notes for column 1.
Column number: 3 (altA)
Column name: | altA |
Column type: | a|b: pipe-delimited set of strings |
Description: | Alternative identifiers for interactor A |
Example: | uniprotkb:P23367|refseq:NP_418591|entrezgene/locuslink:948691 |
Notes
Column 3 lists database names and accessions that belong to the same canonical group. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the same taxon). Members of a canonical group may include splice isoform products from the same or related genes. One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in column 1.
Each pipe-delimited entry is a database_name:accession pair delimited by a colon. Database names are taken from the MI controlled vocabulary at http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI Database references listed in this column may include the following:
- uniprotkb
- The accessions this protein is known by in UniProt(http://www.uniprot.org/). More information regarding this protein can be retrieved using this accession from UniProt. See the AC line in the flat file. http://au.expasy.org/sprot/userman.html#AC_line.
- refseq
- If a protein accession exists in the RefSeq data base (http://www.ncbi.nlm.nih.gov/RefSeq/) that reference is indicated here. More information about this protein can be obtained from RefSeq using this accession.
- entrezgene/locuslink
- NCBI gene Identifiers for the gene encoding this protein. See ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq column GeneID given protein's accession.version
- other
- If none of the three identifier types are available then other databasename:accession pairs will be listed. These database names may not follow the MI controlled vocabulary.
Example:
emb:CAA44868.1|gb:AAA23715.1|gb:AAB02995.1|emb:CAA56736.1|uniprot:P24991
- irefindex
- If the node represents a complex, then the rogid for the complex will be listed here, such as the following:
irefindex:xBr9cTXgzPLNxsaKiYyHcoEm/DM
Column number: 4 (altB)
Column name: | altB |
Column type: | a|b: pipe-delimited set of strings |
Description: | Alternative identifiers for interactor B |
Example: | uniprotkb:P06722|refseq:NP_417308|entrezgene/locuslink:947299 |
Notes
See notes for column 3.
Column number: 5 (aliasA)
Column name: | aliasA |
Column type: | a|b: pipe-delimited set of strings |
Description: | Aliases for interactor A |
Example: | uniprotkb:MUTL_ECOLI|entrezgene/locuslink:mutL |
Notes
Each pipe-delimited entry is a databasename:alias pair delimited by a colon. Database names are taken from the PSI-MI controlled vocabulary at http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI
Database names and sources listed in this column may include the following:
- uniprotkb:entry name
- the entry name given by UniProt. See Entry name in the ID line of the flat file. http://au.expasy.org/sprot/userman.html#ID_line
- entrezgene/locuslink:symbol
- the NCBI gene symbol for the gene encoding this protein. See ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info column Symbol given GeneID
- irefindex:complex
- If the node is a complex then irefindex:complex will be listed here.
- NA
- NA may be listed here if aliases are Not Available
Column number: 6 (aliasB)
Column name: | aliasB |
Column type: | a|b: pipe-delimited set of strings |
Description: | Aliases for interactor B |
Example: | uniprotkb:MUTH_ECOLI|entrezgene/locuslink:mutH |
Notes
See notes for column 5.
Column number: 7 (Method)
Column name: | Method |
Column type: | string |
Description: | Interaction detection method |
Example: | MI:0039(2h fragment pooling) |
Notes
CHANGE
Only a single method will appear in this column. Previously, multiple methods appeared. Both the controlled vocabulary term identifier for the method (e.g. MI:0399) and the controlled vocabulary term short label in brackets (e.g. 2h fragment pooling) will appear in this column. See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to look up controlled vocabulary term identifiers.
The interaction detection method is from the original record. Path for PSI-MI 2.5:
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel/
CHANGE
If a controlled vocabulary term identifier was not provided by the source database then an attempt was made to use the supplied short label to find the correct term identifier. If a term identifier could not be found, then MI:0000 will appear before the shortLabels.
NA or -1 may appear in place of a recognised shortLabel.
For example:
MI:0000(-1) MI:0000(NA)
xxx Sabry check above that there is only one method per line and that the normailzed cv term id is used when appropriate and check what is used when no mapping can be made.
Column number: 8 (author)
Column name: | author |
Column type: | a|b: pipe-delimited set of strings |
Description: | |
Example: | hall-1999-1|hall-1999-2|mansour-2001-1|mansour-2001-2|hall-1999 |
Notes
According to MITAB2.5 format this column should contain a pipe-delimited list of author surnames in which the interaction has been shown.
CHANGE
This column will usually include only one author name reference. However, some experimental evidences have secondary references which could be included here.
xxx Sabry Check this
Column number: 9 (pmids)
Column name: | pmids |
Column type: | a|b: pipe-delimited set of strings |
Description: | PubMed Identifiers |
Example: | pubmed:9880500|pubmed:11585365 |
Notes
This is a non-redundant list of PubMed identifiers pointing to literature that supports the interaction. According to MITAB2.5 format, this column should contain a pipe delimited set of databaseName:identifier pairs such as pubmed:12345. The source database name is always pubmed.
CHANGE
This column will usually include only one pubmed reference that describes where the experimental evidence is found. In some cases, secondary references will be included here.
xxx Sabry Check this
The special value - may appear in place of the identifiers.
Column number: 10 (taxa)
Column name: | taxa |
Column type: | string |
Description: | Taxonomy identifier for interactor A |
Example: | taxid:83333(Escherichia coli K-12) |
Notes
The NCBI taxonomy identifier listed here is that of the sequence record for the interactor and may be different than what is listed in the interaction record. See the methods section for more details. See the NCBI taxonomy database at http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy . According to MITAB2.5 format, this column should contain a pipe delimited set of databaseName:identifier pairs such as taxid:12345. The source database name has been listed as taxid since it is always NCBI's taxonomy database. The value in this column will be NA if the interactor is a complex.
Column number: 11 (taxb)
Column name: | taxb |
Column type: | string |
Description: | Taxonomy identifier for interactor B |
Example: | taxid:83333(Escherichia coli K-12) |
Notes
See notes for column 10.
Column number: 12 (interactionType)
Column name: | interactionType |
Column type: | string |
Description: | Interaction Type from controlled vocabulary or short label |
Example: | MI:0218(physical interaction) |
Notes CHANGE
Only one interaction type will be present in each line of the file (previously, multiple types were listed).
The interaction type is taken from the PSI-MI controlled vocabulary and represented as...
database:identifier(interaction type)
...(when available in the interaction record) or Path for PSI-MI 2.5:
entrySet/entry/interactionList/interaction/interactionType/names/shortLabel
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for interaction types.
CHANGE
If the MI controlled vocabulary identifier was not provided by the source database, but a text description was provided, then an attempt was made to map the text to the correct controlled vocabulary term identifier. If this was not possible then MI:0000 is listed.
xxx Sabry discuss
NA may be listed here if the interaction type is not available (meaning that we could not find the interaction type in the record provided by the source database).
Column number: 13 (sourcedb)
Column name: | sourcedb |
Column type: | String |
Description: | Source databases containing this interaction |
Example: | MI:0469(intact) |
Notes
Taken from the PSI-MI controlled vocabulary and represented as...
database:identifier(sourceName)
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to lookup controlled vocabulary term identifiers for database sources.
CHANGE Only one source database will be listed in each row.
Column number: 14 (interactionIdentifier)
Column name: | interactionIdentifier |
Column type: | String |
Description: | source interaction database and accession |
Example: | intact:EBI-761694 |
Notes
Each reference is presented as a databaseName:identifier pair.
CHANGE
Only one source database reference will be listed in each row. The RIGID (from iRefIndex) is no longer listed in this column. See column xxx instead.
The source databaseNames that appear in this column are taken from the PSI-MI controlled vocabulary at http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI where possible
If an interaction record identifier is not provided by the source database, this entry will appear as
database-name:-
xxx Sabry check this
Column number: 15 (confidence)
Column name: | confidence |
Column type: | a|b: pipe-delimited set of strings |
Description: | Confidence scores |
Example: | lpr:1|hpr:12|np:1 |
Notes
Each reference is presented as a scoreName:score pair. Three confidence scores are provided: lpr, hpr and np.
PubMed Identifiers (PMIDs) point to literature references that support an interaction. A PMID may be used to support more than one interaction.
The lpr score (lowest pmid re-use) is the lowest number of distinct interactions (RIGIDs: see column 14) that any one PMID (supporting the interaction in this row) is used to support. A value of one indicates that at least one of the PMIDs supporting this interaction has never been used to support any other interaction. This likely indicates that only one interaction was described by that reference and that the present interaction is not derived from high throughput methods.
The hpr score (highest pmid re-use) is the highest number of interactions (RIGIDs: see column 14) that any one PMID (supporting the interaction in this row) is used to support. A high value (e.g. greater than 50) indicates that one PMID describes at least 50 other interactions and it is more likely that high-throughput methods were used.
The np score (number pmids) is the total number of unique PMIDs used to support the interaction described in this row.
- may appear in the score field, indicating the absence of a score value.
CHANGE
NEW
COLUMNS PAST THIS POINT (16 - 31) ARE PART OF THE NEW PSI-MITAB 2.6 FORMAT
Column number: 16 (expansion)
Column name: | expansion |
Column type: | String |
Description: | Model used to convert n-ary data into binary data for purpose of export in MITAB file |
Example: | bipartite |
Notes For iRefIndex, this column will always contain either "bipartite" or "none".
Other databases may use either "spoke" or "matrix" or "none" in this column.
See Understanding_the_iRefIndex_MITAB_format at the top of this file for an explanation.
Column number: 17 (biological role A)
Column name: | biological_role_A (change by Sabry from biological role A) |
Column type: | String |
Description: | Biological role of interactor A |
Example: | MI:0501(enzyme) |
Notes When provided by the source database, this includes single entry such as "MI:0501(enzyme)", MI:0502(enzyme target), MI:0580(electron acceptor), or MI:0499(unspecified role). See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to browse possible values for biological role.
xxx Sabry check this format
Sabry: For complexes and when no role specified this will beunspecified role
Column number: 18 (biological role B)
Column name: | biological_role_B (changed by Sabry from biological role B) |
Column type: | String |
Description: | Biological role of interactor B |
Example: | MI:0501(enzyme) |
Notes See notes for column 17.
Column number: 19 (experimental role A)
Column name: | experimental_role_A (changed by Sabry from experimental role A) |
Column type: | String |
Description: | Indicates the experimental role of the interactor (such as bait or prey). |
Example: | MI:0496(bait) |
Example: | MI:0498(prey) |
Notes
This column indicates the experimental role (if any was provided by the source database) that was played by interactor A (column 1).
See http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=MI to see definitions of bait and prey. as well as browse other possible values of experimental role that may appear in this column for other databases.
xxx Sabry check format
Sabry: For complexes and when no role specified this will beMI:0499(unspecified role)
Column number: 20 (experimental role B)
Column name: | experimental_role_B (changed by Sabry from experimental role B) |
Column type: | String |
Description: | Indicates the experimental role of the interactor (such as bait or prey). |
Example: | MI:0496(bait) |
Example: | MI:0498(prey) |
Notes
This column indicates the experimental role (if any) that was played by interactor B (column 2).
See notes above for column 19.
Column number: 21 (interactor type A)
Column name: | interactor_type_A (change by sabry from interactor type A) |
Column type: | string |
Description: | describes the type of molecule that A is |
Example: | MI:0326(protein) |
Notes
For iRefIndex, this will always be one of...
MI:0326(protein) MI:0315(protein complex)
xxx Sabry check why - sometimes appear here
Column number: 22 (interactor type B)
Column name: | interactor_type_B (change by sabry from interactor type B) |
Column type: | string |
Description: | describes the type of molecule that B is |
Example: | MI:0326(protein) |
Notes
See column 21.
Column number: 23 (xrefs A)
Column name: | xrefs_A (changed from xrefs A by Sabry) |
Column type: | b: pipe-delimited set of strings |
Description: | xrefs for molecule A |
Example: | go:"GO:0016233"(telomere capping) |
Notes
This is not used by iRefIndex. A dash ( - ) will always appear in this column.
This column may be used to list cross-references to annotation information for molecule A. For example, Gene Ontology identifiers or OMIM identifiers.
Column number: 24 (xrefs B)
Column name: | xrefs_B (changed from xrefs B by Sabry) |
Column type: | b: pipe-delimited set of strings |
Description: | xrefs for molecule A |
Example: | - |
Notes
This is not used by iRefIndex. A dash ( - ) will always appear in this column.
See notes to column 23.
Column number: 25 (xrefs Interaction)
Column name: | xrefs_Interaction (changed from xrefs Interaction by Sabry) |
Column type: | b: pipe-delimited set of strings |
Description: | xrefs for the interaction |
Example: | go:"GO:0048786"(presynaptic active zone) |
Notes
This is not used by iRefIndex. A dash ( - ) will always appear in this column.
This column may be used to list cross-references to annotation information for the interaction. For example, Gene Ontology identifiers or OMIM identifiers.
Column number: 26 (Annotations A)
Column name: | Annotations_A (changed from Annotations A by Sabry) |
Column type: | b: pipe-delimited set of strings |
Description: | Annotations for molecule A |
Example: | This protein binds 7 zinc molecules |
Notes
This is not used by iRefIndex. A dash ( - ) will always appear in this column.
This column may be used to list free-text annotation information for the interaction.
Some databases may use dataset:* or data-processing:* (where * is non-controlled free-text) in this column.
Column number: 27 (Annotations B)
Column name: | Annotations_B (changed from Annotations B by Sabry) |
Column type: | string |
Description: | Annotations for molecule B |
Example: | - |
Notes
This is not used by iRefIndex. A dash ( - ) will always appear in this column.
See notes to column 26.
Column number: 28 (Annotations Interaction)
Column name: | Annotations Interaction |
Column type: | b: pipe-delimited set of strings |
Description: | Annotations for interaction |
Example: | prediction score:432|comment:prediction based on phage display consensus|author-confidence:8|comment:AD-ORFeome library used in the experiment. |Interaction of the NON_CORE set.|number of hits:1
pbs signal:-8.966|pbs category:B|figure-legend:NFA |
Notes
This is not used by iRefIndex. A dash ( - ) will always appear in this column.
This column may be used to list free-text annotation information for the interaction. The keys used before the : (like "comment") are database specific and not controlled.
Some databases may use dataset:* or data-processing:* (where * is non-controlled free-text) in this column.
Column number: 29 (Host organism taxid)
Column name: | Host_organism_taxid (Changed from Host organism taxid by Sabry) |
Column type: | string |
Description: | Host organism taxid where the interaction was experimentally demonstrated |
Example: | taxid:10090(Mus musculus) |
Notes
Host organism taxid where the interaction was experimentally demonstrated. This may differ from the taxid of the interactors. Other possible entries are:
taxid:-1(in vitro)
taxid:-4(in vivo)
A dash ( - ) will be used when no information about the Host organism taxid is vailable
taxid:32644(unidentified) will be used when the source specify the Host organism taxid as 32644
xxx Sabry check format
Column number: 30 (parameters Interaction)
Column name: | parameters_Interaction (changed from parameters Interaction by Sabry) |
Column type: | string |
Description: | Parameters for the interaction |
Example: | - |
Notes
This is not used by iRefIndex. A dash ( - ) will always appear in this column.
Internal note : use of this column is not well-defined or characterized.
Column number: 31 (Creation date)
Column name: | Creation_date (Changed from Creation date by sabry ) |
Column type: | string (yyyy/mm/dd) |
Description: | When was the entry created. |
Example: | - 2010/05/06 |
Notes
This will be the release date of iRefIndex for all entries in this file.
This date will not match the date for the corresponding record in the source database.
Column number: 32 (Update date)
Column name: | Update_date (Changed from Update date by Sabry) |
Column type: | string (yyyy/mm/dd) |
Description: | When was this record last updated? |
Example: | 2010/05/06 |
Notes
This will be the release date of iRefIndex for all entries in this file.
This date will not match the date for the corresponding record in the source database.
Column number: 33 (Checksum A)
Column name: | Checksum_A (Changed from Checksum A by Sabry) |
Column type: | String |
Description: | Calculation of the checksum is as described in PMID 18823568. |
Example: | rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333 |
Notes
This only applies if interactor A is of type protein. This value is a ROGID and is the same as column 39. See notes for column 39. This ROGID is calculated by iRefIndex. The source database should generate an equivalent ROGID value for one of the interactors in the same source record unless the underlying sequence has been updated during the iRefIndex build process.
Column number: 34 (Checksum B)
Column name: | Checksum B |
Column type: | String |
Description: | Calculation of the checksum is as described in PMID 18823568. |
Example: | rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333 |
Notes
This only applies if interactor B is of type protein. This value is a ROGID and is the same as column 40. See notes for column 40. This ROGID is calculated by iRefIndex. The source database should generate an equivalent ROGID value for one of the interactors in the same source record unless the underlying sequence has been updated during the iRefIndex build process.
Column number: 35 (Checksum Interaction)
Column name: | Checksum_Interaction (Changed from Checksum Interaction by Sabry) |
Column type: | String |
Description: | Calculation of the checksum is as described in PMID 18823568. |
Example: | rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA |
Notes This only applies if all interactors in the interaction are of type protein. This value is a RIGID and is the same as column 49. See notes for column 49. This RIGID is calculated by iRefIndex. The source database should generate an equivalent RIGID value for the same source record unless one of the participating protein interactor sequences has been updated during the iRefIndex build process.
Column number: 36 (Negative)
Column name: | Negative |
Column type: | Boolean (true or false) |
Description: | Does the interaction record provide evidence that some interaction does NOT occur. |
Example: | false |
Notes
This value will be false for all lines in this file since iRefIndex does not include "negative" interactions from any of the source databases.
COLUMNS PAST THIS POINT (37 -) ARE NOT DEFINED BY THE PSI-MITAB2.6 STANDARD. THESE COLUMNS ARE SPECIFIC TO THIS IREFINDEX RELEASE AND MAY CHANGE FROM ONE RELEASE TO ANOTHER
Column number: 37 (OriginalReferenceA)
Column name: | OriginalReferenceA |
Column type: | database name:accession |
Description: | Database name and reference used in the original interaction record to describe interactor A |
Example: | uniprotkb:P23367 |
Notes
This is the protein reference that was found in the original interaction record to describe interactor A. It is a colon-delimited pair of database name and accession. It may be either the primary or secondary reference for the protein provided by the source database.
Sabry: For complexes this will be ROGID of complex
Column number: 38 (OriginalReferenceB)
Column name: | OriginalReferenceB |
Column type: | database name:accession |
Description: | Database name and reference used in the original interaction record to describe interactor B |
Example: | uniprotkb:P23367 |
Notes
See notes for column 33.
Column number: 39 (Before-C13N-ROGID-A)
Column name: | Final_ROGID_A (Changed from Final-ROGID-A by sabry) |
Column type: | String |
Description: | Unique identifier for interactor A. Before Canonicalization (C13N). |
Example: | rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333 |
Notes
CHANGE
This column contains a universal key for the interactor. It corresponds to the ROGID (redundant object group identifier) described in the original iRefIndex paper BEFORE canonicalization has been performed. PMID 18823568.
Protein references from the original interaction record (and a description of how they were mapped to the final form and then to the canonical form) can be found in the corresponding iRefWeb record. See column xxx and search for this interaction record at http://wodaklab.org/iRefWeb/.
Column number: 40 (Before-C13N-ROGID-B)
Column name: | Final_ROGID_B (Changed from Final-ROGID-B by sabry) |
Column type: | String |
Description: | Unique identifier for interactor B. Before Canonicalization. |
Example: | rogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333 |
Notes
See notes for column 35.
Column number: 41 (After-C13N-ROGID-A)
Column name: | After_C13N_ROGID_A (changed from After-C13N-ROGID-A (Canonical ROGID A) by Sabry) |
Column type: | String |
Description: | Unique identifier for the canonical group to which interactor A belongs. Column 1 is an integer equivalent to this identifier. |
Example: | crogid:hhZYhMtr5JC1lGIKtR1wxHAd3JY83333 |
Notes
CHANGE
This column contains a universal key for the canonical group to which interactor A belongs.
Column 3 lists database names and accessions that belong to this group. Members of a canonical group do not all necessarily have the same sequence (although they all belong to the same taxon). Members of a canonical group may include splice isoform products from the same or related genes. One member of the canonical group is chosen to represent the entire group. The identifier for that canonical representative is listed in this column.
See http://irefindex.uio.no/wiki/Canonicalization for a description of canonicalization. Protein references from the original interaction record (and a description of how they were mapped to the canonical form) can be found in columns 33 - 38 as well as in the corresponding iRefWeb record. See column xxx and search for this interaction record at http://wodaklab.org/iRefWeb/.
This universal key listed here is the ROGID (redundant object group identifier) described in the original iRefIndex paper. PMID 18823568. However, an additional round of processing called canonicalization (C13N) has been performed before choosing a protein to represent this interactor. Therefore this identifier may differ from the value in column 35 (before canonicalization).
An internal, integer equivalent of this universal key appears in column 1 of this table.
If this line (entry) describes a binary interaction between two proteins, then the protein with the 'ascibetically' (ASCII value sort order) larger ROGID is listed as interactor A (see After-C13N-ROGID-A (column 41) and uidA (column 1)).
If this entry describes the membership of a protein in a complex, then the ROGID of the complex is always listed first as interactor A and the member protein's ROGID is listed second (see After-C13N-ROGID-B (column 42) and uidB (column 2)).
If this entry describes a an interaction involving only one protein type, then the ROGID of that protein is listed for both interactor A and B.
The ROGID (redundant object group identifier) for proteins, consists of the SEGUID for the protein concatenated with the taxon identifier for the protein. For complex nodes, the ROGID is calculated as the SHA-1 digest of the ROGID's of all the protein participants (after first ordering them by ASCII-based lexicographical sorting in ascending order and concatenating them) See the iRefIndex paper for details. The SEGUID is always 27 characters long. So the ROGID will be composed of 27 characters concatenated with a taxon identifier for proteins.
Column number: 42 (After-C13N-ROGID-B)
Column name: | After_C13N_ROGID_B (changed from After-C13N-ROGID-B (Canonical ROGID B) by Sabry) |
Column type: | String |
Description: | Unique identifier for the canonical group to which interactor B belongs. Column 2 is an integer equivalent to this identifier. |
Example: | crogid:AhmYiMtz8lR12Gixt91txbAd3JY83333 |
Notes
See notes for column 41.
Column number: 43 (entrezGeneIds-A)
Column name: | entrezGeneIds_A (Changed from entrezGeneIds-A by Sabry) |
Column type: | pipe delimited list of integers or a string |
Description: | EntrezGene identifier(possibly a pipe-delimited list) for interactor A |
Example: | 947299 |
Notes
CHANGE
xxx discuss with Sabry
This column contains a pipe-delimited list of integers that are Entrez GeneIds. This list makes up a related gene group (RGG) that was used in the canonicalization procedure. See [Canonicalization] for more details. Briefly, EntrezGene identifiers were grouped together into related gene groups (RGGs) if they shared at least one identical protein product.
If you are looking for the specific Entrez Gene identifier for molecule A in column 1, then refer to column 3.
If no EntrezGene identifier is available for the interactor, then a ROGID will appear in this column (see notes to column 41).
If the interactor is a node representing a complex, then the ROGID for the complex will appear here.
Column number: 44 (entrezGeneIds-B)
Column name: | entrezGeneIds_B (Changed from entrezGeneIds-B by Sabry) |
Column type: | pipe delimited list of integers or a string |
Description: | EntrezGene identifier(possibly a pipe-delimited list) for interactor B |
Example: | 948691 |
Notes
See notes for column 43.
Column number: 45 (MappingScoreA)
Column name: | MappingScoreA |
Column type: | String |
Description: | String describing operations performed by iRefIndex procedure during mapping from original protein reference (columns 32) to the final protein reference (columns 34). |
Example: | PTUO+ |
Notes
CHANGE
This column contains a description of mapping operations as a condensed string of letters. See the original iRefIndex paper. PMID 18823568. Protein references from the original interaction record (and a description of how they were mapped to the canonical form) can also be found in the corresponding iRefWeb record. See column xxx and search for this interaction record at http://wodaklab.org/iRefWeb/.
Sabry: For complexes this will be (includig 'NA' might confuse as it looks like a score)-
Column number: 46 (MappingScoreB)
Column name: | MappingScoreB |
Column type: | String |
Description: | String describing operations performed by iRefIndex procedure during mapping from original protein reference (columns 33) to the final protein reference (columns 35). |
Example: | SU |
Notes
See notes for column 45. ---------------------------------------------------------------------------- checked
Column number: 47 (C13N-rigid)
Column name: | C13N_rigid (Chnaged from C13N-rigid (Canonical RIGID) by Sabry) |
Column type: | string |
Description: | Redundant interaction group identifier |
Example: | crigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA |
Notes
The Canonical RIGID (for redundant interaction group identifier) consists of the canonical (C13N) ROG identifiers for each of the protein participants (see notes above) ordered by ASCII-based lexicographic sorting in ascending order, concatenated and then digested with the SHA-1 algorithm. See the iRefIndex paper for details. This identifier points to a set of redundant protein-protein interactions that involve the same set of proteins with the exact same primary sequences.
Column number: 48 (C13N-rig)
Column name: | C13N_rig (Chaged from rig by Sabry) |
Column type: | string |
Description: | Redundant interaction group |
Example: | irefindex:12345 |
Notes
CHANGE
xxx discuss with Sabry
This is an internal, integer equivalent of the C13N-RIGID. See column 47.
This integer may be used to query the iRefWeb interface for the interaction record. For example
http://wodaklab.org/iRefWeb/interaction/show/13653
where 13653 is the C13N-rig.
Starting with release 6.0, this C13N-rig is stable from one release of iRefIndex to another.
Column number: 49 (Before-C13N-rigid)
Column name: | rigid |
Column type: | string |
Description: | Redundant interaction group identifier - before canonicalization (C13N). |
Example: | rigid:3ERiFkUFsm7ZUHIRJTx8ZlHILRA |
Notes
The RIGID (for redundant interaction group identifier) consists of the ROG identifiers for each of the protein participants (see notes above) ordered by ASCII-based lexicographic sorting in ascending order, concatenated and then digested with the SHA-1 algorithm. See the iRefIndex paper for details. This identifier points to a set of redundant protein-protein interactions that involve the same set of proteins with the exact same primary sequences.
The rigid is constructed from ROGs **before** canonicalization. This identifier can be easily and universally constructed by data providers to facilitate data integration and exchange.
Column number: 50 (imex-id)
Column name: | imex_id (changed from imex-id by Sabry) |
Column type: | string |
Description: | IMEx identifier if available |
Example: | imex:IM-12202-3 |
Example: | When no information available a dash will be used ( - ) |
Notes
Column number: 51 (edgetype)
Column name: | edgetype |
Column type: | Character |
Description: | Does the edge represent a binary interaction (X), member of complex (C) data, or a multimer (Y)? |
Example: | X |
Notes
CHANGE xxx discuss with Sabry
Edges can be labelled as either X, C or Y:
- X
- a binary interaction with two protein participants
- C
- denotes that this edge is a binary expansion of interaction record that had 3 or more interactors (so-called "complex" or "n-ary" data). The expansion type is described in column 16 (expansion). In the case of iRefIndex, the expansion is always "bipartite" meaning that Interactor A (column 1) of this row represents the collection of interactors and Interactor B (column 2) represents a protein that is a member of this group.
See Understanding_the_iRefIndex_MITAB_format for further explanation.
- Y
- for dimers and polymers. In case of dimers and polymers when the number of subunits is not described in the original interaction record, the edge is labelled by a Y. Interactor A (column 1) will be identical to the Interactor B (column 2). The graphical representation of this will appear as a single node connected to itself (loop). The actual number of self-interacting subunits may be 2 (dimer) or more (say 5 for a pentamer). Refer to the original interaction record for more details and see column "numParticipants".
Column number: 52 (numParticipants)
Column name: | numParticipants |
Column type: | Integer |
Description: | Number of participants in the interaction |
Example: | 2 |
Notes
CHANGE xxx discuss with Sabry
- For edges labelled X (see column 21) this value will be two.
- For edges labelled C, this value will be equivalent to the number of protein interactors in the original n-ary interaction record.
- For interactions labelled Y, this value will either be the number of self-interacting subunits (if present in the original interaction record) or 1 where the exact number of subunits is unknown or unspecified.
Column number: 53 (interaction_name)
Column name: | interaction_name |
Column type: | String |
Description: | The name of the interaction, |
Example: | MTA1-HDAC core complex |
Notes
CHANGE xxx discuss with Sabry
- A name was selected from the original interaction data provided when available.
- When no interaction name available a name was constructed using the names of the interactors (e.g.Interaction involving HCK_HUMAN and RASA1_HUMAN).