Statistics iRefIndex 16.0

From irefindex
Revision as of 18:25, 4 October 2019 by Admin (talk | contribs)

Interactions available from major taxonomies (corrected)

Taxons of the protein interactors have been corrected to correspond to the taxon provided in the protein sequence record regardless of the taxon listed in the interaction record. See PMID 18823568 for details.

NCBI taxonomy identifier Scientific name Number of interactions
9606 Homo sapiens 651005
559292 Saccharomyces cerevisiae S288C 121184
7227 Drosophila melanogaster 74579
10090 Mus musculus 68849
3702 Arabidopsis thaliana 57809
40674 Mammalia 36350
4932 Saccharomyces cerevisiae 34840
83333 Escherichia coli K-12 16956
6239 Caenorhabditis elegans 16947
10116 Rattus norvegicus 14250
316407 Escherichia coli str. K-12 substr. W3110 12816
192222 Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819 11930
284812 Schizosaccharomyces pombe 972h- 11130
381518 Influenza A virus (A/Wilson-Smith/1933(H1N1)) 5045


Summary of mapping interaction records to RIGs (redundant interaction groups)

Source: Interaction data source. Total records: Total number of interaction records found in source. Protein-only interactors:Total number of interactions involving only protein interactors. PPI assigned to RIGID: Number of interactions where all protein interactors were assigned to a ROG. Percentage of column 3 is shown. Unique RIGIDs (interactions): Number of unique protein interactions and complexes (RIGID's) found in the data source (also expressed as a percentage of column 4). For a description of the term RIGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source Total records Protein-related interactions PPI assigned to RIGID % Unique RIGIDs %
BHF_UCL 2341 2328 2328 100.00 1515 65.08
BIND 157736 153063 73206 47.83 54161 73.98
BIND_TRANSLATION 192923 84138 82228 97.73 60872 74.03
BIOGRID 1653530 778945 775480 99.56 568254 73.28
CORUM 4274 4274 4270 99.91 4018 94.10
DIP 81731 80134 79878 99.68 77472 96.99
HPIDB 3007 2840 2840 100.00 1558 54.86
HPRD 83022 83022 82983 99.95 40542 48.86
INNATEDB 18408 18300 17807 97.31 12728 71.48
INTACT 571739 520992 520864 99.98 329941 63.34
INTCOMPLEX 2536 2026 2026 100.00 1995 98.47
MATRIXDB 36945 36867 36867 100.00 22374 60.69
MBINFO 542 522 522 100.00 331 63.41
MINT 81305 80746 80731 99.98 44969 55.70
MPACT 16504 16504 16373 99.21 13398 81.83
MPIDB 1505 1504 1425 94.75 893 62.67
MPPI 1814 1758 1578 89.76 776 49.18
QUICKGO 71979 58723 56583 96.36 28741 50.79
REACTOME 141996 141996 141844 99.89 130128 91.74
UNIPROTPP 11118 11033 11033 100.00 6239 56.55
VIRUSHOST 34760 34760 34760 100.00 30178 86.82
(All) 3169715 2114475 2025626 95.80 1079693 53.30


Assignment of protein interactors to ROGs (redundant object group)

Source: Interaction data source (see methods). Protein interactors: Total number of interactors found in all interaction records. Assigned: Number of proteins assigned unambiguously to a ROG. Assignments listed in columns 5 and 6 are not included here. %: Column 3 expressed as a percentage of column 2. Arbitrary: Total number of ROG assignments that were ambiguous and resolved with an arbitrary method (see ROG scores with 'L'). Matching sequence: Total number of assignments made where a sequence in the interaction record matched a known sequence. Unassigned:Total number of protein interactors that could not be assigned to a ROG. Unique: Total number of unique proteins (ROG's). For a description of the term ROGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source Protein interactors Assigned % Arbitrary Matching sequence New or obsolete sequence Unassigned Unique proteins
BHF_UCL 6186 6186 100.00 0 0 0 0 1793
BIND 322722 226129 70.07 55 0 5783 96593 30482
BIND_TRANSLATION 257681 254875 98.91 20550 0 9727 2806 36908
BIOGRID 64812 63774 98.40 3282 0 6177 1038 63507
CORUM 17317 17313 99.98 2 0 3 4 6125
DIP 28066 27916 99.47 648 0 1344 150 27170
HPIDB 7703 7703 100.00 0 0 1 0 2472
HPRD 123812 123812 100.00 14795 97009 116 0 9842
INNATEDB 42511 41922 98.61 0 0 9 589 6282
INTACT 430046 429865 99.96 204 57 433 181 90591
INTCOMPLEX 8856 8856 100.00 0 0 1 0 4779
MATRIXDB 180685 180685 100.00 0 0 5 0 21578
MBINFO 1136 1136 100.00 0 0 0 0 274
MINT 211273 211256 99.99 138 0 8 17 26287
MPACT 40349 40199 99.63 0 0 0 150 4995
MPIDB 3238 3090 95.43 0 0 1 148 930
MPPI 3568 3361 94.20 16 0 0 207 833
QUICKGO 130696 128509 98.33 0 0 1 2187 25779
REACTOME 283992 283839 99.95 704 0 0 153 5938
UNIPROTPP 29349 29349 100.00 1 0 0 0 6832
VIRUSHOST 69520 69520 100.00 0 0 4 0 8841
(All) 2263518 2159295 95.40 40395 97066 23613 104223 144942

Mapping score summary

See below for definitions of the mapping score codes.

BHF_UCL BIND BIND_TRANSLATION BIOGRID CORUM DIP HPIDB HPRD INNATEDB INTACT INTCOMPLEX MATRIXDB MBINFO MINT MPACT MPIDB MPPI QUICKGO REACTOME UNIPROTPP VIRUSHOST
P 6177 180574 42048 17276 7698 41911 428555 8842 180670 1136 210702 3075 125745 270448 29306 69501
P+IN 385
P+N 8
PD 129267 7098 5 2994
PD+IN 1
PD+L 38
PD+LQ 10169
PD+LYQ 1
PD+XQ 26
PDQ 31409
PDY 4828
PDYQ 5
PE 541
PGD 643 1778 3
PGD+L 6272 3265 6
PGD+X 2
PT 8659 2794 19 2 30579 2 2763 14
PTD 90937 2 2 44
PTD+LQ 4041
PTD+LYQ 3
PTDQ 2634
PTDY 955
PTDYQ 2
PTGD 14
PTGD+L 23 2
PTM 3
PTY 1 3 1
PU 9 27 13 4 2 576 13 10 395 6 12687 42 1
PU+L 17 2 157 129 704
PU+O 43
PU+X 610 1
PUD 13 9 145
PUD+L 7 9 13
PUD+X 60 162
PUT 4 11 4 2527 6
PUT+L 24 41 9 1
PUT+O 14
PUTD 14
PUTD+L 10 3
PV 9 9
PY 9715 6170 1 9 26 1 5 8 1 4
S 2 43 13024 86 3
S+L 4 222 554
S+LE 1
S+LY 1 59
S+N 4
S+O 304
S+X 214
S+XY 216
SD 5403 2541 1
SD+L 247 278
SD+N 116
SD+O 11612
SD+X 1267
SDY 3
SE 1
SGD 570
SGD+L 2822
SGD+O 15232
ST 4697 110 7093
ST+L 26 3455
ST+LY 4
ST+O 855
STD 728 6950
STD+L 9 569
STD+O 30000
STGD 1635
STGD+L 7077
STGD+O 38840
STY 25 1
SUD 65
SUD+L 49 25
SUD+O 7
SUD+X 566
SUTD 23
SUTD+L 32 15
SUTD+O 159
SY 6 1037 8


Mapping score code definitions

Character Description of feature (when the value is 1) align="center" style="background:#f0f0f0;"
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.
E The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.
L More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.
I The protein reference used was an NCBI GenInfo Identifier (I).
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.
P The interaction record's primary (P) reference for the protein was used to make the assignment
S One of the interaction record's secondary (S) references for the protein was used to make the assignment
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record