Statistics iRefIndex 17.0

From irefindex
Revision as of 20:16, 30 June 2020 by Admin (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Interactions available from major taxonomies (corrected)

Taxons of the protein interactors have been corrected to correspond to the taxon provided in the protein sequence record regardless of the taxon listed in the interaction record. See PMID 18823568 for details.

NCBI taxonomy identifier Scientific name Number of interactions
9606 Homo sapiens 732100
559292 Saccharomyces cerevisiae S288C 139956
7227 Drosophila melanogaster 76585
10090 Mus musculus 70996
3702 Arabidopsis thaliana 59726
6239 Caenorhabditis elegans 32382
83333 Escherichia coli K-12 16968
10116 Rattus norvegicus 14516
316407 Escherichia coli str. K-12 substr. W3110 12822
4896 Schizosaccharomyces pombe 12432
192222 Campylobacter jejuni subsp. jejuni NCTC 11168 =ATCC 700819 11930
632 Yersinia pestis 4166
243276 Treponema pallidum subsp. pallidum str. Nichols 3643
1111708 Synechocystis sp. PCC 6803 substr. Kazusa 3275

Summary of mapping interaction records to RIGs (redundant interaction groups)

Source: Interaction data source. Total records: Total number of interaction records found in source. Protein-only interactors:Total number of interactions involving only protein interactors. PPI assigned to RIGID: Number of interactions where all protein interactors were assigned to a ROG. Percentage of column 3 is shown. Unique RIGIDs (interactions): Number of unique protein interactions and complexes (RIGID's) found in the data source (also expressed as a percentage of column 4). For a description of the term RIGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source Total records Protein-related interactions PPI assigned to RIGID % Unique RIGIDs %
BAR 10396 10396 10383 99.87 10369 99.87
BHF_UCL 2341 2327 2327 100.00 1514 65.06
BIND 157736 91309 68064 74.54 49513 72.74
BIND_TRANSLATION 192923 84138 82233 97.74 60855 74.00
BIOGRID 1760395 873924 869987 99.55 646657 74.33
CORUM 4274 4274 4270 99.91 4018 94.10
DIP 81731 80134 79878 99.68 77468 96.98
HPIDB 6038 5769 5769 100.00 2432 42.16
HPRD 83022 83022 82983 99.95 40530 48.84
HURI 171545 168756 168750 100.00 51482 30.51
INNATEDB 18408 18408 6903 37.50 4815 69.75
INTACT 651130 597534 597404 99.98 345369 57.81
INTCOMPLEX 2821 2261 2261 100.00 2226 98.45
MATRIXDB 37217 36866 36866 100.00 22361 60.65
MBINFO 1084 1057 1057 100.00 539 50.99
MINT 165946 165136 165118 99.99 61283 37.11
MPACT 16504 16504 16373 99.21 13398 81.83
MPIDB 1505 1504 1425 94.75 893 62.67
MPPI 1814 1758 1578 89.76 776 49.18
QUICKGO 75574 60741 58630 96.52 29763 50.76
REACTOME 141996 141996 141844 99.89 126328 89.06
SPIKE 29686 29686 28327 95.42 27828 98.24
UNIPROTPP 12863 12775 12775 100.00 7233 56.62
VIRUSHOST 15000 15000 15000 100.00 9397 62.65
(All) 3641949 2505275 2460205 98.20 1185907 48.20

Assignment of protein interactors to ROGs (redundant object group)

Source: Interaction data source (see methods). Protein interactors: Total number of interactors found in all interaction records. Assigned: Number of proteins assigned unambiguously to a ROG. Assignments listed in columns 5 and 6 are not included here. %: Column 3 expressed as a percentage of column 2. Arbitrary: Total number of ROG assignments that were ambiguous and resolved with an arbitrary method (see ROG scores with 'L'). Matching sequence: Total number of assignments made where a sequence in the interaction record matched a known sequence. Unassigned:Total number of protein interactors that could not be assigned to a ROG. Unique: Total number of unique proteins (ROG's). For a description of the term ROGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source Protein interactors Assigned % Arbitrary Matching sequence New or obsolete sequence Unassigned Unique proteins
BAR 20792 20779 99.94 0 0 0 13 3267
BHF_UCL 6185 6185 100.00 0 0 0 0 1792
BIND 252251 215207 85.31 17 0 6491 37044 30237
BIND_TRANSLATION 257681 254882 98.91 20507 0 10469 2799 36881
BIOGRID 70364 69209 98.36 3194 0 6748 1155 68896
CORUM 17317 17313 99.98 2 0 7 4 6125
DIP 28066 27916 99.47 643 0 1398 150 27166
HPIDB 12529 12529 100.00 0 0 0 0 2481
HPRD 123812 123812 100.00 16325 87744 169 0 9837
HURI 340295 340289 100.00 33 0 438 6 8181
INNATEDB 42658 25503 59.78 0 0 0 17155 3742
INTACT 541927 541743 99.97 202 60 480 184 95845
INTCOMPLEX 10287 10287 100.00 0 0 2 0 5753
MATRIXDB 180936 180936 100.00 0 0 27 0 21540
MBINFO 1746 1746 100.00 0 0 0 0 274
MINT 480615 480588 99.99 312 0 31 27 26859
MPACT 40349 40199 99.63 0 0 0 150 4995
MPIDB 3238 3090 95.43 0 0 1 148 930
MPPI 3568 3361 94.20 16 0 0 207 833
QUICKGO 136308 134148 98.42 0 0 0 2160 26894
REACTOME 283992 283839 99.95 704 0 0 153 5860
SPIKE 65934 64565 97.92 889 0 17 1369 8809
UNIPROTPP 35584 35584 100.00 1 0 0 0 8132
VIRUSHOST 30000 30000 100.00 0 0 378 0 3882
(All) 2986434 2923710 97.90 42845 87804 26656 62724 153527

Mapping score summary

See below for definitions of the mapping score codes.

BAR BHF_UCL BIND BIND_TRANSLATION BIOGRID CORUM DIP HPIDB HPRD HURI INNATEDB INTACT INTCOMPLEX MATRIXDB MBINFO MINT MPACT MPIDB MPPI QUICKGO REACTOME SPIKE UNIPROTPP VIRUSHOST
P 20640 6185 180023 45936 17272 12521 339113 25501 540427 10270 180764 1746 479471 3075 131370 266602 54981 35563 29622
P+IN 385
P+N 28
PD 127837 7245 3 2994
PD+LQ 10123
PD+LYQ 43
PD+XQ 26
PDQ 31341
PDY 5447 1
PDYQ 16
PE 1114
PGD 675 1844 1 397
PGD+L 6237 3164 3 876
PGD+X 1
PT 8450 3076 19 1 30579 2 2778
PTD 80723 2 2 44
PTD+LQ 4036
PTD+LYQ 8
PTDQ 2696
PTDY 1044
PTDYQ 6
PTGD 18 1
PTGD+L 19 2
PTM 3
PTY 1 1 3 378
PU 139 118 13 8 705 2 539 15 145 739 6 16533 8281 20
PU+L 17 2 33 157 293 704 13
PU+O 46
PU+X 604 1
PUD 81 9 145
PUD+L 7 9 13
PUD+X 54 162
PUT 4 12 8 2527 6
PUT+L 24 42 19 1
PUT+O 14
PUTD 4
PUTD+L 10 3
PV 9 27
PY 10395 6729 4 438 54 2 27 31 1 17
S 2 45 12982 147 3
S+IN 1
S+L 10 214 732
S+LE 1
S+LY 8 63
S+N 4
S+O 243
S+X 215
S+XY 216
SD 5384 4286
SD+L 245 607
SD+N 133
SD+O 10167
SD+X 1268
SDY 10
SE 2
SGD 936
SGD+L 2215
SGD+O 14365
ST 4699 136 7093
ST+L 26 4614
ST+LY 4 36
ST+O 829
STD 733 11603
STD+L 9 1149
STD+O 26380
STGD 2502
STGD+L 6922
STGD+O 35594
STY 25
SUD 70
SUD+L 50 27
SUD+O 7
SUD+X 568
SUTD 23
SUTD+L 32 23
SUTD+O 159
SY 9 1080 8

Mapping score code definitions

Character Description of feature (when the value is 1) align="center" style="background:#f0f0f0;"
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.
E The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.
L More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.
I The protein reference used was an NCBI GenInfo Identifier (I).
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.
P The interaction record's primary (P) reference for the protein was used to make the assignment
S One of the interaction record's secondary (S) references for the protein was used to make the assignment
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record