Statistics iRefIndex 19.0

From irefindex
Revision as of 06:49, 30 October 2022 by Admin (talk | contribs) (Created page with "== Interactions available from major taxonomies (corrected) == Taxons of the protein interactors have been corrected to correspond to the taxon provided in the protein sequenc...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Interactions available from major taxonomies (corrected)

Taxons of the protein interactors have been corrected to correspond to the taxon provided in the protein sequence record regardless of the taxon listed in the interaction record. See PMID 18823568 for details.

NCBI taxonomy identifier Scientific name Number of interactions
9606 Homo sapiens 1096914
559292 Saccharomyces cerevisiae S288C 155935
10090 Mus musculus 104631
7227 Drosophila melanogaster 86357
3702 Arabidopsis thaliana 87766
6239 Caenorhabditis elegans 41801
10116 Rattus norvegicus 15952
316407 Escherichia coli str. K-12 substr. W3110 12822
4896 Schizosaccharomyces pombe 13816
192222 Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819 11931
83333 Escherichia coli K-12 16903
381518 Influenza A virus (A/Wilson-Smith/1933(H1N1)) 5104
632 Yersinia pestis 4147
2697049 Severe acute respiratory syndrome coronavirus 22047

Summary of mapping interaction records to RIGs (redundant interaction groups)

Source: Interaction data source. Total records: Total number of interaction records found in source. Protein-only interactors:Total number of interactions involving only protein interactors. PPI assigned to RIGID: Number of interactions where all protein interactors were assigned to a ROG. Percentage of column 3 is shown. Unique RIGIDs (interactions): Number of unique protein interactions and complexes (RIGID's) found in the data source (also expressed as a percentage of column 4). For a description of the term RIGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source Total records Protein-related interactions PPI assigned to RIGID % Unique RIGIDs %
BAR 23432 23432 23420 99.95 23414 99.97
BHF_UCL 2341 2326 2326 100.00 1513 65.05
BIND 157736 153063 73206 47.83 54156 73.98
BIND_TRANSLATION 192923 84138 74093 88.06 56577 76.36
BIOGRID 1855067 967393 963176 99.56 727908 75.57
CORUM 4274 4274 4270 99.91 4018 94.10
DIP 76796 0 0 \N 0 \N
HPIDB 3019 2845 2845 100.00 1560 54.83
HPRD 83022 83022 83022 100.00 40548 48.84
HURI 171545 168756 168750 100.00 51482 30.51
I2D_IMEX 596071 575676 575479 99.97 292694 50.86
INNATEDB 18408 18408 6895 37.46 4809 69.75
INTACT 731876 671221 671089 99.98 372746 55.54
INTCOMPLEX 3287 2658 2658 100.00 2611 98.23
MATRIXDB 1776 1775 1775 100.00 1419 79.94
MBINFO 542 522 522 100.00 331 63.41
MINT 85784 85180 85165 99.98 48860 57.37
MPACT 16504 16504 16373 99.21 13398 81.83
MPIDB 1497 1461 1461 100.00 923 63.18
MPPI 1814 1758 1578 89.76 776 49.18
QUICKGO 58761 50321 49835 99.03 23889 47.94
REACTOME 141996 141996 141844 99.89 126328 89.06
SPIKE 29686 29686 28329 95.43 27830 98.24
UNIPROTPP 15042 14950 14950 100.00 8692 58.14
VIRUSHOST 40005 40005 40004 100.00 35076 87.68
(All) 4313204 3141370 3033065 96.55 1269053 41.84

Assignment of protein interactors to ROGs (redundant object group)

Source: Interaction data source (see methods). Protein interactors: Total number of interactors found in all interaction records. Assigned: Number of proteins assigned unambiguously to a ROG. Assignments listed in columns 5 and 6 are not included here. %: Column 3 expressed as a percentage of column 2. Arbitrary: Total number of ROG assignments that were ambiguous and resolved with an arbitrary method (see ROG scores with 'L'). Matching sequence: Total number of assignments made where a sequence in the interaction record matched a known sequence. Unassigned:Total number of protein interactors that could not be assigned to a ROG. Unique: Total number of unique proteins (ROG's). For a description of the term ROGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source Protein interactors Assigned % Arbitrary Matching sequence New or obsolete sequence Unassigned Unique proteins
BAR 46864 46852 99.97 1 0 0 12 4649
BHF_UCL 6184 6184 100.00 0 0 0 0 1791
BIND 322722 226129 70.07 17 0 41 96593 30475
BIND_TRANSLATION 257681 235695 91.47 20505 0 56 21986 33559
BIOGRID 73125 71929 98.36 3158 0 1 1196 71603
CORUM 17317 17313 99.98 2 0 7 4 6124
HPIDB 7724 7724 100.00 0 0 0 0 2481
HPRD 123812 123812 100.00 16529 87317 162 0 9838
HURI 340295 340289 100.00 39 0 941 6 8181
I2D_IMEX 1496885 1496674 99.99 445 0 8 211 86702
INNATEDB 42658 25490 59.75 0 0 2 17168 3739
INTACT 601576 601387 99.97 170 62 512 189 99546
INTCOMPLEX 11807 11807 100.00 0 0 0 0 6645
MATRIXDB 66584 66584 100.00 0 0 22 0 14891
MBINFO 1136 1136 100.00 0 0 0 0 274
MINT 229166 229149 99.99 141 0 0 17 28458
MPACT 40349 40199 99.63 0 0 0 150 4995
MPIDB 3157 3157 100.00 0 0 0 0 967
MPPI 3568 3361 94.20 16 0 0 207 833
QUICKGO 109082 108565 99.53 0 0 0 517 20082
REACTOME 283992 283839 99.95 704 0 0 153 5860
SPIKE 65934 64567 97.93 888 0 19 1367 8810
UNIPROTPP 42085 42085 100.00 1 0 0 0 9468
VIRUSHOST 80010 80009 100.00 0 0 0 1 9892
(All) 4273713 4133936 96.73 42616 87379 1771 139777 153059

Mapping score summary

See below for definitions of the mapping score codes.

BAR BHF_UCL BIND BIND_TRANSLATION BIOGRID CORUM HPIDB HPRD HURI I2D_IMEX INNATEDB INTACT INTCOMPLEX MATRIXDB MBINFO MINT MPACT MPIDB MPPI QUICKGO REACTOME SPIKE UNIPROTPP VIRUSHOST
P 46456 6175 187409 56273 17271 7720 338595 1494943 25486 600045 11790 66522 1136 228607 3157 108565 266602 54963 42029 80009
P+IN 385
P+N 116
PD 133916 7424 2994
PD+LQ 9895
PD+LYQ 39
PD+XQ 26
PDQ 15743
PDY 33 1
PDYQ 11
PGD 675 1888 400
PGD+L 6236 3108 4 875
PGD+X 1
PT 8417 3112 19 1 2 30579 2
PTD 92006 2 1 44
PTD+LQ 4274
PTDQ 2156
PTDY 8
PTDYQ 6
PTGD 18 1
PTGD+L 19 2
PTM 3
PTY 3
PU 395 9 118 14 4 714 1264 2 560 17 40 388 16533 8297 53
PU+L 1 17 2 39 436 158 134 704 13 1
PU+O 48
PU+X 604 3 1
PUD 81 9 145
PUD+L 7 10 13
PUD+X 54 162
PUT 4 1 11 4 2527
PUT+L 25 9 8 7
PUT+O 14
PUTD 14
PUTD+L 10 3
PV 9 9 9
PY 4 941 8 2 6 22 19
S 3 60 147 11
S+IN 1
S+L 38 737
S+N 4
S+O 243
SD 4279
SD+L 607
SD+N 133
SD+O 10176
SD+OY 2
SGD 1012
SGD+L 2227
SGD+O 14252
ST 136 7093
ST+L 4787
ST+O 829
STD 11722
STD+L 1181
STD+O 26255
STD+OY 27
STGD 2537
STGD+L 6940
STGD+O 35367
SUD+L 27
SUD+O 7
SUTD+L 23
SUTD+O 159

Mapping score code definitions

Character Description of feature (when the value is 1) align="center" style="background:#f0f0f0;"
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.
E The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.
L More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.
I The protein reference used was an NCBI GenInfo Identifier (I).
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.
P The interaction record's primary (P) reference for the protein was used to make the assignment
S One of the interaction record's secondary (S) references for the protein was used to make the assignment
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record