Difference between revisions of "Statistics iRefIndex 20.0"

From irefindex
Line 18: Line 18:
 
| 6239 ||Caenorhabditis elegans ||44373
 
| 6239 ||Caenorhabditis elegans ||44373
 
|-
 
|-
| 2697049 ||Severe acute respiratory syndrome coronavirus 2 ||26199
+
| 2697049 ||Severe acute respiratory syndrome CoV 2 ||26199
 
|-
 
|-
 
| 83333 ||Escherichia coli K-12 ||17342
 
| 83333 ||Escherichia coli K-12 ||17342

Revision as of 05:38, 28 August 2023

Interactions available from major taxonomies (corrected)

NCBI taxonomy identifier Scientific name Number of interactions
9606 Homo sapiens 1188264
559292 Saccharomyces cerevisiae S288C 163873
10090 Mus musculus 111999
3702 Arabidopsis thaliana 88692
7227 Drosophila melanogaster 86809
4577 Zea mays 51057
6239 Caenorhabditis elegans 44373
2697049 Severe acute respiratory syndrome CoV 2 26199
83333 Escherichia coli K-12 17342
10116 Rattus norvegicus 16126
4896 Schizosaccharomyces pombe 14141
316407 Escherichia coli str. K-12 substr. W3110 12824
192222 Campylobacter jejuni subsp. jejuni NCTC 11168 11931
381518 Influenza A virus (A/Wilson-Smith/1933(H1N1)) 5102

Summary of mapping interaction records to RIGs (redundant interaction groups)

Source: Interaction data source. Total records: Total number of interaction records found in source. Protein-only interactors:Total number of interactions involving only protein interactors. PPI assigned to RIGID: Number of interactions where all protein interactors were assigned to a ROG. Percentage of column 3 is shown. Unique RIGIDs (interactions): Number of unique protein interactions and complexes (RIGID's) found in the data source (also expressed as a percentage of column 4). For a description of the term RIGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source Total records Protein-related interactions PPI assigned to RIGID % Unique RIGIDs %
BAR 9641 9637 9631 99.94 9511 98.75
BHF_UCL 2341 2325 2325 100.00 1512 65.03
BIND 157736 91309 68064 74.54 49512 72.74
BIND_TRANSLATION 162914 72276 70695 97.81 55331 78.27
BIOGRID 4932724 3080120 3064694 99.50 1215219 39.65
CORUM 4274 4274 4270 99.91 4018 94.10
DIP 81731 80134 79875 99.68 77465 96.98
HPIDB 3019 2845 2845 100.00 1560 54.83
HPRD 83022 83022 83022 100.00 40559 48.85
HURI 171545 168756 168750 100.00 51482 30.51
IMEX 658536 638460 638442 100.00 351421 55.04
INNATEDB 18408 18408 6830 37.10 4775 69.91
INTACT 794733 733322 733199 99.98 431054 58.79
INTCOMPLEX 4158 3391 3391 100.00 3340 98.50
MATRIXDB 1776 1775 1775 100.00 1419 79.94
MBINFO 542 522 522 100.00 331 63.41
MINT 87982 87366 87351 99.98 50482 57.79
MPACT 16504 16504 16373 99.21 13398 81.83
MPIDB 1497 1458 1458 100.00 920 63.10
MPPI 1814 1758 1578 89.76 776 49.18
QUICKGO 62188 53367 53367 100.00 25366 47.53
REACTOME 141996 141996 141844 99.89 126328 89.06
UNIPROTPP 17004 16904 16904 100.00 9823 58.11
VIRUSHOST 55115 55115 55115 100.00 47007 85.29
(All) 7471200 5365044 5312320 99.02 1808539 34.04

Assignment of protein interactors to ROGs (redundant object group)

Source: Interaction data source (see methods). Protein interactors: Total number of interactors found in all interaction records. Assigned: Number of proteins assigned unambiguously to a ROG. Assignments listed in columns 5 and 6 are not included here. %: Column 3 expressed as a percentage of column 2. Arbitrary: Total number of ROG assignments that were ambiguous and resolved with an arbitrary method (see ROG scores with 'L'). Matching sequence: Total number of assignments made where a sequence in the interaction record matched a known sequence. Unassigned:Total number of protein interactors that could not be assigned to a ROG. Unique: Total number of unique proteins (ROG's). For a description of the term ROGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source Protein interactors Assigned % Arbitrary Matching sequence New or obsolete sequence Unassigned Unique proteins
BAR 19431 19425 99.97 0 0 0 6 2299
BHF_UCL 6182 6182 100.00 0 0 0 0 1789
BIND 252251 215207 85.31 17 0 11428 37044 30232
BIND_TRANSLATION 217051 214024 98.61 11509 0 15128 3027 33454
BIOGRID 161882 159221 98.36 8311 0 22218 2661 80008
CORUM 17317 17313 99.98 2 0 62 4 6124
DIP 28066 27913 99.45 654 0 2418 153 27154
HPIDB 7720 7720 100.00 0 0 0 0 2478
HPRD 123812 123812 100.00 16285 95204 161 0 9842
HURI 340295 340289 100.00 63 0 2252 6 8181
IMEX 1686189 1686169 100.00 464 0 53 20 104243
INNATEDB 42658 25328 59.37 0 0 2 17330 3718
INTACT 702160 701983 99.97 176 63 430 177 115488
INTCOMPLEX 15582 15582 100.00 0 9 0 0 8359
MATRIXDB 66584 66584 100.00 0 0 488 0 14884
MBINFO 1136 1136 100.00 0 0 3 0 274
MINT 223411 223394 99.99 124 0 9 17 27894
MPACT 40349 40199 99.63 0 0 0 150 4995
MPIDB 3151 3151 100.00 0 0 0 0 962
MPPI 3568 3361 94.20 16 0 0 207 833
QUICKGO 115555 115555 100.00 0 0 0 0 20966
REACTOME 283992 283839 99.95 704 0 0 153 5860
UNIPROTPP 50254 50254 100.00 1 0 0 0 11129
VIRUSHOST 110230 110230 100.00 0 0 909 0 10586
(All) 4518826 4457871 98.65 38326 95276 55561 60955 176285

Mapping score summary

See below for definitions of the mapping score codes.

BAR BHF_UCL BIND BIND_TRANSLATION BIOGRID CORUM DIP HPIDB HPRD HURI IMEX INNATEDB INTACT INTCOMPLEX MATRIXDB MBINFO MINT MPACT MPIDB MPPI QUICKGO REACTOME UNIPROTPP VIRUSHOST
P 19270 6173 174608 50530 17216 7720 337140 1562392 25459 624153 13622 66472 1136 218609 3155 111515 266602 47710 109363
PD 123093 7998 3 2994
PD+LQ 9459
PD+LYQ 39
PDQ 31177
PD+XQ 26
PD+XYQ 50
PDY 9815 2
PDYQ 12
PE 417
PGD 627 1966 3
PGD+L 6287 4076 4
PGD+X 2
P+IN 385
P+N 15
PT 8779 3318 19 4 3 2 30579
PTD 80564 2 2 44
PTD+LQ 4163
PTDQ 2854
PTDY 1579
PTDYQ 6
PTGD 14
PTGD+L 23 2
PTM 3
PTY 250 16 3
PU 155 9 118 14 4 975 1324 2 532 14 55 335 16533 68
PUD 81 12 145
PUD+L 7 11 13
PUD+X 54 162
PU+L 11 2 39 438 156 115 704 1
PU+O 49 9
PUT 4 2 12 4 2527
PUTD 4
PUTD+L 10 3
PUT+L 31 9 8 7
PUT+O 14
PU+X 604 3 1
PV 9 9 9
PY 15198 10477 59 2135 76 2 30 57 10 2 867
S 2 62 12010 79 3
SD 5348 2567
SD+L 229 442
SD+LY 3
SD+N 121
SD+O 11679
SD+OY 2
SD+X 1283
SDY 15
SE 1
SGD 698
SGD+L 2684
SGD+O 14727
S+IN 1
S+L 7 210 757
S+LY 3 88
S+N 4
S+O 309
ST 4686 100 7093
STD 752 7270
STD+L 11 996
STD+O 30211
STD+OY 27
STDY 3
STGD 1615
STGD+L 7572
STGD+O 36908
ST+L 22 3962
ST+LY 5
ST+O 865
STY 43
SUD 79 2
SUD+L 44 25
SUD+O 7
SUD+X 569
SUTD 24 2
SUTD+L 36 23
SUTD+O 159
S+X 196
S+XY 233
SY 8 2028 7

Mapping score code definitions

Character Description of feature (when the value is 1) align="center" style="background:#f0f0f0;"
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.
E The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.
L More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.
I The protein reference used was an NCBI GenInfo Identifier (I).
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.
P The interaction record's primary (P) reference for the protein was used to make the assignment
S One of the interaction record's secondary (S) references for the protein was used to make the assignment
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record