Difference between revisions of "Statistics iRefIndex 4.0"

From irefindex
Line 143: Line 143:
 
|}
 
|}
  
== ROG summary ==
+
== ROG summary (Corresponds to Table 4 in PMID 18823568) ==
  
 
{|
 
{|

Revision as of 11:28, 9 June 2009

Summary

  • Total distinct interactions : 369,457
  • Total distinct proteins : 83,388

This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build Sources_iRefIndex_4.0. This full build of the iRefIndex contains data that cannot be redistributed according to usage policies of the source databases (namely, from DIP, HPRD and MPact databases). Please contact ian.donaldson at biotek.uio.no if you would like to obtain a copy of the full iRefIndex build under an academic, collaborative agreement.

The data that are freely available at ftp://ftp.no.embnet.org/irefindex/data/current/ are a subset of the full build that we can freely redistribute according to the usage policies of the source databases. Please refer to http://irefindex.uio.no/wiki/Statistics_iRefIndex_free_4.0 for statistics that are applicable to this free dataset.

Interactions available from major taxonomies

Taxonomy Number of interactions
562 (Escherichia coli ) 1243
4932 (Saccharomyces cerevisiae ) 116321
6239 (Caenorhabditis elegans) 11912
7227 (Drosophila melanogaster) 46794
9606 (Homo sapiens) 117535
10090 (Mus musculus) 10098
10116 (Rattus norvegicus) 3372

Interactions (Corresponds to Table 6 in PMID 18823568)

BIND 62921
BIOGRID 20497 163891
DIP 25914 28969 56441
HPRD 2893 1958 839 37956
INTACT 24239 25653 24807 8075 111235
MINT 21991 34654 29988 6270 45260 76607
MPACT 6904 8489 6777 0 6087 6426 13321
MPPI 385 26 41 303 89 71 0 829
OPHID 2210 1333 887 17913 7196 6396 0 183 47297
CORUM 113 18 29 390 121 66 0 9 158 1919
BIND BIOGRID DIP HPRD INTACT MINT MPACT MPPI OPHID CORUM
(25903) (111633) (13594) (15201) (55807) (15712) (1137) (238) (26571) (1403)

Interactors

BIND 40801
BIOGRID 14442 27471
DIP 15395 13084 20111
HPRD 3320 2472 1249 9539
INTACT 18121 16827 15687 5792 41587
MINT 16178 15064 14987 4620 23418 28428
MPACT 4638 4560 4632 0 4859 4756 4972
MPPI 671 212 292 429 575 504 0 862
OPHID 3242 2300 1241 7357 5747 4709 1 421 9629
CORUM 1551 746 670 1845 2293 1733 0 321 1849 3581
BIND BIOGRID DIP HPRD INTACT MINT MPACT MPPI OPHID CORUM
(18591) (8169) (1838) (1040) (11893) (3241) (17) (45) (1280) (626)

Summary of mapping interaction records to RIGs (Corresponds to Table 5 in PMID 18823568)

Source Total records Protein-only interactors PPI Assigned to RIGID Unique RIGIDs
bind 193648 93957 91291(97.1625%) 62921 (68.9236%)
grid 240501 240501 240197(99.8736%) 163891 (68.2319%)
dip 57675 57675 56608(98.1500%) 56441 (99.7050%)
intact 129092 128326 127893(99.6626%) 111235 (86.9750%)
mint 109412 109412 107823(98.5477%) 76607 (71.0488%)
HPRD 38037 38037 38028(99.9763%) 37956 (99.8107%)
ophid 73257 73257 72907(99.5222%) 47297 (64.8731%)
MPACT 16504 16504 16286(98.6791%) 13321 (81.7942%)
MPPI 1814 1814 1697(93.5502%) 829 (48.8509%)
CORUM 2104 2104 2104(100.0000%) 1919 (91.2072%)
ALL 862044 761587 754834(99.1133%) 369457 (48.9455%)

Assignment of protein interactors to ROGs (Corresponds to Table 3 in PMID 18823568)

Source Protein_Intractors Assigned % Arbitrary New Unassigned Unique proteins
bind 285482 273646 95.8540 0 7942 3894 40801
CORUM 10316 10314 99.9806 0 2 0 3581
dip 20728 18513 89.3140 1261 479 475 20111
grid 29318 19162 65.3592 10053 5 98 27471
HPRD 9565 9493 99.2473 53 15 4 9539
intact 97988 94387 96.3251 18 3347 236 41587
mint 76908 73745 95.8873 2 2689 472 28428
MPACT 40349 40112 99.4126 0 0 237 4972
MPPI 3628 3456 95.2591 0 39 133 862
ophid 146423 145362 99.2754 103 699 259 9629
All 720705 688190 95.4884 11490 15217 5808 83388

ROG summary (Corresponds to Table 4 in PMID 18823568)

Decimal_score Binary_flag String_score Score_class Proteins Percentage BIND BioGrid DIP MINT HPRD OPHID MIIP MPACT
1802 000000011100001010 SVOX+ -1 4 0.0006% 0 0 0 0 0 0 0 0
1 000000000000000001 P 1 562341 78.0265% 232685 7503 0 71616 0 125715 3023 30666
810 000000001100101010 SVGO+ 1 59 0.0082% 0 0 0 0 59 0 0 0
8193 000010000000000001 PI 1 48 0.0067% 0 2 0 0 0 0 0 0
554 000000001000101010 SVGO 1 611 0.0848% 0 0 0 0 611 0 0 0
130 000000000010000010 SM 1 551 0.0765% 0 551 0 0 0 0 0 0
66 000000000001000010 SD 1 2 0.0003% 0 2 0 0 0 0 0 0
65 000000000001000001 PD 1 9533 1.3227% 8084 1446 0 3 0 0 0 0
42 000000000000101010 SVG 1 149 0.0207% 0 0 0 0 149 0 0 0
8194 000010000000000010 SI 1 12395 1.7198% 12336 59 0 0 0 0 0 0
10 000000000000001010 SV 1 13 0.0018% 0 0 2 4 0 0 0 0
2 000000000000000010 S 1 34755 4.8224% 0 7402 17432 242 2510 0 0 6927
129 000000000010000001 PM 1 523 0.0726% 473 1 0 0 0 0 32 0
5 000000000000000101 PU 2 22818 3.1661% 0 0 0 264 0 19520 320 2519
16385 000100000000000001 PE 2 189 0.0262% 0 0 0 0 0 0 0 0
6 000000000000000110 SU 2 737 0.1023% 0 659 60 4 6 0 0 0
778 000000001100001010 SVO+ 2 1 0.0001% 0 0 0 0 0 0 0 0
774 000000001100000110 SUO+ 2 1 0.0001% 0 0 0 0 0 0 0 0
16386 000100000000000010 SE 2 5405 0.7500% 5405 0 0 0 0 0 0 0
773 000000001100000101 PUO+ 2 9 0.0012% 0 0 0 3 0 0 0 0
146 000000000010010010 STM 3 1 0.0001% 0 1 0 0 0 0 0 0
8209 000010000000010001 PTI 3 13 0.0018% 0 0 0 0 0 0 0 0
8210 000010000000010010 STI 3 903 0.1253% 855 48 0 0 0 0 0 0
18 000000000000010010 ST 3 8651 1.2004% 0 1487 1015 0 6146 0 0 0
81 000000000001010001 PTD 3 1487 0.2063% 1486 0 0 1 0 0 0 0
17 000000000000010001 PT 3 26398 3.6628% 11876 0 0 1604 0 122 46 0
145 000000000010010001 PTM 3 170 0.0236% 132 0 0 0 0 0 35 0
26 000000000000011010 SVT 3 1 0.0001% 0 0 0 0 0 0 0 0
16401 000100000000010001 PTE 4 3 0.0004% 0 0 0 0 0 0 0 0
16402 000100000000010010 STE 4 315 0.0437% 314 0 1 0 0 0 0 0
22 000000000000010110 SUT 4 15 0.0021% 0 1 3 0 11 0 0 0
789 000000001100010101 PUTO+ 4 14 0.0019% 0 0 0 0 0 0 0 0
790 000000001100010110 SUTO+ 4 1 0.0001% 0 0 0 0 1 0 0 0
4354 000001000100000010 SL+ 5 4193 0.5818% 0 2984 1209 0 0 0 0 0
21 000000000000010101 PUT 5 33 0.0046% 0 0 0 4 0 5 0 0
4394 000001000100101010 SVGL+ 5 50 0.0069% 0 0 0 0 50 0 0 0
131073 100000000000000001 PQ 5 2 0.0003% 0 0 0 0 0 0 0 0
131077 100000000000000101 PUQ 5 1 0.0001% 0 0 0 0 0 0 0 0
131089 100000000000010001 PTQ 5 38 0.0053% 0 0 0 0 0 0 0 0
12546 000011000100000010 SLI+ 5 6660 0.9241% 0 6660 0 0 0 0 0 0
4357 000001000100000101 PUL+ 5 84 0.0117% 0 0 0 0 0 84 0 0
5381 000001010100000101 PUXL+ 5 27 0.0037% 0 0 0 0 0 19 0 0
5378 000001010100000010 SXL+ 5 1 0.0001% 0 0 0 0 0 0 0 0
4482 000001000110000010 SML+ 5 409 0.0567% 0 409 0 0 0 0 0 0
5382 000001010100000110 SUXL+ 5 3 0.0004% 0 0 0 0 3 0 0 0
5386 000001010100001010 SVXL+ 5 2 0.0003% 0 0 0 1 0 0 0 0
4374 000001000100010110 SUTL+ 5 52 0.0072% 0 0 52 0 0 0 0 0
4373 000001000100010101 PUTL+ 5 9 0.0012% 0 0 0 1 0 0 0 0
86274 010101000100000010 SLEN+ 6 3 0.0004% 0 2 1 0 0 0 0 0
82034 010100000001110010 STGDEN 6 2 0.0003% 0 0 0 0 2 0 0 0
81938 010100000000010010 STEN 6 24 0.0033% 24 0 0 0 0 0 0 0
81937 010100000000010001 PTEN 6 5 0.0007% 3 0 0 0 0 0 2 0
81922 010100000000000010 SEN 6 5823 0.8080% 5452 3 366 2 0 0 0 0
81921 010100000000000001 PEN 6 2869 0.3981% 2462 0 0 49 0 98 37 0
65601 010000000001000001 PDN 6 1 0.0001% 1 0 0 0 0 0 0 0
65553 010000000000010001 PTN 6 26 0.0036% 0 0 0 0 0 0 0 0
65537 010000000000000001 PN 6 6463 0.8968% 0 0 112 2638 13 601 0 0
196625 110000000000010001 PTNQ 6 1 0.0001% 0 0 0 0 0 0 0 0

Scores (Table 2)

Character Description of feature (when the value is 1) Frequency
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made. 11025(1.5422%)
E The protein reference was a retired NCBI Identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. 14638(2.0476%)
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment. 871(0.1218%)
L More than one possible assignment is possible (see + above). The assignment with the largest (L) SEGUID is arbitrarily chosen (see Methods) 11493(1.6076%)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made. 1654(0.2314%)
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below). 11582(1.6201%)
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier. 15217(2.1286%)
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record. 700(0.0979%)
I The protein reference used was an NCBI GenInfo Identifier (I). 20019(2.8003%)
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment. 23804(3.3297%)
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made 38162(5.3381%)
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420. 890(0.1245%)
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'. 42(0.0059%)
P The interaction record's primary (P) reference for the protein was used to make the assignment 633105(88.5589%)
S One of the interaction record's secondary (S) references for the protein was used to make the assignment 81792(11.4411%)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record 37(0.0052%)