Difference between revisions of "Statistics iRefIndex 5.0"

From irefindex
Line 317: Line 317:
 
== Scores (Corresponds to Table 2 in PMID 18823568) ==
 
== Scores (Corresponds to Table 2 in PMID 18823568) ==
  
 +
{|
 +
| align="center" style="background:#f0f0f0;"|'''Character'''||align="center" style="background:#f0f0f0;"|'''Description of feature (when the value is 1)'''||align="center" style="background:#f0f0f0;"|'''Frequency'''
 +
|-
 +
| D||The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.||11071(1.5417%)
 +
|-
 +
| E||The protein reference was a retired NCBI Identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence.||14570(2.029%)
 +
|-
 +
| G||The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.||894(0.1245%)
 +
|-
 +
| L||More than one possible assignment is possible (see + above). The assignment with the largest (L) SEGUID is arbitrarily chosen (see Methods)||11569(1.6111%)
 +
|-
 +
| M||The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.||1675(0.2333%)
 +
|-
 +
| +||More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).||11656(1.6232%)
 +
|-
 +
| N||The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.||15144(2.109%)
 +
|-
 +
| O||More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.||711(0.099%)
 +
|-
 +
| I||The protein reference used was an NCBI GenInfo Identifier (I).||20081(2.7965%)
 +
|-
 +
| U||The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.||23836(3.3194%)
 +
|-
 +
| T||The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made||38036(5.2969%)
 +
|-
 +
| V||The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.||914(0.1273%)
 +
|-
 +
| Q||The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.||42(0.0058%)
 +
|-
 +
| P||The interaction record's primary (P) reference for the protein was used to make the assignment||635920(88.5583%)
 +
|-
 +
| S||One of the interaction record's secondary (S) references for the protein was used to make the assignment||82161(11.4417%)
 +
|-
 +
| X||More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record||37(0.0052%)
 +
|}
  
  
 
[[Category:iRefIndex]]
 
[[Category:iRefIndex]]

Revision as of 13:06, 23 July 2009

Summary

  • Total interactions : 871,172
  • Total distinct interactions (based on RIGID): 357,156
  • Total distinct proteins (based on ROGID) : 83,938

This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build Sources_iRefIndex_4.0. This full build of the iRefIndex contains data that cannot be redistributed according to usage policies of the source databases (namely, from DIP, HPRD and MPact databases). Please contact ian.donaldson at biotek.uio.no if you would like to obtain a copy of the full iRefIndex build under an academic, collaborative agreement.

The data that are freely available at ftp://ftp.no.embnet.org/irefindex/data/current/ are a subset of the full build that we can freely redistribute according to the usage policies of the source databases. Please refer to http://irefindex.uio.no/wiki/Statistics_iRefIndex_free_4.0 for statistics that are applicable to this free dataset.

Interactions available from major taxonomies

Top 15 uncorrected taxonomy groups in iRefIndex (Taxonomy identifiers as they appear in original source)

NCBI taxonomy identifier Name Number of interactions
4932 Saccharomyces cerevisiae 115570
9606 Homo sapiens 103700
7227 Drosophila melanogaster 46240
40674 Mammalia 35023
197 Campylobacter jejuni 11998
6239 Caenorhabditis elegans 11793
284812 Schizosaccharomyces pombe 972h- 11556
10090 Mus musculus 9964
562 Escherichia coli 8512
3702 Arabidopsis thaliana 5348
160 Treponema pallidum 3646
83333 Escherichia coli K-12 3490
10116 Rattus norvegicus 3477
1142 Synechocystis 3057
36329 Plasmodium falciparum 3D7 2731

Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)

4932 Saccharomyces cerevisiae 115562
9606 Homo sapiens 114618
7227 Drosophila melanogaster 46244
197 Campylobacter jejuni 11998
4896 Schizosaccharomyces pombe 11831
6239 Caenorhabditis elegans 11793
10090 Mus musculus 8569
83333 Escherichia coli K-12 7482
3702 Arabidopsis thaliana 5354
155864 Escherichia coli O157:H7 EDL933 4928
160 Treponema pallidum 3646
1148 Synechocystis sp. PCC 6803 3166
36329 Plasmodium falciparum 3D7 2731
10116 Rattus norvegicus 2650
85962 Helicobacter pylori 26695 1598

Interactions (Corresponds to Table 6 in PMID 18823568)

BIND 62896
BIOGRID 20564 164770
DIP 25930 29004 56430
HPRD 2947 2016 858 39966
INTACT 24400 26946 25008 8442 113877
MINT 22066 34683 30052 6563 45338 76602
MPACT 6938 8489 6793 0 6132 6426 13321
MPPI 385 27 41 304 89 71 0 826
OPHID 2226 1357 899 18063 7248 6396 0 183 47297
CORUM 116 18 29 403 122 66 0 9 158 1919
BIND BIOGRID DIP HPRD INTACT MINT MPACT MPPI OPHID CORUM
(25793) (111301) (13496) (16472) (56805) (15495) (1132) (235) (26461) (1398)

Interactors

BIND 40783
BIOGRID 14503 27694
DIP 15398 13106 20108
HPRD 3357 2522 1251 9750
INTACT 18205 16989 15721 5967 42061
MINT 16198 15112 14999 4669 23460 28424
MPACT 4651 4560 4639 0 4874 4756 4972
MPPI 674 219 294 430 579 505 0 861
OPHID 3253 2325 1242 7426 5809 4708 1 422 9626
CORUM 1561 756 671 1867 2312 1733 0 322 1849 3581
BIND BIOGRID DIP HPRD INTACT MINT MPACT MPPI OPHID CORUM
(18529) (8247) (1834) (1094) (12125) (3228) (16) (42) (1257) (620)

Summary of mapping interaction records to RIGs (Corresponds to Table 5 in PMID 18823568)

Source Total records Protein-only interactors PPI Assigned to RIGID Unique RIGIDs
bind 193648 93957 91265(97.1349%) 62896(68.9158%)
grid 242126 242126 241823(99.8749%) 164770(68.1366%)
dip 57675 57675 56597(98.1309%) 56430(99.7049%)
intact 133302 132525 132077(99.6620%) 113877(86.2202%)
mint 109398 109398 107808(98.5466%) 76602(71.0541%)
HPRD 40075 40075 40075(100.0000%) 39966(99.7280%)
ophid 73257 73257 72907(99.5222%) 47297(64.8731%)
MPACT 16504 16504 16286(98.6791%) 13321(81.7942%)
MPPI 1814 1814 1688(93.0540%) 826(48.9336%)
CORUM 2104 2104 2104(100.0000%) 1919(91.2072%)
ALL 869903 769435 762630(99.1156%) 372649(48.8637%)

Assignment of protein interactors to ROGs (Corresponds to Table 3 in PMID 18823568)

Source Protein_Intractors Assigned % Arbitrary New Unassigned Unique proteins
bind 285482 273658 95.8582 0 7887 3937 40783
CORUM 10316 10314 99.9806 0 2 0 3581
dip 20728 18527 89.3815 1246 477 478 20108
grid 29599 19354 65.3873 10141 6 98 27694
HPRD 9773 9676 99.0075 55 42 0 9750
intact 100752 97166 96.4408 19 3323 244 42061
mint 76898 73745 95.8998 2 2678 473 28424
MPACT 40349 40112 99.4126 0 0 237 4972
MPPI 3628 3457 95.2867 0 30 141 861
ophid 146423 145362 99.2754 103 699 259 9626
All 723948 691371 95.5001 11566 15144 5867 83809

ROG summary (Corresponds to Table 4 in PMID 18823568)

Decimal_score Binary_flag String_score Score_class Proteins Percentage BIND BioGrid DIP MINT HPRD OPHID MPPI MPACT IntAct CORUM
1 000000000000000001 P 1 565134 78.0628% 232690 7520 0 71606 0 125715 3023 30666 93914 0
554 000000001000101010 SVGO 1 624 0.0862% 0 0 0 0 624 0 0 0 0 0
66 000000000001000010 SD 1 2 0.0003% 0 2 0 0 0 0 0 0 0 0
65 000000000001000001 PD 1 9581 1.3234% 8084 1494 0 3 0 0 0 0 0 0
42 000000000000101010 SVG 1 163 0.0225% 0 0 0 0 163 0 0 0 0 0
8193 000010000000000001 PI 1 49 0.0068% 0 2 0 0 0 0 0 0 47 0
129 000000000010000001 PM 1 523 0.0722% 473 1 0 0 0 0 32 0 17 0
8194 000010000000000010 SI 1 12399 1.7127% 12336 63 0 0 0 0 0 0 0 0
10 000000000000001010 SV 1 13 0.0018% 0 0 2 4 0 0 0 0 7 0
2 000000000000000010 S 1 35124 4.8517% 0 7473 17447 252 2772 0 0 6927 253 0
130 000000000010000010 SM 1 570 0.0787% 0 570 0 0 0 0 0 0 0 0
778 000000001100001010 SVO+ 2 1 0.0001% 0 0 0 0 0 0 0 0 1 0
774 000000001100000110 SUO+ 2 1 0.0001% 0 0 0 0 0 0 0 0 1 0
16385 000100000000000001 PE 2 184 0.0254% 0 0 0 0 0 0 0 0 184 0
16386 000100000000000010 SE 2 5414 0.7478% 5414 0 0 0 0 0 0 0 0 0
773 000000001100000101 PUO+ 2 12 0.0017% 0 0 0 3 0 1 0 0 8 0
5 000000000000000101 PU 2 22812 3.1511% 0 0 0 264 0 19519 320 2519 190 0
6 000000000000000110 SU 2 767 0.1059% 0 690 60 4 5 0 0 0 8 0
145 000000000010010001 PTM 3 170 0.0235% 132 0 0 0 0 0 35 0 3 0
8210 000010000000010010 STI 3 905 0.1250% 855 50 0 0 0 0 0 0 0 0
8209 000010000000010001 PTI 3 12 0.0017% 0 0 0 0 0 0 0 0 12 0
17 000000000000010001 PT 3 26392 3.6456% 11873 0 0 1604 0 122 47 0 2456 10290
18 000000000000010010 ST 3 8547 1.1806% 0 1487 1015 0 6042 0 0 0 3 0
26 000000000000011010 SVT 3 1 0.0001% 0 0 0 0 0 0 0 0 1 0
81 000000000001010001 PTD 3 1487 0.2054% 1486 0 0 1 0 0 0 0 0 0
146 000000000010010010 STM 3 1 0.0001% 0 1 0 0 0 0 0 0 0 0
790 000000001100010110 SUTO+ 4 1 0.0001% 0 0 0 0 1 0 0 0 0 0
16401 000100000000010001 PTE 4 3 0.0004% 0 0 0 0 0 0 0 0 3 0
789 000000001100010101 PUTO+ 4 14 0.0019% 0 0 0 0 0 0 0 0 14 0
16402 000100000000010010 STE 4 315 0.0435% 315 0 0 0 0 0 0 0 0 0
22 000000000000010110 SUT 4 18 0.0025% 0 1 3 0 14 0 0 0 0 0
131073 100000000000000001 PQ 5 2 0.0003% 0 0 0 0 0 0 0 0 2 0
21 000000000000010101 PUT 5 33 0.0046% 0 0 0 4 0 5 0 0 0 24
12546 000011000100000010 SLI+ 5 6716 0.9277% 0 6716 0 0 0 0 0 0 0 0
131077 100000000000000101 PUQ 5 1 0.0001% 0 0 0 0 0 0 0 0 1 0
131089 100000000000010001 PTQ 5 38 0.0052% 0 0 0 0 0 0 0 0 38 0
4373 000001000100010101 PUTL+ 5 9 0.0012% 0 0 0 1 0 0 0 0 8 0
4354 000001000100000010 SL+ 5 4208 0.5813% 0 3014 1194 0 0 0 0 0 0 0
1802 000000011100001010 SVOX+ 5 3 0.0004% 0 0 0 0 0 0 0 0 3 0
810 000000001100101010 SVGO+ 5 55 0.0076% 0 0 0 0 55 0 0 0 0 0
4357 000001000100000101 PUL+ 5 84 0.0116% 0 0 0 0 0 84 0 0 0 0
4374 000001000100010110 SUTL+ 5 52 0.0072% 0 0 52 0 0 0 0 0 0 0
4394 000001000100101010 SVGL+ 5 52 0.0072% 0 0 0 0 52 0 0 0 0 0
4482 000001000110000010 SML+ 5 411 0.0568% 0 411 0 0 0 0 0 0 0 0
5381 000001010100000101 PUXL+ 5 29 0.0040% 0 0 0 0 0 19 0 0 10 0
5382 000001010100000110 SUXL+ 5 3 0.0004% 0 0 0 0 3 0 0 0 0 0
5386 000001010100001010 SVXL+ 5 2 0.0003% 0 0 0 1 0 0 0 0 1 0
86274 010101000100000010 SLEN+ 6 3 0.0004% 0 2 1 0 0 0 0 0 0 0
81938 010100000000010010 STEN 6 24 0.0033% 24 0 0 0 0 0 0 0 0 0
81937 010100000000010001 PTEN 6 3 0.0004% 3 0 0 0 0 0 0 0 0 0
81922 010100000000000010 SEN 6 5766 0.7965% 5397 4 364 1 0 0 0 0 0 0
81921 010100000000000001 PEN 6 2858 0.3948% 2462 0 0 49 0 98 30 0 217 2
65601 010000000001000001 PDN 6 1 0.0001% 1 0 0 0 0 0 0 0 0 0
65553 010000000000010001 PTN 6 10 0.0014% 0 0 0 0 0 0 0 0 10 0
65537 010000000000000001 PN 6 6478 0.8948% 0 0 112 2628 42 601 0 0 3095 0
196625 110000000000010001 PTNQ 6 1 0.0001% 0 0 0 0 0 0 0 0 1 0

Scores (Corresponds to Table 2 in PMID 18823568)

Character Description of feature (when the value is 1) Frequency
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made. 11071(1.5417%)
E The protein reference was a retired NCBI Identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. 14570(2.029%)
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment. 894(0.1245%)
L More than one possible assignment is possible (see + above). The assignment with the largest (L) SEGUID is arbitrarily chosen (see Methods) 11569(1.6111%)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made. 1675(0.2333%)
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below). 11656(1.6232%)
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier. 15144(2.109%)
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record. 711(0.099%)
I The protein reference used was an NCBI GenInfo Identifier (I). 20081(2.7965%)
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment. 23836(3.3194%)
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made 38036(5.2969%)
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420. 914(0.1273%)
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'. 42(0.0058%)
P The interaction record's primary (P) reference for the protein was used to make the assignment 635920(88.5583%)
S One of the interaction record's secondary (S) references for the protein was used to make the assignment 82161(11.4417%)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record 37(0.0052%)