Difference between revisions of "Statistics iRefIndex 8.0"

From irefindex
Line 52: Line 52:
  
 
===Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)===  
 
===Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)===  
 
+
{| {{table}}
 +
| align="center" style="background:#f0f0f0;"|''' NCBI taxonomy identifier'''
 +
| align="center" style="background:#f0f0f0;"|'''Scientific_name'''
 +
| align="center" style="background:#f0f0f0;"|'''Number_of_interactions'''
 +
|-
 +
| 4932||Saccharomyces cerevisiae||186503
 +
|-
 +
| 9606||Homo sapiens||138480
 +
|-
 +
| 7227||Drosophila melanogaster||46921
 +
|-
 +
| 83333||Escherichia coli K-12||17008
 +
|-
 +
| 10090||Mus musculus||14615
 +
|-
 +
| 6239||Caenorhabditis elegans||13831
 +
|-
 +
| 4896||Schizosaccharomyces pombe||13471
 +
|-
 +
| 197||Campylobacter jejuni||12025
 +
|-
 +
| 3702||Arabidopsis thaliana||7996
 +
|-
 +
| 10116||Rattus norvegicus||5057
 +
|-
 +
| 155864||Escherichia coli O157:H7 str. EDL933||4924
 +
|-
 +
| 632||Yersinia pestis||3822
 +
|-
 +
| 160||Treponema pallidum||3647
 +
|-
 +
| 1148||Synechocystis sp. PCC 6803||3229
 +
|-
 +
| 1392||Bacillus anthracis||3087
 +
|-
 +
|
 +
|}
 
* Full list [[http://irefindex.uio.no/wikifiles//images/5/57/Interactions_by_taxonomy_beta8_corected.pdf]]
 
* Full list [[http://irefindex.uio.no/wikifiles//images/5/57/Interactions_by_taxonomy_beta8_corected.pdf]]
  

Revision as of 15:25, 28 December 2010

Summary

Last updated: 2010-12-28

  • Total interaction source records : 1,057,642
  • Total distinct interactions (based on RIGID): 480,368 (45.4188 % of total interactions)
  • Total distinct proteins (based on ROGID) : 91,936

This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build Sources_iRefIndex_8.0.

Interactions available from major taxonomies

Top 15 uncorrected taxonomy groups in iRefIndex (Taxonomy identifiers as they appear in original source)

NCBI taxonomy identifier Scientific_name Number_of_interactions
559292 Saccharomyces cerevisiae S288c 162083
9606 Homo sapiens 130732
4932 Saccharomyces cerevisiae 60164
7227 Drosophila melanogaster 46917
40674 Mammalia 36385
10090 Mus musculus 18085
83333 Escherichia coli K-12 17224
6239 Caenorhabditis elegans 13831
4896 Schizosaccharomyces pombe 13460
197 Campylobacter jejuni 12025
3702 Arabidopsis thaliana 7994
10116 Rattus norvegicus 6688
562 Escherichia coli 5294
632 Yersinia pestis 3818
160 Treponema pallidum 3647
  • Full list [[1]]

Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)

NCBI taxonomy identifier Scientific_name Number_of_interactions
4932 Saccharomyces cerevisiae 186503
9606 Homo sapiens 138480
7227 Drosophila melanogaster 46921
83333 Escherichia coli K-12 17008
10090 Mus musculus 14615
6239 Caenorhabditis elegans 13831
4896 Schizosaccharomyces pombe 13471
197 Campylobacter jejuni 12025
3702 Arabidopsis thaliana 7996
10116 Rattus norvegicus 5057
155864 Escherichia coli O157:H7 str. EDL933 4924
632 Yersinia pestis 3822
160 Treponema pallidum 3647
1148 Synechocystis sp. PCC 6803 3229
1392 Bacillus anthracis 3087
  • Full list [[2]]

Interactions (Corresponds to Table 6 in PMID 18823568)

BIND 62862
GRID 24223 245516
DIP 26437 38962 89630
INTACT 24797 34903 37889 131339
MINT 22660 41142 36873 48037 85802
HPRD 2653 11892 1574 6777 5537 40569
OPHID 2346 8736 1442 7366 6822 13071 47522
MPACT 7084 8466 7002 6171 6470 0 0 13328
MPPI 381 145 63 95 93 212 183 0 830
CORUM 238 172 114 238 119 342 236 0 15 2607
BIND_TRANSLATION 47304 22231 25048 22972 21585 2 0 6883 113 14 49527
BIND GRID DIP INTACT MINT HPRD OPHID MPACT MPPI CORUM BIND_TRANSLATION
(9440) (170793) (27921) (61935) (18079) (19394) (28503) (1152) (267) (1918) (1812)

Interactors

BIND 40752
GRID 17804 31832
DIP 17676 18297 29980
INTACT 18892 22275 24274 51140
MINT 16979 18412 19687 25467 31660
HPRD 3288 6324 3868 6321 4934 9851
OPHID 3377 5783 4203 6770 5379 6712 9642
MPACT 4718 4610 4725 4880 4796 0 1 4978
MPPI 679 454 469 626 562 369 422 0 864
CORUM 2049 2246 2208 3137 2528 2032 2244 0 416 4365
BIND_TRANSLATION 28781 14239 14973 15386 14073 758 794 4303 292 659 30021
BIND GRID DIP INTACT MINT HPRD OPHID MPACT MPPI CORUM BIND_TRANSLATION
(6570) (5419) (2336) (15150) (3987) (1185) (1080) (15) (39) (555) (1003)


Summary of mapping interaction records to RIGs (Corresponds to Table 5 in PMID 18823568)

Source Total records Protein-only interactors PPI Assigned to RIGID Unique RIGIDs
bind 193648 93957 91228(97.0955%) 62862(68.9065%)
grid 362355 357976 357524(99.8737%) 245516(68.6712%)
dip 90994 90994 89911(98.8098%) 89630(99.6875%)
intact 156558 154962 154305(99.5760%) 131339(85.1165%)
mint 122775 122775 122298(99.6115%) 85802(70.1581%)
HPRD 83022 83022 83022(100.0000%) 40569(48.8654%)
ophid 73257 73257 73160(99.8676%) 47522(64.9563%)
MPACT 16504 16504 16293(98.7215%) 13328(81.8020%)
MPPI 1814 1814 1699(93.6604%) 830(48.8523%)
CORUM 2844 2844 2844(100.0000%) 2607(91.6667%)
BIND_Translation 149918 66583 65358(98.1602%) 49527(75.7780%)
ALL 1253689 1064688 1057642(99.3382%) 480368(45.4188%)


Assignment of protein interactors to ROGs (Corresponds to Table 3 in PMID 18823568)

Source Protein_Intractors Assigned % Arbitrary N_and_Y Unassigned Unique proteins
bind 285482 272804 95.5591 0 8705 3924 40752
BIND_Translation 201856 186453 92.3693 61 13807 1497 30021
CORUM 12916 12916 100.0000 0 0 0 4365
dip 30978 29430 95.0029 641 425 478 29980
grid 39352 32304 82.0899 6833 3 212 31832
HPRD 123812 120541 97.3581 3115 156 0 9851
intact 129043 124852 96.7522 37 3780 328 51140
mint 87509 83727 95.6782 2 3639 138 31660
MPACT 40349 40118 99.4275 0 1 230 4978
MPPI 3628 3457 95.2867 0 41 130 864
ophid 146423 145174 99.1470 241 1002 6 9642
All 1101348 1051916 95.5117 10930 31559 6943 91936


ROG summary (Corresponds to Table 4 in PMID 18823568)

Decimal_score Binary_flag String_score Score_class Proteins Percentage bind grid dip intact mint HPRD ophid MPACT BIND_Translation MPPI CORUM
131201 100000000010000001 PMQ -1 21014 1.9080% 0 0 0 0 0 0 0 0 21014 0 0
131217 100000000010010001 PTMQ -1 3856 0.3501% 0 0 0 0 0 0 0 0 3856 0 0
147458 100100000000000010 SEQ -1 1873 0.1701% 0 0 0 0 0 0 0 0 1873 0 0
147474 100100000000010010 STEQ -1 507 0.0460% 0 0 0 1 0 0 0 0 506 0 0
212993 110100000000000001 PENQ -1 258 0.0234% 0 0 0 0 0 0 0 0 258 0 0
139265 100010000000000001 PIQ -1 234 0.0212% 0 0 0 0 0 0 0 0 234 0 0
163969 101000000010000001 PMYQ -1 211 0.0192% 0 0 0 0 0 0 0 0 211 0 0
147457 100100000000000001 PEQ -1 44 0.0040% 0 0 0 0 0 0 0 0 44 0 0
139281 100010000000010001 PTIQ -1 41 0.0037% 0 0 0 0 0 0 0 0 41 0 0
196737 110000000010000001 PMNQ -1 40 0.0036% 0 0 0 0 0 0 0 0 40 0 0
163985 101000000010010001 PTMYQ -1 27 0.0025% 0 0 0 0 0 0 0 0 27 0 0
196609 110000000000000001 PNQ -1 17 0.0015% 0 0 0 1 0 0 0 0 16 0 0
213009 110100000000010001 PTENQ -1 14 0.0013% 0 0 0 0 0 0 0 0 14 0 0
16530 000100000010010010 STME -1 13 0.0012% 13 0 0 0 0 0 0 0 0 0 0
16514 000100000010000010 SME -1 3 0.0003% 3 0 0 0 0 0 0 0 0 0 0
196753 110000000010010001 PTMNQ -1 2 0.0002% 0 0 0 0 0 0 0 0 2 0 0
196673 110000000001000001 PDNQ -1 2 0.0002% 0 0 0 0 0 0 0 0 2 0 0
4358 000001000100000110 SUL+ -1 1 0.0001% 0 0 1 0 0 0 0 0 0 0 0
1 000000000000000001 P 1 708614 64.3406% 231926 19822 0 121076 81586 0 125682 30666 81917 3023 12916
2 000000000000000010 S 1 57858 5.2534% 0 5 28292 361 268 21415 0 6935 582 0 0
554 000000001000101010 SVGO 1 33637 3.0542% 0 0 2 0 0 33635 0 0 0 0 0
8194 000010000000000010 SI 1 12336 1.1201% 12336 0 0 0 0 0 0 0 0 0 0
65 000000000001000001 PD 1 8075 0.7332% 8073 0 0 0 2 0 0 0 0 0 0
42 000000000000101010 SVG 1 2422 0.2199% 0 0 104 0 0 2318 0 0 0 0 0
41 000000000000101001 PVG 1 1895 0.1721% 0 1895 0 0 0 0 0 0 0 0 0
129 000000000010000001 PM 1 548 0.0498% 473 0 0 43 0 0 0 0 0 32 0
10 000000000000001010 SV 1 172 0.0156% 0 0 5 144 23 0 0 0 0 0 0
8193 000010000000000001 PI 1 59 0.0054% 0 0 0 51 8 0 0 0 0 0 0
9 000000000000001001 PV 1 6 0.0005% 0 0 0 1 5 0 0 0 0 0 0
66 000000000001000010 SD 1 4 0.0004% 0 4 0 0 0 0 0 0 0 0 0
130 000000000010000010 SM 1 1 0.0001% 0 0 0 0 0 1 0 0 0 0 0
5 000000000000000101 PU 2 23266 2.1125% 0 0 0 372 116 0 19356 2517 585 320 0
16386 000100000000000010 SE 2 5405 0.4908% 5405 0 0 0 0 0 0 0 0 0 0
16385 000100000000000001 PE 2 156 0.0142% 0 0 0 147 9 0 0 0 0 0 0
6 000000000000000110 SU 2 129 0.0117% 0 0 100 17 4 8 0 0 0 0 0
773 000000001100000101 PUO+ 2 18 0.0016% 0 0 0 9 0 0 9 0 0 0 0
770 000000001100000010 SO+ 2 6 0.0005% 0 0 0 6 0 0 0 0 0 0 0
1797 000000011100000101 PUOX+ 2 2 0.0002% 0 0 0 2 0 0 0 0 0 0 0
774 000000001100000110 SUO+ 2 1 0.0001% 0 0 0 1 0 0 0 0 0 0 0
778 000000001100001010 SVO+ 2 1 0.0001% 0 0 0 1 0 0 0 0 0 0 0
17 000000000000010001 PT 3 89640 8.1391% 11776 10558 0 2543 1706 0 122 0 62888 47 0
18 000000000000010010 ST 3 56632 5.1421% 0 20 925 1 0 55686 0 0 0 0 0
81 000000000001010001 PTD 3 1496 0.1358% 1496 0 0 0 0 0 0 0 0 0 0
8210 000010000000010010 STI 3 855 0.0776% 855 0 0 0 0 0 0 0 0 0 0
145 000000000010010001 PTM 3 189 0.0172% 132 0 0 22 0 0 0 0 0 35 0
8209 000010000000010001 PTI 3 13 0.0012% 0 0 0 13 0 0 0 0 0 0 0
146 000000000010010010 STM 3 3 0.0003% 0 0 0 0 0 3 0 0 0 0 0
26 000000000000011010 SVT 3 1 0.0001% 0 0 0 1 0 0 0 0 0 0 0
16402 000100000000010010 STE 4 317 0.0288% 316 0 1 0 0 0 0 0 0 0 0
789 000000001100010101 PUTO+ 4 14 0.0013% 0 0 0 14 0 0 0 0 0 0 0
22 000000000000010110 SUT 4 9 0.0008% 0 0 1 0 0 8 0 0 0 0 0
16401 000100000000010001 PTE 4 2 0.0002% 0 0 0 2 0 0 0 0 0 0 0
131073 100000000000000001 PQ 5 12109 1.0995% 0 0 0 6 0 0 0 0 12103 0 0
810 000000001100101010 SVGO+ 5 7467 0.6780% 0 0 0 0 0 7467 0 0 0 0 0
4393 000001000100101001 PVGL+ 5 6826 0.6198% 0 6826 0 0 0 0 0 0 0 0 0
4394 000001000100101010 SVGL+ 5 3219 0.2923% 0 0 104 0 0 3115 0 0 0 0 0
131089 100000000000010001 PTQ 5 816 0.0741% 0 0 0 12 0 0 0 0 804 0 0
4354 000001000100000010 SL+ 5 535 0.0486% 0 7 526 2 0 0 0 0 0 0 0
4357 000001000100000101 PUL+ 5 226 0.0205% 0 0 0 0 0 0 222 0 4 0 0
4373 000001000100010101 PUTL+ 5 65 0.0059% 0 0 0 8 0 0 0 0 57 0 0
5381 000001010100000101 PUXL+ 5 29 0.0026% 0 0 0 9 1 0 19 0 0 0 0
5386 000001010100001010 SVXL+ 5 18 0.0016% 0 0 0 17 1 0 0 0 0 0 0
21 000000000000010101 PUT 5 11 0.0010% 0 0 0 2 0 0 5 0 4 0 0
4374 000001000100010110 SUTL+ 5 10 0.0009% 0 0 10 0 0 0 0 0 0 0 0
1802 000000011100001010 SVOX+ 5 4 0.0004% 0 0 0 4 0 0 0 0 0 0 0
5378 000001010100000010 SXL+ 5 1 0.0001% 0 0 0 1 0 0 0 0 0 0 0
32769 001000000000000001 PY 6 14728 1.3373% 3325 1 0 2548 2824 0 732 0 5293 5 0
65601 010000000001000001 PDN 6 8195 0.7441% 52 0 0 2 247 0 253 0 7636 5 0
81922 010100000000000010 SEN 6 4422 0.4015% 4422 0 0 0 0 0 0 0 0 0 0
65537 010000000000000001 PN 6 1973 0.1791% 96 0 189 923 529 153 17 0 55 11 0
32833 001000000001000001 PDY 6 755 0.0686% 755 0 0 0 0 0 0 0 0 0 0
32770 001000000000000010 SY 6 386 0.0350% 0 2 232 94 25 0 0 1 32 0 0
73729 010010000000000001 PIN 6 202 0.0183% 0 0 0 202 0 0 0 0 0 0 0
163841 101000000000000001 PYQ 6 117 0.0106% 0 0 0 0 0 0 0 0 117 0 0
32785 001000000000010001 PTY 6 81 0.0074% 35 0 0 0 0 0 0 0 46 0 0
81921 010100000000000001 PEN 6 28 0.0025% 0 0 0 1 11 0 0 0 0 16 0
196625 110000000000010001 PTNQ 6 23 0.0021% 0 0 0 1 0 0 0 0 22 0 0
65617 010000000001010001 PTDN 6 21 0.0019% 0 0 0 0 0 0 0 0 21 0 0
81938 010100000000010010 STEN 6 20 0.0018% 19 0 1 0 0 0 0 0 0 0 0
163857 101000000000010001 PTYQ 6 15 0.0014% 0 0 0 0 0 0 0 0 15 0 0
65553 010000000000010001 PTN 6 12 0.0011% 0 0 1 8 0 3 0 0 0 0 0
81937 010100000000010001 PTEN 6 5 0.0005% 0 0 0 0 3 0 0 0 0 2 0
32897 001000000010000001 PMY 6 2 0.0002% 0 0 0 0 0 0 0 0 0 2 0
147473 100100000000010001 PTEQ 6 2 0.0002% 0 0 0 0 0 0 0 0 2 0 0
32786 001000000000010010 STY 6 2 0.0002% 0 0 2 0 0 0 0 0 0 0 0
81986 010100000001000010 SDEN 6 1 0.0001% 1 0 0 0 0 0 0 0 0 0 0



Scores (Corresponds to Table 2 in PMID 18823568)

Character Description of feature (when the value is 1) Frequency
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made. 18549(1.6951%)
E The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, seeunce information obtained form UniProt. 13070(1.1944%)
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment. 55466(5.0688%)
L More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one) 10930(0.9988%)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made. 25909(2.3677%)
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below). 18443(1.6854%)
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier. 15235(1.3923%)
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record. 41150(3.7605%)
I The protein reference used was an NCBI GenInfo Identifier (I). 13740(1.2556%)
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment. 23781(2.1732%)
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made 154714(14.1386%)
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420. 55668(5.0873%)
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'. 41222(3.7671%)
P The interaction record's primary (P) reference for the protein was used to make the assignment 905994(82.7948%)
S One of the interaction record's secondary (S) references for the protein was used to make the assignment 188271(17.2052%)
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009) 16324(1.4918%)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record 54(0.0049%)



All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).