Difference between revisions of "Statistics iRefIndex 7.0"

From irefindex
Line 1: Line 1:
 
== Summary ==
 
== Summary ==
 
Last updated : 08 March, 2010
 
Last updated : 08 March, 2010
*Total interactions :  
+
*Total interactions : 926,113
*Total distinct interactions (based on RIGID): ( % of total interactions)
+
*Total distinct interactions (based on RIGID): 433,617 ( 46.8 % of total interactions)
*Total distinct proteins (based on ROGID)    :   
+
*Total distinct proteins (based on ROGID)    :  83,234
  
 
This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build [[Sources_iRefIndex_7.0]].  This full build of the iRefIndex contains data that cannot be redistributed according to usage policies of the source databases (namely, from DIP, HPRD, CORUM and MPact databases).  Please contact  ian.donaldson at biotek.uio.no if you would like to obtain a copy of the full iRefIndex build under an academic, collaborative agreement.
 
This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build [[Sources_iRefIndex_7.0]].  This full build of the iRefIndex contains data that cannot be redistributed according to usage policies of the source databases (namely, from DIP, HPRD, CORUM and MPact databases).  Please contact  ian.donaldson at biotek.uio.no if you would like to obtain a copy of the full iRefIndex build under an academic, collaborative agreement.

Revision as of 20:17, 8 March 2010

Summary

Last updated : 08 March, 2010

  • Total interactions : 926,113
  • Total distinct interactions (based on RIGID): 433,617 ( 46.8 % of total interactions)
  • Total distinct proteins (based on ROGID) : 83,234

This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build Sources_iRefIndex_7.0. This full build of the iRefIndex contains data that cannot be redistributed according to usage policies of the source databases (namely, from DIP, HPRD, CORUM and MPact databases). Please contact ian.donaldson at biotek.uio.no if you would like to obtain a copy of the full iRefIndex build under an academic, collaborative agreement.

The data that are freely available at ftp://ftp.no.embnet.org/irefindex/data/archive/release_7.0/ are a subset of the full build that we can freely redistribute according to the usage policies of the source databases. Please refer to http://irefindex.uio.no/wiki/Statistics_iRefIndex_free_7.0 for statistics that are applicable to this free dataset.

Interactions available from major taxonomies

Top 15 uncorrected taxonomy groups in iRefIndex (Taxonomy identifiers as they appear in original source)

  • Full list [[1]]

Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)

  • Full list [[2]]

Interactions (Corresponds to Table 6 in PMID 18823568)

BIND 62903
BIOGRID 23140 226496
DIP 26433 32865 61680
HPRD 3029 11612 843 39966
INTACT 24214 32627 25059 8652 117840
MINT 22100 37903 30107 6758 46689 79710
MPACT 6953 8271 6847 0 6147 6430 13321
MPPI 376 141 41 304 92 76 0 830
OPHID 2295 8356 882 18093 7330 6482 0 183 47530
CORUM 232 167 64 549 220 107 0 15 236 2607
I2D 0 0 0 0 0 0 0 0 0 0 0
BIND_TRANSLATION 42739 22497 18370 3980 19401 17755 2145 369 2928 193 0 48765
BIND BIOGRID DIP HPRD INTACT MINT MPACT MPPI OPHID CORUM I2D BIND_TRANSLATION
(10382) (157990) (16953) (13960) (58119) (16958) (1278) (240) (26312) (1836) (0) (2633)

Interactors

BIND 40790
BIOGRID 17313 29090
DIP 15526 14440 20171
HPRD 3407 6005 1236 9750
INTACT 18310 20511 15628 6189 44616
MINT 16422 17286 15000 4935 23979 29217
MPACT 4655 4547 4646 0 4877 4761 4972
MPPI 675 435 289 429 592 520 0 865
OPHID 3307 5444 1226 7453 6036 4918 1 421 9645
CORUM 2028 2064 842 2293 2908 2225 0 415 2245 4365
I2D 0 0 0 0 0 0 0 0 0 0 0
BIND_TRANSLATION 24977 15261 12008 3576 15673 13975 2370 680 3448 1958 0 26792
BIND BIOGRID DIP HPRD INTACT MINT MPACT MPPI OPHID CORUM I2D BIND_TRANSLATION
(10653) (4470) (1727) (810) (13535) (3380) (15) (47) (1212) (682) (0) (617)


Summary of mapping interaction records to RIGs (Corresponds to Table 5 in PMID 18823568)

Source Total records Protein-only interactors PPI Assigned to RIGID Unique RIGIDs
bind 193648 93957 91276(97.1466%) 62903(68.9152%)
grid 333977 329642 329185(99.8614%) 226496(68.8051%)
dip 62903 62903 61843(98.3149%) 61680(99.7364%)
intact 140723 139787 139310(99.6588%) 117840(84.5883%)
mint 113258 113258 112159(99.0296%) 79710(71.0688%)
HPRD 40075 40075 40075(100.0000%) 39966(99.7280%)
ophid 73257 73257 73160(99.8676%) 47530(64.9672%)
MPACT 16504 16504 16286(98.6791%) 13321(81.7942%)
MPPI 1814 1814 1701(93.7707%) 830(48.7948%)
CORUM 2844 2844 2844(100.0000%) 2607(91.6667%)
BIND_Translation 140601 61585 58274(94.6237%) 48765(83.6823%)
ALL 1119604 935626 926113(98.9832%) 433617(46.8212%)


Assignment of protein interactors to ROGs (Corresponds to Table 3 in PMID 18823568)

Source Protein_Intractors Assigned % Arbitrary N_and_Y Unassigned Unique proteins
bind 285482 272970 95.6172 0 8602 3905 40790
BIND_Translation 187563 173313 92.4026 9625 496 4129 26792
CORUM 12916 12916 100.0000 0 0 0 4365
dip 20785 18536 89.1797 1255 527 467 20171
grid 36333 30255 83.2714 5869 5 204 29090
HPRD 9773 9663 98.8745 64 46 0 9750
intact 108593 105448 97.1039 19 2862 264 44616
mint 79057 75743 95.8081 2 3086 226 29217
MPACT 40349 40112 99.4126 0 0 237 4972
MPPI 3628 3456 95.2591 0 45 125 865
ophid 146423 145330 99.2535 103 984 6 9645
All 930902 887749 95.3644 16937 16653 9563 83234


ROG summary (Corresponds to Table 4 in PMID 18823568)

Decimal_score Binary_flag String_score Score_class Proteins Percentage BIND BioGrid DIP MINT HPRD OPHID MPPI MPACT IntAct CORUM BIND_Translation
1 000000000000000001 P 1 607631 65.2734% 232081 28623 0 73314 0 125683 3023 30666 101306 12916 19
8193 000010000000000001 PI 1 152094 16.3383% 0 0 0 1 0 0 0 0 48 0 152045
2 000000000000000010 S 1 43100 4.6299% 0 44 17426 244 2772 0 0 6927 255 0 15432
8194 000010000000000010 SI 1 12336 1.3252% 12336 0 0 0 0 0 0 0 0 0 0
65 000000000001000001 PD 1 8076 0.8675% 8073 0 0 3 0 0 0 0 0 0 0
41 000000000000101001 PVG 1 1583 0.1701% 0 1583 0 0 0 0 0 0 0 0 0
10 000000000000001010 SV 1 1293 0.1389% 0 0 5 14 0 0 0 0 237 0 1037
129 000000000010000001 PM 1 528 0.0567% 473 0 0 0 0 0 32 0 23 0 0
554 000000001000101010 SVGO 1 477 0.0512% 0 0 0 0 477 0 0 0 0 0 0
42 000000000000101010 SVG 1 171 0.0184% 0 0 12 0 159 0 0 0 0 0 0
66 000000000001000010 SD 1 99 0.0106% 0 4 9 0 0 0 0 0 0 0 86
5 000000000000000101 PU 2 22826 2.4520% 0 0 0 289 0 19519 320 2519 179 0 0
16386 000100000000000010 SE 2 5429 0.5832% 5423 0 6 0 0 0 0 0 0 0 0
16385 000100000000000001 PE 2 1532 0.1646% 0 0 0 202 0 0 0 0 720 0 610
6 000000000000000110 SU 2 815 0.0875% 0 1 58 4 5 0 0 0 10 0 737
16449 000100000001000001 PDE 2 120 0.0129% 0 0 0 34 0 0 0 0 86 0 0
773 000000001100000101 PUO+ 2 13 0.0014% 0 0 0 3 0 1 0 0 9 0 0
774 000000001100000110 SUO+ 2 1 0.0001% 0 0 0 0 0 0 0 0 1 0 0
778 000000001100001010 SVO+ 2 1 0.0001% 0 0 0 0 0 0 0 0 1 0 0
17 000000000000010001 PT 3 16060 1.7252% 11785 0 0 1632 0 122 46 0 2475 0 0
18 000000000000010010 ST 3 8104 0.8706% 0 0 1016 0 6042 0 0 0 3 0 1043
8209 000010000000010001 PTI 3 2052 0.2204% 0 0 0 0 0 0 0 0 13 0 2039
81 000000000001010001 PTD 3 1497 0.1608% 1496 0 0 1 0 0 0 0 0 0 0
8210 000010000000010010 STI 3 855 0.0918% 855 0 0 0 0 0 0 0 0 0 0
145 000000000010010001 PTM 3 184 0.0198% 132 0 0 0 0 0 35 0 17 0 0
26 000000000000011010 SVT 3 1 0.0001% 0 0 0 0 0 0 0 0 1 0 0
16402 000100000000010010 STE 4 317 0.0341% 316 0 1 0 0 0 0 0 0 0 0
16401 000100000000010001 PTE 4 269 0.0289% 0 0 0 0 0 0 0 0 4 0 265
22 000000000000010110 SUT 4 17 0.0018% 0 0 3 0 14 0 0 0 0 0 0
789 000000001100010101 PUTO+ 4 14 0.0015% 0 0 0 0 0 0 0 0 14 0 0
790 000000001100010110 SUTO+ 4 1 0.0001% 0 0 0 0 1 0 0 0 0 0 0
4393 000001000100101001 PVGL+ 5 5854 0.6289% 0 5854 0 0 0 0 0 0 0 0 0
4362 000001000100001010 SVL+ 5 5498 0.5906% 0 0 0 0 0 0 0 0 0 0 5498
4354 000001000100000010 SL+ 5 5336 0.5732% 0 14 1195 0 0 0 0 0 0 0 4127
810 000000001100101010 SVGO+ 5 193 0.0207% 0 0 0 0 193 0 0 0 0 0 0
4357 000001000100000101 PUL+ 5 84 0.0090% 0 0 0 0 0 84 0 0 0 0 0
4394 000001000100101010 SVGL+ 5 70 0.0075% 0 1 8 0 61 0 0 0 0 0 0
4374 000001000100010110 SUTL+ 5 52 0.0056% 0 0 52 0 0 0 0 0 0 0 0
131089 100000000000010001 PTQ 5 39 0.0042% 0 0 0 0 0 0 0 0 39 0 0
5381 000001010100000101 PUXL+ 5 29 0.0031% 0 0 0 0 0 19 0 0 10 0 0
4373 000001000100010101 PUTL+ 5 9 0.0010% 0 0 0 1 0 0 0 0 8 0 0
21 000000000000010101 PUT 5 7 0.0008% 0 0 0 2 0 5 0 0 0 0 0
1802 000000011100001010 SVOX+ 5 4 0.0004% 0 0 0 0 0 0 0 0 4 0 0
5382 000001010100000110 SUXL+ 5 3 0.0003% 0 0 0 0 3 0 0 0 0 0 0
5386 000001010100001010 SVXL+ 5 2 0.0002% 0 0 0 1 0 0 0 0 1 0 0
131073 100000000000000001 PQ 5 1 0.0001% 0 0 0 0 0 0 0 0 1 0 0
131077 100000000000000101 PUQ 5 1 0.0001% 0 0 0 0 0 0 0 0 1 0 0
32769 001000000000000001 PY 6 8873 0.9532% 3051 4 0 2785 0 731 5 0 2297 0 0
81922 010100000000000010 SEN 6 4524 0.4860% 4515 0 9 0 0 0 0 0 0 0 0
32833 001000000001000001 PDY 6 755 0.0811% 755 0 0 0 0 0 0 0 0 0 0
65537 010000000000000001 PN 6 608 0.0653% 178 1 112 26 46 0 11 0 234 0 0
32770 001000000000000010 SY 6 567 0.0609% 0 0 405 25 0 0 0 0 95 0 42
65601 010000000001000001 PDN 6 559 0.0600% 52 0 0 247 0 253 5 0 2 0 0
81921 010100000000000001 PEN 6 418 0.0449% 0 0 0 3 0 0 22 0 1 0 392
73729 010010000000000001 PIN 6 223 0.0240% 0 0 0 0 0 0 0 0 223 0 0
81937 010100000000010001 PTEN 6 64 0.0069% 0 0 0 0 0 0 2 0 0 0 62
32785 001000000000010001 PTY 6 32 0.0034% 32 0 0 0 0 0 0 0 0 0 0
81938 010100000000010010 STEN 6 19 0.0020% 19 0 0 0 0 0 0 0 0 0 0
65553 010000000000010001 PTN 6 8 0.0009% 0 0 0 0 0 0 0 0 8 0 0
147473 100100000000010001 PTEQ 6 1 0.0001% 0 0 0 0 0 0 0 0 1 0 0
40961 001010000000000001 PIY 6 1 0.0001% 0 0 0 0 0 0 0 0 1 0 0
32786 001000000000010010 STY 6 1 0.0001% 0 0 1 0 0 0 0 0 0 0 0
163841 101000000000000001 PYQ 6 1 0.0001% 0 0 0 0 0 0 0 0 1 0 0


Scores (Corresponds to Table 2 in PMID 18823568)

Character Description of feature (when the value is 1) Frequency
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made. 11106(1.2054%)
E The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, seeunce information obtained form UniProt. 12693(1.3777%)
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment. 8348(0.9061%)
L More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one) 16937(1.8383%)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made. 712(0.0773%)
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below). 17164(1.863%)
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier. 6423(0.6971%)
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record. 704(0.0764%)
I The protein reference used was an NCBI GenInfo Identifier (I). 167561(18.1868%)
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment. 23872(2.591%)
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made 29603(3.2131%)
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420. 15147(1.644%)
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'. 43(0.0047%)
P The interaction record's primary (P) reference for the protein was used to make the assignment 832046(90.309%)
S One of the interaction record's secondary (S) references for the protein was used to make the assignment 89286(9.691%)
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009) 10230(1.1103%)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record 38(0.0041%)