Difference between revisions of "Statistics iRefIndex 6.0"

From irefindex
Line 1: Line 1:
 
== Summary ==
 
== Summary ==
  
*Total interactions :  
+
*Total interactions : 759,742
*Total distinct interactions (based on RIGID): ( % of total interactions)
+
*Total distinct interactions (based on RIGID):364,708 ( 48% of total interactions)
*Total distinct proteins (based on ROGID)    :  
+
*Total distinct proteins (based on ROGID)    : 80,677
  
 
This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build [[Sources_iRefIndex_6.0]].  This full build of the iRefIndex contains data that cannot be redistributed according to usage policies of the source databases (namely, from DIP, HPRD, CORUM and MPact databases).  Please contact  ian.donaldson at biotek.uio.no if you would like to obtain a copy of the full iRefIndex build under an academic, collaborative agreement.
 
This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build [[Sources_iRefIndex_6.0]].  This full build of the iRefIndex contains data that cannot be redistributed according to usage policies of the source databases (namely, from DIP, HPRD, CORUM and MPact databases).  Please contact  ian.donaldson at biotek.uio.no if you would like to obtain a copy of the full iRefIndex build under an academic, collaborative agreement.

Revision as of 09:06, 17 September 2009

Summary

  • Total interactions : 759,742
  • Total distinct interactions (based on RIGID):364,708 ( 48% of total interactions)
  • Total distinct proteins (based on ROGID) : 80,677

This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build Sources_iRefIndex_6.0. This full build of the iRefIndex contains data that cannot be redistributed according to usage policies of the source databases (namely, from DIP, HPRD, CORUM and MPact databases). Please contact ian.donaldson at biotek.uio.no if you would like to obtain a copy of the full iRefIndex build under an academic, collaborative agreement.

The data that are freely available at ftp://ftp.no.embnet.org/irefindex/data/archive/release_6.0/ are a subset of the full build that we can freely redistribute according to the usage policies of the source databases. Please refer to http://irefindex.uio.no/wiki/Statistics_iRefIndex_free_6.0 for statistics that are applicable to this free dataset.

Interactions available from major taxonomies

Top 15 uncorrected taxonomy groups in iRefIndex (Taxonomy identifiers as they appear in original source)

  • Full list [[1]]

Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)

  • Full list [[2]]

Interactions (Corresponds to Table 6 in PMID 18823568)

BIND 62319
BIOGRID 22289 163090
DIP 26090 28943 56451
HPRD 2947 8838 831 39966
INTACT 24444 28247 25033 8467 113843
MINT 22342 34515 30275 6641 45828 77919
MPACT 6938 8226 6748 0 6130 6425 13321
MPPI 386 111 40 304 90 73 0 823
OPHID 2228 6169 876 18149 7277 6454 0 186 47495
CORUM 116 71 28 403 125 69 0 9 158 1917
I2D 0 0 0 0 0 0 0 0 0 0 0
BIND BIOGRID DIP HPRD INTACT MINT MPACT MPPI OPHID CORUM I2D
(24368) (103151) (13356) (14909) (56201) (16753) (1291) (238) (26475) (1392) (0)

Interactors

BIND 40841
BIOGRID 16602 27409
DIP 15480 13566 20119
HPRD 3357 5205 1215 9750
INTACT 18205 18409 15634 5967 41925
MINT 16446 16206 15025 4787 23640 28904
MPACT 4651 4445 4609 0 4874 4755 4972
MPPI 673 390 284 429 578 509 0 861
OPHID 3252 4619 1205 7438 5817 4808 1 422 9631
CORUM 1561 1414 642 1867 2312 1769 0 321 1850 3581
I2D 0 0 0 0 0 0 0 0 0 0 0
BIND BIOGRID DIP HPRD INTACT MINT MPACT MPPI OPHID CORUM I2D
(17893) (5077) (1838) (880) (12034) (3370) (18) (46) (1273) (618) (0)

Summary of mapping interaction records to RIGs (Corresponds to Table 5 in PMID 18823568)

Source Total records Protein-only interactors PPI Assigned to RIGID Unique RIGIDs
bind 193648 93957 90370(96.1823%) 62319(68.9598%)
grid 239485 238211 237796(99.8258%) 163090(68.5840%)
dip 57675 57675 56614(98.1604%) 56451(99.7121%)
intact 133302 132525 132073(99.6589%) 113843(86.1970%)
mint 110788 110788 109608(98.9349%) 77919(71.0888%)
HPRD 40075 40075 40075(100.0000%) 39966(99.7280%)
ophid 73257 73257 73133(99.8307%) 47495(64.9433%)
MPACT 16504 16504 16286(98.6791%) 13321(81.7942%)
MPPI 1814 1814 1685(92.8886%) 823(48.8427%)
CORUM 2104 2104 2102(99.9049%) 1917(91.1989%)
ALL 868652 766910 759742(99.0653%) 364708(48.0042%)

Assignment of protein interactors to ROGs (Corresponds to Table 3 in PMID 18823568)

Source Protein_Intractors Assigned % Arbitrary New Unassigned Unique proteins
bind 285482 273657 95.8579 0 7887 3938 40841
CORUM 10316 10314 99.9806 0 2 0 3581
dip 20728 18533 89.4105 1246 485 464 20119
grid 27629 22371 80.9693 5124 1 133 27409
HPRD 9773 9676 99.0075 55 42 0 9750
intact 100752 97661 96.9321 110 2740 241 41925
mint 77936 74931 96.1443 50 2727 228 28904
MPACT 40349 40112 99.4126 0 0 237 4972
MPPI 3628 3457 95.2867 0 32 139 861
ophid 146423 145362 99.2754 103 952 6 9631
All 723016 696074 96.2737 6688 14868 5386 80677

ROG summary (Corresponds to Table 4 in PMID 18823568)

Decimal_score Binary_flag String_score Score_class Proteins Percentage BIND BioGrid DIP MINT HPRD OPHID MPPI MPACT IntAct CORUM
1 000000000000000001 P 1 576699 79.7630% 232419 19992 0 71896 0 125492 3023 30666 93211 0
65 000000000001000001 PD 1 8086 1.1184% 8083 0 0 3 0 0 0 0 0 0
42 000000000000101010 SVG 1 163 0.0225% 0 0 0 0 163 0 0 0 0 0
41 000000000000101001 PVG 1 2362 0.3267% 0 2362 0 0 0 0 0 0 0 0
554 000000001000101010 SVGO 1 624 0.0863% 0 0 0 0 624 0 0 0 0 0
129 000000000010000001 PM 1 522 0.0722% 473 0 0 0 0 0 32 0 17 0
10 000000000000001010 SV 1 240 0.0332% 0 0 2 9 0 0 0 0 229 0
8193 000010000000000001 PI 1 48 0.0066% 0 0 0 0 0 0 0 0 48 0
8194 000010000000000010 SI 1 12336 1.7062% 12336 0 0 0 0 0 0 0 0 0
2 000000000000000010 S 1 29459 4.0745% 0 3 17397 231 4665 0 0 6927 236 0
16449 000100000001000001 PDE 2 116 0.0160% 0 0 0 34 0 0 0 0 82 0
16385 000100000000000001 PE 2 884 0.1223% 0 0 0 199 0 0 0 0 685 0
778 000000001100001010 SVO+ 2 1 0.0001% 0 0 0 0 0 0 0 0 1 0
774 000000001100000110 SUO+ 2 166 0.0230% 0 0 0 61 0 0 0 0 105 0
16386 000100000000000010 SE 2 5419 0.7495% 5413 0 6 0 0 0 0 0 0 0
1798 000000011100000110 SUOX+ 2 382 0.0528% 0 0 0 194 0 0 0 0 188 0
773 000000001100000101 PUO+ 2 9 0.0012% 0 0 0 3 0 1 0 0 5 0
5 000000000000000101 PU 2 23087 3.1932% 0 0 0 318 0 19713 320 2519 217 0
6 000000000000000110 SU 2 126 0.0174% 0 0 84 11 5 0 0 0 26 0
145 000000000010010001 PTM 3 170 0.0235% 132 0 0 0 0 0 35 0 3 0
8210 000010000000010010 STI 3 855 0.1183% 855 0 0 0 0 0 0 0 0 0
17 000000000000010001 PT 3 26406 3.6522% 11871 0 0 1620 0 122 47 0 2456 10290
18 000000000000010010 ST 3 5162 0.7140% 0 0 1010 0 4149 0 0 0 3 0
8209 000010000000010001 PTI 3 12 0.0017% 0 0 0 0 0 0 0 0 12 0
81 000000000001010001 PTD 3 1487 0.2057% 1486 0 0 1 0 0 0 0 0 0
26 000000000000011010 SVT 3 1 0.0001% 0 0 0 0 0 0 0 0 1 0
789 000000001100010101 PUTO+ 4 11 0.0015% 0 0 0 0 0 0 0 0 11 0
790 000000001100010110 SUTO+ 4 4 0.0006% 0 0 0 0 1 0 0 0 3 0
22 000000000000010110 SUT 4 17 0.0024% 0 0 3 0 14 0 0 0 0 0
16401 000100000000010001 PTE 4 4 0.0006% 0 0 0 0 0 0 0 0 4 0
16402 000100000000010010 STE 4 315 0.0436% 315 0 0 0 0 0 0 0 0 0
131089 100000000000010001 PTQ 5 38 0.0053% 0 0 0 0 0 0 0 0 38 0
131073 100000000000000001 PQ 5 1 0.0001% 0 0 0 0 0 0 0 0 1 0
131077 100000000000000101 PUQ 5 1 0.0001% 0 0 0 0 0 0 0 0 1 0
4354 000001000100000010 SL+ 5 1202 0.1662% 0 8 1194 0 0 0 0 0 0 0
5386 000001010100001010 SVXL+ 5 2 0.0003% 0 0 0 1 0 0 0 0 1 0
4373 000001000100010101 PUTL+ 5 9 0.0012% 0 0 0 1 0 0 0 0 8 0
4357 000001000100000101 PUL+ 5 84 0.0116% 0 0 0 0 0 84 0 0 0 0
810 000000001100101010 SVGO+ 5 55 0.0076% 0 0 0 0 55 0 0 0 0 0
21 000000000000010101 PUT 5 31 0.0043% 0 0 0 2 0 5 0 0 0 24
4374 000001000100010110 SUTL+ 5 52 0.0072% 0 0 52 0 0 0 0 0 0 0
4393 000001000100101001 PVGL+ 5 5116 0.7076% 0 5116 0 0 0 0 0 0 0 0
4394 000001000100101010 SVGL+ 5 52 0.0072% 0 0 0 0 52 0 0 0 0 0
5381 000001010100000101 PUXL+ 5 32 0.0044% 0 0 0 2 0 19 0 0 11 0
5382 000001010100000110 SUXL+ 5 139 0.0192% 0 0 0 46 3 0 0 0 90 0
147473 100100000000010001 PTEQ 6 1 0.0001% 0 0 0 0 0 0 0 0 1 0
86274 010101000100000010 SLEN+ 6 1 0.0001% 0 0 1 0 0 0 0 0 0 0
82034 010100000001110010 STGDEN 6 1 0.0001% 0 1 0 0 0 0 0 0 0 0
81938 010100000000010010 STEN 6 19 0.0026% 19 0 0 0 0 0 0 0 0 0
81937 010100000000010001 PTEN 6 4 0.0006% 2 0 0 0 0 0 2 0 0 0
81922 010100000000000010 SEN 6 5035 0.6964% 4661 0 372 2 0 0 0 0 0 0
81921 010100000000000001 PEN 6 2163 0.2992% 1342 0 0 263 0 324 19 0 215 0
65665 010000000010000001 PMN 6 2 0.0003% 0 0 0 0 0 0 2 0 0 0
32769 001000000000000001 PY 6 696 0.0963% 271 14 0 326 0 29 0 0 56 0
32770 001000000000000010 SY 6 73 0.0101% 0 0 30 23 0 0 0 0 20 0
32785 001000000000010001 PTY 6 2 0.0003% 2 0 0 0 0 0 0 0 0 0
32786 001000000000010010 STY 6 1 0.0001% 0 0 1 0 0 0 0 0 0 0
32833 001000000001000001 PDY 6 1 0.0001% 1 0 0 0 0 0 0 0 0 0
65537 010000000000000001 PN 6 7485 1.0352% 1717 0 112 2462 42 628 7 0 2515 2
65553 010000000000010001 PTN 6 16 0.0022% 6 0 0 0 0 0 0 0 10 0
65601 010000000001000001 PDN 6 142 0.0196% 140 0 0 0 0 0 2 0 0 0
163841 101000000000000001 PYQ 6 1 0.0001% 0 0 0 0 0 0 0 0 1 0

Scores (Corresponds to Table 2 in PMID 18823568)

Character Description of feature (when the value is 1) Frequency
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made. 9833(1.3702%)
E The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, seeunce information obtained form UniProt 13962(1.9456%)
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment. 8373(1.1668%)
L More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one) 6689(0.9321%)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made. 694(0.0967%)
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below). 7317(1.0196%)
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier. 14868(2.0718%)
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record. 1252(0.1745%)
I The protein reference used was an NCBI GenInfo Identifier (I). 13251(1.8465%)
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment. 24150(3.3652%)
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made 34618(4.8239%)
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420. 8616(1.2006%)
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'. 42(0.0059%)
P The interaction record's primary (P) reference for the protein was used to make the assignment 655728(91.3741%)
S One of the interaction record's secondary (S) references for the protein was used to make the assignment 61902(8.6259%)
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009) 774(0.1079%)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record 555(0.0773%)