Statistics iRefIndex 6.0

From irefindex
Revision as of 08:20, 15 September 2009 by Sabry (talk | contribs) (New page: == Summary == *Total interactions : *Total distinct interactions (based on RIGID): ( % of total interactions) *Total distinct proteins (based on ROGID) : This page lists statistics ...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Summary

  • Total interactions :
  • Total distinct interactions (based on RIGID): ( % of total interactions)
  • Total distinct proteins (based on ROGID) :

This page lists statistics for our internal version of iRefIndex that includes all of the data from sources used for the current build Sources_iRefIndex_6.0. This full build of the iRefIndex contains data that cannot be redistributed according to usage policies of the source databases (namely, from DIP, HPRD, CORUM and MPact databases). Please contact ian.donaldson at biotek.uio.no if you would like to obtain a copy of the full iRefIndex build under an academic, collaborative agreement.

The data that are freely available at ftp://ftp.no.embnet.org/irefindex/data/archive/release_6.0/ are a subset of the full build that we can freely redistribute according to the usage policies of the source databases. Please refer to http://irefindex.uio.no/wiki/Statistics_iRefIndex_free_6.0 for statistics that are applicable to this free dataset.

Interactions available from major taxonomies

Top 15 uncorrected taxonomy groups in iRefIndex (Taxonomy identifiers as they appear in original source)

  • Full list [[1]]

Top 15 corrected taxonomy groups in iRefIndex (Taxonomy identifiers corrected using sequence database information)

  • Full list [[2]]

Interactions (Corresponds to Table 6 in PMID 18823568)

BIND 62449
BIOGRID 22283 163095
DIP 26178 28930 56434
HPRD 2947 8794 830 39966
INTACT 24393 28128 24805 8442 113871
MINT 22073 34302 29797 6612 45427 77509
MPACT 6938 8226 6748 0 6132 6427 13321
MPPI 387 111 40 304 89 73 0 824
OPHID 2226 6141 876 18063 7248 6417 0 183 47274
CORUM 116 71 28 403 122 69 0 9 158 1917
I2D 0 0 0 0 0 0 0 0 0 0 0
BIND BIOGRID DIP HPRD INTACT MINT MPACT MPPI OPHID CORUM I2D
(24433) (103192) (13350) (14994) (56394) (16824) (1291) (238) (26331) (1394) (0)

Interactors

BIND 40546
BIOGRID 16592 27409
DIP 15498 13562 20106
HPRD 3357 5193 1215 9750
INTACT 18199 18390 15604 5967 42060
MINT 16359 16155 14914 4777 23565 28854
MPACT 4651 4445 4609 0 4874 4756 4972
MPPI 673 391 284 429 578 510 0 860
OPHID 3253 4608 1206 7426 5809 4794 1 421 9626
CORUM 1561 1413 643 1867 2312 1770 0 321 1849 3581
I2D 0 0 0 0 0 0 0 0 0 0 0
BIND BIOGRID DIP HPRD INTACT MINT MPACT MPPI OPHID CORUM I2D
(17589) (5089) (1829) (888) (12173) (3425) (18) (45) (1278) (618) (0)

Summary of mapping interaction records to RIGs (Corresponds to Table 5 in PMID 18823568)

Source Total records Protein-only interactors PPI Assigned to RIGID Unique RIGIDs
bind 193648 93957 90574(96.3994%) 62449(68.9480%)
grid 239485 238211 237795(99.8254%) 163095(68.5864%)
dip 57675 57675 56597(98.1309%) 56434(99.7120%)
intact 133302 132525 132071(99.6574%) 113871(86.2195%)
mint 110788 110788 109164(98.5341%) 77509(71.0023%)
HPRD 40075 40075 40075(100.0000%) 39966(99.7280%)
ophid 73257 73257 72880(99.4854%) 47274(64.8655%)
MPACT 16504 16504 16286(98.6791%) 13321(81.7942%)
MPPI 1814 1814 1685(92.8886%) 824(48.9021%)
CORUM 2104 2104 2102(99.9049%) 1917(91.1989%)
ALL 868652 766910 759229(98.9984%) 364987(48.0734%)

Assignment of protein interactors to ROGs (Corresponds to Table 3 in PMID 18823568)

Source Protein_Intractors Assigned % Arbitrary New Unassigned Unique proteins
bind 285482 273052 95.6460 0 7887 4543 40546
CORUM 10316 10314 99.9806 0 2 0 3581
dip 20728 18527 89.3815 1246 477 478 20106
grid 27629 22416 81.1321 5079 0 134 27409
HPRD 9773 9676 99.0075 55 42 0 9750
intact 100752 97167 96.4418 19 3323 243 42060
mint 77936 74746 95.9069 2 2711 477 28854
MPACT 40349 40112 99.4126 0 0 237 4972
MPPI 3628 3457 95.2867 0 30 141 860
ophid 146423 145362 99.2754 103 699 259 9626
All 723016 694829 96.1015 6504 15171 6512 80593

Scores (Corresponds to Table 2 in PMID 18823568)

Character Description of feature (when the value is 1) Frequency
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made. 9715(1.3559%)
E The protein reference was a retired NCBI Identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. 13556(1.892%)
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment. 894(0.1248%)
L New system 6505(0.9079%)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made. 87(0.0121%)
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below). 6592(0.92%)
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier. 15171(2.1174%)
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record. 711(0.0992%)
I The protein reference used was an NCBI GenInfo Identifier (I). 13251(1.8494%)
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment. 23172(3.234%)
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made 36382(5.0777%)
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420. 1143(0.1595%)
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'. 42(0.0059%)
P The interaction record's primary (P) reference for the protein was used to make the assignment 647668(90.3928%)
S One of the interaction record's secondary (S) references for the protein was used to make the assignment 68836(9.6072%)
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009) 774(0.108%)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record 37(0.0052%)