Statistics iRefIndex 2.0

From irefindex
Revision as of 00:52, 14 November 2008 by Ian.donaldson (talk | contribs) (New page: == Assignment of protein interactors to ROG's == {| {{table}} | align="center" style="background:#f0f0f0;"|'''Source''' | align="center" style="background:#f0f0f0;"|'''Intractors''' | al...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Assignment of protein interactors to ROG's

Source Intractors Assigned % Arbitrary New Unassigned Unique proteins
bind 285482 273695 95.87 0 7782 4005 40766
grid 28218 19295 68.38 8843 3 77 26945
intact 89190 85695 96.08 19 3250 226 39678
mint 80543 77294 95.97 6 2764 479 28055
MPPI 3628 3281 90.44 178 33 136 857
ophid 146423 145408 99.31 103 653 259 9663
All 633484 604668 95.45 9149 14485 5182 77745


Summary of mapping interaction records to RIG's.

Source Total records Protein-only interactors PPI Assigned to RIGID Unique RIGIDs
bind 193648 93957 91234(97.1019%) 62859(68.8987%)
grid 218165 218165 217902(99.8794%) 142872(65.5671%)
intact 111572 111008 110598(99.6307%) 97012(87.7159%)
mint 104847 104847 103522(98.7363%) 73733(71.2245%)
HPRD 38037 38037 38017(99.9474%) 37980(99.9027%)
ophid 73257 73257 72907(99.5222%) 47318(64.9019%)
MPACT 16504 16504 16286(98.6791%) 13321(81.7942%)
MPPI 1814 1814 1695(93.4399%) 827(48.7906%)
ALL 757844 657589 652161(99.1746%) 475922(72.9762%)


Features of the ROG assignment score and their corresponding character representation

Character Description of feature (when the value is 1) Frequency
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made. 12719(2.0243%)
E The protein reference was a retired NCBI Identifier. NCBI\'s eUtils (E) were used to retrieve the current accession and/or sequence. 10909(1.7363%)
G The interaction record\'s reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment. 0(0.0%)
L More than one possible assignment is possible (see + above). The assignment with the largest (L) SEGUID is arbitrarily chosen (see Methods) 9150(1.4563%)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made. 1882(0.2995%)
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below). 9172(1.4598%)
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier. 14485(2.3054%)
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record. 22(0.0035%)
I The protein reference used was an NCBI GenInfo Identifier (I). 13495(2.1479%)
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment. 20644(3.2857%)
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made 24056(3.8287%)
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420. 0(0.0%)
Q The protein reference used to make the assignment was of the type \'see-also\'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = \'see-also\'. 27(0.0043%)
P The interaction record\'s primary (P) reference for the protein was used to make the assignment 584548(93.0362%)
S One of the interaction record\'s secondary (S) references for the protein was used to make the assignment 43754(6.9638%)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record 35(0.0056%)



Number of protein references successfully assigned to ROG's and broken down by assignment score

Binary_flag String_score Score_class Proteins Percentage BIND BioGrid DIP MINT HPRD OPHID MIIP MPACT
010101000100000010 SLEN+ -1 1 0.0002% 0 1 0 0 0 0 0 0
010100000000010001 PTEN -1 18 0.0028% 0 0 0 0 0 0 18 0
010000000010010001 PTMN -1 2 0.0003% 0 0 0 0 0 0 2 0
000001010100000010 SXL+ -1 1 0.0002% 0 0 0 0 0 0 0 0
100000000000000101 PUQ -1 1 0.0002% 0 0 0 0 0 0 0 0
000000000010010010 STM -1 3 0.0005% 0 3 0 0 0 0 0 0
000000001100010101 PUTO+ -1 15 0.0024% 0 0 0 0 0 0 0 0
000000000000000001 P 1 523340 82.6130% 232755 7313 0 74939 0 126446 0 0
000010000000000001 PI 1 19 0.0030% 0 2 0 0 0 0 0 0
000010000001000010 SDI 1 1066 0.1683% 1044 22 0 0 0 0 0 0
000010000001000001 PDI 1 40 0.0063% 0 0 0 0 0 0 0 0
000010000000000010 SI 1 11457 1.8086% 11287 170 0 0 0 0 0 0
000000000010000010 SM 1 1190 0.1879% 0 1190 0 0 0 0 0 0
000000000001000010 SD 1 6 0.0009% 0 6 0 0 0 0 0 0
000000000000000010 S 1 8209 1.2958% 0 7754 0 236 0 0 0 0
000000000001000001 PD 1 9244 1.4592% 8091 1134 0 19 0 0 0 0
000000000010000001 PM 1 488 0.0770% 473 1 0 0 0 0 0 0
000000000000000101 PU 2 19850 3.1335% 0 0 0 346 0 18835 0 0
000000000000000110 SU 2 286 0.0451% 0 275 0 5 0 0 0 0
000100000000000001 PE 2 246 0.0388% 0 0 0 0 0 0 0 0
000000001100000110 SUO+ 2 1 0.0002% 0 0 0 0 0 0 0 0
000000001100000101 PUO+ 2 6 0.0009% 0 0 0 1 0 0 0 0
000100000000000010 SE 2 5434 0.8578% 5434 0 0 0 0 0 0 0
000010000001010010 STDI 3 66 0.0104% 29 37 0 0 0 0 0 0
000010000000010010 STI 3 834 0.1317% 826 8 0 0 0 0 0 0
000010000001010001 PTDI 3 13 0.0021% 0 0 0 0 0 0 0 0
000000000001010001 PTD 3 1483 0.2341% 1483 0 0 0 0 0 0 0
000000000000010010 ST 3 1383 0.2183% 0 1380 0 0 0 0 0 0
000000000000010001 PT 3 19265 3.0411% 11825 0 0 1742 0 122 3072 0
000000000010010001 PTM 3 199 0.0314% 132 0 0 0 0 0 67 0
000100000000010010 STE 4 316 0.0499% 316 0 0 0 0 0 0 0
000100000000010001 PTE 4 2 0.0003% 0 0 0 0 0 0 0 0
000001000100000101 PUL+ 5 84 0.0133% 0 0 0 0 0 84 0 0
000001000100010101 PUTL+ 5 187 0.0295% 0 0 0 1 0 0 178 0
000001000100000010 SL+ 5 8843 1.3959% 0 8843 0 0 0 0 0 0
100000000000000001 PQ 5 2 0.0003% 0 0 0 0 0 0 0 0
000001010100000101 PUXL+ 5 34 0.0054% 0 0 0 5 0 19 0 0
000000000000010101 PUT 5 180 0.0284% 0 0 0 6 0 5 142 0
100000000000010001 PTQ 5 24 0.0038% 0 0 0 0 0 0 0 0
010100000000010010 STEN 6 22 0.0035% 22 0 0 0 0 0 0 0
010100000000000010 SEN 6 4636 0.7318% 4635 0 0 1 0 0 0 0
010100000000000001 PEN 6 234 0.0369% 0 0 0 0 0 0 0 0
010000000001000001 PDN 6 801 0.1264% 799 2 0 0 0 0 0 0
010000000000000001 PN 6 8727 1.3776% 2322 0 0 2763 0 653 11 0
010000000000010001 PTN 6 44 0.0069% 4 0 0 0 0 0 2 0


Redundancy between pairs of interaction datasets processed in this study

BIND 40766 ' ' ' ' ' ' ' '
BIOGRID 14613 26945
DIP 0 0 0
HPRD 0 0 0 0
INTACT 18047 16580 0 0 39678
MINT 16465 15288 0 0 22659 28055
MPACT 0 0 0 0 0 0 0
MPPI 647 181 0 0 548 539 0 857
OPHID 3172 2732 0 0 5549 4796 0 404 9663
BIND BIOGRID DIP HPRD INTACT MINT MPACT MPPI OPHID
20177 0 2301 3476 8162 0 0 53 12338