Difference between revisions of "Statistics iRefIndex 13.0"

From irefindex
(Created page with " == Interactions available from major taxonomies == {| cellspacing="0" cellpadding="5" | align="center" style="background:#f0f0f0;"|'''NCBI taxonomy identifier''' ||align="c...")
 
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
 
== Interactions available from major taxonomies ==
 
 
{| cellspacing="0" cellpadding="5"
 
| align="center" style="background:#f0f0f0;"|'''NCBI taxonomy identifier''' ||align="center" style="background:#f0f0f0;"|'''Scientific name''' ||align="center" style="background:#f0f0f0;"|'''Number of interactions'''
 
|}
 
 
 
== Interactions available from major taxonomies (corrected) ==
 
== Interactions available from major taxonomies (corrected) ==
  
 
{| cellspacing="0" cellpadding="5"
 
{| cellspacing="0" cellpadding="5"
 
| align="center" style="background:#f0f0f0;"|'''NCBI taxonomy identifier''' ||align="center" style="background:#f0f0f0;"|'''Scientific name''' ||align="center" style="background:#f0f0f0;"|'''Number of interactions'''
 
| align="center" style="background:#f0f0f0;"|'''NCBI taxonomy identifier''' ||align="center" style="background:#f0f0f0;"|'''Scientific name''' ||align="center" style="background:#f0f0f0;"|'''Number of interactions'''
 +
|-
 +
| 9606 ||Homo sapiens ||259634
 +
|-
 +
| 559292 ||Saccharomyces cerevisiae S288c ||117420
 +
|-
 +
| 7227 ||Drosophila melanogaster ||58383
 +
|-
 +
| 10090 ||Mus musculus ||31434
 +
|-
 +
| 3702 ||Arabidopsis thaliana ||23376
 +
|-
 +
| 6239 ||Caenorhabditis elegans ||17514
 +
|-
 +
| 83333 ||Escherichia coli K-12 ||15118
 +
|-
 +
| 192222 ||Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819 ||11974
 +
|-
 +
| 284812 ||Schizosaccharomyces pombe 972h- ||8629
 +
|-
 +
| 10116 ||Rattus norvegicus ||8545
 +
|-
 +
| 632 ||Yersinia pestis ||3958
 +
|-
 +
| 243276 ||Treponema pallidum subsp. pallidum str. Nichols ||3642
 +
|-
 +
| 1111708 ||Synechocystis sp. PCC 6803 substr. Kazusa ||3231
 +
|-
 +
| 1392 ||Bacillus anthracis ||3042
 
|}
 
|}
  
Line 118: Line 138:
 
| OPHID ||73257 ||73257 ||73257 ||100.00 ||47497 ||64.84
 
| OPHID ||73257 ||73257 ||73257 ||100.00 ||47497 ||64.84
 
|-
 
|-
| (All) ||1630519 ||1151707 ||1135346 ||98.58 ||795032 ||70.03
+
| (All) ||1630519 ||1151707 ||1135346 ||98.58 ||566670 ||49.91
 
|}
 
|}
  
Line 348: Line 368:
 
| SY ||  ||  ||34 ||  ||780 ||  ||  ||1 ||  ||  ||  ||  ||  ||  
 
| SY ||  ||  ||34 ||  ||780 ||  ||  ||1 ||  ||  ||  ||  ||  ||  
 
|}
 
|}
 +
 +
== Scores definitions (Table 2) ==
 +
 +
{|
 +
| align="center" style="background:#f0f0f0;"|'''Character'''||align="center" style="background:#f0f0f0;"|'''Description of feature (when the value is 1)'''||align="center" style="background:#f0f0f0;"
 +
|-
 +
| D||The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.
 +
|-
 +
| E||The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.
 +
|-
 +
| G||The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.
 +
|-
 +
| L||More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)
 +
|-
 +
| M||The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.
 +
|-
 +
| +||More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).
 +
|-
 +
| N||The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.
 +
|-
 +
| O||More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.
 +
|-
 +
| I||The protein reference used was an NCBI GenInfo Identifier (I).
 +
|-
 +
| U||The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.
 +
|-
 +
| T||The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made
 +
|-
 +
| V||The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.
 +
|-
 +
| Q||The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.
 +
|-
 +
| P||The interaction record's primary (P) reference for the protein was used to make the assignment
 +
|-
 +
| S||One of the interaction record's secondary (S) references for the protein was used to make the assignment
 +
|-
 +
| Y|| the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)
 +
|-
 +
| X||More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record
 +
|}
 +
 +
[[Category:iRefIndex]]

Latest revision as of 13:36, 11 December 2013

Interactions available from major taxonomies (corrected)

NCBI taxonomy identifier Scientific name Number of interactions
9606 Homo sapiens 259634
559292 Saccharomyces cerevisiae S288c 117420
7227 Drosophila melanogaster 58383
10090 Mus musculus 31434
3702 Arabidopsis thaliana 23376
6239 Caenorhabditis elegans 17514
83333 Escherichia coli K-12 15118
192222 Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819 11974
284812 Schizosaccharomyces pombe 972h- 8629
10116 Rattus norvegicus 8545
632 Yersinia pestis 3958
243276 Treponema pallidum subsp. pallidum str. Nichols 3642
1111708 Synechocystis sp. PCC 6803 substr. Kazusa 3231
1392 Bacillus anthracis 3042

Interactions

BIND BIND_TRANSLATION BIOGRID CORUM DIP HPRD INNATEDB INTACT MATRIXDB MPACT MPI-IMEX MPI-LIT MPPI OPHID
BIND 62978 52195 22636 221 25228 1957 168 26435 4 6318 6 25 353 2159
BIND_TRANSLATION 60774 24740 195 25245 2684 236 26936 4 6284 6 21 362 2757
BIOGRID 286455 149 30154 10285 618 53317 6 4219 1 114 6619
CORUM 2607 146 154 30 361 15 239
DIP 72351 518 218 30454 3 6704 42 186 54 1160
HPRD 40536 439 5299 17 111 7327
INNATEDB 5495 686 3 18 677
INTACT 200788 16 6747 292 164 159 9901
MATRIXDB 229 1 25
MPACT 13338
MPI-IMEX 468
MPI-LIT 738
MPPI 778 181
OPHID 47497
(Exclusive to source) 8940 4364 208329 1934 26422 24895 3916 124327 186 5337 166 452 224 30445

Interactors

BIND BIND_TRANSLATION BIOGRID CORUM DIP HPRD INNATEDB INTACT MATRIXDB MPACT MPI-IMEX MPI-LIT MPPI OPHID
BIND 37513 30347 17198 2035 15852 2751 1355 20028 99 4360 32 85 660 3088
BIND_TRANSLATION 36160 17915 2003 16038 3095 1476 20286 117 4003 30 94 662 3342
BIOGRID 49568 2500 15195 6439 2103 30814 115 4514 2 480 5776
CORUM 4363 1581 1544 938 3735 51 408 2248
DIP 24567 1621 1211 20312 77 4553 127 387 404 2325
HPRD 9839 1377 5873 102 268 5153
INNATEDB 3776 2998 85 275 1993
INTACT 68510 191 4948 382 553 727 8039
MATRIXDB 249 15 145
MPACT 4982 1
MPI-IMEX 470 81
MPI-LIT 922
MPPI 835 418
OPHID 9525
(Exclusive to source) 5661 3529 14570 353 2352 1908 385 26107 33 18 79 307 19 652

Summary of mapping interaction records to RIGs (Table 5)

Source Total records Protein-related interactions PPI assigned to RIGID % Unique RIGIDs %
BIND 157736 91309 91096 99.77 62978 69.13
BIND_TRANSLATION 192923 84138 82034 97.50 60774 74.08
BIOGRID 722541 433702 431705 99.54 286455 66.35
CORUM 2844 2844 2844 100.00 2607 91.67
DIP 76271 74759 74728 99.96 72351 96.82
HPRD 83022 83022 82983 99.95 40536 48.85
INNATEDB 19531 19531 8059 41.26 5495 68.18
INTACT 281793 269273 269148 99.95 200788 74.60
MATRIXDB 1065 392 392 100.00 229 58.42
MPACT 16504 16504 16308 98.81 13338 81.79
MPI-IMEX 473 473 468 98.94 468 100.00
MPI-LIT 745 745 741 99.46 738 99.60
MPPI 1814 1758 1583 90.05 778 49.15
OPHID 73257 73257 73257 100.00 47497 64.84
(All) 1630519 1151707 1135346 98.58 566670 49.91

Assignment of protein interactors to ROGs (Table 3)

Source Protein interactors Assigned % Arbitrary Matching sequence New or obsolete sequence Unassigned Unique proteins
BIND 252251 251999 99.90 0 0 40561 252 37513
BIND_TRANSLATION 257681 252213 97.88 20799 0 23960 5468 36160
BIOGRID 50562 49780 98.45 10270 0 347 782 49568
CORUM 12916 12916 100.00 7 0 0 0 4363
DIP 25335 25320 99.94 600 0 1290 15 24567
HPRD 123812 123812 100.00 13701 96018 217 0 9839
INNATEDB 41975 26539 63.23 0 0 0 15436 3776
INTACT 223699 223586 99.95 108 35 469 113 68510
MATRIXDB 1274 1274 100.00 0 0 0 0 249
MPACT 40349 40134 99.47 0 0 3 215 4982
MPI-IMEX 946 940 99.37 2 0 0 6 470
MPI-LIT 1460 1455 99.66 7 0 0 5 922
MPPI 3568 3366 94.34 16 0 5 202 835
OPHID 146514 146514 100.00 405 12 1014 0 9525
(All) 1182342 1159848 98.10 45915 96065 67866 22494 113221

ROG summary

BIND BIND_TRANSLATION BIOGRID CORUM DIP HPRD INNATEDB INTACT MATRIXDB MPACT MPI-IMEX MPI-LIT MPPI OPHID
P 185575 33846 12877 26539 222535 616 1044
P+IN 2
P+LY 160 2
P+N 9
PD 124746 1271 2996 124455
PD+IN 1
PD+LQ 10194
PD+LYQ 67
PD+N 22
PD+XQ 26
PDIQ 219
PDIYQ 513
PDQ 15769
PDY 4628 5 992
PDYQ 15454
PGD 623 2050 1
PGD+L 6290 10243 7
PGD+X 12
PI 10
PIY 409
PT 2664 2538 1 30579 320 400
PTD 86577 1 3 44 114
PTD+LQ 4025
PTD+LYQ 12
PTDIYQ 13
PTDQ 2162
PTDY 218
PTDYQ 138
PTGD 23 1
PTGD+L 15 2
PTI 14
PTY 1
PU 16 32 387 2
PU+L 17 7 71 7
PU+O 19
PU+X 610 2 1
PUD 7 143 16979
PUD+L 13 265
PUD+O 12
PUD+X 82 162 3526
PUT 4 15 2527 2 1
PUT+L 19 29 2
PUT+O 16
PUTD 4 9
PUTD+L 3 140
PV 9
PV+LY 1
PVY 8
PY 7603 293 34
S 2 735 11965 84
S+L 6 190 545
S+LY 17 66 6
S+N 2
S+O 306
S+X 86
S+XY 176
SD 4061 3107
SD+L 225 378
SD+LY 29
SD+N 128
SD+O 11135
SD+OY 14
SD+X 1117
SD+XY 3
SDY 232
SGD 627
SGD+L 2094
SGD+O 15516
SI 17
SIY 32240
ST 4796 88 7025
ST+L 25 3755
ST+LY 61
ST+O 876
STD 670 8457
STD+L 4 919
STD+O 28238
STD+OY 6
STDY 2 2
STGD 1600
STGD+L 5928
STGD+O 39794
STI 5
STIY 3475
STY 2 2 3
SUD 46
SUD+L 34 8
SUD+O 2
SUD+X 773
SUTD 11
SUTD+L 27 7
SUTD+O 131
SY 34 780 1

Scores definitions (Table 2)

Character Description of feature (when the value is 1) align="center" style="background:#f0f0f0;"
D The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.
E The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.
G The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.
L More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)
M The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.
+ More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).
N The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.
O More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.
I The protein reference used was an NCBI GenInfo Identifier (I).
U The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.
T The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made
V The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.
Q The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.
P The interaction record's primary (P) reference for the protein was used to make the assignment
S One of the interaction record's secondary (S) references for the protein was used to make the assignment
Y the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)
X More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record