Difference between revisions of "Statistics iRefIndex 14.0"

Latest revision as of 14:08, 21 April 2015

These statistics apply to the extended version of iRefIndex. See the iRefIndex_Release_Notes for details.

Interactions available from major taxonomies (corrected)

Taxons of the protein interactors have been corrected to correspond to the taxon provided in the protein sequence record regardless of the taxon listed in the interaction record. See PMID 18823568 for details.

NCBI taxonomy identifier	Scientific name	Number of interactions
9606	Homo sapiens	472494
559292	Saccharomyces cerevisiae S288c	122323
7227	Drosophila melanogaster	60888
10090	Mus musculus	35318
3702	Arabidopsis thaliana	24946
6239	Caenorhabditis elegans	17843
83333	Escherichia coli K-12	16450
192222	Campylobacter jejuni subsp. jejuni NCTC 11168 = ATCC 700819	11971
10116	Rattus norvegicus	9679
284812	Schizosaccharomyces pombe 972h-	9387
381518	Influenza A virus (A/Wilson-Smith/1933(H1N1))	4087
632	Yersinia pestis	3956
243276	Treponema pallidum subsp. pallidum str. Nichols	3642
1111708	Synechocystis sp. PCC 6803 substr. Kazusa	3232

Summary of mapping interaction records to RIGs (redundant interaction groups)

Source: Interaction data source. Total records: Total number of interaction records found in source. Protein-only interactors:Total number of interactions involving only protein interactors. PPI assigned to RIGID: Number of interactions where all protein interactors were assigned to a ROG. Percentage of column 3 is shown. Unique RIGIDs (interactions): Number of unique protein interactions and complexes (RIGID's) found in the data source (also expressed as a percentage of column 4). For a description of the term RIGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source	Total records	Protein-related interactions	PPI assigned to RIGID	%	Unique RIGIDs	%
BHF_UCL	928	915	915	100.00	518	56.61
BIND	157736	91309	90816	99.46	62858	69.21
BIND_TRANSLATION	192923	84138	81773	97.19	60720	74.25
BIOGRID	790004	493818	491294	99.49	324083	65.97
CORUM	2844	2844	2844	100.00	2607	91.67
DIP	78781	77225	77052	99.78	74638	96.87
HPIDB	1458	1405	1405	100.00	725	51.60
HPRD	83022	83022	82983	99.95	40536	48.85
I2D_IMEX	892	891	891	100.00	434	48.71
INNATEDB	17496	17496	7111	40.64	4932	69.36
INTACT	344906	327730	327637	99.97	224568	68.54
INTCOMPLEX	1100	982	982	100.00	968	98.57
MATRIXDB	596	575	575	100.00	324	56.35
MBINFO	542	521	521	100.00	330	63.34
MOLCON	377	375	375	100.00	212	56.53
MPACT	16504	16504	16373	99.21	13398	81.83
MPIDB	1505	1504	1504	100.00	954	63.43
MPPI	1814	1758	1578	89.76	776	49.18
OPHID	73257	73257	73257	100.00	47464	64.79
REACTOME	141996	141996	141993	100.00	141818	99.88
SPIKE	29686	29686	28323	95.41	27824	98.24
UNIPROTPP	8952	8890	8890	100.00	5049	56.79
VIRUSHOST	45540	45540	45539	100.00	45538	100.00
(All)	1992859	1502381	1484631	98.82	797994	53.75

Assignment of protein interactors to ROGs (redundant object group)

Source: Interaction data source (see methods). Protein interactors: Total number of interactors found in all interaction records. Assigned: Number of proteins assigned unambiguously to a ROG. Assignments listed in columns 5 and 6 are not included here. %: Column 3 expressed as a percentage of column 2. Arbitrary: Total number of ROG assignments that were ambiguous and resolved with an arbitrary method (see ROG scores with 'L'). Matching sequence: Total number of assignments made where a sequence in the interaction record matched a known sequence. Unassigned:Total number of protein interactors that could not be assigned to a ROG. Unique: Total number of unique proteins (ROG's). For a description of the term ROGs, see README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format and the original paper PMID 18823568.

Source	Protein interactors	Assigned	%	Arbitrary	Matching sequence	New or obsolete sequence	Unassigned	Unique proteins
BHF_UCL	2060	2060	100.00	0	0	0	0	494
BIND	252251	251706	99.78	0	0	0	545	37441
BIND_TRANSLATION	257681	251597	97.64	40883	0	0	6084	36124
BIOGRID	53047	52116	98.24	11433	0	0	931	51873
CORUM	12916	12916	100.00	7	0	0	0	4363
DIP	26633	26551	99.69	2084	0	0	82	25804
HPIDB	3221	3221	100.00	0	0	0	0	782
HPRD	123812	123812	100.00	13563	95615	130	0	9841
I2D_IMEX	1932	1932	100.00	0	0	0	0	448
INNATEDB	40104	24918	62.13	0	0	0	15186	3619
INTACT	265428	265292	99.95	115	39	74	136	73955
INTCOMPLEX	3256	3256	100.00	0	0	0	0	2194
MATRIXDB	1171	1171	100.00	5	0	0	0	231
MBINFO	1134	1134	100.00	0	0	0	0	273
MOLCON	862	862	100.00	0	0	0	0	275
MPACT	40349	40199	99.63	0	0	0	150	4995
MPIDB	3238	3238	100.00	0	0	0	0	995
MPPI	3568	3361	94.20	16	0	0	207	833
OPHID	146514	146514	100.00	405	20	1014	0	9476
REACTOME	283992	283988	100.00	19	0	0	4	6013
SPIKE	65934	64561	97.92	967	0	0	1373	8811
UNIPROTPP	21185	21185	100.00	1	0	0	0	4642
VIRUSHOST	94874	94873	100.00	22	0	0	1	10283
(All)	1705162	1680463	98.55	69520	95674	1218	24699	122677

Mapping score summary

See below for definitions of the mapping score codes.

	BHF_UCL	BIND	BIND_TRANSLATION	BIOGRID	CORUM	DIP	HPIDB	HPRD	I2D_IMEX	INNATEDB	INTACT	INTCOMPLEX	MATRIXDB	MBINFO	MOLCON	MPACT	MPIDB	MPPI	OPHID	REACTOME	SPIKE	UNIPROTPP	VIRUSHOST
P	2060		173353	33822	12875		3221		1932	24918	264204	3256	1166	1134	862		3238			283963	55189	21180	94851
P+IN											6
P+L			19764	746							2												22
P+N											64
P+X			3	2
PD		116272																2996	124085
PD+IN											2
PD+LQ			10197
PD+N																			1014
PD+X		10
PD+XQ			26
PDIQ			732
PDQ			30573
PGD			613	2079																	306
PGD+L			6300	10659							6										962
PGD+X				13
PI											418
PT			2084	2579							1					30579
PT+L			541	1
PTD		84164									1							44	114
PTD+LQ			4022
PTDIQ			13
PTDQ			2492
PTGD			17	1
PTGD+L			21	2
PTI											16
PTM											3
PU			16		34						396									6	8099	4
PU+L			17		7						76		5							19	5
PU+O											23
PU+X			610								2
PUD		7																143	17341
PUD+L																		13	265
PUD+O																			20
PUD+X		60																162	3526
PUT			4								15					2527
PUT+L			21								30											1
PUT+O											16
PUTD		4																	9
PUTD+L																		3	140
PV											7
PV+L											1
S			146	831		12454		115			1
S+L				25		1560		634
S+N											2
S+O								275
S+X						263
SD				1338		4690		3119
SD+L						215		327
SD+N								130
SD+O								11114
SD+X						1173
SGD								680
SGD+L								2124
SGD+O								15462
SI		45114
ST						4557		112								7093
ST+L						243		3767
ST+O								852
STD				18		702		8455
STD+L						5		645
STD+O								28208
STGD								2023
STGD+L								6026
STGD+O								39571
STI		6075
SU			32
SUD						47
SUD+L						33		25
SUD+O								2
SUD+X						568
SUTD						13
SUTD+L						28		15
SUTD+O								131

Mapping score code definitions

Character	Description of feature (when the value is 1)	align="center" style="background:#f0f0f0;"
D	The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.
E	The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.
G	The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.
L	More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)
M	The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.
+	More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The reference supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).
N	The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.
O	More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.
I	The protein reference used was an NCBI GenInfo Identifier (I).
U	The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.
T	The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made
V	The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.
Q	The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.
P	The interaction record's primary (P) reference for the protein was used to make the assignment
S	One of the interaction record's secondary (S) references for the protein was used to make the assignment
Y	the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)
X	More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record

@@ Line 44: / Line 44: @@
 == Summary of mapping interaction records to RIGs (redundant interaction groups) ==
-'''Source''': Interaction data source.  '''Total records''': Total number of interaction records found in source.  '''Protein-only interactors''':Total number of interactions involving only protein interactors.  '''PPI assigned to RIGID''': Number of interactions where all protein interactors were assigned to a ROG. Percentage of column 3 is shown.  '''Unique interactions''': Number of unique protein interactions and complexes (RIGID's) found in the data source (also expressed as a percentage of column 4).   For a description of the term RIGs, see [[README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format]] and the original paper PMID 18823568.
+'''Source''': Interaction data source.  '''Total records''': Total number of interaction records found in source.  '''Protein-only interactors''':Total number of interactions involving only protein interactors.  '''PPI assigned to RIGID''': Number of interactions where all protein interactors were assigned to a ROG. Percentage of column 3 is shown.  '''Unique RIGIDs (interactions)''': Number of unique protein interactions and complexes (RIGID's) found in the data source (also expressed as a percentage of column 4).   For a description of the term RIGs, see [[README_MITAB2.6_for_iRefIndex#Understanding_the_iRefIndex_MITAB_format]] and the original paper PMID 18823568.
 {| cellspacing="0" cellpadding="5"
@@ Line 153: / Line 153: @@
 |}
-== ROG summary ==
+== Mapping score summary ==
+See below for definitions of the mapping score codes.
 {| cellspacing="0" cellpadding="5"
@@ Line 306: / Line 308: @@
 | SUTD+O ||  ||  ||  ||  ||  ||  ||  ||131 ||  ||  ||  ||  ||  ||  ||  ||  ||  ||  ||  ||  ||  ||  ||
 |}
+== Mapping score code definitions ==
+{|
+| align="center" style="background:#f0f0f0;"|'''Character'''||align="center" style="background:#f0f0f0;"|'''Description of feature (when the value is 1)'''||align="center" style="background:#f0f0f0;"
+|-
+| D||The source database (D) listed in the interaction record is different than what is expected for the given accession for the protein. In specific cases, this difference is tolerated and the assignment is made.
+|-
+| E||The protein reference was a retired NCBI Identifier or a UniProt identifier. NCBI's eUtils (E) were used to retrieve the current accession and/or sequence. For the identifiers still with no sequence after going through eUtils, sequence information obtained from UniProt.
+|-
+| G||The interaction record's reference for the protein was an EntrezGene (G) identifier. The corresponding products of the gene were used to make the assignment.
+|-
+| L||More than one possible assignment is possible (see + above). (e.g. isoforms for a geneid) In such a situation, references are picked using a ranking system (first look for RefSeq, then UniProt). Even after this ranking if ambiguity exists, the reference with lengthiest sequences selected. (Please note that this score class definition is different from originally published one)
+|-
+| M||The protein reference listed by the interaction record was a typographical modification (M) of a known accession. In specific cases, this variation is tolerated and the assignment is made.
+|-
+| +||More than one possible assignment is possible (+). This case may arise in one of three ways. 1) The refere[[Category:iRefIndex]]nce supplied by the interaction record requires updating but more than one possibility exists. For example, Q7XJL8 was found to be a secondary accession in three separate UniProt records (Q3EBZ2, Q6DR20, and Q8GWA9). 2) The secondary references supplied by the interaction record point to more than one unique protein sequence. 3) An EntrezGene identifier is provided in the interaction record as a protein reference. This identifier points to more than one protein product. An attempt is made to resolve this ambiguity as indicated by ROG score features O, X or L (see below).
+|-
+| N||The protein reference, taxonomy identifier and sequence for the protein as provided in the interaction record are used to make a new entry in the SEGUID table. The protein interactor is assigned the newly (N) generated ROG identifier.
+|-
+| O||More than one possible assignment is possible (see + above). The assignment chosen has a SEGUID that is identical to the SEGUID of the original (O) sequence provided in the interaction record.
+|-
+| I||The protein reference used was an NCBI GenInfo Identifier (I).
+|-
+| U||The protein reference listed in the interaction record and used to make the assignment was a secondary UniProt accession and was updated (U) to a primary UniProt accession in order to make the assignment.
+|-
+| T||The taxonomy (T) identifier for the protein (as supplied by the interaction record) differed from what was found in the protein sequence record. This discrepancy was tolerated and the assignment was made
+|-
+| V||The protein reference listed by the interaction record contained version (V) information that was ignored. For example, RefSeq accession.version NP_012420.1 was listed but treated as RefSeq accession NP_012420.
+|-
+| Q||The protein reference used to make the assignment was of the type 'see-also'. See PSI-MI Path: entrySet/entry/interactorList/interactor/xref/primaryRef/refType = 'see-also'.
+|-
+| P||The interaction record's primary (P) reference for the protein was used to make the assignment
+|-
+| S||One of the interaction record's secondary (S) references for the protein was used to make the assignment
+|-
+| Y|| the accession was referring an accession which was removed from RefSeq or UniProt after beta3 build of iRefIndex (March 9th, 2009)
+|-
+| X||More than one possible assignment is possible (see + above). The assignment chosen has the same taxonomy (X) identifier as listed in the interaction record
+|}
 [[Category:iRefIndex]]

Anonymous

Search

Difference between revisions of "Statistics iRefIndex 14.0"

Namespaces

More

Page actions

Latest revision as of 14:08, 21 April 2015

Contents

Interactions available from major taxonomies (corrected)

Summary of mapping interaction records to RIGs (redundant interaction groups)

Assignment of protein interactors to ROGs (redundant object group)

Mapping score summary

Mapping score code definitions

Navigation

Navigation

Internal Links

Wiki tools

Wiki tools

Anonymous

Search

Difference between revisions of "Statistics iRefIndex 14.0"

Latest revision as of 14:08, 21 April 2015

Contents

Interactions available from major taxonomies (corrected)

Summary of mapping interaction records to RIGs (redundant interaction groups)

Assignment of protein interactors to ROGs (redundant object group)

Mapping score summary

Mapping score code definitions

Navigation

Wiki tools

Page tools

Categories