Difference between revisions of "iRefIndex/PBR Notes"
PaulBoddie (talk | contribs) (→Retrieving BIND Identifiers: Added query for identifiers without considering interactions.) |
PaulBoddie (talk | contribs) (→Observations: Fixed first observation: "BIND identifiers" -> "distinct human interaction identifiers".) |
||
Line 395: | Line 395: | ||
=== Observations === | === Observations === | ||
− | There are 183614 | + | There are 183614 distinct human interaction identifiers in the iRefIndex 5.1 release, 188690 in the BIND PSI-XML dataset. |
The following distribution of BIND identifiers '''for interactions''' was observed (with each cell representing an intersection of the listed sets): | The following distribution of BIND identifiers '''for interactions''' was observed (with each cell representing an intersection of the listed sets): |
Revision as of 11:00, 26 October 2009
Preliminary BIND PSI-XML parsing results.
Contents
Results
Importing MITAB output from the iRefIndex 5.1 build and a build done using the BIND PSI-XML data, then comparing the interaction details for BIND, the following distributions are observed:
Interactions
iRefIndex 5.1 | |||
(Total) | (7122) | ||
1324 | |||
PSI-XML | (8706) | 2908 | 5798 |
To summarise:
- Common interactions: 5798
- Interactions only in iRefIndex 5.1: 1324
- Interactions only in BIND PSI-XML: 2908
Interactors
iRefIndex 5.1 | |||
(Total) | (5669) | ||
1450 | |||
PSI-XML | (5152) | 933 | 4219 |
To summarise:
- Common interactors: 4219
- Interactors only in iRefIndex 5.1: 1450
- Interactors only in BIND PSI-XML: 933
Queries
Common interactions:
select count(distinct mitab_sources.rigid) from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidA = i5_mitab_interactions.uidA and i5_mitab_sources.uidB = i5_mitab_interactions.uidB and i5_mitab_sources.rigid = i5_mitab_interactions.rigid and i5_mitab_interactions.taxA = 9606 and i5_mitab_interactions.taxB = 9606 inner join mitab_sources on i5_mitab_sources.rigid = mitab_sources.rigid and i5_mitab_sources.name = mitab_sources.name inner join mitab_interactions on mitab_sources.uidA = mitab_interactions.uidA and mitab_sources.uidB = mitab_interactions.uidB and mitab_sources.rigid = mitab_interactions.rigid and mitab_interactions.taxA = 9606 and mitab_interactions.taxB = 9606 where mitab_sources.name = 'bind';
Interactions only in iRefIndex 5.1:
select count(distinct i5_mitab_sources.rigid) from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidA = i5_mitab_interactions.uidA and i5_mitab_sources.uidB = i5_mitab_interactions.uidB and i5_mitab_sources.rigid = i5_mitab_interactions.rigid and i5_mitab_interactions.taxA = 9606 and i5_mitab_interactions.taxB = 9606 where i5_mitab_sources.name = 'bind' and i5_mitab_sources.rigid not in ( select distinct mitab_sources.rigid from mitab_sources inner join mitab_interactions on mitab_sources.uidA = mitab_interactions.uidA and mitab_sources.uidB = mitab_interactions.uidB and mitab_sources.rigid = mitab_interactions.rigid and mitab_interactions.taxA = 9606 and mitab_interactions.taxB = 9606 where mitab_sources.name = 'bind' );
Interactions only in BIND PSI-XML:
select count(distinct mitab_sources.rigid) from mitab_sources inner join mitab_interactions on mitab_sources.uidA = mitab_interactions.uidA and mitab_sources.uidB = mitab_interactions.uidB and mitab_sources.rigid = mitab_interactions.rigid and mitab_interactions.taxA = 9606 and mitab_interactions.taxB = 9606 where mitab_sources.name = 'bind' and mitab_sources.rigid not in ( select distinct i5_mitab_sources.rigid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidA = i5_mitab_interactions.uidA and i5_mitab_sources.uidB = i5_mitab_interactions.uidB and i5_mitab_sources.rigid = i5_mitab_interactions.rigid and i5_mitab_interactions.taxA = 9606 and i5_mitab_interactions.taxB = 9606 where i5_mitab_sources.name = 'bind' );
Common interactors:
select count(distinct X.uid) from ( select i5_mitab_sources.uidA as uid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidA = i5_mitab_interactions.uidA where name = 'bind' and taxA = 9606 union select i5_mitab_sources.uidB as uid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidB = i5_mitab_interactions.uidB where name = 'bind' and taxB = 9606 ) as X inner join ( select mitab_sources.uidA as uid from mitab_sources inner join mitab_interactions on mitab_sources.uidA = mitab_interactions.uidA where name = 'bind' and taxA = 9606 union select mitab_sources.uidB as uid from mitab_sources inner join mitab_interactions on mitab_sources.uidB = mitab_interactions.uidB where name = 'bind' and taxB = 9606 ) as Y on X.uid = Y.uid;
Interactors only in iRefIndex 5.1:
select count(distinct X.uid) from ( select i5_mitab_sources.uidA as uid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidA = i5_mitab_interactions.uidA where name = 'bind' and taxA = 9606 union select i5_mitab_sources.uidB as uid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidB = i5_mitab_interactions.uidB where name = 'bind' and taxB = 9606 ) as X where X.uid not in ( select mitab_sources.uidA as uid from mitab_sources inner join mitab_interactions on mitab_sources.uidA = mitab_interactions.uidA where name = 'bind' and taxA = 9606 union select mitab_sources.uidB as uid from mitab_sources inner join mitab_interactions on mitab_sources.uidB = mitab_interactions.uidB where name = 'bind' and taxB = 9606 );
Interactors only in BIND PSI-XML:
select count(distinct X.uid) from ( select mitab_sources.uidA as uid from mitab_sources inner join mitab_interactions on mitab_sources.uidA = mitab_interactions.uidA where name = 'bind' and taxA = 9606 union select mitab_sources.uidB as uid from mitab_sources inner join mitab_interactions on mitab_sources.uidB = mitab_interactions.uidB where name = 'bind' and taxB = 9606 ) as X where X.uid not in ( select i5_mitab_sources.uidA as uid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidA = i5_mitab_interactions.uidA where name = 'bind' and taxA = 9606 union select i5_mitab_sources.uidB as uid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidB = i5_mitab_interactions.uidB where name = 'bind' and taxB = 9606 );
Retrieving BIND Identifiers
The following queries determine the appropriate set of interactions and then derive the BIND identifiers from them.
Find all BIND identifiers in iRefIndex 5.1:
create table export_all_i5_bind as select distinct i5_mitab_interaction_identifiers.uid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidA = i5_mitab_interactions.uidA and i5_mitab_sources.uidB = i5_mitab_interactions.uidB and i5_mitab_sources.rigid = i5_mitab_interactions.rigid and i5_mitab_interactions.taxA = 9606 and i5_mitab_interactions.taxB = 9606 inner join i5_mitab_interaction_identifiers on i5_mitab_sources.uidA = i5_mitab_interaction_identifiers.uidA and i5_mitab_sources.uidB = i5_mitab_interaction_identifiers.uidB and i5_mitab_sources.rigid = i5_mitab_interaction_identifiers.rigid where name = 'bind' and uid like 'bind:%';
Find all BIND identifiers in BIND PSI-XML:
create table export_all_xml_bind as select distinct mitab_interaction_identifiers.uid from mitab_sources inner join mitab_interactions on mitab_sources.uidA = mitab_interactions.uidA and mitab_sources.uidB = mitab_interactions.uidB and mitab_sources.rigid = mitab_interactions.rigid and mitab_interactions.taxA = 9606 and mitab_interactions.taxB = 9606 inner join mitab_interaction_identifiers on mitab_sources.uidA = mitab_interaction_identifiers.uidA and mitab_sources.uidB = mitab_interaction_identifiers.uidB and mitab_sources.rigid = mitab_interaction_identifiers.rigid where name = 'bind' and uid like 'bind:%';
Find BIND identifiers for common interactions where those interactions are known by the same identifier:
create table export_common_bind as select distinct i5_mitab_interaction_identifiers.uid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidA = i5_mitab_interactions.uidA and i5_mitab_sources.uidB = i5_mitab_interactions.uidB and i5_mitab_sources.rigid = i5_mitab_interactions.rigid and i5_mitab_interactions.taxA = 9606 and i5_mitab_interactions.taxB = 9606 inner join i5_mitab_interaction_identifiers on i5_mitab_sources.uidA = i5_mitab_interaction_identifiers.uidA and i5_mitab_sources.uidB = i5_mitab_interaction_identifiers.uidB and i5_mitab_sources.rigid = i5_mitab_interaction_identifiers.rigid inner join mitab_sources on i5_mitab_sources.rigid = mitab_sources.rigid and i5_mitab_sources.name = mitab_sources.name inner join mitab_interactions on mitab_sources.uidA = mitab_interactions.uidA and mitab_sources.uidB = mitab_interactions.uidB and mitab_sources.rigid = mitab_interactions.rigid and mitab_interactions.taxA = 9606 and mitab_interactions.taxB = 9606 inner join mitab_interaction_identifiers on mitab_sources.uidA = mitab_interaction_identifiers.uidA and mitab_sources.uidB = mitab_interaction_identifiers.uidB and mitab_sources.rigid = mitab_interaction_identifiers.rigid and i5_mitab_interaction_identifiers.uid = mitab_interaction_identifiers.uid where mitab_sources.name = 'bind' and i5_mitab_interaction_identifiers.uid like 'bind:%';
Find BIND identifiers for interactions only in iRefIndex 5.1:
create table export_i5_bind as select distinct i5_mitab_interaction_identifiers.uid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidA = i5_mitab_interactions.uidA and i5_mitab_sources.uidB = i5_mitab_interactions.uidB and i5_mitab_sources.rigid = i5_mitab_interactions.rigid and i5_mitab_interactions.taxA = 9606 and i5_mitab_interactions.taxB = 9606 inner join i5_mitab_interaction_identifiers on i5_mitab_sources.uidA = i5_mitab_interaction_identifiers.uidA and i5_mitab_sources.uidB = i5_mitab_interaction_identifiers.uidB and i5_mitab_sources.rigid = i5_mitab_interaction_identifiers.rigid where i5_mitab_sources.name = 'bind' and i5_mitab_interaction_identifiers.uid like 'bind:%' and i5_mitab_sources.rigid not in ( select distinct mitab_sources.rigid from mitab_sources inner join mitab_interactions on mitab_sources.uidA = mitab_interactions.uidA and mitab_sources.uidB = mitab_interactions.uidB and mitab_sources.rigid = mitab_interactions.rigid and mitab_interactions.taxA = 9606 and mitab_interactions.taxB = 9606 inner join mitab_interaction_identifiers on mitab_sources.uidA = mitab_interaction_identifiers.uidA and mitab_sources.uidB = mitab_interaction_identifiers.uidB and mitab_sources.rigid = mitab_interaction_identifiers.rigid where mitab_sources.name = 'bind' );
Find BIND identifiers for interactions only in BIND PSI-XML:
create table export_xml_bind as select distinct mitab_interaction_identifiers.uid from mitab_sources inner join mitab_interactions on mitab_sources.uidA = mitab_interactions.uidA and mitab_sources.uidB = mitab_interactions.uidB and mitab_sources.rigid = mitab_interactions.rigid and mitab_interactions.taxA = 9606 and mitab_interactions.taxB = 9606 inner join mitab_interaction_identifiers on mitab_sources.uidA = mitab_interaction_identifiers.uidA and mitab_sources.uidB = mitab_interaction_identifiers.uidB and mitab_sources.rigid = mitab_interaction_identifiers.rigid where mitab_sources.name = 'bind' and mitab_interaction_identifiers.uid like 'bind:%' and mitab_sources.rigid not in ( select distinct i5_mitab_sources.rigid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidA = i5_mitab_interactions.uidA and i5_mitab_sources.uidB = i5_mitab_interactions.uidB and i5_mitab_sources.rigid = i5_mitab_interactions.rigid and i5_mitab_interactions.taxA = 9606 and i5_mitab_interactions.taxB = 9606 inner join i5_mitab_interaction_identifiers on i5_mitab_sources.uidA = i5_mitab_interaction_identifiers.uidA and i5_mitab_sources.uidB = i5_mitab_interaction_identifiers.uidB and i5_mitab_sources.rigid = i5_mitab_interaction_identifiers.rigid where i5_mitab_sources.name = 'bind' );
Find BIND identifier coverage for any usage of an identifier:
create table export_identifiers_bind as select coalesce(A.uid, B.uid) as uid, (case when A.uid is not null then 1 else 0 end) as irefindex, (case when B.uid is not null then 1 else 0 end) as bindpsixml from ( select uid from i5_mitab_interaction_identifiers as A where A.uid like 'bind:%' ) as A full outer join ( select uid from mitab_interaction_identifiers as B where B.uid like 'bind:%' ) as B on A.uid = B.uid order by A.uid, B.uid;
Find BIND identifier coverage for human interactions:
create table export_identifiers_human_bind as select coalesce(A.uid, B.uid) as uid, (case when A.uid is not null then 1 else 0 end) as irefindex, (case when B.uid is not null then 1 else 0 end) as bindpsixml from ( select distinct i5_mitab_interaction_identifiers.uid from i5_mitab_sources inner join i5_mitab_interactions on i5_mitab_sources.uidA = i5_mitab_interactions.uidA and i5_mitab_sources.uidB = i5_mitab_interactions.uidB and i5_mitab_sources.rigid = i5_mitab_interactions.rigid and i5_mitab_interactions.taxA = 9606 and i5_mitab_interactions.taxB = 9606 inner join i5_mitab_interaction_identifiers on i5_mitab_sources.uidA = i5_mitab_interaction_identifiers.uidA and i5_mitab_sources.uidB = i5_mitab_interaction_identifiers.uidB and i5_mitab_sources.rigid = i5_mitab_interaction_identifiers.rigid where i5_mitab_sources.name = 'bind' and i5_mitab_interaction_identifiers.uid like 'bind:%' ) as A full outer join ( select distinct mitab_interaction_identifiers.uid from mitab_sources inner join mitab_interactions on mitab_sources.uidA = mitab_interactions.uidA and mitab_sources.uidB = mitab_interactions.uidB and mitab_sources.rigid = mitab_interactions.rigid and mitab_interactions.taxA = 9606 and mitab_interactions.taxB = 9606 inner join mitab_interaction_identifiers on mitab_sources.uidA = mitab_interaction_identifiers.uidA and mitab_sources.uidB = mitab_interaction_identifiers.uidB and mitab_sources.rigid = mitab_interaction_identifiers.rigid where mitab_sources.name = 'bind' and mitab_interaction_identifiers.uid like 'bind:%' ) as B on A.uid = B.uid order by A.uid, B.uid;
Observations
There are 183614 distinct human interaction identifiers in the iRefIndex 5.1 release, 188690 in the BIND PSI-XML dataset.
The following distribution of BIND identifiers for interactions was observed (with each cell representing an intersection of the listed sets):
Common | iRefIndex 5.1 only | BIND PSI-XML only | |
Common | (8639) | ||
iRefIndex 5.1 only | 76 | (1613) | |
BIND PSI-XML only | 0 | 1088 | (3248) |
In other words:
- There are 8639 common BIND identifiers in total.
- There are 1613 BIND identifiers appearing for interactions only in iRefIndex 5.1.
- There are 3248 BIND identifiers appearing for interactions only in the BIND PSI-XML dataset.
- Although the intersection between "iRefIndex 5.1 only" and "BIND PSI-XML only" should yield zero interactions, since by definition an interaction in one of these sets should not appear in another, 1088 BIND identifiers appear in both.
- Of the interactions that should only be referenced by iRefIndex 5.1, 76 BIND identifiers also refer to interactions in both iRefIndex 5.1 and the BIND PSI-XML dataset.
- Of the interactions that should only be referenced by the BIND PSI-XML dataset, 0 BIND identifiers also refer to interactions in both iRefIndex 5.1 and the BIND PSI-XML dataset. This is as expected.
In fact, some BIND identifiers appear to refer to more than one RIG identifier. For example:
- BIND identifier 100771 refers to 5TlbBV7VB0MFa/R1cmyeByxSFjw in only the iRefIndex 5.1 release, but also to pXZtvCDYsluwWrmdXf5FJmQ0y54 in that release and in the BIND PSI-XML dataset.
- BIND identifier 100321 refers to VwWuNAmhR/FDx6FtF6FacwAmzr0 in only the iRefIndex 5.1 release, but to Dr5ri7vTj02leEUMKr35jj1fP/c in only the BIND PSI-XML dataset.
See bug #133 for further details and discussion.
See bug #135 for details and discussion of the broader activity.
Other Stuff
This is what I did for the unnecessary "new BIND vs. other sources" experiment...
Downloaded the following:
- BioGrid
- CORUM
- DIP
- HPRD
- Intact
- MINT
- MIPS (MPPI)
- OPHID
Modified the following:
- BioGrid - removed all but human data
Parsed all sources mentioned above.
Removed all non-9606 records.
Did ROG assignment and so on.
Made the reports.