Difference between revisions of "iRefIndex/PBR Notes"

From irefindex
(→‎Retrieving BIND Identifiers: Added query for identifiers without considering interactions.)
(→‎Observations: Fixed first observation: "BIND identifiers" -> "distinct human interaction identifiers".)
Line 395: Line 395:
 
=== Observations ===
 
=== Observations ===
  
There are 183614 BIND identifiers in the iRefIndex 5.1 release, 188690 in the BIND PSI-XML dataset.
+
There are 183614 distinct human interaction identifiers in the iRefIndex 5.1 release, 188690 in the BIND PSI-XML dataset.
  
 
The following distribution of BIND identifiers '''for interactions''' was observed (with each cell representing an intersection of the listed sets):
 
The following distribution of BIND identifiers '''for interactions''' was observed (with each cell representing an intersection of the listed sets):

Revision as of 11:00, 26 October 2009

Preliminary BIND PSI-XML parsing results.

Results

Importing MITAB output from the iRefIndex 5.1 build and a build done using the BIND PSI-XML data, then comparing the interaction details for BIND, the following distributions are observed:

Interactions

iRefIndex 5.1
(Total) (7122)
1324
PSI-XML (8706) 2908 5798

To summarise:

  • Common interactions: 5798
  • Interactions only in iRefIndex 5.1: 1324
  • Interactions only in BIND PSI-XML: 2908

Interactors

iRefIndex 5.1
(Total) (5669)
1450
PSI-XML (5152) 933 4219

To summarise:

  • Common interactors: 4219
  • Interactors only in iRefIndex 5.1: 1450
  • Interactors only in BIND PSI-XML: 933

Queries

Common interactions:

select count(distinct mitab_sources.rigid)
from i5_mitab_sources
inner join i5_mitab_interactions
  on i5_mitab_sources.uidA = i5_mitab_interactions.uidA
  and i5_mitab_sources.uidB = i5_mitab_interactions.uidB
  and i5_mitab_sources.rigid = i5_mitab_interactions.rigid
  and i5_mitab_interactions.taxA = 9606
  and i5_mitab_interactions.taxB = 9606
inner join mitab_sources
  on i5_mitab_sources.rigid = mitab_sources.rigid
  and i5_mitab_sources.name = mitab_sources.name
inner join mitab_interactions
  on mitab_sources.uidA = mitab_interactions.uidA
  and mitab_sources.uidB = mitab_interactions.uidB
  and mitab_sources.rigid = mitab_interactions.rigid
  and mitab_interactions.taxA = 9606
  and mitab_interactions.taxB = 9606
where mitab_sources.name = 'bind';

Interactions only in iRefIndex 5.1:

select count(distinct i5_mitab_sources.rigid)
from i5_mitab_sources
inner join i5_mitab_interactions
  on i5_mitab_sources.uidA = i5_mitab_interactions.uidA
  and i5_mitab_sources.uidB = i5_mitab_interactions.uidB
  and i5_mitab_sources.rigid = i5_mitab_interactions.rigid
  and i5_mitab_interactions.taxA = 9606
  and i5_mitab_interactions.taxB = 9606
where i5_mitab_sources.name = 'bind'
  and i5_mitab_sources.rigid not in (
    select distinct mitab_sources.rigid
    from mitab_sources
    inner join mitab_interactions
      on mitab_sources.uidA = mitab_interactions.uidA
      and mitab_sources.uidB = mitab_interactions.uidB
      and mitab_sources.rigid = mitab_interactions.rigid
      and mitab_interactions.taxA = 9606
      and mitab_interactions.taxB = 9606
    where mitab_sources.name = 'bind'
    );

Interactions only in BIND PSI-XML:

select count(distinct mitab_sources.rigid)
from mitab_sources
inner join mitab_interactions
  on mitab_sources.uidA = mitab_interactions.uidA
  and mitab_sources.uidB = mitab_interactions.uidB
  and mitab_sources.rigid = mitab_interactions.rigid
  and mitab_interactions.taxA = 9606
  and mitab_interactions.taxB = 9606
where mitab_sources.name = 'bind'
  and mitab_sources.rigid not in (
    select distinct i5_mitab_sources.rigid
    from i5_mitab_sources
    inner join i5_mitab_interactions
      on i5_mitab_sources.uidA = i5_mitab_interactions.uidA
      and i5_mitab_sources.uidB = i5_mitab_interactions.uidB
      and i5_mitab_sources.rigid = i5_mitab_interactions.rigid
      and i5_mitab_interactions.taxA = 9606
      and i5_mitab_interactions.taxB = 9606
    where i5_mitab_sources.name = 'bind'
    );

Common interactors:

select count(distinct X.uid)
from (
  select i5_mitab_sources.uidA as uid
  from i5_mitab_sources
  inner join i5_mitab_interactions
    on i5_mitab_sources.uidA = i5_mitab_interactions.uidA
  where name = 'bind' and taxA = 9606
  union
  select i5_mitab_sources.uidB as uid
  from i5_mitab_sources
  inner join i5_mitab_interactions
    on i5_mitab_sources.uidB = i5_mitab_interactions.uidB
  where name = 'bind' and taxB = 9606
  ) as X
inner join (
  select mitab_sources.uidA as uid
  from mitab_sources
  inner join mitab_interactions
    on mitab_sources.uidA = mitab_interactions.uidA
  where name = 'bind' and taxA = 9606
  union
  select mitab_sources.uidB as uid
  from mitab_sources
  inner join mitab_interactions
    on mitab_sources.uidB = mitab_interactions.uidB
  where name = 'bind' and taxB = 9606
  ) as Y
  on X.uid = Y.uid;

Interactors only in iRefIndex 5.1:

select count(distinct X.uid)
from (
  select i5_mitab_sources.uidA as uid
  from i5_mitab_sources
  inner join i5_mitab_interactions
    on i5_mitab_sources.uidA = i5_mitab_interactions.uidA
  where name = 'bind' and taxA = 9606
  union
  select i5_mitab_sources.uidB as uid
  from i5_mitab_sources
  inner join i5_mitab_interactions
    on i5_mitab_sources.uidB = i5_mitab_interactions.uidB
  where name = 'bind' and taxB = 9606
  ) as X
where X.uid not in (
  select mitab_sources.uidA as uid
  from mitab_sources
  inner join mitab_interactions
    on mitab_sources.uidA = mitab_interactions.uidA
  where name = 'bind' and taxA = 9606
  union
  select mitab_sources.uidB as uid
  from mitab_sources
  inner join mitab_interactions
    on mitab_sources.uidB = mitab_interactions.uidB
  where name = 'bind' and taxB = 9606
  );

Interactors only in BIND PSI-XML:

select count(distinct X.uid)
from (
  select mitab_sources.uidA as uid
  from mitab_sources
  inner join mitab_interactions
    on mitab_sources.uidA = mitab_interactions.uidA
  where name = 'bind' and taxA = 9606
  union
  select mitab_sources.uidB as uid
  from mitab_sources
  inner join mitab_interactions
    on mitab_sources.uidB = mitab_interactions.uidB
  where name = 'bind' and taxB = 9606
  ) as X
where X.uid not in (
  select i5_mitab_sources.uidA as uid
  from i5_mitab_sources
  inner join i5_mitab_interactions
    on i5_mitab_sources.uidA = i5_mitab_interactions.uidA
  where name = 'bind' and taxA = 9606
  union
  select i5_mitab_sources.uidB as uid
  from i5_mitab_sources
  inner join i5_mitab_interactions
    on i5_mitab_sources.uidB = i5_mitab_interactions.uidB
  where name = 'bind' and taxB = 9606
  );

Retrieving BIND Identifiers

The following queries determine the appropriate set of interactions and then derive the BIND identifiers from them.

Find all BIND identifiers in iRefIndex 5.1:

create table export_all_i5_bind as
select distinct i5_mitab_interaction_identifiers.uid
from i5_mitab_sources
inner join i5_mitab_interactions
  on i5_mitab_sources.uidA = i5_mitab_interactions.uidA
  and i5_mitab_sources.uidB = i5_mitab_interactions.uidB
  and i5_mitab_sources.rigid = i5_mitab_interactions.rigid
  and i5_mitab_interactions.taxA = 9606
  and i5_mitab_interactions.taxB = 9606
inner join i5_mitab_interaction_identifiers
  on i5_mitab_sources.uidA = i5_mitab_interaction_identifiers.uidA
  and i5_mitab_sources.uidB = i5_mitab_interaction_identifiers.uidB
  and i5_mitab_sources.rigid = i5_mitab_interaction_identifiers.rigid
where name = 'bind' and uid like 'bind:%';

Find all BIND identifiers in BIND PSI-XML:

create table export_all_xml_bind as
select distinct mitab_interaction_identifiers.uid
from mitab_sources
inner join mitab_interactions
  on mitab_sources.uidA = mitab_interactions.uidA
  and mitab_sources.uidB = mitab_interactions.uidB
  and mitab_sources.rigid = mitab_interactions.rigid
  and mitab_interactions.taxA = 9606
  and mitab_interactions.taxB = 9606
inner join mitab_interaction_identifiers
  on mitab_sources.uidA = mitab_interaction_identifiers.uidA
  and mitab_sources.uidB = mitab_interaction_identifiers.uidB
  and mitab_sources.rigid = mitab_interaction_identifiers.rigid
where name = 'bind' and uid like 'bind:%';

Find BIND identifiers for common interactions where those interactions are known by the same identifier:

create table export_common_bind as
select distinct i5_mitab_interaction_identifiers.uid
from i5_mitab_sources
inner join i5_mitab_interactions
  on i5_mitab_sources.uidA = i5_mitab_interactions.uidA
  and i5_mitab_sources.uidB = i5_mitab_interactions.uidB
  and i5_mitab_sources.rigid = i5_mitab_interactions.rigid
  and i5_mitab_interactions.taxA = 9606
  and i5_mitab_interactions.taxB = 9606
inner join i5_mitab_interaction_identifiers
  on i5_mitab_sources.uidA = i5_mitab_interaction_identifiers.uidA
  and i5_mitab_sources.uidB = i5_mitab_interaction_identifiers.uidB
  and i5_mitab_sources.rigid = i5_mitab_interaction_identifiers.rigid
inner join mitab_sources
  on i5_mitab_sources.rigid = mitab_sources.rigid
  and i5_mitab_sources.name = mitab_sources.name
inner join mitab_interactions
  on mitab_sources.uidA = mitab_interactions.uidA
  and mitab_sources.uidB = mitab_interactions.uidB
  and mitab_sources.rigid = mitab_interactions.rigid
  and mitab_interactions.taxA = 9606
  and mitab_interactions.taxB = 9606
inner join mitab_interaction_identifiers
  on mitab_sources.uidA = mitab_interaction_identifiers.uidA
  and mitab_sources.uidB = mitab_interaction_identifiers.uidB
  and mitab_sources.rigid = mitab_interaction_identifiers.rigid
  and i5_mitab_interaction_identifiers.uid = mitab_interaction_identifiers.uid
where mitab_sources.name = 'bind'
  and i5_mitab_interaction_identifiers.uid like 'bind:%';

Find BIND identifiers for interactions only in iRefIndex 5.1:

create table export_i5_bind as
select distinct i5_mitab_interaction_identifiers.uid
from i5_mitab_sources
inner join i5_mitab_interactions
  on i5_mitab_sources.uidA = i5_mitab_interactions.uidA
  and i5_mitab_sources.uidB = i5_mitab_interactions.uidB
  and i5_mitab_sources.rigid = i5_mitab_interactions.rigid
  and i5_mitab_interactions.taxA = 9606
  and i5_mitab_interactions.taxB = 9606
inner join i5_mitab_interaction_identifiers
  on i5_mitab_sources.uidA = i5_mitab_interaction_identifiers.uidA
  and i5_mitab_sources.uidB = i5_mitab_interaction_identifiers.uidB
  and i5_mitab_sources.rigid = i5_mitab_interaction_identifiers.rigid
where i5_mitab_sources.name = 'bind'
  and i5_mitab_interaction_identifiers.uid like 'bind:%'
  and i5_mitab_sources.rigid not in (
    select distinct mitab_sources.rigid
    from mitab_sources
    inner join mitab_interactions
      on mitab_sources.uidA = mitab_interactions.uidA
      and mitab_sources.uidB = mitab_interactions.uidB
      and mitab_sources.rigid = mitab_interactions.rigid
      and mitab_interactions.taxA = 9606
      and mitab_interactions.taxB = 9606
    inner join mitab_interaction_identifiers
      on mitab_sources.uidA = mitab_interaction_identifiers.uidA
      and mitab_sources.uidB = mitab_interaction_identifiers.uidB
      and mitab_sources.rigid = mitab_interaction_identifiers.rigid
    where mitab_sources.name = 'bind'
    );

Find BIND identifiers for interactions only in BIND PSI-XML:

create table export_xml_bind as
select distinct mitab_interaction_identifiers.uid
from mitab_sources
inner join mitab_interactions
  on mitab_sources.uidA = mitab_interactions.uidA
  and mitab_sources.uidB = mitab_interactions.uidB
  and mitab_sources.rigid = mitab_interactions.rigid
  and mitab_interactions.taxA = 9606
  and mitab_interactions.taxB = 9606
inner join mitab_interaction_identifiers
  on mitab_sources.uidA = mitab_interaction_identifiers.uidA
  and mitab_sources.uidB = mitab_interaction_identifiers.uidB
  and mitab_sources.rigid = mitab_interaction_identifiers.rigid
where mitab_sources.name = 'bind'
  and mitab_interaction_identifiers.uid like 'bind:%'
  and mitab_sources.rigid not in (
    select distinct i5_mitab_sources.rigid
    from i5_mitab_sources
    inner join i5_mitab_interactions
      on i5_mitab_sources.uidA = i5_mitab_interactions.uidA
      and i5_mitab_sources.uidB = i5_mitab_interactions.uidB
      and i5_mitab_sources.rigid = i5_mitab_interactions.rigid
      and i5_mitab_interactions.taxA = 9606
      and i5_mitab_interactions.taxB = 9606
    inner join i5_mitab_interaction_identifiers
      on i5_mitab_sources.uidA = i5_mitab_interaction_identifiers.uidA
      and i5_mitab_sources.uidB = i5_mitab_interaction_identifiers.uidB
      and i5_mitab_sources.rigid = i5_mitab_interaction_identifiers.rigid
    where i5_mitab_sources.name = 'bind'
    );

Find BIND identifier coverage for any usage of an identifier:

create table export_identifiers_bind as
select coalesce(A.uid, B.uid) as uid, (case when A.uid is not null then 1 else 0 end) as irefindex, (case when B.uid is not null then 1 else 0 end) as bindpsixml
from (
  select uid
  from i5_mitab_interaction_identifiers as A
  where A.uid like 'bind:%'
  ) as A
full outer join (
  select uid
  from mitab_interaction_identifiers as B
  where B.uid like 'bind:%'
  ) as B
  on A.uid = B.uid
order by A.uid, B.uid;

Find BIND identifier coverage for human interactions:

create table export_identifiers_human_bind as
select coalesce(A.uid, B.uid) as uid, (case when A.uid is not null then 1 else 0 end) as irefindex, (case when B.uid is not null then 1 else 0 end) as bindpsixml
from (
  select distinct i5_mitab_interaction_identifiers.uid
  from i5_mitab_sources
  inner join i5_mitab_interactions
    on i5_mitab_sources.uidA = i5_mitab_interactions.uidA
    and i5_mitab_sources.uidB = i5_mitab_interactions.uidB
    and i5_mitab_sources.rigid = i5_mitab_interactions.rigid
    and i5_mitab_interactions.taxA = 9606
    and i5_mitab_interactions.taxB = 9606
  inner join i5_mitab_interaction_identifiers
    on i5_mitab_sources.uidA = i5_mitab_interaction_identifiers.uidA
    and i5_mitab_sources.uidB = i5_mitab_interaction_identifiers.uidB
    and i5_mitab_sources.rigid = i5_mitab_interaction_identifiers.rigid
  where i5_mitab_sources.name = 'bind'
    and i5_mitab_interaction_identifiers.uid like 'bind:%'
  ) as A
full outer join (
  select distinct mitab_interaction_identifiers.uid
  from mitab_sources
  inner join mitab_interactions
    on mitab_sources.uidA = mitab_interactions.uidA
    and mitab_sources.uidB = mitab_interactions.uidB
    and mitab_sources.rigid = mitab_interactions.rigid
    and mitab_interactions.taxA = 9606
    and mitab_interactions.taxB = 9606
  inner join mitab_interaction_identifiers
    on mitab_sources.uidA = mitab_interaction_identifiers.uidA
    and mitab_sources.uidB = mitab_interaction_identifiers.uidB
    and mitab_sources.rigid = mitab_interaction_identifiers.rigid
  where mitab_sources.name = 'bind'
    and mitab_interaction_identifiers.uid like 'bind:%'
  ) as B
  on A.uid = B.uid
order by A.uid, B.uid;

Observations

There are 183614 distinct human interaction identifiers in the iRefIndex 5.1 release, 188690 in the BIND PSI-XML dataset.

The following distribution of BIND identifiers for interactions was observed (with each cell representing an intersection of the listed sets):

Common iRefIndex 5.1 only BIND PSI-XML only
Common (8639)
iRefIndex 5.1 only 76 (1613)
BIND PSI-XML only 0 1088 (3248)

In other words:

  • There are 8639 common BIND identifiers in total.
  • There are 1613 BIND identifiers appearing for interactions only in iRefIndex 5.1.
  • There are 3248 BIND identifiers appearing for interactions only in the BIND PSI-XML dataset.
  • Although the intersection between "iRefIndex 5.1 only" and "BIND PSI-XML only" should yield zero interactions, since by definition an interaction in one of these sets should not appear in another, 1088 BIND identifiers appear in both.
  • Of the interactions that should only be referenced by iRefIndex 5.1, 76 BIND identifiers also refer to interactions in both iRefIndex 5.1 and the BIND PSI-XML dataset.
  • Of the interactions that should only be referenced by the BIND PSI-XML dataset, 0 BIND identifiers also refer to interactions in both iRefIndex 5.1 and the BIND PSI-XML dataset. This is as expected.

In fact, some BIND identifiers appear to refer to more than one RIG identifier. For example:

  • BIND identifier 100771 refers to 5TlbBV7VB0MFa/R1cmyeByxSFjw in only the iRefIndex 5.1 release, but also to pXZtvCDYsluwWrmdXf5FJmQ0y54 in that release and in the BIND PSI-XML dataset.
  • BIND identifier 100321 refers to VwWuNAmhR/FDx6FtF6FacwAmzr0 in only the iRefIndex 5.1 release, but to Dr5ri7vTj02leEUMKr35jj1fP/c in only the BIND PSI-XML dataset.

See bug #133 for further details and discussion.

See bug #135 for details and discussion of the broader activity.

Other Stuff

This is what I did for the unnecessary "new BIND vs. other sources" experiment...

Downloaded the following:

  • BioGrid
  • CORUM
  • DIP
  • HPRD
  • Intact
  • MINT
  • MIPS (MPPI)
  • OPHID

Modified the following:

  • BioGrid - removed all but human data

Parsed all sources mentioned above.

Removed all non-9606 records.

Did ROG assignment and so on.

Made the reports.