Difference between revisions of "iRefIndex Development"

From irefindex
(→‎Adding Sources to iRefIndex: Added description of XML element path analysis.)
(→‎Adding Sources to iRefIndex: Updated the script invocation and added an explanation of the output.)
Line 8: Line 8:
 
## For the specific version, review the format's schema and how the data uses the schema. For example, PSI MI XML permits the specification of interactors within interaction descriptions as well as in a separate interactor list.
 
## For the specific version, review the format's schema and how the data uses the schema. For example, PSI MI XML permits the specification of interactors within interaction descriptions as well as in a separate interactor list.
 
# Review existing, similar mapper definition files.
 
# Review existing, similar mapper definition files.
 +
 +
=== Evaluating the Data ===
  
 
The <tt>show_xml_paths.py</tt> script in the <tt>iRef_PSI_XML2RDBMS</tt> directory can be used to show the different element paths used in an XML data file to hold data items. For example:
 
The <tt>show_xml_paths.py</tt> script in the <tt>iRef_PSI_XML2RDBMS</tt> directory can be used to show the different element paths used in an XML data file to hold data items. For example:
  
  python show_xml_paths.py /home/irefindex/data/MINT/2010-09-14/10023771.psi25.xml
+
  python show_xml_paths.py --data /home/irefindex/data/MINT/2010-09-14/10023771.psi25.xml
  
 
The resulting list of paths indicates the places in the element hierarchy of a PSI-MI XML file where information is actually stored. For example:
 
The resulting list of paths indicates the places in the element hierarchy of a PSI-MI XML file where information is actually stored. For example:
Line 68: Line 70:
 
</pre>
 
</pre>
  
With this information, a suitable mapper file can be identified for the conversion of the XML-encoded data into tabular data to be stored in a database.
+
With this information, a suitable mapper file can be identified for the conversion of the XML-encoded data into tabular data to be stored in a database. In the above example, it is apparent that the experiment, interaction and interactor details reside alongside each other within each <tt>entry</tt> element:
 +
 
 +
<pre>
 +
entrySet/entry/experimentList/experimentDescription
 +
entrySet/entry/interactionList/interaction
 +
entrySet/entry/interactionList/interaction/participantList/participant
 +
entrySet/entry/interactorList/interactor
 +
</pre>
 +
 
 +
In contrast, other PSI-MI XML files adopt a different structure which can be reduced to the following:
 +
 
 +
<pre>
 +
entrySet/entry/interactionList/interaction
 +
entrySet/entry/interactionList/interaction/experimentList/experimentDescription
 +
entrySet/entry/interactionList/interaction/participantList/participant
 +
</pre>
 +
 
 +
The different sources can be divided into a number of subformats as follows:
 +
 
 +
{| border="1" cellspacing="0" cellpadding="5"
 +
! Subformat
 +
! Sources
 +
! Notes
 +
|-
 +
| Separate experiment, interaction, interactor lists
 +
| BioGRID, HPRD, IntAct, MINT, OPHID
 +
| BioGRID uses proteininteractor instead of interactor; OPHID uses proteinParticipant, proteinInteractor
 +
|-
 +
| Interaction contains experiment; separate interactor list
 +
| DIP
 +
|
 +
|-
 +
| Interaction contains experiment and interactor/participant
 +
| BIND Translation, CORUM, MPACT, MPPI
 +
| MPPI uses proteinParticipant, proteinInteractor
 +
|}
 +
 
 +
=== Reviewing Mapper Files ===
 +
 
 +
The mapper files already in existence can be reviewed by using the <tt>show_xml_paths.py</tt> script together with one of these files which reside in the <tt>mapper</tt> subdirectory of the <tt>iRef_PSI_XML2RDBMS</tt> directory. For example:
 +
 
 +
python show_xml_paths.py --mapper mapper/Map25_CORUM.xml
 +
 
 +
The resulting output describes the structure of the data and how the mapper will attempt to interpret that data. For example (for CORUM):
 +
 
 +
<pre>
 +
Element experimentDescription ...
 +
  Table int_name ...
 +
    _euid ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription
 +
    _idetlbl ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/alias
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullname
 +
    _idetncat ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/alias
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullname
 +
  Table int_xref ...
 +
    _euid ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription
 +
    _brefdb ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@db
 +
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@db
 +
    _brefid ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@id
 +
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef/@id
 +
    _brefct ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef
 +
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef
 +
  Table int_xref ...
 +
    _euid ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription
 +
    _idetdb ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@db
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@db
 +
    _idetid ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@id
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@id
 +
    _idetct ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef
 +
Element experimentList ...
 +
  Table int_experiment ...
 +
    _euidr ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/bibref
 +
    _iuider ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/bibref
 +
Element interaction ...
 +
  Table int_name ...
 +
    _iuid ...
 +
      entry/interactionList/interaction
 +
    _iuiflnm ...
 +
      entry/interactionList/interaction/names/fullName
 +
    _iuiflnmct ...
 +
      entry/interactionList/interaction/names/fullName
 +
  Table int_source ...
 +
    _iuid ...
 +
      entry/interactionList/interaction
 +
    _itp ...
 +
      entry/interactionList/interaction/xref
 +
    _isrc ...
 +
      entry/interactionList/interaction/xref
 +
    _ifle ...
 +
      entry/interactionList/interaction/xref
 +
  Table int_xref ...
 +
    _iuid ...
 +
      entry/interactionList/interaction
 +
    _idb ...
 +
      entry/interactionList/interaction/xref/primaryRef/@db
 +
    _iref ...
 +
      entry/interactionList/interaction/xref/primaryRef/@id
 +
    _irefcat ...
 +
      entry/interactionList/interaction/xref/primaryRef
 +
Element participant ...
 +
  Table int_name ...
 +
    _ouid ...
 +
      entry/interactionList/interaction/participantList/participant/interactor
 +
    _olb ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/names/shortLabel
 +
      entry/interactionList/interaction/participantList/participant/interactor/names/alias
 +
      entry/interactionList/interaction/participantList/participant/interactor/names/fullName
 +
    _olbct ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/names/shortLabel
 +
      entry/interactionList/interaction/participantList/participant/interactor/names/alias
 +
      entry/interactionList/interaction/participantList/participant/interactor/names/fullName
 +
  Table int_object ...
 +
    _ouid ...
 +
      entry/interactionList/interaction/participantList/participant/interactor
 +
    _oltyp ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/interactorType/names/shortLabel
 +
    _osrc ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/names
 +
    _ofil ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/names
 +
  Table int_sequence ...
 +
    _ouid ...
 +
      entry/interactionList/interaction/participantList/participant/interactor
 +
    _obsq ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/sequence
 +
  Table int_xref ...
 +
    _ouid ...
 +
      entry/interactionList/interaction/participantList/participant/interactor
 +
    _odb ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@db
 +
      entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@db
 +
    _orefid ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@id
 +
      entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@id
 +
    _oicat ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef
 +
      entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef
 +
    _otax ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/organism/@ncbiTaxId
 +
    _otp ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@refType
 +
      entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@refType
 +
Element participantList ...
 +
  Table int_source2object ...
 +
    _iuidr ...
 +
      entry/interactionList/interaction/participantList
 +
    _what ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/names
 +
    _isrcr ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/names
 +
    _ifler ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/names
 +
    _refob ...
 +
      entry/interactionList/interaction/participantList/participant/interactor/names
 +
</pre>
  
 
== All iRefIndex Pages ==
 
== All iRefIndex Pages ==

Revision as of 18:11, 7 October 2010

See iRefIndex Issues and Notes for details of ongoing work to improve the iRefIndex software.

Adding Sources to iRefIndex

  1. Identify the location of the downloaded data.
  2. Evaluate the form of the data:
    1. For PSI MI XML (Molecular Interaction XML) documents, check the version of the format employed by the data documents.
    2. For the specific version, review the format's schema and how the data uses the schema. For example, PSI MI XML permits the specification of interactors within interaction descriptions as well as in a separate interactor list.
  3. Review existing, similar mapper definition files.

Evaluating the Data

The show_xml_paths.py script in the iRef_PSI_XML2RDBMS directory can be used to show the different element paths used in an XML data file to hold data items. For example:

python show_xml_paths.py --data /home/irefindex/data/MINT/2010-09-14/10023771.psi25.xml

The resulting list of paths indicates the places in the element hierarchy of a PSI-MI XML file where information is actually stored. For example:

entrySet/entry/experimentList/experimentDescription/attributeList/attribute
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/names/fullName
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/names/shortLabel
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/alias
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/fullName
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
entrySet/entry/experimentList/experimentDescription/names/fullName
entrySet/entry/experimentList/experimentDescription/names/shortLabel
entrySet/entry/interactionList/interaction/attributeList/attribute
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/names/fullName
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/names/shortLabel
entrySet/entry/interactionList/interaction/confidenceList/confidence/value
entrySet/entry/interactionList/interaction/experimentList/experimentRef
entrySet/entry/interactionList/interaction/interactionType/names/fullName
entrySet/entry/interactionList/interaction/interactionType/names/shortLabel
entrySet/entry/interactionList/interaction/intraMolecular
entrySet/entry/interactionList/interaction/modelled
entrySet/entry/interactionList/interaction/names/shortLabel
entrySet/entry/interactionList/interaction/negative
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/isLink
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/interactorRef
entrySet/entry/interactionList/interaction/participantList/participant/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/alias
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/shortLabel
entrySet/entry/interactorList/interactor/attributeList/attribute
entrySet/entry/interactorList/interactor/interactorType/names/fullName
entrySet/entry/interactorList/interactor/interactorType/names/shortLabel
entrySet/entry/interactorList/interactor/names/alias
entrySet/entry/interactorList/interactor/names/fullName
entrySet/entry/interactorList/interactor/names/shortLabel
entrySet/entry/interactorList/interactor/organism/names/fullName
entrySet/entry/interactorList/interactor/organism/names/shortLabel
entrySet/entry/interactorList/interactor/sequence
entrySet/entry/source/attributeList/attribute
entrySet/entry/source/names/fullName
entrySet/entry/source/names/shortLabel

With this information, a suitable mapper file can be identified for the conversion of the XML-encoded data into tabular data to be stored in a database. In the above example, it is apparent that the experiment, interaction and interactor details reside alongside each other within each entry element:

entrySet/entry/experimentList/experimentDescription
entrySet/entry/interactionList/interaction
entrySet/entry/interactionList/interaction/participantList/participant
entrySet/entry/interactorList/interactor

In contrast, other PSI-MI XML files adopt a different structure which can be reduced to the following:

entrySet/entry/interactionList/interaction
entrySet/entry/interactionList/interaction/experimentList/experimentDescription
entrySet/entry/interactionList/interaction/participantList/participant

The different sources can be divided into a number of subformats as follows:

Subformat Sources Notes
Separate experiment, interaction, interactor lists BioGRID, HPRD, IntAct, MINT, OPHID BioGRID uses proteininteractor instead of interactor; OPHID uses proteinParticipant, proteinInteractor
Interaction contains experiment; separate interactor list DIP
Interaction contains experiment and interactor/participant BIND Translation, CORUM, MPACT, MPPI MPPI uses proteinParticipant, proteinInteractor

Reviewing Mapper Files

The mapper files already in existence can be reviewed by using the show_xml_paths.py script together with one of these files which reside in the mapper subdirectory of the iRef_PSI_XML2RDBMS directory. For example:

python show_xml_paths.py --mapper mapper/Map25_CORUM.xml

The resulting output describes the structure of the data and how the mapper will attempt to interpret that data. For example (for CORUM):

Element experimentDescription ...
  Table int_name ...
    _euid ...
      entry/interactionList/interaction/experimentList/experimentDescription
    _idetlbl ...
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/alias
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullname
    _idetncat ...
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/alias
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullname
  Table int_xref ...
    _euid ...
      entry/interactionList/interaction/experimentList/experimentDescription
    _brefdb ...
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@db
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@db
    _brefid ...
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@id
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef/@id
    _brefct ...
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef
  Table int_xref ...
    _euid ...
      entry/interactionList/interaction/experimentList/experimentDescription
    _idetdb ...
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@db
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@db
    _idetid ...
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@id
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@id
    _idetct ...
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef
Element experimentList ...
  Table int_experiment ...
    _euidr ...
      entry/interactionList/interaction/experimentList/experimentDescription/bibref
    _iuider ...
      entry/interactionList/interaction/experimentList/experimentDescription/bibref
Element interaction ...
  Table int_name ...
    _iuid ...
      entry/interactionList/interaction
    _iuiflnm ...
      entry/interactionList/interaction/names/fullName
    _iuiflnmct ...
      entry/interactionList/interaction/names/fullName
  Table int_source ...
    _iuid ...
      entry/interactionList/interaction
    _itp ...
      entry/interactionList/interaction/xref
    _isrc ...
      entry/interactionList/interaction/xref
    _ifle ...
      entry/interactionList/interaction/xref
  Table int_xref ...
    _iuid ...
      entry/interactionList/interaction
    _idb ...
      entry/interactionList/interaction/xref/primaryRef/@db
    _iref ...
      entry/interactionList/interaction/xref/primaryRef/@id
    _irefcat ...
      entry/interactionList/interaction/xref/primaryRef
Element participant ...
  Table int_name ...
    _ouid ...
      entry/interactionList/interaction/participantList/participant/interactor
    _olb ...
      entry/interactionList/interaction/participantList/participant/interactor/names/shortLabel
      entry/interactionList/interaction/participantList/participant/interactor/names/alias
      entry/interactionList/interaction/participantList/participant/interactor/names/fullName
    _olbct ...
      entry/interactionList/interaction/participantList/participant/interactor/names/shortLabel
      entry/interactionList/interaction/participantList/participant/interactor/names/alias
      entry/interactionList/interaction/participantList/participant/interactor/names/fullName
  Table int_object ...
    _ouid ...
      entry/interactionList/interaction/participantList/participant/interactor
    _oltyp ...
      entry/interactionList/interaction/participantList/participant/interactor/interactorType/names/shortLabel
    _osrc ...
      entry/interactionList/interaction/participantList/participant/interactor/names
    _ofil ...
      entry/interactionList/interaction/participantList/participant/interactor/names
  Table int_sequence ...
    _ouid ...
      entry/interactionList/interaction/participantList/participant/interactor
    _obsq ...
      entry/interactionList/interaction/participantList/participant/interactor/sequence
  Table int_xref ...
    _ouid ...
      entry/interactionList/interaction/participantList/participant/interactor
    _odb ...
      entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@db
      entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@db
    _orefid ...
      entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@id
      entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@id
    _oicat ...
      entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef
      entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef
    _otax ...
      entry/interactionList/interaction/participantList/participant/interactor/organism/@ncbiTaxId
    _otp ...
      entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@refType
      entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@refType
Element participantList ...
  Table int_source2object ...
    _iuidr ...
      entry/interactionList/interaction/participantList
    _what ...
      entry/interactionList/interaction/participantList/participant/interactor/names
    _isrcr ...
      entry/interactionList/interaction/participantList/participant/interactor/names
    _ifler ...
      entry/interactionList/interaction/participantList/participant/interactor/names
    _refob ...
      entry/interactionList/interaction/participantList/participant/interactor/names

All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).