Difference between revisions of "iRefIndex Development"

From irefindex
(→‎Evaluating the Data: Added InnateDB observations.)
(Added note.)
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
{{Note|
 +
This page describes development processes related to the code supporting iRefIndex release 9 and earlier.
 +
}}
 +
 
See [[iRefIndex Issues and Notes]] for details of ongoing work to improve the iRefIndex software.
 
See [[iRefIndex Issues and Notes]] for details of ongoing work to improve the iRefIndex software.
  
Line 5: Line 9:
 
# Identify the location of the downloaded data.
 
# Identify the location of the downloaded data.
 
# Evaluate the form of the data:
 
# Evaluate the form of the data:
## For PSI MI XML (Molecular Interaction XML) documents, check the version of the format employed by the data documents.
+
#* For PSI MI XML (Molecular Interaction XML) documents, check the version of the format employed by the data documents.
## For the specific version, review the format's schema and how the data uses the schema. For example, PSI MI XML permits the specification of interactors within interaction descriptions as well as in a separate interactor list.
+
#* For the specific version, review the format's schema and how the data uses the schema. For example, PSI MI XML permits the specification of interactors within interaction descriptions as well as in a separate interactor list.
 +
#* For MITAB files, see [[iRefIndex MITAB Mapping]].
 
# Review existing, similar mapper definition files.
 
# Review existing, similar mapper definition files.
  
 
=== Evaluating the Data ===
 
=== Evaluating the Data ===
  
The <tt>show_xml_paths.py</tt> script in the <tt>iRef_PSI_XML2RDBMS</tt> directory can be used to show the different element paths used in an XML data file to hold data items. For example:
+
The <tt>show_xml_paths.py</tt> script in the <tt>tools</tt> directory within the <tt>iRef_PSI_XML2RDBMS</tt> directory can be used to show the different element paths used in an XML data file to hold data items. For example:
  
  python show_xml_paths.py --data /home/irefindex/data/MINT/2010-09-14/10023771.psi25.xml
+
  python tools/show_xml_paths.py --data /home/irefindex/data/MINT/2010-09-14/10023771.psi25.xml
  
 
The resulting list of paths indicates the places in the element hierarchy of a PSI-MI XML file where information is actually stored. For example:
 
The resulting list of paths indicates the places in the element hierarchy of a PSI-MI XML file where information is actually stored. For example:
  
 
<pre>
 
<pre>
 +
entrySet/@level
 +
entrySet/@minorVersion
 +
entrySet/@version
 +
entrySet/@xmlns
 +
entrySet/@xmlns:xsi
 +
entrySet/@xsi:schemaLocation
 +
entrySet/entry/experimentList/experimentDescription/@id
 
entrySet/entry/experimentList/experimentDescription/attributeList/attribute
 
entrySet/entry/experimentList/experimentDescription/attributeList/attribute
 +
entrySet/entry/experimentList/experimentDescription/attributeList/attribute/@name
 +
entrySet/entry/experimentList/experimentDescription/attributeList/attribute/@nameAc
 +
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@db
 +
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@dbAc
 +
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@id
 +
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@refType
 +
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@refTypeAc
 +
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/@ncbiTaxId
 
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/names/fullName
 
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/names/fullName
 
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/names/shortLabel
 
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/names/shortLabel
 
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/alias
 
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/alias
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/alias/@type
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/alias/@typeAc
 
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/fullName
 
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/fullName
 
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
 
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@db
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@dbAc
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@id
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@refType
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@refTypeAc
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@db
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@dbAc
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@id
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@refType
 +
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@refTypeAc
 
entrySet/entry/experimentList/experimentDescription/names/fullName
 
entrySet/entry/experimentList/experimentDescription/names/fullName
 
entrySet/entry/experimentList/experimentDescription/names/shortLabel
 
entrySet/entry/experimentList/experimentDescription/names/shortLabel
 +
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@db
 +
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@dbAc
 +
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@id
 +
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@refType
 +
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@refTypeAc
 +
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@db
 +
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@dbAc
 +
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@id
 +
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@refType
 +
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/@id
 
entrySet/entry/interactionList/interaction/attributeList/attribute
 
entrySet/entry/interactionList/interaction/attributeList/attribute
 +
entrySet/entry/interactionList/interaction/attributeList/attribute/@name
 +
entrySet/entry/interactionList/interaction/attributeList/attribute/@nameAc
 
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/names/fullName
 
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/names/fullName
 
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/names/shortLabel
 
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/names/shortLabel
 +
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@refTypeAc
 
entrySet/entry/interactionList/interaction/confidenceList/confidence/value
 
entrySet/entry/interactionList/interaction/confidenceList/confidence/value
 
entrySet/entry/interactionList/interaction/experimentList/experimentRef
 
entrySet/entry/interactionList/interaction/experimentList/experimentRef
 
entrySet/entry/interactionList/interaction/interactionType/names/fullName
 
entrySet/entry/interactionList/interaction/interactionType/names/fullName
 
entrySet/entry/interactionList/interaction/interactionType/names/shortLabel
 
entrySet/entry/interactionList/interaction/interactionType/names/shortLabel
 +
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@db
 +
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@id
 +
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@refType
 +
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@refTypeAc
 
entrySet/entry/interactionList/interaction/intraMolecular
 
entrySet/entry/interactionList/interaction/intraMolecular
 
entrySet/entry/interactionList/interaction/modelled
 
entrySet/entry/interactionList/interaction/modelled
 
entrySet/entry/interactionList/interaction/names/shortLabel
 
entrySet/entry/interactionList/interaction/names/shortLabel
 
entrySet/entry/interactionList/interaction/negative
 
entrySet/entry/interactionList/interaction/negative
 +
entrySet/entry/interactionList/interaction/participantList/participant/@id
 
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/names/shortLabel
 
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/names/shortLabel
 +
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@refTypeAc
 
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/names/shortLabel
 
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/names/shortLabel
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@refTypeAc
 
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/names/shortLabel
 
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/names/shortLabel
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/@id
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/names/shortLabel
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/names/shortLabel
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@refTypeAc
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/isLink
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/isLink
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/names/shortLabel
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/names/shortLabel
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@refTypeAc
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/names/shortLabel
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/names/shortLabel
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@refTypeAc
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/names/shortLabel
 
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/names/shortLabel
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@refTypeAc
 
entrySet/entry/interactionList/interaction/participantList/participant/interactorRef
 
entrySet/entry/interactionList/interaction/participantList/participant/interactorRef
 
entrySet/entry/interactionList/interaction/participantList/participant/names/shortLabel
 
entrySet/entry/interactionList/interaction/participantList/participant/names/shortLabel
 
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/alias
 
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/alias
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/alias/@type
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/alias/@typeAc
 
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/fullName
 
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/shortLabel
 
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/shortLabel
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactionList/interaction/xref/primaryRef/@db
 +
entrySet/entry/interactionList/interaction/xref/primaryRef/@dbAc
 +
entrySet/entry/interactionList/interaction/xref/primaryRef/@id
 +
entrySet/entry/interactionList/interaction/xref/primaryRef/@refType
 +
entrySet/entry/interactionList/interaction/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactorList/interactor/@id
 
entrySet/entry/interactorList/interactor/attributeList/attribute
 
entrySet/entry/interactorList/interactor/attributeList/attribute
 +
entrySet/entry/interactorList/interactor/attributeList/attribute/@name
 +
entrySet/entry/interactorList/interactor/attributeList/attribute/@nameAc
 
entrySet/entry/interactorList/interactor/interactorType/names/fullName
 
entrySet/entry/interactorList/interactor/interactorType/names/fullName
 
entrySet/entry/interactorList/interactor/interactorType/names/shortLabel
 
entrySet/entry/interactorList/interactor/interactorType/names/shortLabel
 +
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@db
 +
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@dbAc
 +
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@id
 +
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@refType
 +
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@db
 +
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@dbAc
 +
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@id
 +
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@refType
 +
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@refTypeAc
 
entrySet/entry/interactorList/interactor/names/alias
 
entrySet/entry/interactorList/interactor/names/alias
 +
entrySet/entry/interactorList/interactor/names/alias/@type
 +
entrySet/entry/interactorList/interactor/names/alias/@typeAc
 
entrySet/entry/interactorList/interactor/names/fullName
 
entrySet/entry/interactorList/interactor/names/fullName
 
entrySet/entry/interactorList/interactor/names/shortLabel
 
entrySet/entry/interactorList/interactor/names/shortLabel
 +
entrySet/entry/interactorList/interactor/organism/@ncbiTaxId
 
entrySet/entry/interactorList/interactor/organism/names/fullName
 
entrySet/entry/interactorList/interactor/organism/names/fullName
 
entrySet/entry/interactorList/interactor/organism/names/shortLabel
 
entrySet/entry/interactorList/interactor/organism/names/shortLabel
 
entrySet/entry/interactorList/interactor/sequence
 
entrySet/entry/interactorList/interactor/sequence
 +
entrySet/entry/interactorList/interactor/xref/primaryRef/@db
 +
entrySet/entry/interactorList/interactor/xref/primaryRef/@dbAc
 +
entrySet/entry/interactorList/interactor/xref/primaryRef/@id
 +
entrySet/entry/interactorList/interactor/xref/primaryRef/@refType
 +
entrySet/entry/interactorList/interactor/xref/primaryRef/@refTypeAc
 +
entrySet/entry/interactorList/interactor/xref/primaryRef/@version
 +
entrySet/entry/interactorList/interactor/xref/secondaryRef/@db
 +
entrySet/entry/interactorList/interactor/xref/secondaryRef/@dbAc
 +
entrySet/entry/interactorList/interactor/xref/secondaryRef/@id
 +
entrySet/entry/interactorList/interactor/xref/secondaryRef/@refType
 +
entrySet/entry/interactorList/interactor/xref/secondaryRef/@refTypeAc
 +
entrySet/entry/interactorList/interactor/xref/secondaryRef/@secondary
 +
entrySet/entry/interactorList/interactor/xref/secondaryRef/@version
 +
entrySet/entry/source/@releaseDate
 
entrySet/entry/source/attributeList/attribute
 
entrySet/entry/source/attributeList/attribute
 +
entrySet/entry/source/attributeList/attribute/@name
 +
entrySet/entry/source/attributeList/attribute/@nameAc
 
entrySet/entry/source/names/fullName
 
entrySet/entry/source/names/fullName
 
entrySet/entry/source/names/shortLabel
 
entrySet/entry/source/names/shortLabel
 +
entrySet/entry/source/xref/primaryRef/@db
 +
entrySet/entry/source/xref/primaryRef/@dbAc
 +
entrySet/entry/source/xref/primaryRef/@id
 +
entrySet/entry/source/xref/primaryRef/@refType
 +
entrySet/entry/source/xref/primaryRef/@refTypeAc
 +
entrySet/entry/source/xref/primaryRef/@secondary
 +
entrySet/entry/source/xref/secondaryRef/@db
 +
entrySet/entry/source/xref/secondaryRef/@dbAc
 +
entrySet/entry/source/xref/secondaryRef/@id
 +
entrySet/entry/source/xref/secondaryRef/@refType
 +
entrySet/entry/source/xref/secondaryRef/@refTypeAc
 
</pre>
 
</pre>
  
Line 103: Line 295:
 
|-
 
|-
 
| Interaction contains experiment and interactor/participant
 
| Interaction contains experiment and interactor/participant
| BIND Translation, CORUM, InnateDB, MPACT, MPPI
+
| BIND Translation, CORUM, InnateDB, MatrixDB, MPACT, MPPI
 
| InnateDB provides apparently redundant lists of experiments and interactors<br>MPPI uses proteinParticipant, proteinInteractor
 
| InnateDB provides apparently redundant lists of experiments and interactors<br>MPPI uses proteinParticipant, proteinInteractor
 
|}
 
|}
Line 111: Line 303:
 
The mapper files already in existence can be reviewed by using the <tt>show_xml_paths.py</tt> script together with one of these files which reside in the <tt>mapper</tt> subdirectory of the <tt>iRef_PSI_XML2RDBMS</tt> directory. For example:
 
The mapper files already in existence can be reviewed by using the <tt>show_xml_paths.py</tt> script together with one of these files which reside in the <tt>mapper</tt> subdirectory of the <tt>iRef_PSI_XML2RDBMS</tt> directory. For example:
  
  python show_xml_paths.py --mapper mapper/Map25_CORUM.xml
+
  python tools/show_xml_paths.py --mapper mapper/Map25_CORUM.xml
  
 
The resulting output describes the structure of the data and how the mapper will attempt to interpret that data. For example (for CORUM):
 
The resulting output describes the structure of the data and how the mapper will attempt to interpret that data. For example (for CORUM):
Line 119: Line 311:
 
   Table int_name ...
 
   Table int_name ...
 
     _euid ...
 
     _euid ...
       entry/interactionList/interaction/experimentList/experimentDescription
+
       <incremental>
 
     _idetlbl ...
 
     _idetlbl ...
 
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
 
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
Line 125: Line 317:
 
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullname
 
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullname
 
     _idetncat ...
 
     _idetncat ...
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
+
       24
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/alias
+
       25
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullname
+
       25
 
   Table int_xref ...
 
   Table int_xref ...
 
     _euid ...
 
     _euid ...
       entry/interactionList/interaction/experimentList/experimentDescription
+
       <incremental>
 
     _brefdb ...
 
     _brefdb ...
 
       entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@db
 
       entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@db
       entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@db
+
       entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef/@db
 
     _brefid ...
 
     _brefid ...
 
       entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@id
 
       entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@id
 
       entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef/@id
 
       entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef/@id
 
     _brefct ...
 
     _brefct ...
       entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef
+
       4
       entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef
+
       5
 
   Table int_xref ...
 
   Table int_xref ...
 
     _euid ...
 
     _euid ...
       entry/interactionList/interaction/experimentList/experimentDescription
+
       <incremental>
 
     _idetdb ...
 
     _idetdb ...
 
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@db
 
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@db
Line 150: Line 342:
 
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@id
 
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@id
 
     _idetct ...
 
     _idetct ...
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef
+
       6
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef
+
       7
 
Element experimentList ...
 
Element experimentList ...
 
   Table int_experiment ...
 
   Table int_experiment ...
 
     _euidr ...
 
     _euidr ...
       entry/interactionList/interaction/experimentList/experimentDescription/bibref
+
       _euid
 
     _iuider ...
 
     _iuider ...
       entry/interactionList/interaction/experimentList/experimentDescription/bibref
+
       _iuid
 
Element interaction ...
 
Element interaction ...
 
   Table int_name ...
 
   Table int_name ...
 
     _iuid ...
 
     _iuid ...
       entry/interactionList/interaction
+
       <incremental>
 
     _iuiflnm ...
 
     _iuiflnm ...
 
       entry/interactionList/interaction/names/fullName
 
       entry/interactionList/interaction/names/fullName
 
     _iuiflnmct ...
 
     _iuiflnmct ...
       entry/interactionList/interaction/names/fullName
+
       12
 
   Table int_source ...
 
   Table int_source ...
 
     _iuid ...
 
     _iuid ...
       entry/interactionList/interaction
+
       <incremental>
 
     _itp ...
 
     _itp ...
 
       entry/interactionList/interaction/xref
 
       entry/interactionList/interaction/xref
Line 177: Line 369:
 
   Table int_xref ...
 
   Table int_xref ...
 
     _iuid ...
 
     _iuid ...
       entry/interactionList/interaction
+
       <incremental>
 
     _idb ...
 
     _idb ...
 
       entry/interactionList/interaction/xref/primaryRef/@db
 
       entry/interactionList/interaction/xref/primaryRef/@db
Line 183: Line 375:
 
       entry/interactionList/interaction/xref/primaryRef/@id
 
       entry/interactionList/interaction/xref/primaryRef/@id
 
     _irefcat ...
 
     _irefcat ...
       entry/interactionList/interaction/xref/primaryRef
+
       0
 
Element participant ...
 
Element participant ...
 
   Table int_name ...
 
   Table int_name ...
 
     _ouid ...
 
     _ouid ...
       entry/interactionList/interaction/participantList/participant/interactor
+
       <incremental>
 
     _olb ...
 
     _olb ...
 
       entry/interactionList/interaction/participantList/participant/interactor/names/shortLabel
 
       entry/interactionList/interaction/participantList/participant/interactor/names/shortLabel
Line 193: Line 385:
 
       entry/interactionList/interaction/participantList/participant/interactor/names/fullName
 
       entry/interactionList/interaction/participantList/participant/interactor/names/fullName
 
     _olbct ...
 
     _olbct ...
       entry/interactionList/interaction/participantList/participant/interactor/names/shortLabel
+
       13
       entry/interactionList/interaction/participantList/participant/interactor/names/alias
+
       14
       entry/interactionList/interaction/participantList/participant/interactor/names/fullName
+
       15
 
   Table int_object ...
 
   Table int_object ...
 
     _ouid ...
 
     _ouid ...
       entry/interactionList/interaction/participantList/participant/interactor
+
       <incremental>
 
     _oltyp ...
 
     _oltyp ...
 
       entry/interactionList/interaction/participantList/participant/interactor/interactorType/names/shortLabel
 
       entry/interactionList/interaction/participantList/participant/interactor/interactorType/names/shortLabel
Line 207: Line 399:
 
   Table int_sequence ...
 
   Table int_sequence ...
 
     _ouid ...
 
     _ouid ...
       entry/interactionList/interaction/participantList/participant/interactor
+
       <incremental>
 
     _obsq ...
 
     _obsq ...
 
       entry/interactionList/interaction/participantList/participant/interactor/sequence
 
       entry/interactionList/interaction/participantList/participant/interactor/sequence
 
   Table int_xref ...
 
   Table int_xref ...
 
     _ouid ...
 
     _ouid ...
       entry/interactionList/interaction/participantList/participant/interactor
+
       <incremental>
 
     _odb ...
 
     _odb ...
 
       entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@db
 
       entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@db
Line 220: Line 412:
 
       entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@id
 
       entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@id
 
     _oicat ...
 
     _oicat ...
       entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef
+
       2
       entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef
+
       3
 
     _otax ...
 
     _otax ...
 
       entry/interactionList/interaction/participantList/participant/interactor/organism/@ncbiTaxId
 
       entry/interactionList/interaction/participantList/participant/interactor/organism/@ncbiTaxId
Line 230: Line 422:
 
   Table int_source2object ...
 
   Table int_source2object ...
 
     _iuidr ...
 
     _iuidr ...
       entry/interactionList/interaction/participantList
+
       _iuid
 
     _what ...
 
     _what ...
 
       entry/interactionList/interaction/participantList/participant/interactor/names
 
       entry/interactionList/interaction/participantList/participant/interactor/names
Line 238: Line 430:
 
       entry/interactionList/interaction/participantList/participant/interactor/names
 
       entry/interactionList/interaction/participantList/participant/interactor/names
 
     _refob ...
 
     _refob ...
       entry/interactionList/interaction/participantList/participant/interactor/names
+
      _ouid
 +
</pre>
 +
 
 +
=== Adapting an Existing Mapper File ===
 +
 
 +
Given an analysis of the data and the identification of the data's "subformat" (explained above), it should be possible to take an existing mapper file which supports the same subformat and to modify it to understand the new data source. For example, the InnateDB data resembles the data of various other sources (listed in the table above), and some comparisons of the structure of the data can be performed to see which source is closest in structure to InnateDB by using a <tt>diff</tt>-like program, potentially a graphical program such as <tt>kompare</tt> or <tt>kdiff3</tt>.
 +
 
 +
Once a similar source has been identified, the corresponding mapper file can be copied and modified. For example:
 +
 
 +
cp mapper/Map25_CORUM.xml mapper/Map25_InnateDB.xml
 +
 
 +
Then, it is necessary to update the new mapper file with details that differ from those in the closest source. In some cases, it can also be useful to consult other mapper files. For example, the following path may not be present in the CORUM mapper file despite being provided by the data file:
 +
 
 +
entry/experimentList/experimentDescription/hostOrganismList/hostOrganism
 +
 
 +
However, such information can be used by iRefIndex and may be supported by other mapper files. We may therefore decide to incorporate such information into our new mapper file (and perhaps into CORUM's mapper file, too). To do so, we first inspect other mapper files for the presence of such information and then isolate the section which supports its retrieval. For example, from the IntAct mapper file, using the <tt>show_xml_paths.py</tt> script...
 +
 
 +
python show_xml_paths.py --mapper mapper/Map25_INTACT_MINT_BIOGRID.xml --verbose
 +
 
 +
Here, the <tt>--verbose</tt> flag provides identifier information which makes finding the element, table and mapping definitions easier:
 +
 
 +
<pre>
 +
Element experimentDescription (grouper/id=3)...
 +
 
 +
  [...]
 +
 
 +
  Table int_name (sqlref/id=23)...
 +
    _euid ...
 +
      <incremental> (provides experimentDescription)
 +
    _exorg ...
 +
      entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/@ncbiTaxId
 +
    _exorgc ...
 +
      38
 +
</pre>
 +
 
 +
From the above, it becomes apparent that we need a <tt>group</tt> definition for <tt>elementDescription</tt>, and that <tt>int_name</tt> must be populated according to the stated mappings for <tt>_euid</tt>, <tt>_exorg</tt> and <tt>_exorgc</tt>.
 +
 
 +
Fortunately, a <tt>group</tt> definition already exists in our new mapper file:
 +
 
 +
<pre>
 +
        <group  id ="3" element ="experimentDescription" parpos="4" atrib="_AUTO_">
 +
            <path></path>
 +
            <ref choice="no" />
 +
        </group>
 +
</pre>
 +
 
 +
Meanwhile, the following table modifying section (corresponding to the identifier <tt>23</tt>) must be located in the IntAct mapper file:
 +
 
 +
<pre>
 +
        <sql id="23" userefs="no" >
 +
            <stmt>INSERT INTO int_name(uid,name,category) VALUES ('_euid','_exorg','_exorgc');</stmt>
 +
            <variablelist>
 +
                <variable name="_euid" ></variable>
 +
                <variable name="_exorg"></variable>
 +
                <variable name="_exorgc"></variable>
 +
            </variablelist>
 +
        </sql>
 +
</pre>
 +
 
 +
Since no conflicting section (with the same identifier for an <tt>sql</tt> element) exists in the new mapper file, this can be copied without changes. Then, it is necessary to ensure that mappings for <tt>_euid</tt>, <tt>_exorg</tt> and <tt>_exorgc</tt> are present; the following mappings happen to be found in the IntAct mapper file:
 +
 
 +
<pre>
 +
        <map id="42" sqlref="23"  name="_euid" grouper="3">
 +
            <instruct>
 +
                <param choice="yes" />
 +
            </instruct>
 +
        </map>
 +
        <map id="43" sqlref="23"  name="_exorg" grouper="3">
 +
            <instruct>
 +
                <readfromfile choice="yes">
 +
                    <path variable="_exorg" groupTag="ex" usetext="no" attribute="ncbiTaxId">entry,experimentList,experimentDescription,hostOrganismList,hostOrganism</path>
 +
                </readfromfile>
 +
            </instruct>
 +
        </map>
 +
        <map id="43" sqlref="23"  name="_exorgc" grouper="3">
 +
            <instruct>
 +
                <readfromfile choice="yes">
 +
                    <path variable="_exorgc" groupTag="ex" usetext="no"  prefix="yes" val="38">entry,experimentList,experimentDescription,hostOrganismList,hostOrganism</path>
 +
                </readfromfile>
 +
            </instruct>
 +
        </map>
 +
</pre>
 +
 
 +
Although <tt>_euid</tt> is a known name in the new mapper file already, no mapper definition exists to connect it to the table modification section (with identifier <tt>23</tt>). Thus, it is necessary to copy the above mappings, to adjust their identifiers to avoid conflicts with other mappings, and to make sure that the <tt>grouper</tt> attributes refer to the correct group definition for <tt>experimentDescription</tt>. Fortunately, no such adjustments are required in this case and the definitions can be copied directly.
 +
 
 +
=== Choosing Sources of Data ===
 +
 
 +
In the above example, the <tt>hostOrganism</tt> information was extracted from the separate <tt>experimentList</tt>, but in order to maintain consistency with the other sources in the data file, we may choose a different source in the <tt>interactionList</tt>, particularly since the information appears to be duplicated in that location. Thus, a mapping is required that can extract data from the following path:
 +
 
 +
entry/interactionList/interaction/experimentList/experimentDescription/hostOrganismList/hostOrganism
 +
 
 +
Even if a mapping does not exist for the above path, there may be mappings involving similar paths:
 +
 
 +
entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
 +
 
 +
In this case, a different mapper file should be chosen to provide a suitable definition:
 +
 
 +
python show_xml_paths.py --mapper mapper/Map25_DIP.xml --verbose
 +
 
 +
A suitable section of that file can be summarised as follows:
 +
 
 +
<pre>
 +
Element experimentDescription (grouper/id=3)...
 +
 
 +
  [...]
 +
 
 +
  Table int_name (sqlref/id=15)...
 +
    _euid ...
 +
      <incremental> (provides experimentDescription)
 +
    _idetlbl ...
 +
      (intdetectionshortLabel) ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
 +
      (intdetectionalias) ...
 +
       entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/alias
 +
      (intdtfull) ...
 +
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullName
 +
    _idetncat ...
 +
      (intdetectionshlacategory) ...
 +
      24
 +
      (intdetectionoblac) ...
 +
      25
 +
      (indtcatful) ...
 +
      26
 +
</pre>
 +
 
 +
The definitions involved can then be incorporated into the new mapper file, adjusting identifiers appropriately. Although the <tt>hostOrganism</tt> data in the above example originates from a different place in the element hierarchy, a simple path alteration to point to a location in the <tt>interactionList</tt> hierarchy is probably sufficient to make the initial mapping definitions consistent with the newly incorporated definitions for <tt>interactionDetectionMethod</tt> - the <tt>grouper</tt> identifiers for the mapping definitions refer to <tt>experimentDescription</tt> in all cases.
 +
 
 +
=== Defining the Database Identifier ===
 +
 
 +
A new database identifier is required for new data sources. Source identifiers are defined in each data source's configuration file in the <tt>source</tt> element. For example, for InnateDB:
 +
 
 +
<pre>
 +
    <specs>
 +
        <source>InnateDB</source>
 +
        <filetype>.xml</filetype>
 +
    </specs>
 
</pre>
 
</pre>
 +
 +
These identifiers are then mapped to database identifiers in the iRefIndex <tt>int_db</tt> table. Thus, an <tt>int_db</tt> record must be defined, assigning a database identifier (an integer) which corresponds to this source identifier.
 +
 +
To define a new database identifier the <tt>Create_iRefIndex.sql</tt> file, which resides in the <tt>SQL</tt> directory of the <tt>BioPSI_Suplimenter</tt> software distribution, must be modified, adding a new statement as follows:
 +
 +
INSERT INTO int_db(id,name) VALUES(<database identifier>,'<source identifier>');
 +
 +
For example:
 +
 +
INSERT INTO int_db(id,name) VALUES(178,'InnateDB');
 +
 +
=== Re-running the Parser ===
 +
 +
If data files have already been parsed, but the development process dictates that they be parsed again, potentially to populate a test database, it is necessary to delete various files which are written to the filesystem by the parser in order to prevent repeated parsing of data. These files are called <tt>lastUpdate.obj</tt> and contain information about previous parsing operations.
 +
 +
find /home/irefindex/data -name lastUpdate.obj | xargs rm
 +
 +
=== Modifying Generated Data ===
 +
 +
New sources require changes to some generated data tables and to the programs that populate them:
 +
 +
* In the definition of the <tt>cy_edgeatrib_canonical</tt> table in the <tt>make_canonical_tables.sql</tt> file in the <tt>SQL_commands</tt> directory, a new column is required for each newly defined source.
 +
* The <tt>src/process/no/uio/biotek/Make_Cy_tables.java</tt> file in <tt>BioPSI_Suplimenter</tt> needs to be changed so that the <tt>popCombine_edge</tt> method has a set of statements populating the new column created in the <tt>cy_edgeatrib_canonical</tt> table, and the <tt>src/process/no/uio/biotek/PreProcess_process.java</tt> file also needs changing so that the definition of the <tt>cy_edgeatrib</tt> table in the <tt>commit</tt> method includes a new column for each newly defined source.
 +
* The <tt>make_iRefWeb.sql</tt> file in the <tt>SQL_commands</tt> directory requires a statement defining each new source in the <tt>source_db</tt> table.
  
 
== All iRefIndex Pages ==
 
== All iRefIndex Pages ==

Latest revision as of 16:04, 26 October 2012

NoteNote

This page describes development processes related to the code supporting iRefIndex release 9 and earlier.

See iRefIndex Issues and Notes for details of ongoing work to improve the iRefIndex software.

Adding Sources to iRefIndex

  1. Identify the location of the downloaded data.
  2. Evaluate the form of the data:
    • For PSI MI XML (Molecular Interaction XML) documents, check the version of the format employed by the data documents.
    • For the specific version, review the format's schema and how the data uses the schema. For example, PSI MI XML permits the specification of interactors within interaction descriptions as well as in a separate interactor list.
    • For MITAB files, see iRefIndex MITAB Mapping.
  3. Review existing, similar mapper definition files.

Evaluating the Data

The show_xml_paths.py script in the tools directory within the iRef_PSI_XML2RDBMS directory can be used to show the different element paths used in an XML data file to hold data items. For example:

python tools/show_xml_paths.py --data /home/irefindex/data/MINT/2010-09-14/10023771.psi25.xml

The resulting list of paths indicates the places in the element hierarchy of a PSI-MI XML file where information is actually stored. For example:

entrySet/@level
entrySet/@minorVersion
entrySet/@version
entrySet/@xmlns
entrySet/@xmlns:xsi
entrySet/@xsi:schemaLocation
entrySet/entry/experimentList/experimentDescription/@id
entrySet/entry/experimentList/experimentDescription/attributeList/attribute
entrySet/entry/experimentList/experimentDescription/attributeList/attribute/@name
entrySet/entry/experimentList/experimentDescription/attributeList/attribute/@nameAc
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@db
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@dbAc
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@id
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@refType
entrySet/entry/experimentList/experimentDescription/bibref/xref/primaryRef/@refTypeAc
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/@ncbiTaxId
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/names/fullName
entrySet/entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/names/shortLabel
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/alias
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/alias/@type
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/alias/@typeAc
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/fullName
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@db
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@dbAc
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@id
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@refType
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@refTypeAc
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@db
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@dbAc
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@id
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@refType
entrySet/entry/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@refTypeAc
entrySet/entry/experimentList/experimentDescription/names/fullName
entrySet/entry/experimentList/experimentDescription/names/shortLabel
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@db
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@dbAc
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@id
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@refType
entrySet/entry/experimentList/experimentDescription/xref/primaryRef/@refTypeAc
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@db
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@dbAc
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@id
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@refType
entrySet/entry/experimentList/experimentDescription/xref/secondaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/@id
entrySet/entry/interactionList/interaction/attributeList/attribute
entrySet/entry/interactionList/interaction/attributeList/attribute/@name
entrySet/entry/interactionList/interaction/attributeList/attribute/@nameAc
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/names/fullName
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/names/shortLabel
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/confidenceList/confidence/unit/xref/primaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/confidenceList/confidence/value
entrySet/entry/interactionList/interaction/experimentList/experimentRef
entrySet/entry/interactionList/interaction/interactionType/names/fullName
entrySet/entry/interactionList/interaction/interactionType/names/shortLabel
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/interactionType/xref/primaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@db
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@dbAc
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@id
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@refType
entrySet/entry/interactionList/interaction/interactionType/xref/secondaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/intraMolecular
entrySet/entry/interactionList/interaction/modelled
entrySet/entry/interactionList/interaction/names/shortLabel
entrySet/entry/interactionList/interaction/negative
entrySet/entry/interactionList/interaction/participantList/participant/@id
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/primaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/biologicalRole/xref/secondaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/primaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/experimentalPreparationList/experimentalPreparation/xref/secondaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/primaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/experimentalRoleList/experimentalRole/xref/secondaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/@id
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/primaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/endStatus/xref/secondaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/isLink
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/primaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureRangeList/featureRange/startStatus/xref/secondaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/primaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/featureType/xref/secondaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/featureList/feature/xref/primaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/interactorRef
entrySet/entry/interactionList/interaction/participantList/participant/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/alias
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/alias/@type
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/alias/@typeAc
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/fullName
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/names/shortLabel
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/primaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/participantIdentificationMethodList/participantIdentificationMethod/xref/secondaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/participantList/participant/xref/primaryRef/@refTypeAc
entrySet/entry/interactionList/interaction/xref/primaryRef/@db
entrySet/entry/interactionList/interaction/xref/primaryRef/@dbAc
entrySet/entry/interactionList/interaction/xref/primaryRef/@id
entrySet/entry/interactionList/interaction/xref/primaryRef/@refType
entrySet/entry/interactionList/interaction/xref/primaryRef/@refTypeAc
entrySet/entry/interactorList/interactor/@id
entrySet/entry/interactorList/interactor/attributeList/attribute
entrySet/entry/interactorList/interactor/attributeList/attribute/@name
entrySet/entry/interactorList/interactor/attributeList/attribute/@nameAc
entrySet/entry/interactorList/interactor/interactorType/names/fullName
entrySet/entry/interactorList/interactor/interactorType/names/shortLabel
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@db
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@dbAc
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@id
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@refType
entrySet/entry/interactorList/interactor/interactorType/xref/primaryRef/@refTypeAc
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@db
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@dbAc
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@id
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@refType
entrySet/entry/interactorList/interactor/interactorType/xref/secondaryRef/@refTypeAc
entrySet/entry/interactorList/interactor/names/alias
entrySet/entry/interactorList/interactor/names/alias/@type
entrySet/entry/interactorList/interactor/names/alias/@typeAc
entrySet/entry/interactorList/interactor/names/fullName
entrySet/entry/interactorList/interactor/names/shortLabel
entrySet/entry/interactorList/interactor/organism/@ncbiTaxId
entrySet/entry/interactorList/interactor/organism/names/fullName
entrySet/entry/interactorList/interactor/organism/names/shortLabel
entrySet/entry/interactorList/interactor/sequence
entrySet/entry/interactorList/interactor/xref/primaryRef/@db
entrySet/entry/interactorList/interactor/xref/primaryRef/@dbAc
entrySet/entry/interactorList/interactor/xref/primaryRef/@id
entrySet/entry/interactorList/interactor/xref/primaryRef/@refType
entrySet/entry/interactorList/interactor/xref/primaryRef/@refTypeAc
entrySet/entry/interactorList/interactor/xref/primaryRef/@version
entrySet/entry/interactorList/interactor/xref/secondaryRef/@db
entrySet/entry/interactorList/interactor/xref/secondaryRef/@dbAc
entrySet/entry/interactorList/interactor/xref/secondaryRef/@id
entrySet/entry/interactorList/interactor/xref/secondaryRef/@refType
entrySet/entry/interactorList/interactor/xref/secondaryRef/@refTypeAc
entrySet/entry/interactorList/interactor/xref/secondaryRef/@secondary
entrySet/entry/interactorList/interactor/xref/secondaryRef/@version
entrySet/entry/source/@releaseDate
entrySet/entry/source/attributeList/attribute
entrySet/entry/source/attributeList/attribute/@name
entrySet/entry/source/attributeList/attribute/@nameAc
entrySet/entry/source/names/fullName
entrySet/entry/source/names/shortLabel
entrySet/entry/source/xref/primaryRef/@db
entrySet/entry/source/xref/primaryRef/@dbAc
entrySet/entry/source/xref/primaryRef/@id
entrySet/entry/source/xref/primaryRef/@refType
entrySet/entry/source/xref/primaryRef/@refTypeAc
entrySet/entry/source/xref/primaryRef/@secondary
entrySet/entry/source/xref/secondaryRef/@db
entrySet/entry/source/xref/secondaryRef/@dbAc
entrySet/entry/source/xref/secondaryRef/@id
entrySet/entry/source/xref/secondaryRef/@refType
entrySet/entry/source/xref/secondaryRef/@refTypeAc

With this information, a suitable mapper file can be identified for the conversion of the XML-encoded data into tabular data to be stored in a database. In the above example, it is apparent that the experiment, interaction and interactor details reside alongside each other within each entry element:

entrySet/entry/experimentList/experimentDescription
entrySet/entry/interactionList/interaction
entrySet/entry/interactionList/interaction/participantList/participant
entrySet/entry/interactorList/interactor

In contrast, other PSI-MI XML files adopt a different structure which can be reduced to the following:

entrySet/entry/interactionList/interaction
entrySet/entry/interactionList/interaction/experimentList/experimentDescription
entrySet/entry/interactionList/interaction/participantList/participant

The different sources can be divided into a number of subformats as follows:

Subformat Sources Notes
Separate experiment, interaction, interactor lists BioGRID, HPRD, IntAct, MINT, OPHID BioGRID uses proteininteractor instead of interactor
OPHID uses proteinParticipant, proteinInteractor
Interaction contains experiment; separate interactor list DIP
Interaction contains experiment and interactor/participant BIND Translation, CORUM, InnateDB, MatrixDB, MPACT, MPPI InnateDB provides apparently redundant lists of experiments and interactors
MPPI uses proteinParticipant, proteinInteractor

Reviewing Mapper Files

The mapper files already in existence can be reviewed by using the show_xml_paths.py script together with one of these files which reside in the mapper subdirectory of the iRef_PSI_XML2RDBMS directory. For example:

python tools/show_xml_paths.py --mapper mapper/Map25_CORUM.xml

The resulting output describes the structure of the data and how the mapper will attempt to interpret that data. For example (for CORUM):

Element experimentDescription ...
  Table int_name ...
    _euid ...
      <incremental>
    _idetlbl ...
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/alias
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullname
    _idetncat ...
      24
      25
      25
  Table int_xref ...
    _euid ...
      <incremental>
    _brefdb ...
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@db
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef/@db
    _brefid ...
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/primaryRef/@id
      entry/interactionList/interaction/experimentList/experimentDescription/bibref/xref/secondaryRef/@id
    _brefct ...
      4
      5
  Table int_xref ...
    _euid ...
      <incremental>
    _idetdb ...
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@db
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@db
    _idetid ...
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/primaryRef/@id
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/xref/secondaryRef/@id
    _idetct ...
      6
      7
Element experimentList ...
  Table int_experiment ...
    _euidr ...
      _euid
    _iuider ...
      _iuid
Element interaction ...
  Table int_name ...
    _iuid ...
      <incremental>
    _iuiflnm ...
      entry/interactionList/interaction/names/fullName
    _iuiflnmct ...
      12
  Table int_source ...
    _iuid ...
      <incremental>
    _itp ...
      entry/interactionList/interaction/xref
    _isrc ...
      entry/interactionList/interaction/xref
    _ifle ...
      entry/interactionList/interaction/xref
  Table int_xref ...
    _iuid ...
      <incremental>
    _idb ...
      entry/interactionList/interaction/xref/primaryRef/@db
    _iref ...
      entry/interactionList/interaction/xref/primaryRef/@id
    _irefcat ...
      0
Element participant ...
  Table int_name ...
    _ouid ...
      <incremental>
    _olb ...
      entry/interactionList/interaction/participantList/participant/interactor/names/shortLabel
      entry/interactionList/interaction/participantList/participant/interactor/names/alias
      entry/interactionList/interaction/participantList/participant/interactor/names/fullName
    _olbct ...
      13
      14
      15
  Table int_object ...
    _ouid ...
      <incremental>
    _oltyp ...
      entry/interactionList/interaction/participantList/participant/interactor/interactorType/names/shortLabel
    _osrc ...
      entry/interactionList/interaction/participantList/participant/interactor/names
    _ofil ...
      entry/interactionList/interaction/participantList/participant/interactor/names
  Table int_sequence ...
    _ouid ...
      <incremental>
    _obsq ...
      entry/interactionList/interaction/participantList/participant/interactor/sequence
  Table int_xref ...
    _ouid ...
      <incremental>
    _odb ...
      entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@db
      entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@db
    _orefid ...
      entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@id
      entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@id
    _oicat ...
      2
      3
    _otax ...
      entry/interactionList/interaction/participantList/participant/interactor/organism/@ncbiTaxId
    _otp ...
      entry/interactionList/interaction/participantList/participant/interactor/xref/primaryRef/@refType
      entry/interactionList/interaction/participantList/participant/interactor/xref/secondaryRef/@refType
Element participantList ...
  Table int_source2object ...
    _iuidr ...
      _iuid
    _what ...
      entry/interactionList/interaction/participantList/participant/interactor/names
    _isrcr ...
      entry/interactionList/interaction/participantList/participant/interactor/names
    _ifler ...
      entry/interactionList/interaction/participantList/participant/interactor/names
    _refob ...
      _ouid

Adapting an Existing Mapper File

Given an analysis of the data and the identification of the data's "subformat" (explained above), it should be possible to take an existing mapper file which supports the same subformat and to modify it to understand the new data source. For example, the InnateDB data resembles the data of various other sources (listed in the table above), and some comparisons of the structure of the data can be performed to see which source is closest in structure to InnateDB by using a diff-like program, potentially a graphical program such as kompare or kdiff3.

Once a similar source has been identified, the corresponding mapper file can be copied and modified. For example:

cp mapper/Map25_CORUM.xml mapper/Map25_InnateDB.xml

Then, it is necessary to update the new mapper file with details that differ from those in the closest source. In some cases, it can also be useful to consult other mapper files. For example, the following path may not be present in the CORUM mapper file despite being provided by the data file:

entry/experimentList/experimentDescription/hostOrganismList/hostOrganism

However, such information can be used by iRefIndex and may be supported by other mapper files. We may therefore decide to incorporate such information into our new mapper file (and perhaps into CORUM's mapper file, too). To do so, we first inspect other mapper files for the presence of such information and then isolate the section which supports its retrieval. For example, from the IntAct mapper file, using the show_xml_paths.py script...

python show_xml_paths.py --mapper mapper/Map25_INTACT_MINT_BIOGRID.xml --verbose

Here, the --verbose flag provides identifier information which makes finding the element, table and mapping definitions easier:

Element experimentDescription (grouper/id=3)...

  [...]

  Table int_name (sqlref/id=23)...
    _euid ...
      <incremental> (provides experimentDescription)
    _exorg ...
      entry/experimentList/experimentDescription/hostOrganismList/hostOrganism/@ncbiTaxId
    _exorgc ...
      38

From the above, it becomes apparent that we need a group definition for elementDescription, and that int_name must be populated according to the stated mappings for _euid, _exorg and _exorgc.

Fortunately, a group definition already exists in our new mapper file:

        <group  id ="3" element ="experimentDescription" parpos="4" atrib="_AUTO_">
            <path></path>
            <ref choice="no" />
        </group>

Meanwhile, the following table modifying section (corresponding to the identifier 23) must be located in the IntAct mapper file:

        <sql id="23" userefs="no" >
            <stmt>INSERT INTO int_name(uid,name,category) VALUES ('_euid','_exorg','_exorgc');</stmt>
            <variablelist>
                <variable name="_euid" ></variable>
                <variable name="_exorg"></variable>
                <variable name="_exorgc"></variable>
            </variablelist>
        </sql>

Since no conflicting section (with the same identifier for an sql element) exists in the new mapper file, this can be copied without changes. Then, it is necessary to ensure that mappings for _euid, _exorg and _exorgc are present; the following mappings happen to be found in the IntAct mapper file:

        <map id="42" sqlref="23"  name="_euid" grouper="3">
            <instruct>
                <param choice="yes" />
            </instruct>
        </map>
        <map id="43" sqlref="23"  name="_exorg" grouper="3">
            <instruct>
                <readfromfile choice="yes">
                    <path variable="_exorg" groupTag="ex" usetext="no" attribute="ncbiTaxId">entry,experimentList,experimentDescription,hostOrganismList,hostOrganism</path>
                </readfromfile>
            </instruct>
        </map>
        <map id="43" sqlref="23"  name="_exorgc" grouper="3">
            <instruct>
                <readfromfile choice="yes">
                    <path variable="_exorgc" groupTag="ex" usetext="no"  prefix="yes" val="38">entry,experimentList,experimentDescription,hostOrganismList,hostOrganism</path>
                </readfromfile>
            </instruct>
        </map>

Although _euid is a known name in the new mapper file already, no mapper definition exists to connect it to the table modification section (with identifier 23). Thus, it is necessary to copy the above mappings, to adjust their identifiers to avoid conflicts with other mappings, and to make sure that the grouper attributes refer to the correct group definition for experimentDescription. Fortunately, no such adjustments are required in this case and the definitions can be copied directly.

Choosing Sources of Data

In the above example, the hostOrganism information was extracted from the separate experimentList, but in order to maintain consistency with the other sources in the data file, we may choose a different source in the interactionList, particularly since the information appears to be duplicated in that location. Thus, a mapping is required that can extract data from the following path:

entry/interactionList/interaction/experimentList/experimentDescription/hostOrganismList/hostOrganism

Even if a mapping does not exist for the above path, there may be mappings involving similar paths:

entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel

In this case, a different mapper file should be chosen to provide a suitable definition:

python show_xml_paths.py --mapper mapper/Map25_DIP.xml --verbose

A suitable section of that file can be summarised as follows:

Element experimentDescription (grouper/id=3)...

  [...]

  Table int_name (sqlref/id=15)...
    _euid ...
      <incremental> (provides experimentDescription)
    _idetlbl ...
      (intdetectionshortLabel) ...
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/shortLabel
      (intdetectionalias) ...
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/alias
      (intdtfull) ...
      entry/interactionList/interaction/experimentList/experimentDescription/interactionDetectionMethod/names/fullName
    _idetncat ...
      (intdetectionshlacategory) ...
      24
      (intdetectionoblac) ...
      25
      (indtcatful) ...
      26

The definitions involved can then be incorporated into the new mapper file, adjusting identifiers appropriately. Although the hostOrganism data in the above example originates from a different place in the element hierarchy, a simple path alteration to point to a location in the interactionList hierarchy is probably sufficient to make the initial mapping definitions consistent with the newly incorporated definitions for interactionDetectionMethod - the grouper identifiers for the mapping definitions refer to experimentDescription in all cases.

Defining the Database Identifier

A new database identifier is required for new data sources. Source identifiers are defined in each data source's configuration file in the source element. For example, for InnateDB:

    <specs>
        <source>InnateDB</source>
        <filetype>.xml</filetype>
    </specs>

These identifiers are then mapped to database identifiers in the iRefIndex int_db table. Thus, an int_db record must be defined, assigning a database identifier (an integer) which corresponds to this source identifier.

To define a new database identifier the Create_iRefIndex.sql file, which resides in the SQL directory of the BioPSI_Suplimenter software distribution, must be modified, adding a new statement as follows:

INSERT INTO int_db(id,name) VALUES(<database identifier>,'<source identifier>');

For example:

INSERT INTO int_db(id,name) VALUES(178,'InnateDB');

Re-running the Parser

If data files have already been parsed, but the development process dictates that they be parsed again, potentially to populate a test database, it is necessary to delete various files which are written to the filesystem by the parser in order to prevent repeated parsing of data. These files are called lastUpdate.obj and contain information about previous parsing operations.

find /home/irefindex/data -name lastUpdate.obj | xargs rm

Modifying Generated Data

New sources require changes to some generated data tables and to the programs that populate them:

  • In the definition of the cy_edgeatrib_canonical table in the make_canonical_tables.sql file in the SQL_commands directory, a new column is required for each newly defined source.
  • The src/process/no/uio/biotek/Make_Cy_tables.java file in BioPSI_Suplimenter needs to be changed so that the popCombine_edge method has a set of statements populating the new column created in the cy_edgeatrib_canonical table, and the src/process/no/uio/biotek/PreProcess_process.java file also needs changing so that the definition of the cy_edgeatrib table in the commit method includes a new column for each newly defined source.
  • The make_iRefWeb.sql file in the SQL_commands directory requires a statement defining each new source in the source_db table.

All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).