Difference between revisions of "iRefIndex Output and Statistics"

From irefindex
(→‎Running PSI_MI_TAB_Maker: Added taxid2name statements.)
m (→‎Creating Additional Statistics: Changed a parameter name.)
 
(13 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
== PSI-MI Controlled Vocabulary Mapping ==
 
== PSI-MI Controlled Vocabulary Mapping ==
  
 +
{{Note|
 
'''Sabry to help document this part.'''
 
'''Sabry to help document this part.'''
 +
}}
 +
 +
In the data maintained by iRefIndex, various controlled vocabulary terms are used which do not match genuine terms defined in the molecular interaction ontology. As a result, a process is followed involving the extraction of such unrecognised terms, the curation of a mapping to replacement terms, and the processing of the maintained data to use the replacement terms.
 +
 +
Currently, the curation process is performed by assembling the unrecognised terms in a spreadsheet which is then modified, adding suggested replacements alongside the existing terms.
 +
 +
=== Creating a Mapping Wiki Page ===
 +
 +
A page summarising the mapping of unrecognised terms to known terms should be prepared such as the [[Mapping of terms to MI term ids - iRefIndex 8.0‎]] page.
 +
 +
The <tt>cv2wiki.py</tt> script needs to be obtained. Get the program's source code from this location:
 +
 +
* https://hfaistos.uio.no/cgi-bin/viewvc.cgi/bioscape/bioscape/modules/interaction/Sabry/cv2wiki.py
 +
 +
Using CVS with the appropriate <tt>CVSROOT</tt> setting, run the following command:
 +
 +
cvs co bioscape/bioscape/modules/interaction/Sabry/cv2wiki.py
 +
 +
The <tt>CVSROOT</tt> environment variable should be set to the following for this to work:
 +
 +
export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot
 +
 +
(The <tt><username></tt> should be replaced with your actual username.)
 +
 +
==== Running the Program ====
 +
 +
The program must be run on comma-separated value files exported from the curation spreadsheet. First, the "interaction type" and "interaction detection method" sheets must be individually exported using the following settings:
 +
 +
* The field delimiter is defined to be the comma (<tt>,</tt>) character
 +
* Field quoting is done using the double-quote (<tt>"</tt>) character
 +
 +
With exported files defined, for example, as <tt>cv_int_type.csv</tt> and <tt>cv_int_det_method.csv</tt> for the "interaction type" and "interaction detection method" sheet files respectively, the following command can then be run:
 +
 +
python cv2wiki.py cv_int_type.csv cv_int_det_method.csv MediaWiki
 +
 +
The output in the above example will be written to standard output (the terminal/console). To write to a file, add a filename as an argument to the program. For example:
 +
 +
python cv2wiki.py cv_int_type.csv cv_int_det_method.csv MediaWiki CVMapping
 +
 +
This file could potentially be uploaded to the Wiki using a tool suitable for this purpose. For a Wiki such as MoinMoin (also supported by the program), the file could potentially be copied into place with a certain degree of care.
  
 
== Building PSI_MI_TAB_Maker ==
 
== Building PSI_MI_TAB_Maker ==
Line 59: Line 100:
  
 
   <pre>/biotek/dias/donaldson3/iRefIndex/External_libraries</pre></li>
 
   <pre>/biotek/dias/donaldson3/iRefIndex/External_libraries</pre></li>
 
<li>Edit the <tt>main</tt> method in the <tt>src/psi/no/uio/biotek/PSI_MI_Tab.java</tt> file and change the following variables:
 
 
  <ul>
 
  <li>Change the host, database, username and password used in the database connection. For example:
 
    <pre>Connect con = new Connect("myhost", "irefindex", "irefindex", "mysecretpassword");</pre></li>
 
  <li>Change the output files. For example:
 
<pre>
 
LogAppend log = new LogAppend("/home/irefindex/logs/" + date + "PSI_tab_writer.log");
 
String file = "/home/irefindex/output/<taxid>.mitab." + date + ".txt";  //Proprietary
 
String zipfile = "/home/irefindex/output/<taxid>.mitab." + date + ".txt";
 
</pre></li>
 
  <li>Make sure that the output directory has been created. For example:
 
    <pre>mkdir /home/irefindex/output</pre></li>
 
  </ul>
 
  
 
<li>Compile the source code. In order to build the software on a computer which does not have the NetBeans IDE installed, copy the generic build file into the <tt>PSI_MI_TAB_Maker</tt> directory:
 
<li>Compile the source code. In order to build the software on a computer which does not have the NetBeans IDE installed, copy the generic build file into the <tt>PSI_MI_TAB_Maker</tt> directory:
Line 84: Line 110:
 
</ol>
 
</ol>
  
== Running PSI_MI_TAB_Maker ==
+
== Preparing the MITAB Tables ==
 +
 
 +
Before we can run the PSI_MI_TAB_Maker program, mapping and MITAB-related tables must be created in the database.
 +
 
 +
{{Note|
 +
The tables described here are prepared by the <tt>cv.no.uio.biotek.Preprocess</tt> class in a step which will be documented later.
 +
}}
 +
 
 +
=== Obtaining the SQL Scripts ===
 +
 
 +
Get the scripts from this location:
 +
 
 +
<ul>
 +
<li>https://hfaistos.uio.no/cgi-bin/viewvc.cgi/bioscape/bioscape/modules/interaction/Sabry/SQL_commands/</li>
 +
</ul>
 +
 
 +
Using CVS with the appropriate <tt>CVSROOT</tt> setting, run the following command:
 +
 
 +
<pre>cvs co bioscape/bioscape/modules/interaction/Sabry/SQL_commands</pre>
 +
 
 +
The <tt>CVSROOT</tt> environment variable should be set to the following for this to work:
 +
 
 +
<pre>export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot</pre>
 +
 
 +
(The <tt><username></tt> should be replaced with your actual username.)
 +
 
 +
=== Preparing the SQL Scripts ===
 +
 
 +
The mapping tables script first needs to be parameterised before being used:
 +
 
 +
sed -e 's/<old_db>/<actual_irefindex_db>/g' make_mapping_tables_for_output.sql > make_mapping_tables_for_output_specific.sql
 +
 
 +
=== Running the SQL Scripts ===
  
'''NOTE: The tables described here are prepared by the <tt>cv.no.uio.biotek.Preprocess</tt> class in a step which will be documented later.'''
+
In the <tt>SQL_commands</tt> directory, one script (<tt>preprocess_for_output.sql</tt>) provides the basis for all data output, whereas two other scripts (<tt>preprocess_for_mitab.sql</tt> and <tt>TAB_MAKE.sql</tt>) together provide a large number of SQL statements for the creation of MITAB-related tables. The first two of these scripts should be run as follows and any error conditions noted:
  
In order to run the program, some additional database tables are required. One
+
mysql -h <hostname> -u <username> -p -A -D <database> < make_mapping_tables_for_output_specific.sql
way of ensuring that such tables exist and are suitable is to drop any
+
mysql -h <hostname> -u <username> -p -A -D <database> < preprocess_for_output.sql
existing tables within the database being built, then to copy existing tables
+
mysql -h <hostname> -u <username> -p -A -D <database> < preprocess_for_mitab.sql
from a previously built database:
 
  
<pre>
+
The final script can then be run if no errors were experienced:
use <database>;
 
drop table mapping_intDitection;
 
drop table mapping_intType;
 
drop table mapping_partidentification;
 
create table mapping_intDitection like <old_database>.mapping_intDitection;
 
create table mapping_intType like <old_database>.mapping_intType;
 
create table mapping_partidentification like <old_database>.mapping_partidentification;
 
insert into mapping_intDitection select * from <old_database>.mapping_intDitection;
 
insert into mapping_intType select * from <old_database>.mapping_intType;
 
insert into mapping_partidentification select * from <old_database>.mapping_partidentification;
 
create table taxid2name as select * from <old_database>.taxid2name;
 
</pre>
 
  
For example:
+
mysql -h <hostname> -u <username> -p -A -D <database> < TAB_MAKE.sql
  
<pre>
+
== Running PSI_MI_TAB_Maker ==
use irefindex;
 
drop table mapping_intDitection;
 
drop table mapping_intType;
 
drop table mapping_partidentification;
 
create table mapping_intDitection like old_db.mapping_intDitection;
 
create table mapping_intType like old_db.mapping_intType;
 
create table mapping_partidentification like old_db.mapping_partidentification;
 
insert into mapping_intDitection select * from old_db.mapping_intDitection;
 
insert into mapping_intType select * from old_db.mapping_intType;
 
insert into mapping_partidentification select * from old_db.mapping_partidentification;
 
create table taxid2name as select * from old_db.taxid2name;
 
</pre>
 
  
 
Run the program as follows:
 
Run the program as follows:
  
<pre>java -jar -Xms256m -Xmx256m build/jar/PSI_MI_TAB_Maker.jar</pre>
+
java -jar -Xms256m -Xmx8192m build/jar/PSI_MI_TAB_Maker.jar <config filename>
  
Follow the instructions, supplying the requested arguments when running the
+
A sample configuration file is located in the <tt>config</tt> directory. It can be copied, modified and supplied to the program.
program again.
 
  
 
== Running BioPSI_Suplimenter to Produce Statistics ==
 
== Running BioPSI_Suplimenter to Produce Statistics ==
Line 179: Line 211:
  
 
This file could potentially be uploaded to the Wiki using a tool suitable for this purpose. For a Wiki such as MoinMoin (also supported by the program), the file could potentially be copied into place with a certain degree of care.
 
This file could potentially be uploaded to the Wiki using a tool suitable for this purpose. For a Wiki such as MoinMoin (also supported by the program), the file could potentially be copied into place with a certain degree of care.
 +
 +
=== Creating Additional Statistics ===
 +
 +
The <tt>make_taxonomy_summary.sql</tt> script in the <tt>SQL_commands</tt> directory can be used to generate a table of interactions by species:
 +
 +
mysql -h <hostname> -u <username> -p -A -D <database> < make_taxonomy_summary.sql > taxonomy_summary.txt
 +
 +
This file can be incorporated into the statistics page, at least in part, and otherwise published in full.
 +
 +
== Creating Other Mapping Files ==
 +
 +
In addition to the MITAB data, a mapping file should be generated using a script in the <tt>SQL_commands</tt> directory. First prepare the script, substituting a real filesystem path for <tt>&lt;actual_mapping_file&gt;</tt>:
 +
 +
sed -e 's/<mapping_file>/<actual_mapping_file>/g' mapper_tables.sql > mapper_tables_specific.sql
 +
 +
You will need to "escape" various characters. For example:
 +
 +
sed -e 's/<mapping_file>/\/home\/irefindex\/output\/mappings.txt/g' mapper_tables.sql > mapper_tables_specific.sql
 +
 +
Then execute the script as follows:
 +
 +
mysql -h <hostname> -u <username> -p -A -D <database> < mapper_tables_specific.sql
 +
 +
This will write a file to the specified location.
  
 
== All iRefIndex Pages ==
 
== All iRefIndex Pages ==

Latest revision as of 13:34, 16 February 2012

The production of output and statistics involves two separate programs: PSI_MI_TAB_Maker and BioPSI_Suplimenter.

PSI-MI Controlled Vocabulary Mapping

NoteNote

Sabry to help document this part.

In the data maintained by iRefIndex, various controlled vocabulary terms are used which do not match genuine terms defined in the molecular interaction ontology. As a result, a process is followed involving the extraction of such unrecognised terms, the curation of a mapping to replacement terms, and the processing of the maintained data to use the replacement terms.

Currently, the curation process is performed by assembling the unrecognised terms in a spreadsheet which is then modified, adding suggested replacements alongside the existing terms.

Creating a Mapping Wiki Page

A page summarising the mapping of unrecognised terms to known terms should be prepared such as the Mapping of terms to MI term ids - iRefIndex 8.0‎ page.

The cv2wiki.py script needs to be obtained. Get the program's source code from this location:

Using CVS with the appropriate CVSROOT setting, run the following command:

cvs co bioscape/bioscape/modules/interaction/Sabry/cv2wiki.py

The CVSROOT environment variable should be set to the following for this to work:

export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot

(The <username> should be replaced with your actual username.)

Running the Program

The program must be run on comma-separated value files exported from the curation spreadsheet. First, the "interaction type" and "interaction detection method" sheets must be individually exported using the following settings:

  • The field delimiter is defined to be the comma (,) character
  • Field quoting is done using the double-quote (") character

With exported files defined, for example, as cv_int_type.csv and cv_int_det_method.csv for the "interaction type" and "interaction detection method" sheet files respectively, the following command can then be run:

python cv2wiki.py cv_int_type.csv cv_int_det_method.csv MediaWiki

The output in the above example will be written to standard output (the terminal/console). To write to a file, add a filename as an argument to the program. For example:

python cv2wiki.py cv_int_type.csv cv_int_det_method.csv MediaWiki CVMapping

This file could potentially be uploaded to the Wiki using a tool suitable for this purpose. For a Wiki such as MoinMoin (also supported by the program), the file could potentially be copied into place with a certain degree of care.

Building PSI_MI_TAB_Maker

The PSI_MI_TAB_Maker.jar file needs to be obtained or built.

  1. Get the program's source code from this location:

    Using CVS with the appropriate CVSROOT setting, run the following command:

    cvs co bioscape/bioscape/modules/interaction/Sabry/PSI_MI_TAB_Maker

    The CVSROOT environment variable should be set to the following for this to work:

    export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot
    (The <username> should be replaced with your actual username.)
  2. Obtain the program's dependencies. This program uses the MySQL Connector/J library which can be found at the following location:
  3. Extract the dependencies:
    tar zxf mysql-connector-java-5.1.6.tar.gz

    This will produce a directory called mysql-connector-java-5.1.6 containing a file called mysql-connector-java-5.1.6-bin.jar which should be placed in the lib directory in the PSI_Writer directory...

      mkdir lib
      cp mysql-connector-java-5.1.6/mysql-connector-java-5.1.6-bin.jar lib/

    You may instead choose to copy the library from the BioPSI_Suplimenter/lib directory:

      mkdir lib
      cp ../BioPSI_Suplimenter/lib/mysql-connector-java-5.1.6-bin.jar lib/

    The filenames in the above example will need adjusting, depending on the exact version of the library downloaded.

    The SHA.jar file needs copying from its build location:

    cp ../SHA/dist/SHA.jar lib/

    Alternatively, the external libraries can also be found in the following location:

    /biotek/dias/donaldson3/iRefIndex/External_libraries
  4. Compile the source code. In order to build the software on a computer which does not have the NetBeans IDE installed, copy the generic build file into the PSI_MI_TAB_Maker directory:
    cp Build_files/build.xml .

    Compile and create the .jar file as follows:

    ant jar

Preparing the MITAB Tables

Before we can run the PSI_MI_TAB_Maker program, mapping and MITAB-related tables must be created in the database.

NoteNote

The tables described here are prepared by the cv.no.uio.biotek.Preprocess class in a step which will be documented later.

Obtaining the SQL Scripts

Get the scripts from this location:

Using CVS with the appropriate CVSROOT setting, run the following command:

cvs co bioscape/bioscape/modules/interaction/Sabry/SQL_commands

The CVSROOT environment variable should be set to the following for this to work:

export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot

(The <username> should be replaced with your actual username.)

Preparing the SQL Scripts

The mapping tables script first needs to be parameterised before being used:

sed -e 's/<old_db>/<actual_irefindex_db>/g' make_mapping_tables_for_output.sql > make_mapping_tables_for_output_specific.sql

Running the SQL Scripts

In the SQL_commands directory, one script (preprocess_for_output.sql) provides the basis for all data output, whereas two other scripts (preprocess_for_mitab.sql and TAB_MAKE.sql) together provide a large number of SQL statements for the creation of MITAB-related tables. The first two of these scripts should be run as follows and any error conditions noted:

mysql -h <hostname> -u <username> -p -A -D <database> < make_mapping_tables_for_output_specific.sql
mysql -h <hostname> -u <username> -p -A -D <database> < preprocess_for_output.sql
mysql -h <hostname> -u <username> -p -A -D <database> < preprocess_for_mitab.sql

The final script can then be run if no errors were experienced:

mysql -h <hostname> -u <username> -p -A -D <database> < TAB_MAKE.sql

Running PSI_MI_TAB_Maker

Run the program as follows:

java -jar -Xms256m -Xmx8192m build/jar/PSI_MI_TAB_Maker.jar <config filename>

A sample configuration file is located in the config directory. It can be copied, modified and supplied to the program.

Running BioPSI_Suplimenter to Produce Statistics

This program was already built and run in the iRefIndex Build Process. For a completed build process it should not need to be run again, but an option does exist to explicitly produce statistics for the system, should this be required. The basic details of running the program are described in Running BioPSI_Suplimenter.

Create reports

Upon selecting this option in the running program, complete the fields as previously described. The reports will be written to the designated log file directory.

Creating a Statistics Wiki Page

The reports2wiki.py file needs to be obtained. Get the program's source code from this location:

Using CVS with the appropriate CVSROOT setting, run the following command:

cvs co bioscape/bioscape/modules/interaction/Sabry/reports2wiki.py

The CVSROOT environment variable should be set to the following for this to work:

export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot

(The <username> should be replaced with your actual username.)

Running the Program

The program can be run on the report files in the log file directory as follows:

python reports2wiki.py /home/irefindex/output/Suplimenter03052009 MediaWiki

The "prefix" of the report files should be the common part of all such files such that...

ls /home/irefindex/output/Suplimenter03052009*

...should list the report files (and log files) produced by iRefIndex.

The output in the above example will be written to standard output (the terminal/console). To write to a file, add a filename as an argument to the program. For example:

python reports2wiki.py /home/irefindex/output/Suplimenter03052009 MediaWiki Statistics_iRefIndex_3.0

This file could potentially be uploaded to the Wiki using a tool suitable for this purpose. For a Wiki such as MoinMoin (also supported by the program), the file could potentially be copied into place with a certain degree of care.

Creating Additional Statistics

The make_taxonomy_summary.sql script in the SQL_commands directory can be used to generate a table of interactions by species:

mysql -h <hostname> -u <username> -p -A -D <database> < make_taxonomy_summary.sql > taxonomy_summary.txt

This file can be incorporated into the statistics page, at least in part, and otherwise published in full.

Creating Other Mapping Files

In addition to the MITAB data, a mapping file should be generated using a script in the SQL_commands directory. First prepare the script, substituting a real filesystem path for <actual_mapping_file>:

sed -e 's/<mapping_file>/<actual_mapping_file>/g' mapper_tables.sql > mapper_tables_specific.sql

You will need to "escape" various characters. For example:

sed -e 's/<mapping_file>/\/home\/irefindex\/output\/mappings.txt/g' mapper_tables.sql > mapper_tables_specific.sql

Then execute the script as follows:

mysql -h <hostname> -u <username> -p -A -D <database> < mapper_tables_specific.sql

This will write a file to the specified location.

All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).