iRefIndex Data Preparation for iRefWeb

From irefindex
Revision as of 10:11, 29 September 2011 by PaulBoddie (talk | contribs) (→‎Preparing the SQL Scripts: Added a table of values.)

A special iRefWeb database needs to be created and populated so that iRefWeb may publish iRefIndex data.

Creating a Database

Enter MySQL using a command like the following:

mysql -h <host> -u <admin> -p -A

The <admin> is the name of the user with administrative privileges. For example:

mysql -h myhost -u admin -p -A

Then create a database using commands of the following form:

create database <database>;
grant all privileges on <database>.* to '<username>'@'%';

For example, with <database> given as iRefWeb, <username> given as irefindex, and a substitution for <password>:

create database iRefWeb;
grant all privileges on iRefWeb.* to 'irefindex'@'%';

If difficulties occur granting privileges in this way, try the following statements:

grant select, insert, update, delete, create, drop, references, index, alter, create temporary tables, lock tables, execute, create view, show view, create routine, alter routine on <database>.* to '<username>'@'%';
grant process, file on *.* to '<username>'@'%';

You should choose the same user as the one used to build the iRefIndex database since that database will need to be accessed during this activity.

Accessing the Previous iRefWeb Database

NoteNote

This should probably be changed so that the process of building an iRefWeb database depends only on the iRefIndex database.

In order to populate the iRefWeb database, the previous version of the database needs to be referenced. Thus, a command of the following form may be required:

grant select on <old_irefweb_db>.* to '<username>'@'%';

Obtaining the SQL Scripts

Get the scripts from this location:

Using CVS with the appropriate CVSROOT setting, run the following command:

cvs co bioscape/bioscape/modules/interaction/Sabry/SQL_commands

The CVSROOT environment variable should be set to the following for this to work:

export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot

(The <username> should be replaced with your actual username.)

Preparing the SQL Scripts

The make_iRefWeb.sql and test_iRefWeb.sql scripts needs to be parameterised to refer to specific databases. The following commands should prepare a suitable version of the scripts given appropriate values for <actual_irefindex_db>, <actual_old_irefweb_db> and <actual_irefweb_db>:

sed -e 's/<irefindex_db>/<actual_irefindex_db>/g;s/<old_irefweb_db>/<actual_old_irefweb_db>/g' make_iRefWeb.sql > make_iRefWeb_specific.sql
sed -e 's/<irefweb_db>/<actual_irefweb_db>/g;s/<old_irefweb_db>/<actual_old_irefweb_db>/g' test_iRefWeb.sql > test_iRefWeb_specific.sql

For example, given the following substitutions...

Parameter Substitution Value
<irefindex_db> <actual_irefindex_db> iRefIndex_full_beta9
<old_irefweb_db> <actual_old_irefweb_db> iRefWeb8
<irefweb_db> <actual_irefweb_db> iRefWeb9

...the commands would be as follows:

sed -e 's/<irefindex_db>/iRefIndex_full_beta9/g;s/<old_irefweb_db>/iRefWeb8/g' make_iRefWeb.sql > make_iRefWeb_specific.sql
sed -e 's/<irefweb_db>/iRefWeb9/g;s/<old_irefweb_db>/iRefWeb8/g' test_iRefWeb.sql > test_iRefWeb_specific.sql

Running the SQL Scripts

In the SQL_commands directory, two scripts - preprocess_for_iRefWeb.sql and make_iRefWeb.sql - together provide a large number of SQL statements for the creation of iRefWeb data. The first of these scripts should be run as follows and any error conditions noted:

mysql -h <hostname> -u <username> -p -A -D <irefindex_database> < preprocess_for_iRefWeb.sql

The prepared version of the second script can then be run if no errors were experienced:

mysql -h <hostname> -u <username> -p -A -D <irefweb_database> < make_iRefWeb_specific.sql

Here, <irefweb_database> refers to the current iRefWeb database created above, whereas <irefindex_database> refers to the current iRefIndex database.

The built database can be tested by running the following:

mysql -h <hostname> -u <username> -p -A -D <irefweb_database> < test_iRefWeb_specific.sql

This will output a number of figures corresponding to changes in the number of interactions and interactors. Large deviations from the typical scale of change may indicate a problem in the generation or assignment of identifiers.

Generating the Database Dump

A database dump can be produced by running a command of the following form:

mysqldump -h <hostname> -u <username> -p --databases <irefweb_database> --tables \
  alias interaction interaction_detection_type interaction_interactor interaction_interactor_assignment \
  interaction_source_db interaction_source_db_experiment interaction_type interactor \
  interactor_alias interactor_detection_type interactor_type name_space score \
  sequence sequence_source_db source_db statistics geneid2rog \
  > iRefWeb.sql

Mapping RIGIDs

For iRefIndex 9.0, a table will be made available mapping the RIGIDs used in that release (and subsequent releases) to "legacy" RIGIDs used in earlier releases of iRefIndex. This can be produced by running the following SQL command in the iRefIndex database:

select sourceid, experiment_uid, Represent_rigid, Represent_rogid
from join_2
order by sourceid, experiment_uid
into outfile '/tmp/iRefIndex_interaction_interactors';

This output file can then be processed using the make_legacy_mapping.py program:

python make_legacy_mapping.py /tmp/iRefIndex_interaction_interactors /tmp/iRefIndex_interaction_rigids

(The program needs Python 2.5 or later since it uses the hashlib module. It could probably be made compatible with earlier Python versions if necessary.)

The resulting output file (/tmp/iRefIndex_interaction_rigids in the above example) is a tab-separated file with correct RIGIDs in the first column and "legacy" RIGIDs in the second column.

All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).