Difference between revisions of "iRefIndex Data Preparation for iRefWeb"
PaulBoddie (talk | contribs) (Added database dump and testing details along with a category link.) |
PaulBoddie (talk | contribs) (→Mapping RIGIDs: Added sorting, de-duplication, header notes.) |
||
(4 intermediate revisions by the same user not shown) | |||
Line 64: | Line 64: | ||
(The <tt><username></tt> should be replaced with your actual username.) | (The <tt><username></tt> should be replaced with your actual username.) | ||
+ | |||
+ | == Updating the SQL Scripts == | ||
+ | |||
+ | The <tt>make_iRefWeb.sql</tt> script needs to be updated so that each statement defining a source in the <tt>source_db</tt> table refers to the appropriate release version and date. | ||
== Preparing the SQL Scripts == | == Preparing the SQL Scripts == | ||
Line 72: | Line 76: | ||
sed -e 's/<irefweb_db>/<actual_irefweb_db>/g;s/<old_irefweb_db>/<actual_old_irefweb_db>/g' test_iRefWeb.sql > test_iRefWeb_specific.sql | sed -e 's/<irefweb_db>/<actual_irefweb_db>/g;s/<old_irefweb_db>/<actual_old_irefweb_db>/g' test_iRefWeb.sql > test_iRefWeb_specific.sql | ||
− | For example: | + | For example, given the following substitutions... |
+ | |||
+ | {| border="1" cellspacing="0" cellpadding="5" | ||
+ | ! Parameter | ||
+ | ! Substitution | ||
+ | ! Value | ||
+ | |- | ||
+ | | <irefindex_db> | ||
+ | | <actual_irefindex_db> | ||
+ | | iRefIndex_full_beta9 | ||
+ | |- | ||
+ | | <old_irefweb_db> | ||
+ | | <actual_old_irefweb_db> | ||
+ | | iRefWeb8 | ||
+ | |- | ||
+ | | <irefweb_db> | ||
+ | | <actual_irefweb_db> | ||
+ | | iRefWeb9 | ||
+ | |} | ||
+ | |||
+ | ...the commands would be as follows: | ||
sed -e 's/<irefindex_db>/iRefIndex_full_beta9/g;s/<old_irefweb_db>/iRefWeb8/g' make_iRefWeb.sql > make_iRefWeb_specific.sql | sed -e 's/<irefindex_db>/iRefIndex_full_beta9/g;s/<old_irefweb_db>/iRefWeb8/g' make_iRefWeb.sql > make_iRefWeb_specific.sql | ||
Line 99: | Line 123: | ||
A database dump can be produced by running a command of the following form: | A database dump can be produced by running a command of the following form: | ||
− | + | ./dump_iRefWeb.sh <hostname> <username> <database> | |
− | + | ||
− | + | == Mapping RIGIDs == | |
− | + | ||
− | + | For iRefIndex 9.0, a table will be made available mapping the RIGIDs used in that release (and subsequent releases) to "legacy" RIGIDs used in earlier releases of iRefIndex. This can be produced by running the following SQL command in the iRefIndex database: | |
− | + | ||
− | + | select sourceid, experiment_uid, Represent_rigid, Represent_rogid | |
− | </ | + | from join_2 |
+ | order by sourceid, experiment_uid | ||
+ | into outfile '/tmp/output/iRefIndex_interaction_interactors'; | ||
+ | |||
+ | This output file can then be processed using the <tt>make_legacy_mapping.py</tt> program: | ||
+ | |||
+ | python make_legacy_mapping.py /tmp/output/iRefIndex_interaction_interactors /tmp/output/iRefIndex_interaction_rigids | ||
+ | |||
+ | (The program needs Python 2.5 or later since it uses the <tt>hashlib</tt> module. It could probably be made compatible with earlier Python versions if necessary.) | ||
+ | |||
+ | The resulting output file (<tt>/tmp/iRefIndex_interaction_rigids</tt> in the above example) is a tab-separated file with correct RIGIDs in the first column and "legacy" RIGIDs in the second column. Since the output may include duplicate lines, the following step is advised: | ||
+ | |||
+ | sort -u /tmp/output/iRefIndex_interaction_rigids > /tmp/iRefIndex_interaction_rigids | ||
+ | |||
+ | A header can also be included in the file: | ||
+ | |||
+ | cat header_legacy_mapping.txt /tmp/iRefIndex_interaction_rigids > iRefIndex_interaction_rigids | ||
== All iRefIndex Pages == | == All iRefIndex Pages == |
Latest revision as of 13:04, 20 October 2011
A special iRefWeb database needs to be created and populated so that iRefWeb may publish iRefIndex data.
Contents
Creating a Database
Enter MySQL using a command like the following:
mysql -h <host> -u <admin> -p -A
The <admin> is the name of the user with administrative privileges. For example:
mysql -h myhost -u admin -p -A
Then create a database using commands of the following form:
create database <database>; grant all privileges on <database>.* to '<username>'@'%';
For example, with <database> given as iRefWeb, <username> given as irefindex, and a substitution for <password>:
create database iRefWeb; grant all privileges on iRefWeb.* to 'irefindex'@'%';
If difficulties occur granting privileges in this way, try the following statements:
grant select, insert, update, delete, create, drop, references, index, alter, create temporary tables, lock tables, execute, create view, show view, create routine, alter routine on <database>.* to '<username>'@'%'; grant process, file on *.* to '<username>'@'%';
You should choose the same user as the one used to build the iRefIndex database since that database will need to be accessed during this activity.
Accessing the Previous iRefWeb Database
Note |
This should probably be changed so that the process of building an iRefWeb database depends only on the iRefIndex database. |
In order to populate the iRefWeb database, the previous version of the database needs to be referenced. Thus, a command of the following form may be required:
grant select on <old_irefweb_db>.* to '<username>'@'%';
Obtaining the SQL Scripts
Get the scripts from this location:
- https://hfaistos.uio.no/cgi-bin/viewvc.cgi/bioscape/bioscape/modules/interaction/Sabry/SQL_commands/
Using CVS with the appropriate CVSROOT setting, run the following command:
cvs co bioscape/bioscape/modules/interaction/Sabry/SQL_commands
The CVSROOT environment variable should be set to the following for this to work:
export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot
(The <username> should be replaced with your actual username.)
Updating the SQL Scripts
The make_iRefWeb.sql script needs to be updated so that each statement defining a source in the source_db table refers to the appropriate release version and date.
Preparing the SQL Scripts
The make_iRefWeb.sql and test_iRefWeb.sql scripts needs to be parameterised to refer to specific databases. The following commands should prepare a suitable version of the scripts given appropriate values for <actual_irefindex_db>, <actual_old_irefweb_db> and <actual_irefweb_db>:
sed -e 's/<irefindex_db>/<actual_irefindex_db>/g;s/<old_irefweb_db>/<actual_old_irefweb_db>/g' make_iRefWeb.sql > make_iRefWeb_specific.sql sed -e 's/<irefweb_db>/<actual_irefweb_db>/g;s/<old_irefweb_db>/<actual_old_irefweb_db>/g' test_iRefWeb.sql > test_iRefWeb_specific.sql
For example, given the following substitutions...
Parameter | Substitution | Value |
---|---|---|
<irefindex_db> | <actual_irefindex_db> | iRefIndex_full_beta9 |
<old_irefweb_db> | <actual_old_irefweb_db> | iRefWeb8 |
<irefweb_db> | <actual_irefweb_db> | iRefWeb9 |
...the commands would be as follows:
sed -e 's/<irefindex_db>/iRefIndex_full_beta9/g;s/<old_irefweb_db>/iRefWeb8/g' make_iRefWeb.sql > make_iRefWeb_specific.sql sed -e 's/<irefweb_db>/iRefWeb9/g;s/<old_irefweb_db>/iRefWeb8/g' test_iRefWeb.sql > test_iRefWeb_specific.sql
Running the SQL Scripts
In the SQL_commands directory, two scripts - preprocess_for_iRefWeb.sql and make_iRefWeb.sql - together provide a large number of SQL statements for the creation of iRefWeb data. The first of these scripts should be run as follows and any error conditions noted:
mysql -h <hostname> -u <username> -p -A -D <irefindex_database> < preprocess_for_iRefWeb.sql
The prepared version of the second script can then be run if no errors were experienced:
mysql -h <hostname> -u <username> -p -A -D <irefweb_database> < make_iRefWeb_specific.sql
Here, <irefweb_database> refers to the current iRefWeb database created above, whereas <irefindex_database> refers to the current iRefIndex database.
The built database can be tested by running the following:
mysql -h <hostname> -u <username> -p -A -D <irefweb_database> < test_iRefWeb_specific.sql
This will output a number of figures corresponding to changes in the number of interactions and interactors. Large deviations from the typical scale of change may indicate a problem in the generation or assignment of identifiers.
Generating the Database Dump
A database dump can be produced by running a command of the following form:
./dump_iRefWeb.sh <hostname> <username> <database>
Mapping RIGIDs
For iRefIndex 9.0, a table will be made available mapping the RIGIDs used in that release (and subsequent releases) to "legacy" RIGIDs used in earlier releases of iRefIndex. This can be produced by running the following SQL command in the iRefIndex database:
select sourceid, experiment_uid, Represent_rigid, Represent_rogid from join_2 order by sourceid, experiment_uid into outfile '/tmp/output/iRefIndex_interaction_interactors';
This output file can then be processed using the make_legacy_mapping.py program:
python make_legacy_mapping.py /tmp/output/iRefIndex_interaction_interactors /tmp/output/iRefIndex_interaction_rigids
(The program needs Python 2.5 or later since it uses the hashlib module. It could probably be made compatible with earlier Python versions if necessary.)
The resulting output file (/tmp/iRefIndex_interaction_rigids in the above example) is a tab-separated file with correct RIGIDs in the first column and "legacy" RIGIDs in the second column. Since the output may include duplicate lines, the following step is advised:
sort -u /tmp/output/iRefIndex_interaction_rigids > /tmp/iRefIndex_interaction_rigids
A header can also be included in the file:
cat header_legacy_mapping.txt /tmp/iRefIndex_interaction_rigids > iRefIndex_interaction_rigids
All iRefIndex Pages
Follow this link for a listing of all iRefIndex related pages (archived and current).