iRefIndex Build Process
Contents
Downloading the Source Data
Before downloading the source data, a location must be chosen for the downloaded files. For example:
/biotek/prometheus/storage/Sabry/data
Download the files to create local copies. This is not possible for all the data sources and some need special links to be obtained from the source administrators via e-mail. The FTPtransfer program will download data from the following sources:
- RefSeq
- MMDB
- PDB
- gene2refseq
- IntAct
- MINT
Manual Downloads
More information can be found at the following location:
ftp://ftp.no.embnet.org/irefindex/data/current/sources.htm
For each manual download, a subdirectory hierarchy must be created in the main data directory using a command of the following form:
mkdir -p <path-to-data>/<source>/<date>/
Here, <path-to-data> should be replaced by the location of the data directory, <source> should be replaced by the name of the source, and <date> should be replaced by the current date.
For example, for BIND this directory might be created as follows:
mkdir -p /biotek/prometheus/storage/Sabry/data/BIND/09_22_2008/
BIND
The FTP site was previously available at the following location:
ftp://ftp.bind.ca/pub/BIND/data/bindflatfiles/bindindex/
An archived copy of the data can be found at the following internal location:
/biotek/dias/donaldson3/Sabry/DATA_2006/BINDftp/
In the main downloaded data directory, create a subdirectory hierarchy as noted above.
Copy the following following files into the newly created data directory:
20060525.complex2refs.txt 20060525.complex2subunits.txt 20060525.ints.txt 20060525.labels.txt 20060525.refs.txt
BioGrid
The location of BioGrid downloads is as follows:
http://www.thebiogrid.org/downloads.php
In the main downloaded data directory, create a subdirectory hierarchy as noted above.
Select the BIOGRID-ORGANISM-XXXXX.psi25.zip file and download it to the newly created data directory.
In the data directory, uncompress the downloaded file; for example:
unzip BIOGRID-ORGANISM-2.0.44.psi25.zip
CORUM
The location of CORUM downloads is as follows:
http://mips.gsf.de/genre/proj/corum/index.html
The specific download file is this one:
http://mips.gsf.de/genre/export/sites/default/corum/allComplexes.psimi.zip
Uncompress the downloaded file:
unzip allComplexes.psimi.zip
Important Note
The CORUM data needs adjusting to work with the StaxPSIXML software. Using a suitable XSLT tool such as xsltproc, transform the uncompressed downloaded file as follows:
mv allComplexes.psimi allComplexes.psimi.orig xsltproc fix_corum.xsl allComplexes.psimi.orig > allComplexes.psimi
The fix_corum.xsl file can be found in the XSLT directory within StaxPSIXML.
DIP
Access to data from DIP is performed via the following location:
http://dip.doe-mbi.ucla.edu/dip/Login.cgi?
You have to register, agree to terms, and get a user account.
Access credentials for internal users are available from Sabry.
In the main downloaded data directory, create a subdirectory hierarchy as noted above.
Select the FULL - complete DIP data set from the Files page:
http://dip.doe-mbi.ucla.edu/dip/Download.cgi?SM=3
Download the latest PSI-MI 2.5 file (dip<date>.mif25) to the newly created data directory. If a compressed version of the file was chosen, uncompress the file using the gunzip tool. For example:
gunzip dip20080708.mif25
HPRD
In the main downloaded data directory, create a subdirectory hierarchy as noted above.
Download the PSI-MI single file (HPRD_SINGLE_PSIMI_<date>.xml.tar.gz) to the newly created data directory.
Note: you have to register each and every time, unfortunately.
Uncompress the downloaded file. For example:
tar zxf HPRD_SINGLE_PSIMI_090107.xml.tar.gz
OPHID
OPHID is no longer available, so you have to use the local copy of the data:
/biotek/dias/donaldson3/Sabry/iRefIndex_Backup/BckUp15SEP2008/OPHID/2008MAR16
In the main downloaded data directory, create a subdirectory hierarchy as noted above.
Copy the file ophid1153236640123.xml to the newly created data directory.
MIPS
In the main downloaded data directory, create a subdirectory hierarchy as noted above.
For MPPI, download the following file:
http://mips.gsf.de/proj/ppi/data/mppi.gz
For MPACT, download the following file:
ftp://ftpmips.gsf.de/yeast/PPI/mpact-complete.psi25.xml.gz
Uncompress the downloaded files:
gunzip mpact-complete.psi25.xml.gz gunzip mppi.gz
UniProt
In the main downloaded data directory, create a subdirectory hierarchy as noted above.
Visit the following site:
http://www.uniprot.org/downloads
Download the UniProtKB/Swiss-Prot and UniProtKB/TrEMBL files in text format:
- ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
- ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.dat.gz
- ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot_varsplic.fasta.gz
Or from the EBI UK mirror:
- ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.dat.gz
- ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_trembl.dat.gz
- ftp://ftp.ebi.ac.uk/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot_varsplic.fasta.gz
These files should be moved into the newly created data directory and uncompressed:
gunzip uniprot_sprot.dat.gz gunzip uniprot_trembl.dat.gz gunzip uniprot_sprot_varsplic.fasta.gz
Building FTPtransfer
The FTPtransfer.jar file needs to be obtained or built.
- Get the program's source code from this location:
Using CVS with the appropriate CVSROOT setting, run the following command:
cvs co bioscape/bioscape/modules/interaction/Sabry/FTPtransfer
The CVSROOT environment variable should be set to the following for this to work:
export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot
(The <username> should be replaced with your actual username.) - Obtain the program's dependencies. This program uses the Apache commons-net package, and this must be available during compilation. This library could be retrieved from the Apache site...
...or from a mirror such as the following:
- Extract the dependencies:
tar zxf commons-net-1.4.1.tar.gz
This will produce a directory called commons-net-1.4.1 containing a file called commons-net-1.4.1.jar which should be placed in the lib directory in the FTPtransfer directory...
mkdir lib cp commons-net-1.4.1/commons-net-1.4.1.jar lib/
Alternatively, the external libraries can also be found in the following location:
/biotek/dias/donaldson3/iRefIndex/External_libraries
- Customise the output locations. Currently, the output locations are hard-coded, and changing them would involve searching for the following...
/biotek/prometheus/storage/Sabry/data
...and replacing it with the path to the preferred output directory. The source code is found in the following directory within the FTPtransfer directory:
src/ftptransfer
- Compile the source code. In order to build the software on a computer which does not have the NetBeans IDE installed, copy the generic build file into the FTPtransfer directory:
cp Build_files/build.xml .
Compile and create the .jar file as follows:
ant jar
Running FTPtransfer
To run the program, invoke the .jar file as follows:
java -Xms256m -Xmx256m -jar build/jar/FTPtransfer.jar log
The specified log argument can be replaced with a suitable location for the program's execution log.
Building SHA
The SHA.jar file needs to be obtained or built.
- Get the program's source code from this location:
Using CVS with the appropriate CVSROOT setting, run the following command:
cvs co bioscape/bioscape/modules/interaction/Sabry/SHA
The CVSROOT environment variable should be set to the following for this to work:
export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot
(The <username> should be replaced with your actual username.) - Compile the source code. Compile and create the .jar file as follows:
ant jar
The SHA.jar file will be created in the dist directory.
Building BioPSI_Suplimenter
The BioPSI_Suplimenter.jar file needs to be obtained or built.
- Get the program's source code from this location:
Using CVS with the appropriate CVSROOT setting, run the following command:
cvs co bioscape/bioscape/modules/interaction/Sabry/BioPSI_Suplimenter
The CVSROOT environment variable should be set to the following for this to work:
export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot
(The <username> should be replaced with your actual username.) - Obtain the program's dependencies. This program uses the SHA.jar file created above as well as the MySQL Connector/J library which can be found at the following location:
- Extract the dependencies:
tar zxf mysql-connector-java-5.1.6.tar.gz
This will produce a directory called mysql-connector-java-5.1.6 containing a file called mysql-connector-java-5.1.6-bin.jar which should be placed in the lib directory in the BioPSI_Suplimenter directory...
mkdir lib cp mysql-connector-java-5.1.6/mysql-connector-java-5.1.6-bin.jar lib/
The SHA.jar file needs copying from its build location:
cp ../SHA/dist/SHA.jar lib/
Alternatively, the external libraries can also be found in the following location:
/biotek/dias/donaldson3/iRefIndex/External_libraries
- Compile the source code. In order to build the software on a computer which does not have the NetBeans IDE installed, copy the generic build file into the BioPSI_Suplimenter directory:
cp Build_files/build.xml .
Compile and create the .jar file as follows:
ant jar
Creating the Database
Enter MySQL using a command like the following:
mysql -h <host> -u <admin> -p -A
The <admin> is the name of the user with administrative privileges. For example:
mysql -h myhost -u admin -p -A
Then create a database and user using commands of the following form:
create database <database>; create user '<username>'@'%' identified by '<password>'; grant all privileges on <database>.* to '<username>'@'%';
For example, with <database> given as irefindex, <username> given as irefindex, and a substitution for <password>:
create database irefindex; create user 'irefindex'@'%' identified by 'mysecretpassword'; grant all privileges on irefindex.* to 'irefindex'@'%';