Difference between revisions of "Bioscape Installation"
PaulBoddie (talk | contribs) m (Switched to recommended heading levels.) |
PaulBoddie (talk | contribs) m (Bioscape Manual moved to Bioscape Installation: The manual will refer to the different documents including this one.) |
(No difference)
|
Revision as of 15:19, 10 February 2009
Contents
Installation
Before installing, it is necessary to consider the dependencies listed in the section given below. Precise information about installing the dependencies is not provided in this document, and it is recommended that you make use of your system's package management tools, perhaps installing Bioscape itself from a suitable package, in order to save time and effort working through the installation process manually. However, for those interested in installing Bioscape from the source code distribution, the procedure is given below.
Installation from Source Code
Bioscape can be installed as follows:
python setup.py install
Note that you may need to be a privileged user to perform the above command, and it might be preferable to choose an alternative installation location if you do not have administrative or superuser rights. The following command provides an example of installing the software in another location:
python setup.py install --prefix=/home/user/software/usr
You will need to change the location according to your own system's conventions and your own preferences. Once installed, you may also need to tell your system where to find the installed libraries and programs; this is usually done by modifying environment variables, and could be done for the above example by adding the following definitions to your environment configuration:
export PATH=${PATH}:/home/user/software/usr/bin export PYTHONPATH=${PYTHONPATH}:/home/user/software/usr/lib/python2.3/site-packages
Note that the exact details of the latter definition, particularly the version of Python (2.3) and the library directory (lib) may depend on certain system details.
Dependency Configuration
For some of the dependencies, even with pre-installed packages, you will need to do some preparatory work in order to use Bioscape. Some brief details of this work are given below.
PostgreSQL
It is necessary to initialise a "database cluster" for Bioscape. This is typically done using commands such as the following:
mkdir -p /home/user/software/var/lib/pgsql initdb -D /home/user/software/var/lib/pgsql
Setting the PGDATA environment variable to the directory given in the above commands will save you the effort of specifying it later with other PostgreSQL-related commands.
In order to get improved performance from PostgreSQL, consider replacing the postgresql.conf file in the database cluster with the version found in the docs/database directory.
Configuration
Before use, the distribution must be configured according to the environment in which the software will operate. This is done most conveniently by running the configuration program:
python bioscape_configure.py
The configuration program takes the bioscape.cfg.in template and produces a specific bioscape.cfg configuration file. An alternative approach is to copy bioscape.cfg.in to bioscape.cfg and to edit the file manually.
Once the bioscape.cfg file has been produced, it may be left in a "working directory" where all Bioscape-related tasks will be performed, or it can be copied or moved to your home directory; for example:
mv bioscape.cfg /home/user
See below for advice on setting database parameters in the configuration.
Useful Configuration Value Groups
The following groups of settings and values may be of use when choosing particular configurations of the software.
Setting | Value |
---|---|
database_system | pgsql |
jdbc_database_url | jdbc:postgresql://localhost/bioscape |
jdbc_driver_class | org.postgresql.Driver |
Database Configuration
In order to use certain modules (or packages) within the distribution, the database support must be configured, preferably using the database configuration program:
python bioscape_dbconfigure.py
Each of the modules (or packages) requiring database support can be listed, and the specific table and data definitions can be prepared and invoked using the database configuration program.
Quick Start
Use the quick start program in order to initialise Bioscape as quickly as possible:
bioscape_quickstart.py -t quickstart
Or, from the distribution directory:
python scripts/bioscape_quickstart.py -t quickstart
The program has a range of "targets" that can be specified; running the program without any arguments (given as -t quickstart above) will indicate some of these targets.
Dependencies
Bioscape has the following basic dependencies:
Package | Release Information | Purpose | Notes |
---|---|---|---|
Python | Tested with 2.3.6, 2.4.4 | Runs most of the software | Note that Python releases in the 2.3 series earlier than 2.3.5 have threading issues which are exposed by PyLucene, causing deadlock situations. Additional compatibility issues with gcj apply to PyLucene, and it is recommended that the software be compiled with gcj 3.4.6, potentially together with a suitable version of Python (such as 2.3.5 or 2.4.4 or later). |
PyLucene | Tested with 2.0.0, 2.1.0-2 | Indexes textual documents | |
CMDsyntax | 0.91 | Processes command line options | |
XSLTools | 0.6 | Produces the Web interface | |
WebStack | 1.3 | Produces the Web interface | |
libxml2dom | 0.4.6 | Required by XSLTools | |
libxslt | Tested with 1.1.20 | Required by XSLTools | |
libxml2 | Tested with 2.6.27 | Required by libxml2dom | |
PostgreSQL | Tested with 8.1.9 | Storage of information | Currently PostgreSQL is the only supported database system |
pyPgSQL | Tested with 2.5.1 | Database access | |
egenix-mx-base | Tested with 3.0.0 | Required by pyPgSQL | |
Optional: to collect words from WordNet, the following dependencies apply: | |||
Package | Release Information | Purpose | Notes |
WordNet | 3.0 | Provides the WordNet database | |
pywordnet | 2.0.1 | A Python interface to WordNet | |
Alternative: to use Bioscape with LingPipe, the following dependencies apply: | |||
Package | Release Information | Purpose | Notes |
Jython | Tested with 2.2a1 | Used to run LingPipe-related software | |
LingPipe | Tested with 2.3.0 | Sentence splitting in textual documents | |
Lucene | Tested with 2.0.0 | Indexes textual documents | |
PostgreSQL JDBC Driver | Tested with 8.1-407 JDBC 3 | Database access (if PostgreSQL is used) | Required by Jython |
Optional: the following dependencies are related to improving the software: | |||
Epydoc | Tested with 3.0a3 | API document generation |
Bundled Resources
The following resources are currently bundled with the software:
english.words | ftp://ftp.cs.cornell.edu/pub/smart/ |
abbreviations.txt | A combination of the following, plus additional terms, with fragments incorporated in the list, in place of the full abbreviations, where appropriate: |
official.txt | A combination of files from the downloadable archive found at the following location:
http://www.dcs.shef.ac.uk/research/ilash/Moby/mwords.html The following files from the archive were concatenated, sorted, with duplicate and multiple-word entries removed: 113809of.fic 4160offi.cia The following command was used to prepare the file: cat 113809of.fic 4160offi.cia | sort | uniq > official.txt According to a notice at the following location, the Moby lexicon project has been placed in the public domain: |
wordnet.txt | A list of distinct nouns, verbs, adjectives and adverbs from the WordNet 3.0 database, prepared using the bioscape_get_wordnet.py script. See the docs/licences/LICENSE-WordNet file for copyright and licensing information. |
common_english.txt | Common English word token dictionary processed from the common_english file (taking stripped text after the . field separator), with the original file retrieved from the following location: |
adjectives.txt | Animal adjectives. See the permissive licensing details in the docs/licences/adjectives.txt file for more information. |
Additional Resources
- Entrez Gene
- http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
- Entrez Taxonomy
- http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy
- NCBI PubMed
- http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed