Difference between revisions of "Bioscape Installation"

From irefindex
(→‎Dependencies: Added NCBI tools.)
(New version documentation.)
Line 7: Line 7:
 
== Installation ==
 
== Installation ==
  
Before installing, it is necessary to consider the dependencies listed in the
+
Bioscape consists of three separate applications which must be combined to provide all the facilities of a functional Bioscape installation:
section given below. Precise information about installing the dependencies is
+
 
not provided in this document, and it is recommended that you make use of your
+
* The administrative application: <tt>bsadmin</tt>
system's package management tools, perhaps installing Bioscape itself from a
+
* The text-indexing application: <tt>bsindex</tt>
suitable package, in order to save time and effort working through the
+
* The Web front-end application: <tt>bsweb</tt>
installation process manually. However, for those interested in installing
+
 
Bioscape from the source code distribution, the procedure is given below.
+
Before installing, it is necessary to consider the dependencies listed in the section given below. Precise information about installing the dependencies is not provided in this document, and it is recommended that you make use of your system's package management tools, perhaps installing Bioscape itself from suitable packages, in order to save time and effort working through the installation process manually. However, for those interested in installing
 +
Bioscape from the source code distribution of each application, the procedure is given below.
  
 
=== Installation from Source Code ===
 
=== Installation from Source Code ===
  
Bioscape can be installed as follows:
+
First, nominate a common directory to hold the Bioscape application directories. For example:
 
 
<pre>
 
  python setup.py install
 
</pre>
 
  
Note that you may need to be a privileged user to perform the above command,
+
/home/bioscape/apps
and it might be preferable to choose an alternative installation location if
 
you do not have administrative or superuser rights. The following command
 
provides an example of installing the software in another location:
 
  
<pre>
+
Then, acquire each application's source code distribution '''(details to be provided)''' and unpack the archives in this common directory:
  python setup.py install --prefix=/home/user/software/usr
 
</pre>
 
  
You will need to change the location according to your own system's
+
cd /home/bioscape/apps
conventions and your own preferences. Once installed, you may also need to
+
tar zxf bsadmin-x.y.tar.gz
tell your system where to find the installed libraries and programs; this is
+
tar zxf bsindex-x.y.tar.gz
usually done by modifying environment variables, and could be done for the
+
tar zxf bsweb-x.y.tar.gz
above example by adding the following definitions to your environment
 
configuration:
 
  
<pre>
+
Since these applications contain Python libraries, it is important to configure the environment so that they may be accessed by Python. This may be done by creating a short configuration file resembling the one provided as <tt>docs/configuration/env.sh</tt> in the <tt>bsadmin</tt> distribution and then incorporating it into your environment within a <tt>.bashrc</tt> or equivalent file as follows:
  export PATH=${PATH}:/home/user/software/usr/bin
 
  export PYTHONPATH=${PYTHONPATH}:/home/user/software/usr/lib/python2.3/site-packages
 
</pre>
 
  
Note that the exact details of the latter definition, particularly the version
+
source /home/bioscape/apps/bsadmin/docs/configuration/env.sh
of Python (2.3) and the library directory (<tt>lib</tt>) may depend on certain system
 
details.
 
  
 
== Dependency Configuration ==
 
== Dependency Configuration ==
  
For some of the dependencies, even with pre-installed packages, you will need
+
For some of the dependencies, even with pre-installed packages, you will need to do some preparatory work in order to use Bioscape. Some brief details of this work are given below.
to do some preparatory work in order to use Bioscape. Some brief details of
 
this work are given below.
 
  
=== PostgreSQL ===
+
See also the [[Bioscape Dependencies]] page for a list of the dependencies.
  
It is necessary to initialise a "database cluster" for Bioscape. This is
+
=== Installation of Dependencies ===
typically done using commands such as the following:
 
  
<pre>
+
The <tt>docs/dependencies/download.sh</tt> file in the <tt>bsadmin</tt> distribution provides some commands which should be able to download the source distributions of various dependencies. This file or a modified version of it could be run in a nominated directory which would then hold copies of the dependencies' archive files.
  mkdir -p /home/user/software/var/lib/pgsql
 
  initdb -D /home/user/software/var/lib/pgsql
 
</pre>
 
  
Setting the <tt>PGDATA</tt> environment variable to the directory given in the above
+
The <tt>docs/dependencies/build.sh</tt> file in the <tt>bsadmin</tt> distribution provides some commands which could be run to build each of the dependencies from the previously downloaded archive files.
commands will save you the effort of specifying it later with other
 
PostgreSQL-related commands.
 
  
In order to get improved performance from PostgreSQL, consider replacing the
+
=== PostgreSQL ===
<tt>postgresql.conf</tt> file in the database cluster with the version found in the
 
<tt>docs/database</tt> directory.
 
  
== Configuration ==
+
It is necessary to initialise a "database cluster" for Bioscape. This is typically done using commands such as the following:
  
Before use, the distribution must be configured according to the environment
+
mkdir -p /home/bioscape/data
in which the software will operate. This is done most conveniently by running
+
initdb -D /home/bioscape/data
the configuration program:
 
  
<pre>
+
Setting the <tt>PGDATA</tt> environment variable to the directory given in the above commands will save you the effort of specifying it later with other
  python bioscape_configure.py
+
PostgreSQL-related commands. This variable is mentioned in the <tt>env.sh</tt> file referenced above.
</pre>
 
  
The configuration program takes the <tt>bioscape.cfg.in</tt> template and produces a
+
In order to get improved performance from PostgreSQL, consider replacing the <tt>postgresql.conf</tt> file in the database cluster with the version found in the <tt>docs/database</tt> directory of the <tt>bsadmin</tt> distribution.
specific <tt>bioscape.cfg</tt> configuration file. An alternative approach is to copy
 
<tt>bioscape.cfg.in</tt> to <tt>bioscape.cfg</tt> and to edit the file manually.
 
  
Once the <tt>bioscape.cfg</tt> file has been produced, it may be left in a "working
+
== Configuration ==
directory" where all Bioscape-related tasks will be performed, or it can be
 
copied or moved to your home directory; for example:
 
  
<pre>
+
Before use, the distribution must be configured according to the environment in which the software will operate. This is done most conveniently by running the configuration program in the <tt>bsadmin</tt> directory:
  mv bioscape.cfg /home/user
 
</pre>
 
  
See below for advice on setting database parameters in the configuration.
+
python scripts/bioscape_configure.py
  
=== Useful Configuration Value Groups ===
+
The configuration program takes the <tt>bioscape.cfg.in</tt> template and produces a specific <tt>bioscape.cfg</tt> configuration file. An alternative approach is to copy <tt>bioscape.cfg.in</tt> to <tt>bioscape.cfg</tt> and to edit the file manually.
  
The following groups of settings and values may be of use when choosing
+
Once the <tt>bioscape.cfg</tt> file has been produced, it may be left in the <tt>bsadmin</tt> directory, or it can be copied or moved to your home directory; for example:
particular configurations of the software.
 
  
{| border="1" cellpadding="5" cellspacing="0"
+
mv bioscape.cfg /home/bioscape
! Setting !! Value
 
|-
 
| database_system || pgsql
 
|-
 
| jdbc_database_url || jdbc:postgresql://localhost/bioscape
 
|-
 
| jdbc_driver_class || org.postgresql.Driver
 
|}
 
  
 
== Database Configuration ==
 
== Database Configuration ==
  
In order to use certain modules (or packages) within the distribution, the
+
The database support must also be configured, preferably using the database configuration program in the <tt>bsadmin</tt> distribution:
database support must be configured, preferably using the database
 
configuration program:
 
  
<pre>
+
python scripts/bioscape_dbconfigure.py
  python bioscape_dbconfigure.py
 
</pre>
 
  
Each of the modules (or packages) requiring database support can be listed,
+
Each of the modules (or packages) requiring database support can be listed, and the specific table and data definitions can be prepared and invoked using the database configuration program.
and the specific table and data definitions can be prepared and invoked using
 
the database configuration program.
 
  
 
== Quick Start ==
 
== Quick Start ==
  
Use the quick start program in order to initialise Bioscape as quickly as
+
Use the quick start program - provided in the <tt>bsindex</tt> distribution - in order to initialise Bioscape as quickly as possible:
possible:
 
 
 
<pre>
 
  bioscape_quickstart.py -t quickstart
 
</pre>
 
 
 
Or, from the distribution directory:
 
 
 
<pre>
 
  python scripts/bioscape_quickstart.py -t quickstart
 
</pre>
 
 
 
The program has a range of "targets" that can be specified; running the
 
program without any arguments (given as <tt>-t quickstart</tt> above) will indicate
 
some of these targets.
 
 
 
== Dependencies ==
 
 
 
Bioscape has the following basic dependencies:
 
 
 
{| border="1" cellspacing="0" cellpadding="5"
 
! Package !! Release Information !! Purpose !! Notes
 
|-
 
| [http://www.python.org/ Python] || Tested with 2.3.6, 2.4.4 || Runs most of the software
 
| rowspan="2" | Note that Python releases in the 2.3 series earlier than 2.3.5 have threading issues which are exposed by PyLucene, causing deadlock situations. Additional compatibility issues with gcj apply to PyLucene, and it is recommended that the software be compiled with gcj 3.4.6, potentially together with a suitable version of Python (such as 2.3.5 or 2.4.4 or later).
 
|-
 
| [http://pylucene.osafoundation.org/ PyLucene] || Tested with 2.0.0, 2.1.0-2 || Indexes textual documents
 
|-
 
| [http://www.boddie.org.uk/david/Projects/Python/CMDSyntax/ CMDsyntax] || 0.91 || Processes command line options
 
|-
 
| [http://www.boddie.org.uk/python/XSLTools.html XSLTools] || 0.6 || Produces the Web interface
 
|-
 
| [http://www.boddie.org.uk/python/WebStack.html WebStack] || 1.3 || Produces the Web interface
 
|-
 
| [http://www.boddie.org.uk/python/libxml2dom.html libxml2dom] || 0.4.6 || Required by XSLTools
 
|-
 
| [http://www.xmlsoft.org/XSLT.html libxslt] || Tested with 1.1.20 || Required by XSLTools
 
|-
 
| [http://www.xmlsoft.org/ libxml2] || Tested with 2.6.27 || Required by libxml2dom
 
|-
 
| [http://www.postgresql.org/ PostgreSQL] || Tested with 8.1.9 || Storage of information
 
| rowspan="3" | Currently PostgreSQL is the only supported database system
 
|-
 
| [http://pypgsql.sourceforge.net/ pyPgSQL] || Tested with 2.5.1 || Database access
 
|-
 
| [http://www.egenix.com/products/python/mxBase/ egenix-mx-base] || Tested with 3.0.0 || Required by pyPgSQL
 
|-
 
| [ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/ NCBI Software Development Toolkit (NCBI Tools)] || Tested with 6.1 || Required for ASN.1 Entrez Gene file parsing
 
|-
 
! colspan="4" | Optional: to collect words from WordNet, the following dependencies apply:
 
|-
 
! Package !! Release Information !! Purpose !! Notes
 
|-
 
| [http://wordnet.princeton.edu/ WordNet] || 3.0 || Provides the WordNet database
 
|-
 
| [http://pywordnet.sourceforge.net/ pywordnet] || 2.0.1 || A Python interface to WordNet
 
|-
 
! colspan="4" | Alternative: to use Bioscape with LingPipe, the following dependencies apply:
 
|-
 
! Package !! Release Information !! Purpose !! Notes
 
|-
 
| [http://www.jython.org/ Jython] || Tested with 2.2a1 || Used to run LingPipe-related software
 
|-
 
| [http://www.alias-i.com/lingpipe/ LingPipe] || Tested with 2.3.0 || Sentence splitting in textual documents
 
|-
 
| [http://lucene.apache.org/java/docs/index.html Lucene] || Tested with 2.0.0 || Indexes textual documents
 
|-
 
| [http://jdbc.postgresql.org/ PostgreSQL JDBC Driver] || Tested with 8.1-407 JDBC 3 || Database access (if PostgreSQL is used) || Required by Jython
 
|-
 
! colspan="4" | Optional: the following dependencies are related to improving the software:
 
|-
 
| [http://epydoc.sourceforge.net/ Epydoc] || Tested with 3.0a3 || API document generation
 
|}
 
 
 
== Bundled Resources ==
 
 
 
The following resources are currently bundled with the software:
 
 
 
{| border="1" cellspacing="0" cellpadding="5"
 
| english.words || ftp://ftp.cs.cornell.edu/pub/smart/
 
|-
 
| abbreviations.txt
 
| A combination of the following, plus additional terms, with fragments incorporated in the list, in place of the full abbreviations, where appropriate:
 
 
 
* http://en.wikipedia.org/wiki/List_of_medical_abbreviations
 
* http://web.cn.edu/kwheeler/latin.html
 
* http://www.daube.ch/docu/glossary/latin_abbrev.html
 
|-
 
| official.txt
 
| A combination of files from the downloadable archive found at the following location:
 
 
 
http://www.dcs.shef.ac.uk/research/ilash/Moby/mwords.html
 
 
 
The following files from the archive were concatenated, sorted, with duplicate and multiple-word entries removed:
 
 
 
<pre>113809of.fic 4160offi.cia</pre>
 
 
 
The following command was used to prepare the file:
 
 
 
<pre>cat 113809of.fic 4160offi.cia | sort | uniq > official.txt</pre>
 
  
According to a notice at the following location, the Moby lexicon project has been placed in the public domain:
+
python scripts/bioscape_quickstart.py -t init
  
http://www.dcs.shef.ac.uk/research/ilash/Moby/
+
On a modern, multi-core system, it is recommended that updates be performed as follows:
|-
 
| wordnet.txt
 
| A list of distinct nouns, verbs, adjectives and adverbs from the WordNet 3.0 database, prepared using the <tt>bioscape_get_wordnet.py</tt> script. See the <tt>docs/licences/LICENSE-WordNet</tt> file for copyright and licensing information.
 
|-
 
| common_english.txt
 
| Common English word token dictionary processed from the common_english file (taking stripped text after the <tt>.</tt> field separator), with the original file retrieved from the following location:
 
  
http://pir.georgetown.edu/pirwww/iprolink/protname.shtml
+
python scripts/bioscape_quickstart.py -t update-parallel
|-
 
| adjectives.txt
 
| Animal adjectives. See the permissive licensing details in the <tt>docs/licences/adjectives.txt</tt> file for more information.
 
|}
 
  
== Additional Resources ==
+
Otherwise, use the plain update method:
  
; Entrez Gene : http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
+
python scripts/bioscape_quickstart.py -t update
; Entrez Taxonomy : http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy
 
; NCBI PubMed : http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed
 
  
 
[[Category:Bioscape]]
 
[[Category:Bioscape]]

Revision as of 14:31, 21 July 2009


Please note that this documentation covers an unreleased product and is for internal use only.


Installation

Bioscape consists of three separate applications which must be combined to provide all the facilities of a functional Bioscape installation:

  • The administrative application: bsadmin
  • The text-indexing application: bsindex
  • The Web front-end application: bsweb

Before installing, it is necessary to consider the dependencies listed in the section given below. Precise information about installing the dependencies is not provided in this document, and it is recommended that you make use of your system's package management tools, perhaps installing Bioscape itself from suitable packages, in order to save time and effort working through the installation process manually. However, for those interested in installing Bioscape from the source code distribution of each application, the procedure is given below.

Installation from Source Code

First, nominate a common directory to hold the Bioscape application directories. For example:

/home/bioscape/apps

Then, acquire each application's source code distribution (details to be provided) and unpack the archives in this common directory:

cd /home/bioscape/apps
tar zxf bsadmin-x.y.tar.gz
tar zxf bsindex-x.y.tar.gz
tar zxf bsweb-x.y.tar.gz

Since these applications contain Python libraries, it is important to configure the environment so that they may be accessed by Python. This may be done by creating a short configuration file resembling the one provided as docs/configuration/env.sh in the bsadmin distribution and then incorporating it into your environment within a .bashrc or equivalent file as follows:

source /home/bioscape/apps/bsadmin/docs/configuration/env.sh

Dependency Configuration

For some of the dependencies, even with pre-installed packages, you will need to do some preparatory work in order to use Bioscape. Some brief details of this work are given below.

See also the Bioscape Dependencies page for a list of the dependencies.

Installation of Dependencies

The docs/dependencies/download.sh file in the bsadmin distribution provides some commands which should be able to download the source distributions of various dependencies. This file or a modified version of it could be run in a nominated directory which would then hold copies of the dependencies' archive files.

The docs/dependencies/build.sh file in the bsadmin distribution provides some commands which could be run to build each of the dependencies from the previously downloaded archive files.

PostgreSQL

It is necessary to initialise a "database cluster" for Bioscape. This is typically done using commands such as the following:

mkdir -p /home/bioscape/data
initdb -D /home/bioscape/data

Setting the PGDATA environment variable to the directory given in the above commands will save you the effort of specifying it later with other PostgreSQL-related commands. This variable is mentioned in the env.sh file referenced above.

In order to get improved performance from PostgreSQL, consider replacing the postgresql.conf file in the database cluster with the version found in the docs/database directory of the bsadmin distribution.

Configuration

Before use, the distribution must be configured according to the environment in which the software will operate. This is done most conveniently by running the configuration program in the bsadmin directory:

python scripts/bioscape_configure.py

The configuration program takes the bioscape.cfg.in template and produces a specific bioscape.cfg configuration file. An alternative approach is to copy bioscape.cfg.in to bioscape.cfg and to edit the file manually.

Once the bioscape.cfg file has been produced, it may be left in the bsadmin directory, or it can be copied or moved to your home directory; for example:

mv bioscape.cfg /home/bioscape

Database Configuration

The database support must also be configured, preferably using the database configuration program in the bsadmin distribution:

python scripts/bioscape_dbconfigure.py

Each of the modules (or packages) requiring database support can be listed, and the specific table and data definitions can be prepared and invoked using the database configuration program.

Quick Start

Use the quick start program - provided in the bsindex distribution - in order to initialise Bioscape as quickly as possible:

python scripts/bioscape_quickstart.py -t init

On a modern, multi-core system, it is recommended that updates be performed as follows:

python scripts/bioscape_quickstart.py -t update-parallel

Otherwise, use the plain update method:

python scripts/bioscape_quickstart.py -t update