Difference between revisions of "Bioscape Installation"

From irefindex
(Initial draft based on parts of README.txt.)
m (Moved note to a separate page.)
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
= Installation =
+
{{:Bioscape Status}}
  
Before installing, it is necessary to consider the dependencies listed in the
+
== Installation ==
section given below. Precise information about installing the dependencies is
 
not provided in this document, and it is recommended that you make use of your
 
system's package management tools, perhaps installing Bioscape itself from a
 
suitable package, in order to save time and effort working through the
 
installation process manually. However, for those interested in installing
 
Bioscape from the source code distribution, the procedure is given below.
 
  
== Installation from Source Code ==
+
Bioscape consists of three separate applications which must be combined to provide all the facilities of a functional Bioscape installation:
  
Bioscape can be installed as follows:
+
* The administrative application: <tt>bsadmin</tt>
 +
* The text-indexing application: <tt>bsindex</tt>
 +
* The Web front-end application: <tt>bsweb</tt>
  
<pre>
+
Before installing, it is necessary to consider the dependencies listed in the section given below. Precise information about installing the dependencies is not provided in this document, and it is recommended that you make use of your system's package management tools, perhaps installing Bioscape itself from suitable packages, in order to save time and effort working through the installation process manually. However, for those interested in installing
  python setup.py install
+
Bioscape from the source code distribution of each application, the procedure is given below.
</pre>
 
  
Note that you may need to be a privileged user to perform the above command,
+
=== Installation from Source Code ===
and it might be preferable to choose an alternative installation location if
 
you do not have administrative or superuser rights. The following command
 
provides an example of installing the software in another location:
 
  
<pre>
+
First, nominate a common directory to hold the Bioscape application directories. For example:
  python setup.py install --prefix=/home/user/software/usr
 
</pre>
 
  
You will need to change the location according to your own system's
+
/home/bioscape/apps
conventions and your own preferences. Once installed, you may also need to
 
tell your system where to find the installed libraries and programs; this is
 
usually done by modifying environment variables, and could be done for the
 
above example by adding the following definitions to your environment
 
configuration:
 
  
<pre>
+
Then, acquire each application's source code distribution '''(details to be provided)''' and unpack the archives in this common directory:
  export PATH=${PATH}:/home/user/software/usr/bin
 
  export PYTHONPATH=${PYTHONPATH}:/home/user/software/usr/lib/python2.3/site-packages
 
</pre>
 
  
Note that the exact details of the latter definition, particularly the version
+
cd /home/bioscape/apps
of Python (2.3) and the library directory (<tt>lib</tt>) may depend on certain system
+
tar zxf bsadmin-x.y.tar.gz
details.
+
tar zxf bsindex-x.y.tar.gz
 +
tar zxf bsweb-x.y.tar.gz
  
= Dependency Configuration =
+
Since these applications contain Python libraries, it is important to configure the environment so that they may be accessed by Python. This may be done by creating a short configuration file resembling the one provided as <tt>docs/configuration/env.sh</tt> in the <tt>bsadmin</tt> distribution and then incorporating it into your environment within a <tt>.bashrc</tt> or equivalent file as follows:
  
For some of the dependencies, even with pre-installed packages, you will need
+
source /home/bioscape/apps/bsadmin/docs/configuration/env.sh
to do some preparatory work in order to use Bioscape. Some brief details of
 
this work are given below.
 
  
== PostgreSQL ==
+
=== Installation of Dependencies ===
  
It is necessary to initialise a "database cluster" for Bioscape. This is
+
See the [[Bioscape Dependencies]] page for a list of the dependencies.
typically done using commands such as the following:
 
  
<pre>
+
The <tt>docs/dependencies/download.sh</tt> file in the <tt>bsadmin</tt> distribution provides some commands which should be able to download the source distributions of various dependencies. This file or a modified version of it could be run in a nominated directory which would then hold copies of the dependencies' archive files.
  mkdir -p /home/user/software/var/lib/pgsql
 
  initdb -D /home/user/software/var/lib/pgsql
 
</pre>
 
  
Setting the <tt>PGDATA</tt> environment variable to the directory given in the above
+
The <tt>docs/dependencies/build.sh</tt> file in the <tt>bsadmin</tt> distribution provides some commands which could be run to build each of the dependencies from the previously downloaded archive files.
commands will save you the effort of specifying it later with other
 
PostgreSQL-related commands.
 
  
In order to get improved performance from PostgreSQL, consider replacing the
+
For some of the dependencies, even with pre-installed packages, you will need to do some preparatory work in order to use Bioscape. This is documented on the [[Bioscape Configuration]] page.
<tt>postgresql.conf</tt> file in the database cluster with the version found in the
 
<tt>docs/database</tt> directory.
 
  
= Configuration =
+
[[Category:Bioscape]]
 
 
Before use, the distribution must be configured according to the environment
 
in which the software will operate. This is done most conveniently by running
 
the configuration program:
 
 
 
<pre>
 
  python bioscape_configure.py
 
</pre>
 
 
 
The configuration program takes the <tt>bioscape.cfg.in</tt> template and produces a
 
specific <tt>bioscape.cfg</tt> configuration file. An alternative approach is to copy
 
<tt>bioscape.cfg.in</tt> to <tt>bioscape.cfg</tt> and to edit the file manually.
 
 
 
Once the <tt>bioscape.cfg</tt> file has been produced, it may be left in a "working
 
directory" where all Bioscape-related tasks will be performed, or it can be
 
copied or moved to your home directory; for example:
 
 
 
<pre>
 
  mv bioscape.cfg /home/user
 
</pre>
 
 
 
See below for advice on setting database parameters in the configuration.
 
 
 
== Useful Configuration Value Groups ==
 
 
 
The following groups of settings and values may be of use when choosing
 
particular configurations of the software.
 
 
 
{| border="1" cellpadding="5" cellspacing="0"
 
! Setting !! Value
 
|-
 
| database_system || pgsql
 
|-
 
| jdbc_database_url || jdbc:postgresql://localhost/bioscape
 
|-
 
| jdbc_driver_class || org.postgresql.Driver
 
|}
 
 
 
= Database Configuration =
 
 
 
In order to use certain modules (or packages) within the distribution, the
 
database support must be configured, preferably using the database
 
configuration program:
 
 
 
<pre>
 
  python bioscape_dbconfigure.py
 
</pre>
 
 
 
Each of the modules (or packages) requiring database support can be listed,
 
and the specific table and data definitions can be prepared and invoked using
 
the database configuration program.
 
 
 
= Quick Start =
 
 
 
Use the quick start program in order to initialise Bioscape as quickly as
 
possible:
 
 
 
<pre>
 
  bioscape_quickstart.py -t quickstart
 
</pre>
 
 
 
Or, from the distribution directory:
 
 
 
<pre>
 
  python scripts/bioscape_quickstart.py -t quickstart
 
</pre>
 
 
 
The program has a range of "targets" that can be specified; running the
 
program without any arguments (given as <tt>-t quickstart</tt> above) will indicate
 
some of these targets.
 
 
 
= Dependencies =
 
 
 
Bioscape has the following basic dependencies:
 
 
 
{| border="1" cellspacing="0" cellpadding="5"
 
! Package !! Release Information !! Purpose !! Notes
 
|-
 
| [http://www.python.org/ Python] || Tested with 2.3.6, 2.4.4 || Runs most of the software
 
| rowspan="2" | Note that Python releases in the 2.3 series earlier than 2.3.5 have threading issues which are exposed by PyLucene, causing deadlock situations. Additional compatibility issues with gcj apply to PyLucene, and it is recommended that the software be compiled with gcj 3.4.6, potentially together with a suitable version of Python (such as 2.3.5 or 2.4.4 or later).
 
|-
 
| [http://pylucene.osafoundation.org/ PyLucene] || Tested with 2.0.0, 2.1.0-2 || Indexes textual documents
 
|-
 
| [http://www.boddie.org.uk/david/Projects/Python/CMDSyntax/ CMDsyntax] || 0.91 || Processes command line options
 
|-
 
| [http://www.boddie.org.uk/python/XSLTools.html XSLTools] || 0.6 || Produces the Web interface
 
|-
 
| [http://www.boddie.org.uk/python/WebStack.html WebStack] || 1.3 || Produces the Web interface
 
|-
 
| [http://www.boddie.org.uk/python/libxml2dom.html libxml2dom] || 0.4.6 || Required by XSLTools
 
|-
 
| [http://www.xmlsoft.org/XSLT.html libxslt] || Tested with 1.1.20 || Required by XSLTools
 
|-
 
| [http://www.xmlsoft.org/ libxml2] || Tested with 2.6.27 || Required by libxml2dom
 
|-
 
| [http://www.postgresql.org/ PostgreSQL] || Tested with 8.1.9 || Storage of information
 
| rowspan="3" | Currently PostgreSQL is the only supported database system
 
|-
 
| [http://pypgsql.sourceforge.net/ pyPgSQL] || Tested with 2.5.1 || Database access
 
|-
 
| [http://www.egenix.com/products/python/mxBase/ egenix-mx-base] || Tested with 3.0.0 || Required by pyPgSQL
 
|-
 
! colspan="4" | Optional: to collect words from WordNet, the following dependencies apply:
 
|-
 
! Package !! Release Information !! Purpose !! Notes
 
|-
 
| [http://wordnet.princeton.edu/ WordNet] || 3.0 || Provides the WordNet database
 
|-
 
| [http://pywordnet.sourceforge.net/ pywordnet] || 2.0.1 || A Python interface to WordNet
 
|-
 
! colspan="4" | Alternative: to use Bioscape with LingPipe, the following dependencies apply:
 
|-
 
! Package !! Release Information !! Purpose !! Notes
 
|-
 
| [http://www.jython.org/ Jython] || Tested with 2.2a1 || Used to run LingPipe-related software
 
|-
 
| [http://www.alias-i.com/lingpipe/ LingPipe] || Tested with 2.3.0 || Sentence splitting in textual documents
 
|-
 
| [http://lucene.apache.org/java/docs/index.html Lucene] || Tested with 2.0.0 || Indexes textual documents
 
|-
 
| [http://jdbc.postgresql.org/ PostgreSQL JDBC Driver] || Tested with 8.1-407 JDBC 3 || Database access (if PostgreSQL is used) || Required by Jython
 
|-
 
! colspan="4" | Optional: the following dependencies are related to improving the software:
 
|-
 
| [http://epydoc.sourceforge.net/ Epydoc] || Tested with 3.0a3 || API document generation
 
|}
 
 
 
= Bundled Resources =
 
 
 
The following resources are currently bundled with the software:
 
 
 
{| border="1" cellspacing="0" cellpadding="5"
 
| english.words || ftp://ftp.cs.cornell.edu/pub/smart/
 
|-
 
| abbreviations.txt
 
| A combination of the following, plus additional terms, with fragments incorporated in the list, in place of the full abbreviations, where appropriate:
 
 
 
* http://en.wikipedia.org/wiki/List_of_medical_abbreviations
 
* http://web.cn.edu/kwheeler/latin.html
 
* http://www.daube.ch/docu/glossary/latin_abbrev.html
 
|-
 
| official.txt
 
| A combination of files from the downloadable archive found at the following location:
 
 
 
http://www.dcs.shef.ac.uk/research/ilash/Moby/mwords.html
 
 
 
The following files from the archive were concatenated, sorted, with duplicate and multiple-word entries removed:
 
 
 
<pre>113809of.fic 4160offi.cia</pre>
 
 
 
The following command was used to prepare the file:
 
 
 
<pre>cat 113809of.fic 4160offi.cia | sort | uniq > official.txt</pre>
 
 
 
According to a notice at the following location, the Moby lexicon project has been placed in the public domain:
 
 
 
http://www.dcs.shef.ac.uk/research/ilash/Moby/
 
|-
 
| wordnet.txt
 
| A list of distinct nouns, verbs, adjectives and adverbs from the WordNet 3.0 database, prepared using the <tt>bioscape_get_wordnet.py</tt> script. See the <tt>docs/licences/LICENSE-WordNet</tt> file for copyright and licensing information.
 
|-
 
| common_english.txt
 
| Common English word token dictionary processed from the common_english file (taking stripped text after the <tt>.</tt> field separator), with the original file retrieved from the following location:
 
 
 
http://pir.georgetown.edu/pirwww/iprolink/protname.shtml
 
|-
 
| adjectives.txt
 
| Animal adjectives. See the permissive licensing details in the <tt>docs/licences/adjectives.txt</tt> file for more information.
 
|}
 
 
 
= Additional Resources =
 
 
 
; Entrez Gene : http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
 
; Entrez Taxonomy : http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy
 
; NCBI PubMed : http://www.ncbi.nlm.nih.gov/sites/entrez?db=PubMed
 

Latest revision as of 13:36, 14 July 2010

NoteNotePlease note that this documentation covers an unreleased product and is for internal use only.

Installation

Bioscape consists of three separate applications which must be combined to provide all the facilities of a functional Bioscape installation:

  • The administrative application: bsadmin
  • The text-indexing application: bsindex
  • The Web front-end application: bsweb

Before installing, it is necessary to consider the dependencies listed in the section given below. Precise information about installing the dependencies is not provided in this document, and it is recommended that you make use of your system's package management tools, perhaps installing Bioscape itself from suitable packages, in order to save time and effort working through the installation process manually. However, for those interested in installing Bioscape from the source code distribution of each application, the procedure is given below.

Installation from Source Code

First, nominate a common directory to hold the Bioscape application directories. For example:

/home/bioscape/apps

Then, acquire each application's source code distribution (details to be provided) and unpack the archives in this common directory:

cd /home/bioscape/apps
tar zxf bsadmin-x.y.tar.gz
tar zxf bsindex-x.y.tar.gz
tar zxf bsweb-x.y.tar.gz

Since these applications contain Python libraries, it is important to configure the environment so that they may be accessed by Python. This may be done by creating a short configuration file resembling the one provided as docs/configuration/env.sh in the bsadmin distribution and then incorporating it into your environment within a .bashrc or equivalent file as follows:

source /home/bioscape/apps/bsadmin/docs/configuration/env.sh

Installation of Dependencies

See the Bioscape Dependencies page for a list of the dependencies.

The docs/dependencies/download.sh file in the bsadmin distribution provides some commands which should be able to download the source distributions of various dependencies. This file or a modified version of it could be run in a nominated directory which would then hold copies of the dependencies' archive files.

The docs/dependencies/build.sh file in the bsadmin distribution provides some commands which could be run to build each of the dependencies from the previously downloaded archive files.

For some of the dependencies, even with pre-installed packages, you will need to do some preparatory work in order to use Bioscape. This is documented on the Bioscape Configuration page.