Difference between revisions of "Bioscape Distribution"

From irefindex
m (Added category.)
(Revised distribution details.)
Line 7: Line 7:
 
== Distribution Structure ==
 
== Distribution Structure ==
  
The distribution consists of the following elements:
+
The whole Bioscape system is divided into three distinct distributions:
  
=== Software ===
+
* The administrative application: <tt>bsadmin</tt>
 +
* The text-indexing application: <tt>bsindex</tt>
 +
* The Web front-end application: <tt>bsweb</tt>
  
A Python package called <tt>bioscape</tt> which consists of a number of subpackages. Each subpackage contains modules and directories containing resources.
+
Each distribution contains a Python package along with scripts and tools.  
  
{| border="1" cellspacing="0" cellpadding="5"
+
* <tt>bsadmin</tt> provides a package called <tt>bioscape</tt> which provides basic services to the different applications, focusing on data management.
! Subpackage !! Purpose !! Notes
+
* <tt>bsindex</tt> provides a package called <tt>bsindex</tt> which provides support for text indexing and searching.
|-
+
* <tt>bsweb</tt> provides a package called <tt>bsweb</tt> which contains the Web interface code and resources.
| <tt>bioscape.modules.chebi</tt>
 
| chemical/molecule data aggregation
 
| rowspan="5" | These subpackages are related to activities and can be considered as functional modules.
 
|-
 
| <tt>bioscape.modules.gene</tt>
 
| gene data aggregation
 
|-
 
| <tt>bioscape.modules.pubmed</tt>
 
| PubMed abstract aggregation
 
|-
 
| <tt>bioscape.modules.taxonomy</tt>
 
| taxonomy data aggregation
 
|-
 
| <tt>bioscape.modules.text</tt>
 
| text mining
 
|-
 
| <tt>bioscape.utils.database</tt>
 
| utilities for accessing databases
 
| rowspan="5" | These subpackages are focused on functionality which may be employed in a number of activities and contain modules for particular groups of functions.
 
|-
 
| <tt>bioscape.utils.files</tt>
 
| utilities for managing filesystem resources
 
|-
 
| <tt>bioscape.utils.ftp</tt>
 
| utilities for accessing FTP resources
 
|-
 
| <tt>bioscape.utils.index</tt>
 
| utilities for manipulating text indexes
 
|-
 
| <tt>bioscape.utils.templates</tt>
 
| templating utilities
 
|-
 
| <tt>bioscape.config</tt>
 
| configuration management
 
| rowspan="2" | Some modules exist to provide global services to the software
 
|-
 
| <tt>bioscape.constants</tt>
 
| constants employed throughout Bioscape
 
|}
 
  
=== Configuration ===
+
=== Data Sources ===
  
A properties file (<tt>bioscape.cfg</tt>) located in the top-level directory configures the system as is accessed via the <tt>bioscape.config</tt> subpackage mentioned above.
+
The notion of a data source exists in more than one of these distributions, and each data source is supported by a subpackage within a <tt>sources</tt> package hierarchy (such as <tt>bioscape.sources</tt>), providing the means by which a particular kind of data is acquired, processed and imported into the system. In <tt>bsadmin</tt> such source subpackages are typically only concerned with making data available to the database, but in <tt>bsindex</tt> such sources provide access to textual information which will also be indexed before data is presented to the database.
  
== Adding New Modules ==
+
=== Schemas and Templates ===
  
The following steps should be followed to add a new activity module to the distribution and to integrate the module into various mechanisms.
+
The <tt>bsadmin</tt> application is responsible for managing the database used by Bioscape. In the <tt>bioscape/sql</tt> directory, a number of subdirectories can be found, each of which provides support for initialising portions of the database in association with a particular activity. The <tt>bioscape_dbconfigure.py</tt> script (found in the <tt>scripts</tt> directory of the <tt>bsadmin</tt> distribution) is able to execute the template files found within each of these subdirectories and will accept parameters for substitution into the templates.
  
<ol>
+
=== Configuration ===
<li>The new module should be inserted as a new directory under <tt>bioscape/modules</tt>.</li>
 
<li>If the new module is written in Python and is to be usable as a genuine component in Bioscape, there must be an <tt>__init__.py</tt> file in the directory.</li>
 
<li>Any database initialisation or finalisation templates should be placed in an <tt>sql</tt> subdirectory of the new module directory.
 
<ol>
 
<li>Initialisation templates should have names of the form <tt>activity-dbsystem.sql.in</tt>. For example:
 
<pre>geneparse-pgsql.sql.in</pre>
 
</li>
 
<li>Finalisation templates should have names of the form <tt>drop-activity-dbsystem</tt> (since they typically drop resources from the database). For example:
 
<pre>drop-geneparse-pgsql.sql.in</pre>
 
</li>
 
<li>In template names <tt>activity</tt> is the name of the activity for which the template defines database resources; <tt>dbsystem</tt> is a database name chosen from the list of acceptable values in the <tt>bioscape.cfg.in</tt> file.</li>
 
<li>Typically, activity names should be the same as the Python module or class (in other programming languages) which requires or populates the described database resources. For example...
 
<pre>
 
geneparse-pgsql.sql.in
 
drop-geneparse-pgsql.sql.in</pre>
 
...both describe operations on resources which are populated by the <tt>geneparse</tt> Python module (<tt>bioscape/modules/gene/geneparse.py</tt>).</li>
 
<li>The initialisation of the database must usually be performed by applying the templates in a specific order, a <tt>dependencies.txt</tt> file must be added to the <tt>sql</tt> directory containing a list of template activities showing the order of initialisation. For example:
 
<pre>
 
bionames
 
index
 
search</pre>
 
This indicates that <tt>bionames</tt> should be applied first and <tt>search</tt> last in any initialisation of the database, whereas <tt>search</tt> should be applied first and <tt>bionames</tt> last in any finalisation of the database.</li>
 
<li>Any maintenance-related templates must be documented in a <tt>tasks.txt</tt> file in the <tt>sql</tt> directory containing those templates. For example:
 
<pre>bionames</pre>
 
This indicates that <tt>bionames</tt> supports the backup and restore maintenance tasks.</li>
 
</ol>
 
<li>Any special configuration settings for the module should be added to the <tt>bioscape.cfg.in</tt> template. Before the module is used, the <tt>bioscape.cfg</tt> file specific to any particular installation of the software must then be prepared again from this template.</li>
 
<li>To ensure that the module is installed, it should be added to the <tt>setup.py</tt> file in the packages list. For example:
 
<pre>
 
packages=[
 
    ...
 
    "bioscape.modules.newmodule",
 
    # Add new modules here.
 
    ]</pre>
 
If any resource files (such as database descriptions) are provided within the module, an entry should be made in the <tt>data_files</tt> list. For example:
 
<pre>
 
data_files=[
 
    ...
 
    data_dir("bioscape.modules.newmodule", "sql", ["*.in", "*.txt"]),
 
    # Add new module resources here.
 
    ]</pre>
 
The result of these modifications should be the successful installation of the new module in any installation of Bioscape.</li>
 
<li>The module should be mentioned in this file and in other parts of the documentation in examples or lists of modules.</li>
 
</ol>
 
 
 
== Generating API Documentation ==
 
  
The <tt>tools</tt> directory contains a program which can be run to generate API documentation and to put such documentation in a special <tt>apidocs</tt> directory at the root of the distribution:
+
A properties file (<tt>bioscape.cfg</tt>) located in the top-level directory configures the system as is accessed via the <tt>bioscape.config</tt> subpackage found in the <tt>bsadmin</tt> distribution. This subpackage is also exposed as <tt>bsindex.config</tt> and <tt>bsweb.config</tt> for convenience within the other applications.
  
<pre>
+
=== Constants ===
  python tools/apidocs.py
 
</pre>
 
  
The generated documentation is principally useful as a reference to the API, rather than as a resource illustrating the architecture of the system or as a guide to writing new components.
+
A module containing constant value definitions is accessible as <tt>bioscape.constants</tt> and defined in the <tt>bsadmin</tt> distribution. This module is also exposed as <tt>bsindex.constants</tt> and <tt>bsweb.constants</tt> for convenience within the other applications. These values are made available to database templates and in other parts of the system, and are generally employed in data presented to the database.
  
 
[[Category:Bioscape]]
 
[[Category:Bioscape]]

Revision as of 17:04, 21 July 2009


Please note that this documentation covers an unreleased product and is for internal use only.


Distribution Structure

The whole Bioscape system is divided into three distinct distributions:

  • The administrative application: bsadmin
  • The text-indexing application: bsindex
  • The Web front-end application: bsweb

Each distribution contains a Python package along with scripts and tools.

  • bsadmin provides a package called bioscape which provides basic services to the different applications, focusing on data management.
  • bsindex provides a package called bsindex which provides support for text indexing and searching.
  • bsweb provides a package called bsweb which contains the Web interface code and resources.

Data Sources

The notion of a data source exists in more than one of these distributions, and each data source is supported by a subpackage within a sources package hierarchy (such as bioscape.sources), providing the means by which a particular kind of data is acquired, processed and imported into the system. In bsadmin such source subpackages are typically only concerned with making data available to the database, but in bsindex such sources provide access to textual information which will also be indexed before data is presented to the database.

Schemas and Templates

The bsadmin application is responsible for managing the database used by Bioscape. In the bioscape/sql directory, a number of subdirectories can be found, each of which provides support for initialising portions of the database in association with a particular activity. The bioscape_dbconfigure.py script (found in the scripts directory of the bsadmin distribution) is able to execute the template files found within each of these subdirectories and will accept parameters for substitution into the templates.

Configuration

A properties file (bioscape.cfg) located in the top-level directory configures the system as is accessed via the bioscape.config subpackage found in the bsadmin distribution. This subpackage is also exposed as bsindex.config and bsweb.config for convenience within the other applications.

Constants

A module containing constant value definitions is accessible as bioscape.constants and defined in the bsadmin distribution. This module is also exposed as bsindex.constants and bsweb.constants for convenience within the other applications. These values are made available to database templates and in other parts of the system, and are generally employed in data presented to the database.