Difference between revisions of "Bioscape Development"

From irefindex
m
m (Added category.)
Line 167: Line 167:
 
rather than as a resource illustrating the architecture of the system or as a
 
rather than as a resource illustrating the architecture of the system or as a
 
guide to writing new components.
 
guide to writing new components.
 +
 +
[[Category:Bioscape]]

Revision as of 17:33, 4 March 2009


Please note that this documentation covers an unreleased product and is for internal use only.


Bioscape Development

This document describes a selection of different development tasks undertaken when improving Bioscape.

Adding New Scoring Methods

The following steps should be sufficient to define and make available a new scoring method.

  1. Add a new entry to the data/text/resources/methods.txt file.
  2. Add new templates for scoring to the bioscape/modules/text/sql directory. For example...
    importdb-score-N-pgsql.sql.in
    ...where N is the method name and pgsql refers to a database system (PostgreSQL, according to the bioscape.cfg file conventions).
  3. Create a new record in the text_method table. In PostgreSQL this can be done using a COPY command together with a file containing new lines from the methods.txt file.

Adding New Search Result Types

Defining new kinds of search results involves a number of modifications:

  1. Add a constant in bioscape.constants for the new search result type if appropriate. For example...
    text_context_gene_ontology_term = 7
  2. Add infrastructure to acquire and to emit the results, such as classes in modules within the bioscape.modules.text.finders package. Since Lucene is typically used, such classes will be in the bioscape.modules.text.finders.lucene module and will include...
    • A locator class
    • A finder class
    • An iterator class
  3. Add convenience functions to the bioscape.modules.text.finders package, or modify existing functions such as get_context_term_finder.
  4. Add any reader or writer classes to the appropriate modules, such as the bioscape.modules.text.finders.files which contains classes that consume inputs and produce result output.
  5. Modify the bioscape_search_text.py and bioscape_search_cache.py scripts to include options and invocations for the new search type.
  6. Add database templates for the result data. For example...
      acronyms-pgsql.sql.in
      drop-acronyms-pgsql.sql.in
      acronyms-constraints-pgsql.sql.in
      drop-acronyms-constraints-pgsql.sql.in
      acronyms-partition-pgsql.sql.in
      drop-acronyms-partition-pgsql.sql.in
      acronyms-partition-constraints-pgsql.sql.in
      drop-acronyms-partition-constraints-pgsql.sql.in
      import-acronyms-pgsql.sql.in
    And add entries to the dependencies.txt file.
  7. Add functions in the bioscape.modules.text.bulk module which employ the above import template.
  8. Modify the bioscape_import_text.py script to include options and invocations for the new search type.
  9. Add functions to the scripts/bioscape_quickstart.py script to support the new search type.

Adding New Data Sources and Types

Defining new kinds of data types involves a number of modifications:

  1. Add a new module (see "Adding New Modules" above). For example...
    bioscape.modules.chebi
  2. Define modules which retrieve data from sources. For example, a module which uses FTP to download files and to place them in a special downloads directory. For example...
    bioscape.modules.chebi.chebiftp
  3. Define modules which parse the downloaded data, if necessary, producing import data files. For example...
    bioscape.modules.chebi.chebiparse
  4. Add templates to implement the database schema for the data type, along with templates which support the import and update of such data. For example...
      chebi-pgsql.sql.in
      drop-chebi-pgsql.sql.in
      chebi-constraints-pgsql.sql.in
      drop-chebi-constraints-pgsql.sql.in
      import-chebi-pgsql.sql.in
      update-chebi-pgsql.sql.in
  5. Add a bulk import module for the data type. For example...
    bioscape.modules.chebi.bulk
  6. Define configuration settings for the locations and details used in the above modules. For example...
      chebi_ftp_address
      chebi_data_directory
  7. Add scripts which download and process data and import such data. For example...
      scripts/bioscape_get_chebi.py
      scripts/bioscape_import_chebi.py
  8. Add functions to the scripts/bioscape_quickstart.py script to support the new data type.

Adding New Word Lists for Searching

New lists of words which shall be searched as part of finding contextual information can be added as follows:

  1. Define a list of words. This list may be used directly by the finder implementations, or it may be imported into the database and combined with other information.
  2. Where the list is imported into the database, write database templates to define the tables involved, along with a template to import the data. For example...
      adjectives-pgsql.sql.in
      drop-adjectives-pgsql.sql.in
      import-adjectives-pgsql.sql.in
  3. If necessary, add a bulk import function for the list. For simple lists, this step is not necessary.
  4. Where a database table is involved, potentially in combination with other tables, a database reader class must be defined in the bioscape.modules.text.finders.database module along with a function returning instances of such a class.
  5. The scripts need updating to include the new data source:
      scripts/bioscape_search_cache.py
      scripts/bioscape_search_text.py (potentially only messages and comments)
  6. A constant indicating the type of contextual information must be added to the bioscape.constants module and the translations.xml file provided for the Web application.
  7. A finder class is needed to actually search for the data provided by the new source. In the bioscape.modules.text.finders.lucene module, such a class should be defined, using suitable mix-in classes, employing the newly defined constant for the data source.
  8. The bioscape.modules.text.finders module needs updating to identify the appropriate finder class when a context type is supplied to the get_context_term_finder function.
  9. Add functions to the scripts/bioscape_quickstart.py script to support the new data source.

Database Constants

Some constant values stored in the database are referenced explicitly in various parts of the software. For such values, it is most convenient to define them in the bioscape.constants module and to reference them in the database templates used to initialise and populate the database.

Other kinds of values may not be referenced in the source code in this way, and may also belong to data sets which may change over time (thus being only the initial values in a data set, rather than true constants). Such values should instead be defined in files which are used to import data into the database.

Generating API Documentation

The tools directory contains a program which can be run to generate API documentation and to put such documentation in a special apidocs directory at the root of the distribution:

python tools/apidocs.py

The generated documentation is principally useful as a reference to the API, rather than as a resource illustrating the architecture of the system or as a guide to writing new components.