Bioscape Development

From irefindex
Revision as of 15:05, 10 February 2009 by PaulBoddie (talk | contribs) (New page: == Bioscape Development == This document describes a selection of different development tasks undertaken when improving Bioscape. === Adding New Scoring Methods === The following steps ...)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Bioscape Development

This document describes a selection of different development tasks undertaken when improving Bioscape.

Adding New Scoring Methods

The following steps should be sufficient to define and make available a new scoring method.

  1. Add a new entry to the data/text/resources/methods.txt file.
  2. Add new templates for scoring to the bioscape/modules/text/sql directory. For example...
    importdb-score-N-pgsql.sql.in
    ...where N is the method name and pgsql refers to a database system (PostgreSQL, according to the bioscape.cfg file conventions).
  3. Create a new record in the text_method table. In PostgreSQL this can be done using a COPY command together with a file containing new lines from the methods.txt file.

Adding New Search Result Types

Defining new kinds of search results involves a number of modifications:

  1. Add a constant in bioscape.constants for the new search result type if appropriate. For example...
    text_context_gene_ontology_term = 7
  2. Add infrastructure to acquire and to emit the results, such as classes in modules within the bioscape.modules.text.finders package. Since Lucene is typically used, such classes will be in the bioscape.modules.text.finders.lucene module and will include...
    • A locator class
    • A finder class
    • An iterator class
  3. Add convenience functions to the bioscape.modules.text.finders package, or modify existing functions such as get_context_term_finder.
  4. Add any reader or writer classes to the appropriate modules, such as the bioscape.modules.text.finders.files which contains classes that consume inputs and produce result output.
  5. Modify the bioscape_search_text.py and bioscape_search_cache.py scripts to include options and invocations for the new search type.
  6. Add database templates for the result data. For example...
      acronyms-pgsql.sql.in
      drop-acronyms-pgsql.sql.in
      acronyms-constraints-pgsql.sql.in
      drop-acronyms-constraints-pgsql.sql.in
      acronyms-partition-pgsql.sql.in
      drop-acronyms-partition-pgsql.sql.in
      acronyms-partition-constraints-pgsql.sql.in
      drop-acronyms-partition-constraints-pgsql.sql.in
      import-acronyms-pgsql.sql.in
    And add entries to the dependencies.txt file.
  7. Add functions in the bioscape.modules.text.bulk module which employ the above import template.
  8. Modify the bioscape_import_text.py script to include options and invocations for the new search type.
  9. Add functions to the scripts/bioscape_quickstart.py script to support the new search type.

Adding New Data Sources and Types

Defining new kinds of data types involves a number of modifications:

  1. Add a new module (see "Adding New Modules" above). For example...
    bioscape.modules.chebi
  2. Define modules which retrieve data from sources. For example, a module which uses FTP to download files and to place them in a special downloads directory. For example...
    bioscape.modules.chebi.chebiftp
  3. Define modules which parse the downloaded data, if necessary, producing import data files. For example...
    bioscape.modules.chebi.chebiparse
  4. Add templates to implement the database schema for the data type, along with templates which support the import and update of such data. For example...
      chebi-pgsql.sql.in
      drop-chebi-pgsql.sql.in
      chebi-constraints-pgsql.sql.in
      drop-chebi-constraints-pgsql.sql.in
      import-chebi-pgsql.sql.in
      update-chebi-pgsql.sql.in
  5. Add a bulk import module for the data type. For example...
    bioscape.modules.chebi.bulk
  6. Define configuration settings for the locations and details used in the above modules. For example...
      chebi_ftp_address
      chebi_data_directory
  7. Add scripts which download and process data and import such data. For example...
      scripts/bioscape_get_chebi.py
      scripts/bioscape_import_chebi.py
  8. Add functions to the scripts/bioscape_quickstart.py script to support the new data type.