Bioscape Development

From irefindex
NoteNotePlease note that this documentation covers an unreleased product and is for internal use only.

Bioscape Development

This document describes a selection of different development tasks undertaken when improving Bioscape.

Adding New Scoring Methods

The following steps should be sufficient to define and make available a new scoring method.

  1. Add a new entry to the bioscape/sources/score/Resources/methods.txt file in the bsadmin distribution.
  2. Add new templates for scoring to the appropriate subdirectory of the bioscape/sql directory. For example, for a result score, add importdb-N.sql.in to the bioscape/sql/resultscore directory for a method called N.
  3. Create a new record in the text_method table. In PostgreSQL this can be done using a COPY command together with a file containing new lines from the methods.txt file.

Adding New Search Result Types

Defining new kinds of search results involves a number of modifications:

  1. Add a constant in bioscape.constants (found in bsadmin) for the new search result type if appropriate. For example...
    text_termtype_gene_ontology_term = 13

    ...for predefined search result types, or...

    text_termid_chromosome = -3
    ...for speculative search result types.
  2. Add infrastructure to acquire and to emit the results, such as classes in modules within the bsindex.search package (found in bsindex). Such classes may include a phrase class in bsindex.search.phrases and a policy class in bsindex.search.policies, if the data of interest is found in an unconventional way.
  3. Add convenience functions to the bsindex.search package.
  4. Modify the bsindex_quickstart.py script (in bsindex) to configure the export of an appropriate search cache for the new result type, or to create a new search cache for the new type.
  5. The database import template should not normally need modifying but can be found in the bioscape/sql/search directory (in the bsadmin distribution).
  6. Add a translation for the constant in the bsweb/Resources/translations.xml file (in the bsweb distribution).

Adding New Data Sources and Types

Defining new kinds of data types involves a number of modifications:

  1. Add a new data source module. For example, for a "pure data" source (in bsadmin) involving only the database:
    bioscape.sources.chebi

    For a "data plus indexed text" source (in bsindex):

    bsindex.sources.pmcweb
    This involves the usual creation of a Python package at the appropriate place in the directory hierarchy and with an __init__.py file to indicate that a package (or subpackage) is present.
  2. Define a module which retrieves data from the actual source:
    bioscape.sources.chebi.download
  3. Define a module which parses the downloaded data:
    bioscape.sources.chebi.parse
  4. Add templates to implement the database schema for the data type, along with templates which support the import and update of such data. For example, within the bioscape/sql/chebi directory:
      init.sql.in
      drop.sql.in
      init-constraints.sql.in
      drop-constraints.sql.in
      import.sql.in
  5. Define configuration settings for the locations and details used in the above modules. For example...
      chebi_ftp_address
      chebi_data_directory
  6. Add functions to the scripts/bioscape_quickstart.py script to support the new data type.

See the Bioscape Data Sources document for more information about the structure of data sources.

Adding New Word Lists for Searching

New lists of words which shall be searched as part of finding contextual information can be added as follows:

  1. Define a list of words in the Resources subdirectory of the bioscape.sources.bioentities package.
  2. Write a database template to import the data into the appropriate tables. For example, in the bioscape/sql/bioentities directory:
      import-adjectives-pgsql.sql.in
    This will make the new search terms available.
  3. The search result type can then be added as described above.

Database Constants

Some constant values stored in the database are referenced explicitly in various parts of the software. For such values, it is most convenient to define them in the bioscape.constants module and to reference them in the database templates used to initialise and populate the database.

Other kinds of values may not be referenced in the source code in this way, and may also belong to data sets which may change over time (thus being only the initial values in a data set, rather than true constants). Such values should instead be defined in files which are used to import data into the database.