iRefIndex MITAB2.6 Parser

A tool has been developed to parse the MITAB files produced in the iRefIndex Build Process. Currently, the tool is capable of parsing the MITAB format described on the page README_iRefIndex_MITAB2.6_7.0.

Obtaining the MITAB Parser

The parser and associated resources can be obtained from this location:

https://hfaistos.uio.no/cgi-bin/viewvc.cgi/mitab/

Using CVS with the appropriate CVSROOT setting, run the following command:

cvs co mitab

The CVSROOT environment variable should be set to the following for this to work:

export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot

(The <username> should be replaced with your actual username.)

Prerequisites

The following programs are required to use the parser:

Python (tested with 2.3.5)
PostgreSQL (tested with 8.1.9)

Running the Parser

Given a directory for the iRefIndex output files such as...

/home/irefindex/output

...run the parser as follows:

python parse_mitab.py /home/irefindex/output/All.mitab.03042009.txt

It will be necessary to change the date details included in the above filename to match the actual name of the appropriate file found in your own output directory.

Creating the Database

A database can be created using the usual PostgreSQL tools:

createdb -E unicode mitab_irefindex

This database is initialised as follows:

psql -f init_mitab.sql mitab_irefindex

Should the database tables need to be dropped (perhaps in case of problems with the import), the following command can be used:

psql -f drop_mitab.sql mitab_irefindex

Populating the Database

The database is populated as follows:

python import_mitab.py mitab_irefindex

As a result, a number of tables representing the structure of the data should be available in the database. For applications built to use this data, indexes may need creating in order to make querying more efficient.

Notes on the Populated Database

The schema used by the populated database attempts to model the data as effectively as possible using a number of tables:

Entity type	Tables	Table purpose	Notable columns	Source columns (if different or converted)
Interaction	mitab_interactions	Model each interaction referencing interactors	rigid, intType, edgetype, numParticipants, crigid
	mitab_sources	Represent sources for each interaction	rigid, sourcedb, name	sourcedb
	mitab_interaction_type_names	Represent interaction types for each interaction	rigid, code, name	interactionType
	mitab_interaction_identifiers	Represent interaction identifiers for each interaction	rigid, dbname, uid	interactionIdentifiers
	mitab_confidence	Represent confidence scores for each interaction	rigid, type, confidence	confidence
Experiment	mitab_method_names	Represent detection methods for each interaction	rigid, code, name	method
	mitab_authors	Represent publication authors for each interaction	rigid, author	author
	mitab_pubmed	Represent publication identifiers for each interaction	rigid, pmid	pmids
Interactor	mitab_interactions	Model each interaction referencing interactors	uidA, uidB, intType, taxA, taxB, atype, btype
	mitab_aliases	Represent aliases for each interactor	uid, intType, dbname, alias	uidA or uidB, aliasA or aliasB
	mitab_alternatives	Represent alternative identifiers for each interactor	uid, intType, dbname, alt	uidA or uidB, altA or altB
	mitab_interactor_rogs	Represent alternative integer identifiers for each interactor	uid, intType, rog	uidA or uidB, irogA or irogB

Some changes in representation occur when creating the database:

Prefixed values are generally split to expose the prefix and identifier, name or value following it.
- The various interaction and interactor prefixes (such as irefindex:, rigid:, rogid:, crigid: and crogid:) are omitted from interaction and interactor columns.
- Source identifiers are split with the prefix (such as intact:) used to make a dbname column with the actual identifier stored in its own column (such as alias or alt).
The "empty value" (-) should never appear as an identifier, and where such a value is used in a list, that element should be excluded. This is pertinent in the case of vocabulary terms where MI:0000 might be used together with an empty list of identifiers or names as an "empty collection" indicator.
Duplicate values in lists are generally discarded.

Further work may include the introduction of a separate interactor table, collecting related information for each interactor.

Canonical interactors and interactions

An intType column has been introduced into some tables in order to indicate whether an interactor or interaction involves canonical information.

Where an interactor is a specific, observed interactor, intType will be set to S
Where an interactor is a canonical group, intType will be set to C
Where an interaction involves only specific, observed interactors, intType will be set to S, and the crigid column will refer to the rigid column of the associated canonical interaction
Where an interaction involves canonical groups, intType will be set to C

Thus, the mitab_interactions table effectively has two levels:

A "parent" level describing interactions between canonical groups, grouping together records in...
A "child" level describing interactions between specific, observed interactors, each referencing a parent record

All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).

Anonymous

Search

iRefIndex MITAB2.6 Parser

Namespaces

More

Page actions

Contents

Obtaining the MITAB Parser

Prerequisites

Running the Parser

Creating the Database

Populating the Database

Notes on the Populated Database

Canonical interactors and interactions

All iRefIndex Pages

Navigation

Navigation

Internal Links

Wiki tools

Wiki tools

Anonymous

Search

iRefIndex MITAB2.6 Parser

Contents

Obtaining the MITAB Parser

Prerequisites

Running the Parser

Creating the Database

Populating the Database

Notes on the Populated Database

Canonical interactors and interactions

All iRefIndex Pages

Navigation

Wiki tools

Page tools

Categories