Difference between revisions of "iRefIndex MITAB2.5 Parser"

From irefindex
(→‎Prerequisites: Added links.)
(→‎Creating the Database: Using preferred option to create the database.)
Line 40: Line 40:
 
A database can be created using the usual PostgreSQL tools:
 
A database can be created using the usual PostgreSQL tools:
  
<pre>createdb mitab_irefindex</pre>
+
<pre>createdb -E unicode mitab_irefindex</pre>
  
 
This database is initialised as follows:
 
This database is initialised as follows:

Revision as of 18:35, 23 March 2010

A tool has been developed to parse the MITAB files produced in the iRefIndex Build Process.

Obtaining the MITAB Parser

The parser and associated resources can be obtained from this location:

https://hfaistos.uio.no/cgi-bin/viewvc.cgi/mitab/

Using CVS with the appropriate CVSROOT setting, run the following command:

cvs co mitab

The CVSROOT environment variable should be set to the following for this to work:

export CVSROOT=:ext:<username>@hfaistos.uio.no:/mn/hfaistos/storage/cvsroot

(The <username> should be replaced with your actual username.)

Prerequisites

The following programs are required to use the parser:

Running the Parser

Given a directory for the iRefIndex output files such as...

/home/irefindex/output

...run the parser as follows:

python parse_mitab.py /home/irefindex/output/All.mitab.03042009.txt

It will be necessary to change the date details included in the above filename to match the actual name of the appropriate file found in your own output directory.

Creating the Database

A database can be created using the usual PostgreSQL tools:

createdb -E unicode mitab_irefindex

This database is initialised as follows:

psql -f init_mitab.sql mitab_irefindex

Should the database tables need to be dropped (perhaps in case of problems with the import), the following command can be used:

psql -f drop_mitab.sql mitab_irefindex

Populating the Database

The database is populated as follows:

python import_mitab.py mitab_irefindex

As a result, a number of tables representing the structure of the data should be available in the database. For applications built to use this data, indexes may need creating in order to make querying more efficient.