iRefScape Batch Files

From irefindex
Revision as of 15:41, 6 July 2011 by PaulBoddie (talk | contribs) (Moved batch search information to another page.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

When there are many search terms, it is convenient to construct a batch file containing a list of all terms. This can then be loaded into iRefScape using the "Load from file" button in the main iRefScape panel. There are three types of batch files:

  • Simple searching (recommended for users just wanting to search for a list of things)
  • Attribute annotation and searching (attaching additional attributes to the results of searches)
  • Search type definition and indexing (creating new search types)

The same general file format is shared between the different types of batch files with differences that are documented below.

Simple Batch Searches

Here is an example batch file:

#geneID:3702
814707
814714
814714
817659
818662

The format of this resembles the following:

#<search type>:<taxid>
<search term>
...

In formal terms:

  • The first line starts with a hash (#) and indicates the search type and taxonomy identifier.
  • Subsequent lines contain search terms to be used in queries on iRefIndex data.

The following table provides details of supported search terms:

Label to use in the batch file Description
rog iROGID, the integer reference redundant object group identifier (details of mapper files to locate the iROGID can be found in the Protein identifier mapping document)
geneID NCBI Gene ID, this is always an integer
UniProt_Ac UniProt/KB accession
RefSeq_Ac RefSeq Accession
UniProt_ID UniProt identifier
geneSymbol NCBI Gene symbol
PMID PubMed identifier
src_intxn_id Interaction identifiers used by source databases
omim OMIM identifier (OMIM home page)
digid diseases group identifier (DiG home page)


Attaching Attributes to Searches

Users can attach additional attributes when performing batch queries (instead of loading the attributes after the search).

The following example shows a search for a number of terms, where for each term an additional attribute called "NOONAN_SYNDROME_TYPE" is attached to the matching nodes.

#geneSymbol:9606:NOONAN_SYNDROME_TYPE
PTPN11	NS1
SHOC2	NS2
KRAS	NS3
SOS1	NS4
RAF1	NS5
NRAS	NS6
NF1	NFNS

The format of this resembles the following:

#<search type>:<taxid>:<attribute name>[:<attribute name>]...
<search term>	<attribute value>[	<attribute value>]...
...

Some notes on the format:

  • The first line is compulsory and contains controlling information.
  • The first value after the hash (#) is the iRefIndex search type to be used (as defined above).
  • The second value is the NCBI taxonomy identifier (which can be found using the NCBI taxonomy browser).
  • The third value is the name of the user attribute to be added.
  • All three values are separated by a colon (:).
  • The second and subsequent lines contain the search term and corresponding user attribute value, separated by a tab character.

It is possible to include multiple attributes in the same file. To do this, each attribute name must be defined in the first line, separated by colon characters, and in the second and subsequent lines additional columns must be used to define the attribute values.

NoteNote

Note that if the batch file is also used as a template to construct a user-searchable index, only one user attribute is allowed in the file.

After performing a search with attached attributes, the new attributes can be inspected by adding them to the Node Attribute Browser display.

Creating Search Types

Users can construct their own indices to be used as search types. The easiest way to include a index is to use the method employed to add attributes to result nodes. This process will...

  1. Find the iROGID for each search term.
  2. Associate the supplied search term with the iROGID.
  3. Construct a searchable index for the new search type based on the generated associations.

The example shown below constructs an index, where the user can perform searches using the Noonan syndrome type:

#geneSymbol:9606:NOONAN_SYNDROME_TYPE
PTPN11	NS1
SHOC2	NS2
KRAS	NS3
SOS1	NS4
RAF1	NS5
NRAS	NS6
NF1	NFNS

The format of this resembles the following:

#<search type>:<taxid>:<new search type>
<search term>	<new search term>

In more formal terms:

  • The first line is compulsory and contains controlling information. The first value after the hash character (#) is the search type to be used to query iRefIndex (as defined above). The second value is the NCBI taxonomy identifier (NCBI taxonomy browser). The third value is the name of the search type to be created. All three values are separated by a colon (:).
  • The data rows should always be two columns, where the columns are separated by a tab character. The first value in a data row is the search string that should be used to search iRefIndex. The values in the first column will appear in the search box at the end of the operation. The second column contains the search term to be defined for searches performed using the new search type in future.

The file name of the batch file should contain the prefix INDEX_THIS_ to be considered for the indexing. For example:

INDEX_THIS_Sample_batch4.txt

This file is loaded into iRefScape using the "Load from file" button in the main iRefScape panel. If the format of the file is correct, iRefScape will immediately create an index for the new search type, but will also load the contents of the file as a conventional batch file search into the query box.

When performing searches using a new search type, the user-defined search term (such as NS1 from the above example) does not appear as a separate node attribute, but instead appears as part of the i.query attribute.

All iRefIndex Pages

Follow this link for a listing of all iRefIndex related pages (archived and current).