Protein identifier mapping
From irefindex
Last edited: 2010-10-21
We have made a file which provides mappings between iRefIndex identifiers and popular external identifiers. The file is a tab delimited text file and the first row starting with the "#" provides the column headers.
File download location: ftp://ftp.no.embnet.org/irefindex/data/current/Mappingfiles/
The column descriptions:
Column number | Column name | Description |
1 | db | Source of the external identifier (e.g. UniProt, RefSeq) |
2 | acc | The external identifier (e.g. Q4U9M9) |
3 | entrezGeneid | Entrez gene id. This is provided only for RefSeq identifiers |
4 | irogid | Integer version redundant group identifier(e.g. 3156116, current maximum value=14005379, this is a MySQL int(11) field). |
5 | rogid | String version of the redundant object group (64 bit version of the hash digest of primary amino acid sequence with the NSBI taxonomy identifier appended at the end) |
6 | icrogid | Integer version of the canonical(1) redundant object group (A selected irogid to represent the canonical group) |
7 | crogid | String version of the canonical(1) redundant object group (A selected rogid to represent the canonical group) |
(1) Please refer the following page for details on canonicalization process. http://irefindex.uio.no/wiki/Canonicalization