Protein identifier mapping

Last edited: 2010-10-21

We have made a file which provides mappings between iRefIndex identifiers and popular external identifiers. The current files contains all UniProt, allRefSeq identifier (please refer for version information) and an other identifiers in selected cases. Other identifiers are provided as accession/identifiers for iRefindex identifiers provided only when they do not have a UniProt or RefSeq identifier.

File download location:

The column descriptions:

Column number Column name Description
1 db Source of the external identifier (e.g. UniProt, RefSeq)
2 acc The external identifier (e.g. Q4U9M9)
3 entrezGeneid Entrez gene id. This is provided only for RefSeq identifiers for other identifiers the value is -1 from this field.
4 irogid Integer version redundant group identifier(e.g. 3156116, current maximum value=14005379, this is a MySQL int(11) field).
5 rogid String version of the redundant object group (64 bit version of the hash digest of primary amino acid sequence with the NSBI taxonomy identifier appended at the end)
6 icrogid Integer version of the canonical(1) redundant object group (A selected irogid to represent the canonical group)
7 crogid String version of the canonical(1) redundant object group (A selected rogid to represent the canonical group)

(1) Please refer the following page for details on canonicalization process.