Difference between revisions of "README DiG 1.0"
Line 173: | Line 173: | ||
'''Notes''' | '''Notes''' | ||
− | Cross-reference to EntrezGene (see [http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene]). The file used to mine for geneids is gene_info file at [ftp://ftp.ncbi.nih.gov/gene/DATA/gene_info.gz]. | + | Cross-reference to EntrezGene (see [http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene]). |
+ | The file used to mine for geneids is gene_info file at [ftp://ftp.ncbi.nih.gov/gene/DATA/gene_info.gz]. | ||
In some cases, a zero will appear in this column if a mapping to an EntrezGene identifier could not be made using the gene names provided by Morbid Map (see Column 2 above). | In some cases, a zero will appear in this column if a mapping to an EntrezGene identifier could not be made using the gene names provided by Morbid Map (see Column 2 above). | ||
Latest revision as of 09:25, 30 June 2010
Last edited: June 30, 2010
Applies to Disease Groups (DiG) release: 2.0
Release date: June 14, 2010
Download location: currently not available for download. Contact ian.donaldson@biotek.uio.no
Authors: Katerina Michalickova and Ian Donaldson
Database: DiG (http://donaldson.uio.no/wiki/DiG:_Disease_groups)
Organization: Biotechnology Centre of Oslo, University of Oslo (http://www.biotek.uio.no/)
Contents
Description
This file describes the contents of the tab-delimited format of the Disease Groups list.
Details on the build process are available from http://donaldson.uio.no/wiki/DiG:_Disease_groups
Contact ian.donaldson at biotek.uio.no if you are interested in using DiG.
Directory contents
README | pointer to this file at http://donaldson.uio.no/wiki/README_DiG_1.0 |
diseasegroups.mmddyyyytxt.zip | the DiG list |
DiG data consists of one tab-delimited text file with the name diseasegroups.mmddyyyy.txt.zip where mmddyyyy represents the file's creation date.
Changes from last version
Not applicable
Source data used for this build | http://donaldson.uio.no/wiki/Sources_DiG_1.0 |
Statistics for this release | Not available |
Known Issues
None. First release.
Understanding the DiG format
License
Copyright © 2008, 2009 Ian Donaldson
Citation
DiG is not yet published.
Disclaimer
Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Description of the DiG file
Each line in the DiG list represents a single gene and its association with some disease (taken from Morbid Map). Each gene (line) has been assigned a disease group number (column 7). Each group represents a set of phenotypically related diseases as determined by their Morbid Map titles.
Column number: 1
Column name: | title |
Column type: | text, contains multiple fields |
Description: | title column as listed in Morbid Map |
Example: | 17,20-lyase deficiency, isolated, 202110 (3) |
Notes
"17,20-lyase deficiency, isolated" is the disease title
"202110" is the OMIM identifier
(3) is the evidence code (diseasetag)
See http://irefindex/wiki/DiG:_Disease_groups for more information.
Column number: 2
Column name: | genesymbols |
Column type: | text, multiple values comma delimited |
Description: | gene symbols as originally listed in Morbid Map |
Example: | CYP17A1, CYP17, P450C17 |
Notes
Column number: 3
Column name: | locus |
Column type: | text |
Description: | gene locus as originally listed in Morbid Map |
Example: | 10q24.3 |
Notes
Column number: 4
Column name: | diseaseomimid |
Column type: | integer |
Description: | OMIMID extracted from title column in Morbid Map (see Column 1 above) |
Example: | 202110 |
Notes This omim identifier usually refers to a record describing a disease phenotype; it is a descriptive entry that does not refer to a unique locus.
See [1].
Column number: 5
Column name: | diseasetag |
Column type: | string ((1), (2) or (3)) |
Description: | evidence code extracted from title column in Morbid Map (see Column 1 above) |
Example: | (3) |
Notes
Only entries with (3) in this column have been mapped to a disease group (see column 7). For explanation of disease tags see [2]
Column number: 6
Column name: | geneid |
Column type: | integer |
Description: | EntrezGene identifier |
Example: | 64087 |
Notes
Cross-reference to EntrezGene (see [3]). The file used to mine for geneids is gene_info file at [4]. In some cases, a zero will appear in this column if a mapping to an EntrezGene identifier could not be made using the gene names provided by Morbid Map (see Column 2 above).
Column number: 7
Column name: | digid |
Column type: | integer |
Description: | Disease Group identifier |
Example: | 1 |
Notes
The whole point of the table. This identifier is not stable between releases of DiG. Entries with identical DiG identifiers are deemed to belong to a set of phenotypically-related diseases (and genes).
Column number: 8
Column name: | mantitle |
Column type: | text |
Description: | manually created title |
Example: | 17,20-lyase deficiency, isolated, 202110 (3) |
Notes
In some rare cases, titles provided by Morbid Map could not be properly processed by the text matching process. These titles were manually re-written to avoid these problems. In most cases, the text in this column is identical to that in column 1. See http://irefindex/wiki/DiG:_Disease_groups for details. All these manual changes can be propagated from release to release. Note that the manual titles are not guaranteed to contain OMIM identifiers and evidence codes.
Column number: 9
Column name: | geneomimid |
Column type: | integer |
Description: | omim identifier as originally listed in Morbid Map |
Example: | 609300 |
Notes
This omim identifier usually refers to a record describing a gene.