Difference between revisions of "README DiG 1.0"
(23 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | Last edited: | + | Last edited: June 30, 2010 |
− | Applies to Disease Groups (DiG) release: | + | Applies to Disease Groups (DiG) release: 2.0 |
− | Release date: | + | Release date: June 14, 2010 |
Download location: currently not available for download. Contact ian.donaldson@biotek.uio.no | Download location: currently not available for download. Contact ian.donaldson@biotek.uio.no | ||
Line 72: | Line 72: | ||
− | Each line in the DiG list represents a single gene and its association with some disease (taken from Morbid Map). Each gene (line) has been assigned a disease group number. Each group represents a set of phenotypically related diseases as determined by their Morbid Map titles. | + | Each line in the DiG list represents a single gene and its association with some disease (taken from Morbid Map). Each gene (line) has been assigned a disease group number (column 7). Each group represents a set of phenotypically related diseases as determined by their Morbid Map titles. |
Line 93: | Line 93: | ||
"202110" is the OMIM identifier | "202110" is the OMIM identifier | ||
− | (3) is the evidence code | + | (3) is the evidence code (diseasetag) |
See http://irefindex/wiki/DiG:_Disease_groups for more information. | See http://irefindex/wiki/DiG:_Disease_groups for more information. | ||
Line 100: | Line 100: | ||
{| | {| | ||
− | |Column name: || | + | |Column name: ||genesymbols |
|- | |- | ||
|Column type: ||text, multiple values comma delimited | |Column type: ||text, multiple values comma delimited | ||
|- | |- | ||
− | |Description: ||gene symbols as originally listed in Morbid Map | + | |Description: ||gene symbols as originally listed in Morbid Map |
|- | |- | ||
|Example: ||CYP17A1, CYP17, P450C17 | |Example: ||CYP17A1, CYP17, P450C17 | ||
Line 132: | Line 132: | ||
|Column type: ||integer | |Column type: ||integer | ||
|- | |- | ||
− | |Description: ||OMIMID extracted from title column in Morbid Map (see Column 1 above) | + | |Description: ||OMIMID extracted from title column in Morbid Map (see Column 1 above) |
|- | |- | ||
|Example: ||202110 | |Example: ||202110 | ||
Line 138: | Line 138: | ||
'''Notes''' | '''Notes''' | ||
+ | This omim identifier usually refers to a record describing a disease phenotype; it is a descriptive entry that does not refer to a unique locus. | ||
See [http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim]. | See [http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim]. | ||
− | |||
=== Column number: 5 === | === Column number: 5 === | ||
Line 147: | Line 147: | ||
|Column name: ||diseasetag | |Column name: ||diseasetag | ||
|- | |- | ||
− | |Column type: || | + | |Column type: ||string ((1), (2) or (3)) |
|- | |- | ||
− | |Description: ||evidence | + | |Description: ||evidence code extracted from title column in Morbid Map (see Column 1 above) |
|- | |- | ||
− | |Example: || | + | |Example: ||(3) |
|} | |} | ||
'''Notes''' | '''Notes''' | ||
− | Only entries with | + | Only entries with (3) in this column have been mapped to a disease group (see column 7). |
− | + | For explanation of disease tags see [http://www.ncbi.nlm.nih.gov/Omim/omimfaq.html#gene_map_symbols] | |
=== Column number: 6 === | === Column number: 6 === | ||
Line 173: | Line 173: | ||
'''Notes''' | '''Notes''' | ||
− | Cross-reference to EntrezGene (see [http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene]). | + | Cross-reference to EntrezGene (see [http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene]). |
+ | The file used to mine for geneids is gene_info file at [ftp://ftp.ncbi.nih.gov/gene/DATA/gene_info.gz]. | ||
In some cases, a zero will appear in this column if a mapping to an EntrezGene identifier could not be made using the gene names provided by Morbid Map (see Column 2 above). | In some cases, a zero will appear in this column if a mapping to an EntrezGene identifier could not be made using the gene names provided by Morbid Map (see Column 2 above). | ||
− | + | ||
=== Column number: 7 === | === Column number: 7 === | ||
Line 208: | Line 209: | ||
'''Notes''' | '''Notes''' | ||
− | In some rare cases, titles provided by Morbid Map | + | In some rare cases, titles provided by Morbid Map could not be properly processed by the text matching process. These titles were manually re-written to avoid these problems. In most cases, the text in this column is identical to that in column 1. See http://irefindex/wiki/DiG:_Disease_groups for details. All these manual changes can be propagated from release to release. Note that the manual titles are not guaranteed to contain OMIM identifiers and evidence codes. |
+ | |||
+ | |||
+ | === Column number: 9 === | ||
+ | |||
+ | {| | ||
+ | |Column name: ||geneomimid | ||
+ | |- | ||
+ | |Column type: ||integer | ||
+ | |- | ||
+ | |Description: ||omim identifier as originally listed in Morbid Map | ||
+ | |- | ||
+ | |Example: ||609300 | ||
+ | |} | ||
+ | |||
+ | '''Notes''' | ||
+ | |||
+ | This omim identifier usually refers to a record describing a gene. | ||
[[Category:DiG]] | [[Category:DiG]] |
Latest revision as of 09:25, 30 June 2010
Last edited: June 30, 2010
Applies to Disease Groups (DiG) release: 2.0
Release date: June 14, 2010
Download location: currently not available for download. Contact ian.donaldson@biotek.uio.no
Authors: Katerina Michalickova and Ian Donaldson
Database: DiG (http://donaldson.uio.no/wiki/DiG:_Disease_groups)
Organization: Biotechnology Centre of Oslo, University of Oslo (http://www.biotek.uio.no/)
Contents
Description
This file describes the contents of the tab-delimited format of the Disease Groups list.
Details on the build process are available from http://donaldson.uio.no/wiki/DiG:_Disease_groups
Contact ian.donaldson at biotek.uio.no if you are interested in using DiG.
Directory contents
README | pointer to this file at http://donaldson.uio.no/wiki/README_DiG_1.0 |
diseasegroups.mmddyyyytxt.zip | the DiG list |
DiG data consists of one tab-delimited text file with the name diseasegroups.mmddyyyy.txt.zip where mmddyyyy represents the file's creation date.
Changes from last version
Not applicable
Source data used for this build | http://donaldson.uio.no/wiki/Sources_DiG_1.0 |
Statistics for this release | Not available |
Known Issues
None. First release.
Understanding the DiG format
License
Copyright © 2008, 2009 Ian Donaldson
Citation
DiG is not yet published.
Disclaimer
Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Description of the DiG file
Each line in the DiG list represents a single gene and its association with some disease (taken from Morbid Map). Each gene (line) has been assigned a disease group number (column 7). Each group represents a set of phenotypically related diseases as determined by their Morbid Map titles.
Column number: 1
Column name: | title |
Column type: | text, contains multiple fields |
Description: | title column as listed in Morbid Map |
Example: | 17,20-lyase deficiency, isolated, 202110 (3) |
Notes
"17,20-lyase deficiency, isolated" is the disease title
"202110" is the OMIM identifier
(3) is the evidence code (diseasetag)
See http://irefindex/wiki/DiG:_Disease_groups for more information.
Column number: 2
Column name: | genesymbols |
Column type: | text, multiple values comma delimited |
Description: | gene symbols as originally listed in Morbid Map |
Example: | CYP17A1, CYP17, P450C17 |
Notes
Column number: 3
Column name: | locus |
Column type: | text |
Description: | gene locus as originally listed in Morbid Map |
Example: | 10q24.3 |
Notes
Column number: 4
Column name: | diseaseomimid |
Column type: | integer |
Description: | OMIMID extracted from title column in Morbid Map (see Column 1 above) |
Example: | 202110 |
Notes This omim identifier usually refers to a record describing a disease phenotype; it is a descriptive entry that does not refer to a unique locus.
See [1].
Column number: 5
Column name: | diseasetag |
Column type: | string ((1), (2) or (3)) |
Description: | evidence code extracted from title column in Morbid Map (see Column 1 above) |
Example: | (3) |
Notes
Only entries with (3) in this column have been mapped to a disease group (see column 7). For explanation of disease tags see [2]
Column number: 6
Column name: | geneid |
Column type: | integer |
Description: | EntrezGene identifier |
Example: | 64087 |
Notes
Cross-reference to EntrezGene (see [3]). The file used to mine for geneids is gene_info file at [4]. In some cases, a zero will appear in this column if a mapping to an EntrezGene identifier could not be made using the gene names provided by Morbid Map (see Column 2 above).
Column number: 7
Column name: | digid |
Column type: | integer |
Description: | Disease Group identifier |
Example: | 1 |
Notes
The whole point of the table. This identifier is not stable between releases of DiG. Entries with identical DiG identifiers are deemed to belong to a set of phenotypically-related diseases (and genes).
Column number: 8
Column name: | mantitle |
Column type: | text |
Description: | manually created title |
Example: | 17,20-lyase deficiency, isolated, 202110 (3) |
Notes
In some rare cases, titles provided by Morbid Map could not be properly processed by the text matching process. These titles were manually re-written to avoid these problems. In most cases, the text in this column is identical to that in column 1. See http://irefindex/wiki/DiG:_Disease_groups for details. All these manual changes can be propagated from release to release. Note that the manual titles are not guaranteed to contain OMIM identifiers and evidence codes.
Column number: 9
Column name: | geneomimid |
Column type: | integer |
Description: | omim identifier as originally listed in Morbid Map |
Example: | 609300 |
Notes
This omim identifier usually refers to a record describing a gene.