Difference between revisions of "README DiG 1.0"

From irefindex
 
(27 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Last edited: April 6, 2009
+
Last edited: June 30, 2010
  
Applies to Disease Groups (DiG) release: 1.0
+
Applies to Disease Groups (DiG) release: 2.0
  
Release date: April 6, 2009
+
Release date: June 14, 2010
  
 
Download location: currently not available for download. Contact ian.donaldson@biotek.uio.no
 
Download location: currently not available for download. Contact ian.donaldson@biotek.uio.no
Line 72: Line 72:
  
  
Each line in the DiG list represents a single gene and its association with some disease (taken from Morbid Map).  Each gene (line) has been assigned a disease group number.  Each group represents a set of phenotypically related diseases as determined by their Morbid Map titles.
+
Each line in the DiG list represents a single gene and its association with some disease (taken from Morbid Map).  Each gene (line) has been assigned a disease group number (column 7).  Each group represents a set of phenotypically related diseases as determined by their Morbid Map titles.
  
  
Line 88: Line 88:
  
 
'''Notes'''
 
'''Notes'''
 +
 
"17,20-lyase deficiency, isolated" is the disease title
 
"17,20-lyase deficiency, isolated" is the disease title
  
 
"202110" is the OMIM identifier
 
"202110" is the OMIM identifier
  
(3) is the evidence code
+
(3) is the evidence code (diseasetag)
  
 
See http://irefindex/wiki/DiG:_Disease_groups for more information.
 
See http://irefindex/wiki/DiG:_Disease_groups for more information.
Line 99: Line 100:
  
 
{|
 
{|
|Column name: ||genesymbol
+
|Column name: ||genesymbols
 
|-
 
|-
 
|Column type: ||text, multiple values comma delimited
 
|Column type: ||text, multiple values comma delimited
 
|-
 
|-
|Description: ||gene symbols as originally listed in Morbid Map.
+
|Description: ||gene symbols as originally listed in Morbid Map
 
|-
 
|-
 
|Example: ||CYP17A1, CYP17, P450C17
 
|Example: ||CYP17A1, CYP17, P450C17
Line 131: Line 132:
 
|Column type: ||integer
 
|Column type: ||integer
 
|-
 
|-
|Description: ||omimid extracted from title column in Morbid Map (see Column 1 above).
+
|Description: ||OMIMID extracted from title column in Morbid Map (see Column 1 above)
 
|-
 
|-
 
|Example: ||202110
 
|Example: ||202110
Line 137: Line 138:
  
 
'''Notes'''
 
'''Notes'''
 +
This omim identifier usually refers to a record describing a disease phenotype; it is a descriptive entry that does not refer to a unique locus.
 +
 +
See [http://www.ncbi.nlm.nih.gov/sites/entrez?db=omim].
  
 
=== Column number: 5 ===
 
=== Column number: 5 ===
Line 143: Line 147:
 
|Column name: ||diseasetag
 
|Column name: ||diseasetag
 
|-
 
|-
|Column type: ||integer (0, -1, -2, -3)
+
|Column type: ||string ((1), (2) or (3))
 
|-
 
|-
|Description: ||evidence tag extracted from title column in Morbid Map (see Column 1 above)
+
|Description: ||evidence code extracted from title column in Morbid Map (see Column 1 above)
 
|-
 
|-
|Example: ||-3
+
|Example: ||(3)
 
|}
 
|}
  
 
'''Notes'''
 
'''Notes'''
Only entries with -3 in this column have been mapped to a disease groupentries DiG (see column 7).
+
 
 +
Only entries with (3) in this column have been mapped to a disease group (see column 7).
 +
For explanation of disease tags see [http://www.ncbi.nlm.nih.gov/Omim/omimfaq.html#gene_map_symbols]
  
 
=== Column number: 6 ===
 
=== Column number: 6 ===
Line 166: Line 172:
  
 
'''Notes'''
 
'''Notes'''
 +
 +
Cross-reference to EntrezGene (see [http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene]). 
 +
The file used to mine for geneids is gene_info file at [ftp://ftp.ncbi.nih.gov/gene/DATA/gene_info.gz].
 
In some cases, a zero will appear in this column if a mapping to an EntrezGene identifier could not be made using the gene names provided by Morbid Map (see Column 2 above).
 
In some cases, a zero will appear in this column if a mapping to an EntrezGene identifier could not be made using the gene names provided by Morbid Map (see Column 2 above).
+
 
 
=== Column number: 7 ===
 
=== Column number: 7 ===
  
Line 181: Line 190:
  
 
'''Notes'''
 
'''Notes'''
 +
 
The whole point of the table. This identifier is not stable between releases of DiG.
 
The whole point of the table. This identifier is not stable between releases of DiG.
 
Entries with identical DiG identifiers are deemed to belong to a set of phenotypically-related diseases (and genes).  
 
Entries with identical DiG identifiers are deemed to belong to a set of phenotypically-related diseases (and genes).  
Line 192: Line 202:
 
|Column type: ||text
 
|Column type: ||text
 
|-
 
|-
|Description: ||
+
|Description: ||manually created title
 +
|-
 +
|Example: ||17,20-lyase deficiency, isolated, 202110 (3) 
 +
|}
 +
 
 +
'''Notes'''
 +
 
 +
In some rare cases, titles provided by Morbid Map could not be properly processed by the text matching process.  These titles were manually re-written to avoid these problems.  In most cases, the text in this column is identical to that in column 1.  See http://irefindex/wiki/DiG:_Disease_groups for details. All these manual changes can be propagated from release to release. Note that the manual titles are not guaranteed to contain OMIM identifiers and evidence codes.
 +
 
 +
 
 +
=== Column number: 9 ===
 +
 
 +
{|
 +
|Column name: ||geneomimid
 +
|-
 +
|Column type: ||integer
 
|-
 
|-
|Example: ||manually created title
+
|Description: ||omim identifier as originally listed in Morbid Map
 +
|-
 +
|Example: ||609300
 
|}
 
|}
  
 
'''Notes'''
 
'''Notes'''
In some rare cases, titles provided by Morbid Map were not properly processed.  These titles were manually re-written to avoid these problems.  In most cases, the text in this column is identical to that in column 1.  See http://irefindex/wiki/DiG:_Disease_groups for details.  
+
 
 +
This omim identifier usually refers to a record describing a gene.
  
 
[[Category:DiG]]
 
[[Category:DiG]]

Latest revision as of 09:25, 30 June 2010

Last edited: June 30, 2010

Applies to Disease Groups (DiG) release: 2.0

Release date: June 14, 2010

Download location: currently not available for download. Contact ian.donaldson@biotek.uio.no

Authors: Katerina Michalickova and Ian Donaldson

Database: DiG (http://donaldson.uio.no/wiki/DiG:_Disease_groups)

Organization: Biotechnology Centre of Oslo, University of Oslo (http://www.biotek.uio.no/)


Description

This file describes the contents of the tab-delimited format of the Disease Groups list.

Details on the build process are available from http://donaldson.uio.no/wiki/DiG:_Disease_groups


Contact ian.donaldson at biotek.uio.no if you are interested in using DiG.

Directory contents

README pointer to this file at http://donaldson.uio.no/wiki/README_DiG_1.0
diseasegroups.mmddyyyytxt.zip the DiG list

DiG data consists of one tab-delimited text file with the name diseasegroups.mmddyyyy.txt.zip where mmddyyyy represents the file's creation date.

Changes from last version

Not applicable

Source data used for this build http://donaldson.uio.no/wiki/Sources_DiG_1.0
Statistics for this release Not available

Known Issues

None. First release.

Understanding the DiG format

License

Copyright © 2008, 2009 Ian Donaldson

Citation

DiG is not yet published.

Disclaimer

Data is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Description of the DiG file

Each line in the DiG list represents a single gene and its association with some disease (taken from Morbid Map). Each gene (line) has been assigned a disease group number (column 7). Each group represents a set of phenotypically related diseases as determined by their Morbid Map titles.


Column number: 1

Column name: title
Column type: text, contains multiple fields
Description: title column as listed in Morbid Map
Example: 17,20-lyase deficiency, isolated, 202110 (3)

Notes

"17,20-lyase deficiency, isolated" is the disease title

"202110" is the OMIM identifier

(3) is the evidence code (diseasetag)

See http://irefindex/wiki/DiG:_Disease_groups for more information.

Column number: 2

Column name: genesymbols
Column type: text, multiple values comma delimited
Description: gene symbols as originally listed in Morbid Map
Example: CYP17A1, CYP17, P450C17

Notes

Column number: 3

Column name: locus
Column type: text
Description: gene locus as originally listed in Morbid Map
Example: 10q24.3

Notes

Column number: 4

Column name: diseaseomimid
Column type: integer
Description: OMIMID extracted from title column in Morbid Map (see Column 1 above)
Example: 202110

Notes This omim identifier usually refers to a record describing a disease phenotype; it is a descriptive entry that does not refer to a unique locus.

See [1].

Column number: 5

Column name: diseasetag
Column type: string ((1), (2) or (3))
Description: evidence code extracted from title column in Morbid Map (see Column 1 above)
Example: (3)

Notes

Only entries with (3) in this column have been mapped to a disease group (see column 7). For explanation of disease tags see [2]

Column number: 6

Column name: geneid
Column type: integer
Description: EntrezGene identifier
Example: 64087

Notes

Cross-reference to EntrezGene (see [3]). The file used to mine for geneids is gene_info file at [4]. In some cases, a zero will appear in this column if a mapping to an EntrezGene identifier could not be made using the gene names provided by Morbid Map (see Column 2 above).

Column number: 7

Column name: digid
Column type: integer
Description: Disease Group identifier
Example: 1

Notes

The whole point of the table. This identifier is not stable between releases of DiG. Entries with identical DiG identifiers are deemed to belong to a set of phenotypically-related diseases (and genes).


Column number: 8

Column name: mantitle
Column type: text
Description: manually created title
Example: 17,20-lyase deficiency, isolated, 202110 (3)

Notes

In some rare cases, titles provided by Morbid Map could not be properly processed by the text matching process. These titles were manually re-written to avoid these problems. In most cases, the text in this column is identical to that in column 1. See http://irefindex/wiki/DiG:_Disease_groups for details. All these manual changes can be propagated from release to release. Note that the manual titles are not guaranteed to contain OMIM identifiers and evidence codes.


Column number: 9

Column name: geneomimid
Column type: integer
Description: omim identifier as originally listed in Morbid Map
Example: 609300

Notes

This omim identifier usually refers to a record describing a gene.