Difference between revisions of "Bioscape Scoring Techniques"

From irefindex
(Started a collection of scoring techniques.)
 
(Added more examples.)
Line 64: Line 64:
 
* CTNND1 (#1500) and BCAR1 (#9564) are not supported by any other names
 
* CTNND1 (#1500) and BCAR1 (#9564) are not supported by any other names
 
* the "which we name" text also confirms the equivalence of the two entities
 
* the "which we name" text also confirms the equivalence of the two entities
 +
 +
== Finding Unambiguous Gene Mentions ==
 +
 +
In various disambiguation techniques, an unambiguous gene mention may be
 +
needed in order to disambiguate between competing gene suggestions.
 +
Consequently, a reliable method is needed to find "high quality" suggestions
 +
which can be considered as unambiguous gene mentions. Consider the following
 +
document excerpt:
 +
 +
=== PubMed #9788873 ===
 +
 +
"In mouse embryo fibroblasts, TCDD activates expression of multiple genes,
 +
including '''CYP1B1''', the predominant cytochrome P450 expressed in these cells."
 +
 +
{| cellspacing="0" cellpadding="5" border="1" style="margin: 2em"
 +
! Mentions
 +
! Suggestions
 +
! colspan="2" | Methods
 +
|-
 +
| colspan="2" |
 +
| <tt>unambiguous_at_exact_location</tt>
 +
| <tt>not_part_of_other_mentions</tt>
 +
|-
 +
| CYP1B1
 +
| CYP1B1 (#1545)
 +
| X
 +
| X
 +
|-
 +
| rowspan="3" | CYP1
 +
| CYP1A1 (#1543)
 +
|
 +
|
 +
|-
 +
| CYP2A (#1546)
 +
|
 +
|
 +
|-
 +
| CYP27B1 (#1594)
 +
|
 +
|
 +
|-
 +
| CYP
 +
| PPIG (#9360)
 +
| X
 +
|
 +
|}
 +
 +
* Here, CYP1B1 (#1545) can be considered at this location as an unambiguous gene mention.
 +
 +
If the above methods taken together are known as an "unambiguous gene mention"
 +
method, the fundamental technique can be defined in terms of this method.
 +
However, in order to eliminate obvious bad suggestions for genes, it is also
 +
necessary to apply other methods in order to identify good suggestions more
 +
reliably.
 +
 +
== Disambiguating using Unambiguous Mentions ==
 +
 +
Within the same document, the presence of unambiguous gene mentions can be
 +
used to help disambiguate at other mention locations where an unambiguously
 +
identified gene may be "competing" with other genes, typically using a name
 +
which is ambiguous. For example:
 +
 +
=== PubMed #10484773 ===
 +
 +
Method: <tt>disambiguated_by_unambiguous_gene_mention</tt>
 +
 +
"A common genetic variant (V) '''of the human luteinizing hormone (LH) beta-subunit gene''' was recently discovered."
 +
 +
{| cellspacing="0" cellpadding="5" border="0" style="margin: 2em"
 +
| of the
 +
| style="border: 1px solid #000000" | human luteinizing hormone (LH) beta
 +
| -subunit gene
 +
|-
 +
|
 +
| suggestion
 +
|
 +
|-
 +
|
 +
| style="border: 1px solid #000000" | LHB (#3972)
 +
|
 +
|}
 +
 +
With this unambiguous mention identified and the presence of the suggested gene confirmed, this knowledge can be applied to other mention locations. For example:
 +
 +
"We have now studied whether additional mutations '''in the V-LHbeta promoter sequence''' could contribute to the altered physiology of the LH variant molecules."
 +
 +
{| cellspacing="0" cellpadding="5" border="0" style="margin: 2em"
 +
| in the
 +
| style="border: 1px solid #000000" | V-LHbeta
 +
| promoter sequence
 +
|-
 +
|
 +
| suggestions
 +
|
 +
|-
 +
|
 +
| style="border: 1px solid #000000" | LHB (#3972)<br>PLOD2 (#5352)<br>LHX2 (#9355)
 +
|
 +
|}
 +
 +
Since the latter two genes are not unambiguously identified in the document, yet the first gene has been identified (see above), the latter two genes are scored negatively and are regarded as not being referenced.
 +
 +
Note that the <tt>confirmed_by_competing_names</tt> method also resolves these mentions.
  
 
[[Category:Bioscape]]
 
[[Category:Bioscape]]

Revision as of 13:29, 23 November 2009

This document provides examples of some of the adopted and proposed techniques for scoring source and result data in order to more accurately identify bioentities in the literature.

Confirmation of Mention Suggestions

PubMed #11137999

Method: to be implemented

"Here we report the identification of a new transmembrane serine protease (TMPRSS3; also known as ECHOS1) expressed in many tissues, including fetal cochlea, which is mutated in the families used to describe both the DFNB10 and DFNB8 loci."

( TMPRSS3 ; also known as ECHOS1 )
suggestions suggestion
TMPRSS3 (#64699)
TMPRSS4 (#56649)
TMPRSS3 (#64699)
  • TMPRSS3 (#64699) occurs for both mentions
  • "also known as" could be used as a key to such situations
  • close proximity of mentions (as seen with acronyms) could be sufficient

Note that the above also involves an acronym.

PubMed #7479798

Method: confirmed_by_competing_names

"Cloning and analysis of the full-length cDNA of the human CSE1 homologue, which we name CAS for cellular apoptosis susceptibility gene, reveals a protein coding region with similar length (971 amino acids for CAS, 960 amino acids for CSE1) and 59% overall protein homology to the yeast CSE1 protein."

human CSE1 homologue, which we name CAS
suggestion suggestions
CSE1L (#1434) CSE1L (#1434)
CTNND1 (#1500)
BCAR1 (#9564)
  • CSE1L (#1434) is supported by two different names
  • CTNND1 (#1500) and BCAR1 (#9564) are not supported by any other names
  • the "which we name" text also confirms the equivalence of the two entities

Finding Unambiguous Gene Mentions

In various disambiguation techniques, an unambiguous gene mention may be needed in order to disambiguate between competing gene suggestions. Consequently, a reliable method is needed to find "high quality" suggestions which can be considered as unambiguous gene mentions. Consider the following document excerpt:

PubMed #9788873

"In mouse embryo fibroblasts, TCDD activates expression of multiple genes, including CYP1B1, the predominant cytochrome P450 expressed in these cells."

Mentions Suggestions Methods
unambiguous_at_exact_location not_part_of_other_mentions
CYP1B1 CYP1B1 (#1545) X X
CYP1 CYP1A1 (#1543)
CYP2A (#1546)
CYP27B1 (#1594)
CYP PPIG (#9360) X
  • Here, CYP1B1 (#1545) can be considered at this location as an unambiguous gene mention.

If the above methods taken together are known as an "unambiguous gene mention" method, the fundamental technique can be defined in terms of this method. However, in order to eliminate obvious bad suggestions for genes, it is also necessary to apply other methods in order to identify good suggestions more reliably.

Disambiguating using Unambiguous Mentions

Within the same document, the presence of unambiguous gene mentions can be used to help disambiguate at other mention locations where an unambiguously identified gene may be "competing" with other genes, typically using a name which is ambiguous. For example:

PubMed #10484773

Method: disambiguated_by_unambiguous_gene_mention

"A common genetic variant (V) of the human luteinizing hormone (LH) beta-subunit gene was recently discovered."

of the human luteinizing hormone (LH) beta -subunit gene
suggestion
LHB (#3972)

With this unambiguous mention identified and the presence of the suggested gene confirmed, this knowledge can be applied to other mention locations. For example:

"We have now studied whether additional mutations in the V-LHbeta promoter sequence could contribute to the altered physiology of the LH variant molecules."

in the V-LHbeta promoter sequence
suggestions
LHB (#3972)
PLOD2 (#5352)
LHX2 (#9355)

Since the latter two genes are not unambiguously identified in the document, yet the first gene has been identified (see above), the latter two genes are scored negatively and are regarded as not being referenced.

Note that the confirmed_by_competing_names method also resolves these mentions.