Bioscape Scoring Techniques

From irefindex
Revision as of 12:54, 23 November 2009 by PaulBoddie (talk | contribs) (Started a collection of scoring techniques.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This document provides examples of some of the adopted and proposed techniques for scoring source and result data in order to more accurately identify bioentities in the literature.

Confirmation of Mention Suggestions

PubMed #11137999

Method: to be implemented

"Here we report the identification of a new transmembrane serine protease (TMPRSS3; also known as ECHOS1) expressed in many tissues, including fetal cochlea, which is mutated in the families used to describe both the DFNB10 and DFNB8 loci."

( TMPRSS3 ; also known as ECHOS1 )
suggestions suggestion
TMPRSS3 (#64699)
TMPRSS4 (#56649)
TMPRSS3 (#64699)
  • TMPRSS3 (#64699) occurs for both mentions
  • "also known as" could be used as a key to such situations
  • close proximity of mentions (as seen with acronyms) could be sufficient

Note that the above also involves an acronym.

PubMed #7479798

Method: confirmed_by_competing_names

"Cloning and analysis of the full-length cDNA of the human CSE1 homologue, which we name CAS for cellular apoptosis susceptibility gene, reveals a protein coding region with similar length (971 amino acids for CAS, 960 amino acids for CSE1) and 59% overall protein homology to the yeast CSE1 protein."

human CSE1 homologue, which we name CAS
suggestion suggestions
CSE1L (#1434) CSE1L (#1434)
CTNND1 (#1500)
BCAR1 (#9564)
  • CSE1L (#1434) is supported by two different names
  • CTNND1 (#1500) and BCAR1 (#9564) are not supported by any other names
  • the "which we name" text also confirms the equivalence of the two entities