Difference between revisions of "Bioscape Searching Techniques"
PaulBoddie (talk | contribs) m (→PubMed #10484778: Fixed quoted text in table.) |
PaulBoddie (talk | contribs) (Added possible synonym definition technique.) |
||
Line 140: | Line 140: | ||
"chromosome Xp11.2" | "chromosome Xp11.2" | ||
+ | |||
+ | == Synonym Definitions == | ||
+ | |||
+ | === PubMed #10880513 === | ||
+ | |||
+ | "Our previous studies have shown that activation of a '''related adhesion focal tyrosine kinase (RAFTK) (also known as Pyk2)''' is required for dexamethasone (Dex)-induced apoptosis in multiple myeloma (MM) cells and that human interleukin-6 (IL-6), a known growth and survival factor for MM cells, blocks both RAFTK activation and apoptosis induced by Dex." | ||
+ | |||
+ | {| cellspacing="0" cellpadding="5" border="0" style="margin: 2em" | ||
+ | | style="border: 1px solid #000000" | related adhesion focal tyrosine kinase | ||
+ | | ( | ||
+ | | style="border: 1px solid #000000" | RAFTK | ||
+ | | ) ( | ||
+ | | style="border: 1px solid #000000" | also known as | ||
+ | | | ||
+ | | style="border: 1px solid #000000" | Pyk2 | ||
+ | | ) | ||
+ | |- | ||
+ | | suggestion | ||
+ | | | ||
+ | | suggestion | ||
+ | | | ||
+ | | synonym correspondence | ||
+ | | | ||
+ | | suggestion | ||
+ | |} | ||
+ | |||
+ | === PubMed #10910894 === | ||
+ | |||
+ | "Liver-expressed chemokine '''(LEC) is an unusually large CC chemokine, which is also known as LMC, HCC-4, NCC-4, and CCL16'''." | ||
+ | |||
+ | {| cellspacing="0" cellpadding="5" border="0" style="margin: 2em" | ||
+ | | ( | ||
+ | | style="border: 1px solid #000000" | LEC | ||
+ | | ) is an unusually large CC chemokine, which is | ||
+ | | also known as | ||
+ | | | ||
+ | | style="border: 1px solid #000000" | LMC | ||
+ | | , | ||
+ | | style="border: 1px solid #000000" | HCC-4 | ||
+ | | , | ||
+ | | style="border: 1px solid #000000" | NCC-4 | ||
+ | | , and | ||
+ | | style="border: 1px solid #000000" | CCL16 | ||
+ | |- | ||
+ | | | ||
+ | | suggestion | ||
+ | | | ||
+ | | synonym correspondence | ||
+ | | | ||
+ | | suggestion | ||
+ | | | ||
+ | | suggestion | ||
+ | | | ||
+ | | suggestion | ||
+ | | | ||
+ | | suggestion | ||
+ | |- | ||
+ | | | ||
+ | | style="border: 1px solid #000000" | CCL16 (#6360) | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | style="border: 1px solid #000000" | CCL16 (#6360) | ||
+ | | | ||
+ | | style="border: 1px solid #000000" | CCL16 (#6360)<br>RBMS1 (#5937) | ||
+ | | | ||
+ | | style="border: 1px solid #000000" | CCL16 (#6360) | ||
+ | | | ||
+ | | style="border: 1px solid #000000" | CCL16 (#6360) | ||
+ | |} | ||
+ | |||
+ | === PubMed #11137999 === | ||
+ | |||
+ | "Here we report the identification of a new transmembrane serine protease ('''TMPRSS3; also known as ECHOS1''') expressed in many tissues, including fetal cochlea, which is mutated in the families used to describe both the DFNB10 and DFNB8 loci." | ||
+ | |||
+ | {| cellspacing="0" cellpadding="5" border="0" style="margin: 2em" | ||
+ | | style="border: 1px solid #000000" | TMPRSS3 | ||
+ | | ; | ||
+ | | also known as | ||
+ | | | ||
+ | | style="border: 1px solid #000000" | ECHOS1 | ||
+ | |- | ||
+ | | suggestion | ||
+ | | | ||
+ | | synonym correspondence | ||
+ | | | ||
+ | | suggestion | ||
+ | |- | ||
+ | | style="border: 1px solid #000000" | TMPRSS3 (#64699)<br>TMPRSS4 (#56649) | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | style="border: 1px solid #000000" | TMPRSS3 (#64699) | ||
+ | |} | ||
[[Category:Bioscape]] | [[Category:Bioscape]] |
Revision as of 19:31, 4 December 2009
A number of searching techniques are applied to find textual mentions of entities or concepts, particularly those where the nature of the searching is speculative, meaning that no predefined lists of search terms is used, but where certain characteristic patterns are sought after in the text.
Contents
Acronym Mentions
Mentions of acronym definition phrases involve the detection of sentences containing brackets ( and ), followed by a closer inspection of such sentences, applying regular expressions which look for one of the following patterns:
- An acronym-like term (upper-case letters, digits and hyphens) followed by a parenthesis phrase (a phrase in brackets)
- An acronym-like term in brackets, with the preceding text then being considered as the definition or explanation of the acronym
Upon identifying a possible acronym and explanation, a test is performed to attempt to match each initial (letter or number) with a word from the explanatory text. Here, although it is tempting to only take the first letter (or digit) from each word, other approaches may be necessary involving more sophisticated tokenisation. Consider the following examples:
PubMed #10639512
"In this study, we isolated and characterized the crucial gene at the breast cancer antiestrogen resistance 1 (BCAR1) locus."
breast cancer antiestrogen resistance 1 | ( | BCAR1 | ) |
explanation | acronym | ||
b, c, a, r, 1 | BCAR1 |
- explanation initials correspond to acronym
PubMed #10484778
"Anhidrotic ectodermal dysplasia (EDA) is a human genetic disorder of impaired ectodermal appendage development."
Anhidrotic ectodermal dysplasia | ( | EDA | ) |
explanation | acronym | ||
a, e, d (not detectable in order) |
EDA |
- presumed explanation initials only correspond to acronym if reordered
PubMed #10484776
"We identified a glyoxylate reductase/hydroxypyruvate reductase (GRHPR) cDNA clone from a human liver expressed sequence tag (EST) library."
glyoxylate reductase/hydroxypyruvate reductase | ( | GRHPR | ) |
explanation | acronym | ||
g, r, h, p (within a word), r (requiring word analysis) |
GRHPR |
- presumed explanation initials only correspond if words are inspected more closely
- words must also isolated using a more sophisticated tokeniser than one which splits words using whitespace characters
PubMed #10226785
"The insulin receptor related receptor (IRR) is a heterotetrameric transmembrane receptor with intrinsic tyrosine kinase activity."
insulin receptor related receptor | ( | IRR | ) |
explanation | acronym | ||
i, r, r (should be ignored), r (requiring stop-word detection) |
IRR |
- presumed explanation initials only correspond if stop-words are discarded
"The IRR shares large homology with the insulin and the insulin-like growth factor-1 (IGF-I) receptor with regard to amino acid sequence and protein structure."
insulin-like growth factor-1 | ( | IGF-I | ) |
explanation | acronym | ||
i, l (should be ignored), g, f, 1 (requiring stop-word detection, numeral conversion) |
IGF-I |
- presumed explanation initials only correspond if stop-words are discarded
- numerals must also be converted so that 1 and I can be matched
Chromosome and Maplocation Mentions
To be expanded...
PubMed #10684944
"mouse chromosome 17 and to human chromosome 16p13.3"
PubMed #10639512
"chromosome 16q23.1"
PubMed #10484772
"chromosome Xp11.2"
Synonym Definitions
PubMed #10880513
"Our previous studies have shown that activation of a related adhesion focal tyrosine kinase (RAFTK) (also known as Pyk2) is required for dexamethasone (Dex)-induced apoptosis in multiple myeloma (MM) cells and that human interleukin-6 (IL-6), a known growth and survival factor for MM cells, blocks both RAFTK activation and apoptosis induced by Dex."
related adhesion focal tyrosine kinase | ( | RAFTK | ) ( | also known as | Pyk2 | ) | |
suggestion | suggestion | synonym correspondence | suggestion |
PubMed #10910894
"Liver-expressed chemokine (LEC) is an unusually large CC chemokine, which is also known as LMC, HCC-4, NCC-4, and CCL16."
( | LEC | ) is an unusually large CC chemokine, which is | also known as | LMC | , | HCC-4 | , | NCC-4 | , and | CCL16 | |
suggestion | synonym correspondence | suggestion | suggestion | suggestion | suggestion | ||||||
CCL16 (#6360) | CCL16 (#6360) | CCL16 (#6360) RBMS1 (#5937) |
CCL16 (#6360) | CCL16 (#6360) |
PubMed #11137999
"Here we report the identification of a new transmembrane serine protease (TMPRSS3; also known as ECHOS1) expressed in many tissues, including fetal cochlea, which is mutated in the families used to describe both the DFNB10 and DFNB8 loci."
TMPRSS3 | ; | also known as | ECHOS1 | |
suggestion | synonym correspondence | suggestion | ||
TMPRSS3 (#64699) TMPRSS4 (#56649) |
TMPRSS3 (#64699) |