Represented indicator measurement and corpus distillation on focus species detection

Chih Hsuan Wei, Hung-Yu Kao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In extraction of information from the biomedical literature, name disambiguation of domain-specific entities, such as proteins, is one of the most important issues. The entity ambiguity with the highest dimension is the species to which an entity is associated with. Furthermore, one of the bottlenecks in inter-species gene name normalization is species disambiguation. To enhance the performance of species disambiguation, the detection of focus species detection remains a substantial challenge. This study presents a method addressing this issue. The results present evaluations of all articles from the BioCreaTive I&II GN task. Our method is robust for all types of articles, particularly those without explicit species entity information. Since our method requires a training corpus to be the indicator vector, we developed an iterative corpus distillation method to extend the corpus. In the conducted experiments, the proposed method achieved a high accuracy of 85.64% and 84.32% without species entity information.

Original languageEnglish
Title of host publicationProceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
Pages657-662
Number of pages6
DOIs
Publication statusPublished - 2010 Dec 1
Event2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010 - Hong Kong, China
Duration: 2010 Dec 182010 Dec 21

Publication series

NameProceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010

Other

Other2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010
CountryChina
CityHong Kong
Period10-12-1810-12-21

Fingerprint

Distillation
Genes
Proteins
Experiments
Names
Information Storage and Retrieval

All Science Journal Classification (ASJC) codes

  • Biomedical Engineering
  • Health Informatics

Cite this

Wei, C. H., & Kao, H-Y. (2010). Represented indicator measurement and corpus distillation on focus species detection. In Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010 (pp. 657-662). [5706647] (Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010). https://doi.org/10.1109/BIBM.2010.5706647
Wei, Chih Hsuan ; Kao, Hung-Yu. / Represented indicator measurement and corpus distillation on focus species detection. Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010. 2010. pp. 657-662 (Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010).
@inproceedings{fb1ad48a6bf54033b6bc87a39486b6ec,
title = "Represented indicator measurement and corpus distillation on focus species detection",
abstract = "In extraction of information from the biomedical literature, name disambiguation of domain-specific entities, such as proteins, is one of the most important issues. The entity ambiguity with the highest dimension is the species to which an entity is associated with. Furthermore, one of the bottlenecks in inter-species gene name normalization is species disambiguation. To enhance the performance of species disambiguation, the detection of focus species detection remains a substantial challenge. This study presents a method addressing this issue. The results present evaluations of all articles from the BioCreaTive I&II GN task. Our method is robust for all types of articles, particularly those without explicit species entity information. Since our method requires a training corpus to be the indicator vector, we developed an iterative corpus distillation method to extend the corpus. In the conducted experiments, the proposed method achieved a high accuracy of 85.64{\%} and 84.32{\%} without species entity information.",
author = "Wei, {Chih Hsuan} and Hung-Yu Kao",
year = "2010",
month = "12",
day = "1",
doi = "10.1109/BIBM.2010.5706647",
language = "English",
isbn = "9781424483075",
series = "Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010",
pages = "657--662",
booktitle = "Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010",

}

Wei, CH & Kao, H-Y 2010, Represented indicator measurement and corpus distillation on focus species detection. in Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010., 5706647, Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010, pp. 657-662, 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010, Hong Kong, China, 10-12-18. https://doi.org/10.1109/BIBM.2010.5706647

Represented indicator measurement and corpus distillation on focus species detection. / Wei, Chih Hsuan; Kao, Hung-Yu.

Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010. 2010. p. 657-662 5706647 (Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Represented indicator measurement and corpus distillation on focus species detection

AU - Wei, Chih Hsuan

AU - Kao, Hung-Yu

PY - 2010/12/1

Y1 - 2010/12/1

N2 - In extraction of information from the biomedical literature, name disambiguation of domain-specific entities, such as proteins, is one of the most important issues. The entity ambiguity with the highest dimension is the species to which an entity is associated with. Furthermore, one of the bottlenecks in inter-species gene name normalization is species disambiguation. To enhance the performance of species disambiguation, the detection of focus species detection remains a substantial challenge. This study presents a method addressing this issue. The results present evaluations of all articles from the BioCreaTive I&II GN task. Our method is robust for all types of articles, particularly those without explicit species entity information. Since our method requires a training corpus to be the indicator vector, we developed an iterative corpus distillation method to extend the corpus. In the conducted experiments, the proposed method achieved a high accuracy of 85.64% and 84.32% without species entity information.

AB - In extraction of information from the biomedical literature, name disambiguation of domain-specific entities, such as proteins, is one of the most important issues. The entity ambiguity with the highest dimension is the species to which an entity is associated with. Furthermore, one of the bottlenecks in inter-species gene name normalization is species disambiguation. To enhance the performance of species disambiguation, the detection of focus species detection remains a substantial challenge. This study presents a method addressing this issue. The results present evaluations of all articles from the BioCreaTive I&II GN task. Our method is robust for all types of articles, particularly those without explicit species entity information. Since our method requires a training corpus to be the indicator vector, we developed an iterative corpus distillation method to extend the corpus. In the conducted experiments, the proposed method achieved a high accuracy of 85.64% and 84.32% without species entity information.

UR - http://www.scopus.com/inward/record.url?scp=79952411027&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952411027&partnerID=8YFLogxK

U2 - 10.1109/BIBM.2010.5706647

DO - 10.1109/BIBM.2010.5706647

M3 - Conference contribution

SN - 9781424483075

T3 - Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010

SP - 657

EP - 662

BT - Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010

ER -

Wei CH, Kao H-Y. Represented indicator measurement and corpus distillation on focus species detection. In Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010. 2010. p. 657-662. 5706647. (Proceedings - 2010 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2010). https://doi.org/10.1109/BIBM.2010.5706647