Unsupervised corpus distillation for represented indicator measurement on focus species detection

Chih Hsuan Wei, Hung Yu Kao

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

The gene ambiguity with the highest dimension is the species with which an entity is associated in biomedical text mining. Furthermore, one of the bottlenecks in gene normalisation is focus species detection. This study presents a method which is robust for all types of articles, particularly those without explicit species mentions. Since our method requires a training corpus, we developed an iterative distillation method to extend the corpus. Unsupervised corpus is therefore helpful for the detection of focus species. In experiments, the proposed method achieved a high accuracy of 85.64% and 84.32% in datasets with and without species mentions respectively.

Original languageEnglish
Pages (from-to)413-426
Number of pages14
JournalInternational Journal of Data Mining and Bioinformatics
Volume8
Issue number4
DOIs
Publication statusPublished - 2013

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Biochemistry, Genetics and Molecular Biology(all)
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'Unsupervised corpus distillation for represented indicator measurement on focus species detection'. Together they form a unique fingerprint.

Cite this