TY - JOUR
T1 - Unsupervised corpus distillation for represented indicator measurement on focus species detection
AU - Wei, Chih Hsuan
AU - Kao, Hung Yu
PY - 2013
Y1 - 2013
N2 - The gene ambiguity with the highest dimension is the species with which an entity is associated in biomedical text mining. Furthermore, one of the bottlenecks in gene normalisation is focus species detection. This study presents a method which is robust for all types of articles, particularly those without explicit species mentions. Since our method requires a training corpus, we developed an iterative distillation method to extend the corpus. Unsupervised corpus is therefore helpful for the detection of focus species. In experiments, the proposed method achieved a high accuracy of 85.64% and 84.32% in datasets with and without species mentions respectively.
AB - The gene ambiguity with the highest dimension is the species with which an entity is associated in biomedical text mining. Furthermore, one of the bottlenecks in gene normalisation is focus species detection. This study presents a method which is robust for all types of articles, particularly those without explicit species mentions. Since our method requires a training corpus, we developed an iterative distillation method to extend the corpus. Unsupervised corpus is therefore helpful for the detection of focus species. In experiments, the proposed method achieved a high accuracy of 85.64% and 84.32% in datasets with and without species mentions respectively.
UR - http://www.scopus.com/inward/record.url?scp=84885029280&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84885029280&partnerID=8YFLogxK
U2 - 10.1504/IJDMB.2013.056615
DO - 10.1504/IJDMB.2013.056615
M3 - Article
C2 - 24400519
AN - SCOPUS:84885029280
SN - 1748-5673
VL - 8
SP - 413
EP - 426
JO - International Journal of Data Mining and Bioinformatics
JF - International Journal of Data Mining and Bioinformatics
IS - 4
ER -