An enhanced CRF-based Method for Disease Named Entity Recognition and Normalization in Biomedical Literature

  • 李 昕純

Student thesis: Master's Thesis


Diseases play central roles in many areas of biomedical research and healthcare Consequently aggregating the disease knowledge and treatment research reports becomes an extremely critical issue especially in rapid-growth knowledge bases (e g PubMed) Thus a framework of disease named entity recognition and normalization has become increasingly important for biomedical text mining In this work we not only define five diversity of disease names but also develope a system AuDis for disease mention recognition and normalization in biomedical texts The AuDis utilize an order 2 conditional random fields (CRFs) model to develop a recognition system and optimize the results by customizing several post-processing including abbreviation resolution consistency improvement stopwords filtering and adjectives re-organized Furthermore we utilize dictionary-lookup approach to solve the normalization problem including stable medical lexicons collection and extension As the official evaluation on the CDR task in BioCreative V AuDis obtained the best performance (86 46% of F-score) among 40 runs (16 unique teams) on disease normalization of the DNER sub task After the official evaluation AuDis could obtain the performance of 87 26 F-score now These results suggest that AuDis is a high-performance and state of the art recognition system for disease recognition and normalization from biomedical literature
Date of Award2016 Aug 11
Original languageEnglish
SupervisorHung-Yu Kao (Supervisor)

Cite this