In recent years natural language processing has been facing several obstacles in professional text mining in biomedical fields; the scenarios of natural language processing usage are completely different for handling professional languages and general languages Due to the lack of training data powerful deep learning techniques were not applicable to small datasets available for the highly specific biological researches such as gene ontology The Colorado Richly Annotated Full-Text corpus used in this study contains 67 full-text documents annotated by biologists In this research we aimed to identify the key difficulty of the gene ontology concept recognition task and handled this problem using dictionary-matching and machine-learning techniques Accordingly problem solving was divided into two steps dictionary-matching and machine-learning respectively corresponding to the roles of named concepts In the first step we reconstructed the gene ontology concepts after mining the named concepts Furthermore in the second step we leveraged this reconstructed data to fulfill the needs of the proposed hybrid method The proposed concept recognizer achieved approximately 20% improvement in F1-measure as compared to the state-of-the-art system resulting in 0 804 precision and 0 715 recall It proved that the named concept may be applied to the concept recognition of other professional languages
A Study on Concept Recognition in Biomedical Field Using Gene Ontology as an Example
家融, 楊. (Author). 2019
Student thesis: Doctoral Thesis