DNN-based Ontology Population based on Named Entity Extraction and Active Learning

  • 施 帛辰

Student thesis: Master's Thesis

Abstract

Ontology is a kind of representation which is used to save the knowledge It can be used as the external resource to improve the system’s performance or accuracy in many domains and research Ontology has many related research areas including ontology population ontology enrichment and inconsistency resolution These research areas have a general name Ontology Learning This thesis is focused on the ontology population We aim to develop a system to populate the ontology automatically and avoid to manually define the rules The purpose of this thesis is the ontology population which includes two parts automatic population of the ontology with neural networks and population based on active learning In the system of automatic ontology population we use the named entity recognition model with the improved character-level embedding to extract the terms from a sentence which may be the concepts in the ontology Then we use a multi-layered perception network to decide the predicates between the pairs of named entities Because of the dependency relations between the named entities and the predicates we analyze the distribution between them According to the distribution we filter out the low accuracy combinations For active learning the study proposes three algorithms including uncertainty estimation rule matching and high correlation evaluation Using uncertainty estimation to ensure the uncertain triples Another two methods are used to induce the triples with the certain triples and confirm the induced triples by active learning We collected 1 268 documents for evaluation of the proposed method The system automatically populates the ontology based on the documents Topic detection is selected as the task to evaluate the effectiveness of the populated ontology From the experimental results the proposed method can extract the triples among the documents and 74 59% of triples are correct In addition the populated triples are beneficial to improve the topic detection performance The accuracy of our system is 2 percent higher the baseline model labeled LDA
Date of Award2018 Aug 29
Original languageEnglish
SupervisorChung-Hsien Wu (Supervisor)

Cite this

'