DNN-based Ontology Population based on Named Entity Extraction and Active Learning

論文翻譯標題: 基於深度神經網路結合命名實體辨別和主動學習於知識本體擴充
  • 施 帛辰

學生論文: Master's Thesis


Ontology is a kind of representation which is used to save the knowledge It can be used as the external resource to improve the system’s performance or accuracy in many domains and research Ontology has many related research areas including ontology population ontology enrichment and inconsistency resolution These research areas have a general name Ontology Learning This thesis is focused on the ontology population We aim to develop a system to populate the ontology automatically and avoid to manually define the rules The purpose of this thesis is the ontology population which includes two parts automatic population of the ontology with neural networks and population based on active learning In the system of automatic ontology population we use the named entity recognition model with the improved character-level embedding to extract the terms from a sentence which may be the concepts in the ontology Then we use a multi-layered perception network to decide the predicates between the pairs of named entities Because of the dependency relations between the named entities and the predicates we analyze the distribution between them According to the distribution we filter out the low accuracy combinations For active learning the study proposes three algorithms including uncertainty estimation rule matching and high correlation evaluation Using uncertainty estimation to ensure the uncertain triples Another two methods are used to induce the triples with the certain triples and confirm the induced triples by active learning We collected 1 268 documents for evaluation of the proposed method The system automatically populates the ontology based on the documents Topic detection is selected as the task to evaluate the effectiveness of the populated ontology From the experimental results the proposed method can extract the triples among the documents and 74 59% of triples are correct In addition the populated triples are beneficial to improve the topic detection performance The accuracy of our system is 2 percent higher the baseline model labeled LDA
獎項日期2018 8月 29
監督員Chung-Hsien Wu (Supervisor)