Annotation and verification of sense pools in OntoNotes

Liang Chih Yu, Chung Hsien Wu, Ru Yng Chang, Chao Hong Liu, Eduard Hovy

研究成果: Article同行評審

12 引文 斯高帕斯(Scopus)

摘要

The paper describes the OntoNotes, a multilingual (English, Chinese and Arabic) corpus with large-scale semantic annotations, including predicate-argument structure, word senses, ontology linking, and coreference. The underlying semantic model of OntoNotes involves word senses that are grouped into so-called sense pools, i.e., sets of near-synonymous senses of words. Such information is useful for many applications, including query expansion for information retrieval (IR) systems, (near-)duplicate detection for text summarization systems, and alternative word selection for writing support systems. Although a sense pool provides a set of near-synonymous senses of words, there is still no knowledge about whether two words in a pool are interchangeable in practical use. Therefore, this paper devises an unsupervised algorithm that incorporates Google n-grams and a statistical test to determine whether a word in a pool can be substituted by other words in the same pool. The n-gram features are used to measure the degree of context mismatch for a substitution. The statistical test is then applied to determine whether the substitution is adequate based on the degree of mismatch. The proposed method is compared with a supervised method, namely Linear Discriminant Analysis (LDA). Experimental results show that the proposed unsupervised method can achieve comparable performance with the supervised method.

原文English
頁(從 - 到)436-447
頁數12
期刊Information Processing and Management
46
發行號4
DOIs
出版狀態Published - 2010 七月

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

指紋 深入研究「Annotation and verification of sense pools in OntoNotes」主題。共同形成了獨特的指紋。

引用此