OntoNotes: Corpus cleanup of mistaken agreement using word sense disambiguation

Liang Chih Yu, Chung Hsien Wu, Eduard Hovy

研究成果: Conference contribution

5 引文 斯高帕斯(Scopus)

摘要

Annotated corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, no-one has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the mistaken agreements in word sense annotation, we employ word sense disambiguation (WSD) to select a set of suspicious candidates for human evaluation. Experiments are conducted from three aspects (precision, cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective on identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora.

原文English
主出版物標題Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference
頁面1057-1064
頁數8
出版狀態Published - 2008 十二月 1
事件22nd International Conference on Computational Linguistics, Coling 2008 - Manchester, United Kingdom
持續時間: 2008 八月 182008 八月 22

出版系列

名字Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference
1

Other

Other22nd International Conference on Computational Linguistics, Coling 2008
國家/地區United Kingdom
城市Manchester
期間08-08-1808-08-22

All Science Journal Classification (ASJC) codes

  • 語言與語言學
  • 計算機理論與數學
  • 語言和語言學

指紋

深入研究「OntoNotes: Corpus cleanup of mistaken agreement using word sense disambiguation」主題。共同形成了獨特的指紋。

引用此