TY - GEN
T1 - OntoNotes
T2 - 22nd International Conference on Computational Linguistics, Coling 2008
AU - Yu, Liang Chih
AU - Wu, Chung Hsien
AU - Hovy, Eduard
PY - 2008
Y1 - 2008
N2 - Annotated corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, no-one has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the mistaken agreements in word sense annotation, we employ word sense disambiguation (WSD) to select a set of suspicious candidates for human evaluation. Experiments are conducted from three aspects (precision, cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective on identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora.
AB - Annotated corpora are only useful if their annotations are consistent. Most large-scale annotation efforts take special measures to reconcile inter-annotator disagreement. To date, however, no-one has investigated how to automatically determine exemplars in which the annotators agree but are wrong. In this paper, we use OntoNotes, a large-scale corpus of semantic annotations, including word senses, predicate-argument structure, ontology linking, and coreference. To determine the mistaken agreements in word sense annotation, we employ word sense disambiguation (WSD) to select a set of suspicious candidates for human evaluation. Experiments are conducted from three aspects (precision, cost-effectiveness ratio, and entropy) to examine the performance of WSD. The experimental results show that WSD is most effective on identifying erroneous annotations for highly-ambiguous words, while a baseline is better for other cases. The two methods can be combined to improve the cleanup process. This procedure allows us to find approximately 2% remaining erroneous agreements in the OntoNotes corpus. A similar procedure can be easily defined to check other annotated corpora.
UR - http://www.scopus.com/inward/record.url?scp=77955231415&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955231415&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:77955231415
SN - 9781905593446
T3 - Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference
SP - 1057
EP - 1064
BT - Coling 2008 - 22nd International Conference on Computational Linguistics, Proceedings of the Conference
Y2 - 18 August 2008 through 22 August 2008
ER -