A web-based unsupervised algorithm for learning transliteration model to improve translation of low-frequency proper names

Min Shiang Shia, Jiun Hung Lin, Scott Yu, Wen Hsiang Lu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

In machine translation, cross-language information retrieval, and cross-language question answering, the problems of unknown term translation are difficult to be solved. Although we have proposed several effective Web-based term translation extraction methods exploring Web resources to deal with translation of frequent Web query terms. However, many low-frequency unknown terms are still difficult to be translated by using our previous Web-based term translation extraction methods. Therefore, in this paper we propose a two-stage hybrid translation extraction method, which is composed of our pervious Web-based term translation extraction method and a new Web-based transliteration method to improve translation of low-frequency unknown proper names. Additionally, to construct a good quality transliteration model, we also present a Web-based unsupervised learning algorithm to automatically collect diverse English-Chinese transliteration pairs from the Web. Experimental results showed that our new method can make great improvements for translation of unknown proper names.

Original languageEnglish
Title of host publicationProceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE'05
Pages420-425
Number of pages6
DOIs
Publication statusPublished - 2005
Event2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE'05 - Wuhan, China
Duration: 2005 Oct 302005 Nov 1

Publication series

NameProceedings of 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE'05
Volume2005

Other

Other2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering, IEEE NLP-KE'05
Country/TerritoryChina
CityWuhan
Period05-10-3005-11-01

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'A web-based unsupervised algorithm for learning transliteration model to improve translation of low-frequency proper names'. Together they form a unique fingerprint.

Cite this