Anchor text mining for translation of web queries: A transitive translation approach

Wen-Hsiang Lu, Lee Feng Chien, Hsi Jian Lee

Research output: Contribution to journalArticle

54 Citations (Scopus)

Abstract

To discover translation knowledge in diverse data resources on the Web, this article proposes an effective approach to finding translation equivalents of query terms and constructing multilingual lexicons through the mining of Web anchor texts and link structures. Although Web anchor texts are wide-scoped hypertext resources, not every particular pair of languages contains sufficient anchor texts for effective extraction of translations for Web queries. For more generalized applications, the approach is designed based on a transitive translation model. The translation equivalents of a query term can be extracted via its translation in an intermediate language. To reduce interference from translation errors, the approach further integrates a competitive linking algorithm into the process of determining the most probable translation. A series of experiments has been conducted, including performance tests on term translation extraction, cross-language information retrieval, and translation suggestions for practical Web search services, respectively. The obtained experimental results have shown that the proposed approach is effective in extracting translations of unknown queries, is easy to combine with the probabilistic retrieval model to improve the cross-language retrieval performance, and is very useful when the considered language pairs lack a sufficient number of anchor texts. Based on the approach, an experimental system called LiveTrans has been developed for English-Chinese cross-language Web search.

Original languageEnglish
Pages (from-to)242-269
Number of pages28
JournalACM Transactions on Information Systems
Volume22
Issue number2
DOIs
Publication statusPublished - 2004 Apr 1

Fingerprint

Anchors
Query languages
Query
Text mining
World Wide Web
Experiments
Language

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Business, Management and Accounting(all)
  • Computer Science Applications

Cite this

@article{ca487eefcf7846379dcfcfaf01765cb3,
title = "Anchor text mining for translation of web queries: A transitive translation approach",
abstract = "To discover translation knowledge in diverse data resources on the Web, this article proposes an effective approach to finding translation equivalents of query terms and constructing multilingual lexicons through the mining of Web anchor texts and link structures. Although Web anchor texts are wide-scoped hypertext resources, not every particular pair of languages contains sufficient anchor texts for effective extraction of translations for Web queries. For more generalized applications, the approach is designed based on a transitive translation model. The translation equivalents of a query term can be extracted via its translation in an intermediate language. To reduce interference from translation errors, the approach further integrates a competitive linking algorithm into the process of determining the most probable translation. A series of experiments has been conducted, including performance tests on term translation extraction, cross-language information retrieval, and translation suggestions for practical Web search services, respectively. The obtained experimental results have shown that the proposed approach is effective in extracting translations of unknown queries, is easy to combine with the probabilistic retrieval model to improve the cross-language retrieval performance, and is very useful when the considered language pairs lack a sufficient number of anchor texts. Based on the approach, an experimental system called LiveTrans has been developed for English-Chinese cross-language Web search.",
author = "Wen-Hsiang Lu and Chien, {Lee Feng} and Lee, {Hsi Jian}",
year = "2004",
month = "4",
day = "1",
doi = "10.1145/984321.984324",
language = "English",
volume = "22",
pages = "242--269",
journal = "ACM Transactions on Information Systems",
issn = "1046-8188",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

Anchor text mining for translation of web queries : A transitive translation approach. / Lu, Wen-Hsiang; Chien, Lee Feng; Lee, Hsi Jian.

In: ACM Transactions on Information Systems, Vol. 22, No. 2, 01.04.2004, p. 242-269.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Anchor text mining for translation of web queries

T2 - A transitive translation approach

AU - Lu, Wen-Hsiang

AU - Chien, Lee Feng

AU - Lee, Hsi Jian

PY - 2004/4/1

Y1 - 2004/4/1

N2 - To discover translation knowledge in diverse data resources on the Web, this article proposes an effective approach to finding translation equivalents of query terms and constructing multilingual lexicons through the mining of Web anchor texts and link structures. Although Web anchor texts are wide-scoped hypertext resources, not every particular pair of languages contains sufficient anchor texts for effective extraction of translations for Web queries. For more generalized applications, the approach is designed based on a transitive translation model. The translation equivalents of a query term can be extracted via its translation in an intermediate language. To reduce interference from translation errors, the approach further integrates a competitive linking algorithm into the process of determining the most probable translation. A series of experiments has been conducted, including performance tests on term translation extraction, cross-language information retrieval, and translation suggestions for practical Web search services, respectively. The obtained experimental results have shown that the proposed approach is effective in extracting translations of unknown queries, is easy to combine with the probabilistic retrieval model to improve the cross-language retrieval performance, and is very useful when the considered language pairs lack a sufficient number of anchor texts. Based on the approach, an experimental system called LiveTrans has been developed for English-Chinese cross-language Web search.

AB - To discover translation knowledge in diverse data resources on the Web, this article proposes an effective approach to finding translation equivalents of query terms and constructing multilingual lexicons through the mining of Web anchor texts and link structures. Although Web anchor texts are wide-scoped hypertext resources, not every particular pair of languages contains sufficient anchor texts for effective extraction of translations for Web queries. For more generalized applications, the approach is designed based on a transitive translation model. The translation equivalents of a query term can be extracted via its translation in an intermediate language. To reduce interference from translation errors, the approach further integrates a competitive linking algorithm into the process of determining the most probable translation. A series of experiments has been conducted, including performance tests on term translation extraction, cross-language information retrieval, and translation suggestions for practical Web search services, respectively. The obtained experimental results have shown that the proposed approach is effective in extracting translations of unknown queries, is easy to combine with the probabilistic retrieval model to improve the cross-language retrieval performance, and is very useful when the considered language pairs lack a sufficient number of anchor texts. Based on the approach, an experimental system called LiveTrans has been developed for English-Chinese cross-language Web search.

UR - http://www.scopus.com/inward/record.url?scp=3042778919&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=3042778919&partnerID=8YFLogxK

U2 - 10.1145/984321.984324

DO - 10.1145/984321.984324

M3 - Article

AN - SCOPUS:3042778919

VL - 22

SP - 242

EP - 269

JO - ACM Transactions on Information Systems

JF - ACM Transactions on Information Systems

SN - 1046-8188

IS - 2

ER -