Exploiting the web as the multilingual corpus for unknown query translation

Jenq Haur Wang, Jei Wen Teng, Wen-Hsiang Lu, Lee Feng Chien

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Users' cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this article, the authors investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. They propose a Web-based term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine. This approach can enhance the construction of a domain-specific bilingual lexicon and bring multilingual support to a digital library that only has monolingual document collections. Very promising results have been obtained in generating effective translation equivalents for many unknown terms, including proper nouns, technical terms, and Web query terms, and in assisting bilingual lexicon construction for a real digital library system.

Original languageEnglish
Pages (from-to)660-670
Number of pages11
JournalJournal of the American Society for Information Science and Technology
Volume57
Issue number5
DOIs
Publication statusPublished - 2006 Mar 1

Fingerprint

Digital libraries
Query languages
Glossaries
Search engines
information retrieval
dictionary
search engine
technical language
Query
World Wide Web
Query translation
language

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Human-Computer Interaction
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

@article{a40a21bf06324ce5942fb7c2c24f017e,
title = "Exploiting the web as the multilingual corpus for unknown query translation",
abstract = "Users' cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this article, the authors investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. They propose a Web-based term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine. This approach can enhance the construction of a domain-specific bilingual lexicon and bring multilingual support to a digital library that only has monolingual document collections. Very promising results have been obtained in generating effective translation equivalents for many unknown terms, including proper nouns, technical terms, and Web query terms, and in assisting bilingual lexicon construction for a real digital library system.",
author = "Wang, {Jenq Haur} and Teng, {Jei Wen} and Wen-Hsiang Lu and Chien, {Lee Feng}",
year = "2006",
month = "3",
day = "1",
doi = "10.1002/asi.20328",
language = "English",
volume = "57",
pages = "660--670",
journal = "Journal of the Association for Information Science and Technology",
issn = "2330-1635",
publisher = "John Wiley and Sons Ltd",
number = "5",

}

Exploiting the web as the multilingual corpus for unknown query translation. / Wang, Jenq Haur; Teng, Jei Wen; Lu, Wen-Hsiang; Chien, Lee Feng.

In: Journal of the American Society for Information Science and Technology, Vol. 57, No. 5, 01.03.2006, p. 660-670.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Exploiting the web as the multilingual corpus for unknown query translation

AU - Wang, Jenq Haur

AU - Teng, Jei Wen

AU - Lu, Wen-Hsiang

AU - Chien, Lee Feng

PY - 2006/3/1

Y1 - 2006/3/1

N2 - Users' cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this article, the authors investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. They propose a Web-based term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine. This approach can enhance the construction of a domain-specific bilingual lexicon and bring multilingual support to a digital library that only has monolingual document collections. Very promising results have been obtained in generating effective translation equivalents for many unknown terms, including proper nouns, technical terms, and Web query terms, and in assisting bilingual lexicon construction for a real digital library system.

AB - Users' cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this article, the authors investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. They propose a Web-based term translation approach to determine effective translations for unknown query terms by mining bilingual search-result pages obtained from a real Web search engine. This approach can enhance the construction of a domain-specific bilingual lexicon and bring multilingual support to a digital library that only has monolingual document collections. Very promising results have been obtained in generating effective translation equivalents for many unknown terms, including proper nouns, technical terms, and Web query terms, and in assisting bilingual lexicon construction for a real digital library system.

UR - http://www.scopus.com/inward/record.url?scp=33645039047&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33645039047&partnerID=8YFLogxK

U2 - 10.1002/asi.20328

DO - 10.1002/asi.20328

M3 - Article

VL - 57

SP - 660

EP - 670

JO - Journal of the Association for Information Science and Technology

JF - Journal of the Association for Information Science and Technology

SN - 2330-1635

IS - 5

ER -