A sense based similarity measure for cross-lingual documents

Hsun Hui Huang, Horng Chang Yang, Yau Hwang Kuo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As cross-lingual information retrieval attracts increasing attention, tools that measure cross-lingual document similarity become desirable. Since the way that people convey thoughts at the abstract concept level makes little, if any, difference in the languages they use, it is possible to measure semantic similarity between different lingual documents based on the concepts conveyed by the documents. In this paper, we use senses for document representation to alleviate the barrier of different languages and adopt fuzzy set functions to cope with the inherent fuzziness among senses and propose two document similarity measures- one based on Tversky's notion on similarity and the other on the much used information retrieval criterion. Their performances are compared experimentally. We only focus on documents in English and Chinese. But the proposed approach can be easily extended to process documents in other languages.

Original languageEnglish
Title of host publicationProceedings - 8th International Conference on Intelligent Systems Design and Applications, ISDA 2008
Pages9-13
Number of pages5
DOIs
Publication statusPublished - 2008
Event8th International Conference on Intelligent Systems Design and Applications, ISDA 2008 - Kaohsiung, Taiwan
Duration: 2008 Nov 262008 Nov 28

Publication series

NameProceedings - 8th International Conference on Intelligent Systems Design and Applications, ISDA 2008
Volume1

Other

Other8th International Conference on Intelligent Systems Design and Applications, ISDA 2008
CountryTaiwan
CityKaohsiung
Period08-11-2608-11-28

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Control and Systems Engineering

Fingerprint Dive into the research topics of 'A sense based similarity measure for cross-lingual documents'. Together they form a unique fingerprint.

Cite this