A fuzzy-rough set based semantic similarity measure between cross-lingual documents

Hsun Hui Huang, Horng Chang Yang, Yau-Hwang Kuo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

As cross-lingual information retrieval attracts increasing attention, tools that measure cross-lingual document similarity become desirable. Since the way that people convey thoughts at the abstract concept level makes little, if any, difference in the languages they use, it is possible to measure semantic similarity between different lingual documents based on the concepts conveyed by the documents. In this paper, a novel fuzzy rough set based method for measurement of semantic similarity between cross lingual (Chinese and English) documents is proposed. Aided by a bilingual dictionary and Wordnet, translation is processed like word sense disambiguation and all the distilled senses are used to construct a fuzzy approximation space using a fuzzy partition algorithm. In the fuzzy approximation space documents are approximated by their fuzzy upper and lower approximations and the similarity measure is defined accordingly. The upper and lower approximations correspond to the slack and tight extent of the concepts in their associated document. This method makes possible to distinguish among the documents whose original texts seem not similar but conveyed concepts are similar.

Original languageEnglish
Title of host publication3rd International Conference on Innovative Computing Information and Control, ICICIC'08
DOIs
Publication statusPublished - 2008 Sep 30
Event3rd International Conference on Innovative Computing Information and Control, ICICIC'08 - Dalian, Liaoning, China
Duration: 2008 Jun 182008 Jun 20

Publication series

Name3rd International Conference on Innovative Computing Information and Control, ICICIC'08

Other

Other3rd International Conference on Innovative Computing Information and Control, ICICIC'08
CountryChina
CityDalian, Liaoning
Period08-06-1808-06-20

Fingerprint

Semantics
Glossaries
Information retrieval

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software
  • Control and Systems Engineering

Cite this

Huang, H. H., Yang, H. C., & Kuo, Y-H. (2008). A fuzzy-rough set based semantic similarity measure between cross-lingual documents. In 3rd International Conference on Innovative Computing Information and Control, ICICIC'08 [4603271] (3rd International Conference on Innovative Computing Information and Control, ICICIC'08). https://doi.org/10.1109/ICICIC.2008.33
Huang, Hsun Hui ; Yang, Horng Chang ; Kuo, Yau-Hwang. / A fuzzy-rough set based semantic similarity measure between cross-lingual documents. 3rd International Conference on Innovative Computing Information and Control, ICICIC'08. 2008. (3rd International Conference on Innovative Computing Information and Control, ICICIC'08).
@inproceedings{e38b42bb498f4b3bb4e205c5ab3de48b,
title = "A fuzzy-rough set based semantic similarity measure between cross-lingual documents",
abstract = "As cross-lingual information retrieval attracts increasing attention, tools that measure cross-lingual document similarity become desirable. Since the way that people convey thoughts at the abstract concept level makes little, if any, difference in the languages they use, it is possible to measure semantic similarity between different lingual documents based on the concepts conveyed by the documents. In this paper, a novel fuzzy rough set based method for measurement of semantic similarity between cross lingual (Chinese and English) documents is proposed. Aided by a bilingual dictionary and Wordnet, translation is processed like word sense disambiguation and all the distilled senses are used to construct a fuzzy approximation space using a fuzzy partition algorithm. In the fuzzy approximation space documents are approximated by their fuzzy upper and lower approximations and the similarity measure is defined accordingly. The upper and lower approximations correspond to the slack and tight extent of the concepts in their associated document. This method makes possible to distinguish among the documents whose original texts seem not similar but conveyed concepts are similar.",
author = "Huang, {Hsun Hui} and Yang, {Horng Chang} and Yau-Hwang Kuo",
year = "2008",
month = "9",
day = "30",
doi = "10.1109/ICICIC.2008.33",
language = "English",
isbn = "9780769531618",
series = "3rd International Conference on Innovative Computing Information and Control, ICICIC'08",
booktitle = "3rd International Conference on Innovative Computing Information and Control, ICICIC'08",

}

Huang, HH, Yang, HC & Kuo, Y-H 2008, A fuzzy-rough set based semantic similarity measure between cross-lingual documents. in 3rd International Conference on Innovative Computing Information and Control, ICICIC'08., 4603271, 3rd International Conference on Innovative Computing Information and Control, ICICIC'08, 3rd International Conference on Innovative Computing Information and Control, ICICIC'08, Dalian, Liaoning, China, 08-06-18. https://doi.org/10.1109/ICICIC.2008.33

A fuzzy-rough set based semantic similarity measure between cross-lingual documents. / Huang, Hsun Hui; Yang, Horng Chang; Kuo, Yau-Hwang.

3rd International Conference on Innovative Computing Information and Control, ICICIC'08. 2008. 4603271 (3rd International Conference on Innovative Computing Information and Control, ICICIC'08).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A fuzzy-rough set based semantic similarity measure between cross-lingual documents

AU - Huang, Hsun Hui

AU - Yang, Horng Chang

AU - Kuo, Yau-Hwang

PY - 2008/9/30

Y1 - 2008/9/30

N2 - As cross-lingual information retrieval attracts increasing attention, tools that measure cross-lingual document similarity become desirable. Since the way that people convey thoughts at the abstract concept level makes little, if any, difference in the languages they use, it is possible to measure semantic similarity between different lingual documents based on the concepts conveyed by the documents. In this paper, a novel fuzzy rough set based method for measurement of semantic similarity between cross lingual (Chinese and English) documents is proposed. Aided by a bilingual dictionary and Wordnet, translation is processed like word sense disambiguation and all the distilled senses are used to construct a fuzzy approximation space using a fuzzy partition algorithm. In the fuzzy approximation space documents are approximated by their fuzzy upper and lower approximations and the similarity measure is defined accordingly. The upper and lower approximations correspond to the slack and tight extent of the concepts in their associated document. This method makes possible to distinguish among the documents whose original texts seem not similar but conveyed concepts are similar.

AB - As cross-lingual information retrieval attracts increasing attention, tools that measure cross-lingual document similarity become desirable. Since the way that people convey thoughts at the abstract concept level makes little, if any, difference in the languages they use, it is possible to measure semantic similarity between different lingual documents based on the concepts conveyed by the documents. In this paper, a novel fuzzy rough set based method for measurement of semantic similarity between cross lingual (Chinese and English) documents is proposed. Aided by a bilingual dictionary and Wordnet, translation is processed like word sense disambiguation and all the distilled senses are used to construct a fuzzy approximation space using a fuzzy partition algorithm. In the fuzzy approximation space documents are approximated by their fuzzy upper and lower approximations and the similarity measure is defined accordingly. The upper and lower approximations correspond to the slack and tight extent of the concepts in their associated document. This method makes possible to distinguish among the documents whose original texts seem not similar but conveyed concepts are similar.

UR - http://www.scopus.com/inward/record.url?scp=52449110754&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=52449110754&partnerID=8YFLogxK

U2 - 10.1109/ICICIC.2008.33

DO - 10.1109/ICICIC.2008.33

M3 - Conference contribution

SN - 9780769531618

T3 - 3rd International Conference on Innovative Computing Information and Control, ICICIC'08

BT - 3rd International Conference on Innovative Computing Information and Control, ICICIC'08

ER -

Huang HH, Yang HC, Kuo Y-H. A fuzzy-rough set based semantic similarity measure between cross-lingual documents. In 3rd International Conference on Innovative Computing Information and Control, ICICIC'08. 2008. 4603271. (3rd International Conference on Innovative Computing Information and Control, ICICIC'08). https://doi.org/10.1109/ICICIC.2008.33