A similarity measure for text processing

Jung-Yi Jiang, Wen Hao Cheng, Yu Shu Chiou, Shie Jue Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

In this paper, we propose a novel similarity measure for document data processing. For two document vectors, the proposed measure takes three cases into account: a) The feature considered appears in both documents, b) the feature considered appears in only one document, and c) the feature considered appears in none of the documents. For the first case, we give a lower bound and decrease the similarity according to the difference between the feature values of the two documents. For the second case, we give a fixed value disregarding the magnitude of the feature value. For the last case, we treat it as an identity, Experimental results show that our proposed method can work more effectively than others.

Original languageEnglish
Title of host publicationProceedings of 2011 International Conference on Machine Learning and Cybernetics, ICMLC 2011
Pages1460-1465
Number of pages6
DOIs
Publication statusPublished - 2011 Nov 7
Event2011 International Conference on Machine Learning and Cybernetics, ICMLC 2011 - Guilin, Guangxi, China
Duration: 2011 Jul 102011 Jul 13

Publication series

NameProceedings - International Conference on Machine Learning and Cybernetics
Volume4
ISSN (Print)2160-133X
ISSN (Electronic)2160-1348

Other

Other2011 International Conference on Machine Learning and Cybernetics, ICMLC 2011
CountryChina
CityGuilin, Guangxi
Period11-07-1011-07-13

Fingerprint

Text processing

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Human-Computer Interaction

Cite this

Jiang, J-Y., Cheng, W. H., Chiou, Y. S., & Lee, S. J. (2011). A similarity measure for text processing. In Proceedings of 2011 International Conference on Machine Learning and Cybernetics, ICMLC 2011 (pp. 1460-1465). [6016998] (Proceedings - International Conference on Machine Learning and Cybernetics; Vol. 4). https://doi.org/10.1109/ICMLC.2011.6016998
Jiang, Jung-Yi ; Cheng, Wen Hao ; Chiou, Yu Shu ; Lee, Shie Jue. / A similarity measure for text processing. Proceedings of 2011 International Conference on Machine Learning and Cybernetics, ICMLC 2011. 2011. pp. 1460-1465 (Proceedings - International Conference on Machine Learning and Cybernetics).
@inproceedings{7b42f20784bb402b9506daa01a2b4ea5,
title = "A similarity measure for text processing",
abstract = "In this paper, we propose a novel similarity measure for document data processing. For two document vectors, the proposed measure takes three cases into account: a) The feature considered appears in both documents, b) the feature considered appears in only one document, and c) the feature considered appears in none of the documents. For the first case, we give a lower bound and decrease the similarity according to the difference between the feature values of the two documents. For the second case, we give a fixed value disregarding the magnitude of the feature value. For the last case, we treat it as an identity, Experimental results show that our proposed method can work more effectively than others.",
author = "Jung-Yi Jiang and Cheng, {Wen Hao} and Chiou, {Yu Shu} and Lee, {Shie Jue}",
year = "2011",
month = "11",
day = "7",
doi = "10.1109/ICMLC.2011.6016998",
language = "English",
isbn = "9781457703065",
series = "Proceedings - International Conference on Machine Learning and Cybernetics",
pages = "1460--1465",
booktitle = "Proceedings of 2011 International Conference on Machine Learning and Cybernetics, ICMLC 2011",

}

Jiang, J-Y, Cheng, WH, Chiou, YS & Lee, SJ 2011, A similarity measure for text processing. in Proceedings of 2011 International Conference on Machine Learning and Cybernetics, ICMLC 2011., 6016998, Proceedings - International Conference on Machine Learning and Cybernetics, vol. 4, pp. 1460-1465, 2011 International Conference on Machine Learning and Cybernetics, ICMLC 2011, Guilin, Guangxi, China, 11-07-10. https://doi.org/10.1109/ICMLC.2011.6016998

A similarity measure for text processing. / Jiang, Jung-Yi; Cheng, Wen Hao; Chiou, Yu Shu; Lee, Shie Jue.

Proceedings of 2011 International Conference on Machine Learning and Cybernetics, ICMLC 2011. 2011. p. 1460-1465 6016998 (Proceedings - International Conference on Machine Learning and Cybernetics; Vol. 4).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - A similarity measure for text processing

AU - Jiang, Jung-Yi

AU - Cheng, Wen Hao

AU - Chiou, Yu Shu

AU - Lee, Shie Jue

PY - 2011/11/7

Y1 - 2011/11/7

N2 - In this paper, we propose a novel similarity measure for document data processing. For two document vectors, the proposed measure takes three cases into account: a) The feature considered appears in both documents, b) the feature considered appears in only one document, and c) the feature considered appears in none of the documents. For the first case, we give a lower bound and decrease the similarity according to the difference between the feature values of the two documents. For the second case, we give a fixed value disregarding the magnitude of the feature value. For the last case, we treat it as an identity, Experimental results show that our proposed method can work more effectively than others.

AB - In this paper, we propose a novel similarity measure for document data processing. For two document vectors, the proposed measure takes three cases into account: a) The feature considered appears in both documents, b) the feature considered appears in only one document, and c) the feature considered appears in none of the documents. For the first case, we give a lower bound and decrease the similarity according to the difference between the feature values of the two documents. For the second case, we give a fixed value disregarding the magnitude of the feature value. For the last case, we treat it as an identity, Experimental results show that our proposed method can work more effectively than others.

UR - http://www.scopus.com/inward/record.url?scp=80155138569&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80155138569&partnerID=8YFLogxK

U2 - 10.1109/ICMLC.2011.6016998

DO - 10.1109/ICMLC.2011.6016998

M3 - Conference contribution

AN - SCOPUS:80155138569

SN - 9781457703065

T3 - Proceedings - International Conference on Machine Learning and Cybernetics

SP - 1460

EP - 1465

BT - Proceedings of 2011 International Conference on Machine Learning and Cybernetics, ICMLC 2011

ER -

Jiang J-Y, Cheng WH, Chiou YS, Lee SJ. A similarity measure for text processing. In Proceedings of 2011 International Conference on Machine Learning and Cybernetics, ICMLC 2011. 2011. p. 1460-1465. 6016998. (Proceedings - International Conference on Machine Learning and Cybernetics). https://doi.org/10.1109/ICMLC.2011.6016998