TY - JOUR
T1 - Applying VSM and LCS to develop an integrated text retrieval mechanism
AU - Tasi, Cheng Shiun
AU - Zang, Yong Ming
AU - Liu, Chien Hung
AU - Huang, Yueh Min
N1 - Funding Information:
The authors thank the National Science Council of the Republic of China for financially supporting this research under Contract No. NSC 97-2511-S-006-001-MY3, NSC 98-2631-S-024-001, and NSC 99-2631-S-006-001.
PY - 2012/3
Y1 - 2012/3
N2 - Text retrieval has received a lot of attention in computer science. In the text retrieval field, the most widely-adopted similarity technique is using vector space models (VSM) to evaluate the weight of terms and using Cosine, Jaccard or Dice to measure the similarity between the query and the texts. However, these similarity techniques do not consider the effect of the sequence of the information. In this paper, we propose an integrated text retrieval (ITR) mechanism that takes the advantage of both VSM and longest common subsequence (LCS) algorithm. The key idea of the ITR mechanism is to use LCS to re-evaluate the weight of terms, so that the sequence and weight relationships between the query and the texts can be considered simultaneously. The results of mathematical analysis show that the ITR mechanism can increase the similarity on Jaccard and Dice similarity measurements when a sequential relationship exists between the query and the texts.
AB - Text retrieval has received a lot of attention in computer science. In the text retrieval field, the most widely-adopted similarity technique is using vector space models (VSM) to evaluate the weight of terms and using Cosine, Jaccard or Dice to measure the similarity between the query and the texts. However, these similarity techniques do not consider the effect of the sequence of the information. In this paper, we propose an integrated text retrieval (ITR) mechanism that takes the advantage of both VSM and longest common subsequence (LCS) algorithm. The key idea of the ITR mechanism is to use LCS to re-evaluate the weight of terms, so that the sequence and weight relationships between the query and the texts can be considered simultaneously. The results of mathematical analysis show that the ITR mechanism can increase the similarity on Jaccard and Dice similarity measurements when a sequential relationship exists between the query and the texts.
UR - http://www.scopus.com/inward/record.url?scp=82255175639&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=82255175639&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2011.09.039
DO - 10.1016/j.eswa.2011.09.039
M3 - Article
AN - SCOPUS:82255175639
SN - 0957-4174
VL - 39
SP - 3974
EP - 3982
JO - Expert Systems with Applications
JF - Expert Systems with Applications
IS - 4
ER -