Short-text semantic similarity is an essential technique of natural language search and is widely used in social network analysis and opinion mining to find unknown knowledge. Such similarity measures usually measure short texts with 10-20 words. Similar to spoken utterances, short texts do not necessarily follow formal grammatical rules. The limited information contained in short texts and their syntactic and semantic flexibility make similarity measures difficult. Therefore, this study designed and tested a part-of-speech-based short-text similarity algorithm to solve those problems. The effects of evaluating different parts of speech are thoroughly discussed. The proposed algorithm achieved the best performance using word measures corresponding to different parts of speech.
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications