TY - GEN
T1 - Automatic domain-specific sentiment lexicon generation with label propagation
AU - Tai, Yen Jen
AU - Kao, Hung Yu
PY - 2013
Y1 - 2013
N2 - Nowadays, the advance of social media has led to the explosive growth of opinion data. Therefore, sentiment analysis has attracted a lot of attentions. Currently, sentiment analysis applications are divided into two main approaches, the lexicon-based approach and the machine-learning approach. However, both of them face the challenge of obtaining a large amount of human-labeled training data and corpus. For the lexicon-based approach, it requires a sentiment lexicon to determine the opinion polarity. There are many existing benchmark sentiment lexicons, but they cannot cover all the domain-specific words meanings. Thus, automatic generation of a domain-specific sentiment lexicon becomes an important task. We propose a framework to automatically generate sentiment lexicon. First, we determine the semantic similarity between two words in the entire unlabeled corpus. We treat the words as nodes and similarities as weighted edges to construct word graphs. A graph-based semi-supervised label propagation method finally assigns the polarity to unlabeled words through the proposed propagation process. Experiments conducted on the microblog data, Twitter, show that our approach leads to a better performance than baseline approaches and general-purpose sentiment dictionaries.
AB - Nowadays, the advance of social media has led to the explosive growth of opinion data. Therefore, sentiment analysis has attracted a lot of attentions. Currently, sentiment analysis applications are divided into two main approaches, the lexicon-based approach and the machine-learning approach. However, both of them face the challenge of obtaining a large amount of human-labeled training data and corpus. For the lexicon-based approach, it requires a sentiment lexicon to determine the opinion polarity. There are many existing benchmark sentiment lexicons, but they cannot cover all the domain-specific words meanings. Thus, automatic generation of a domain-specific sentiment lexicon becomes an important task. We propose a framework to automatically generate sentiment lexicon. First, we determine the semantic similarity between two words in the entire unlabeled corpus. We treat the words as nodes and similarities as weighted edges to construct word graphs. A graph-based semi-supervised label propagation method finally assigns the polarity to unlabeled words through the proposed propagation process. Experiments conducted on the microblog data, Twitter, show that our approach leads to a better performance than baseline approaches and general-purpose sentiment dictionaries.
UR - http://www.scopus.com/inward/record.url?scp=84896814545&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84896814545&partnerID=8YFLogxK
U2 - 10.1145/2539150.2539190
DO - 10.1145/2539150.2539190
M3 - Conference contribution
AN - SCOPUS:84896814545
SN - 9781450321136
T3 - ACM International Conference Proceeding Series
SP - 53
EP - 62
BT - Proceedings - 15th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2013
T2 - 15th International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2013
Y2 - 2 December 2013 through 4 December 2013
ER -