TY - GEN
T1 - Learning question focus and semantically related features from Web search results for Chinese question classification
AU - Lin, Shu Jung
AU - Lu, Wen Hsiang
PY - 2006
Y1 - 2006
N2 - Recently, some machine learning techniques like support vector machines are employed for question classification. However, these techniques heavily depend on the availability of large amounts of training data, and may suffer many difficulties while facing various new questions from the real users on the Web. To mitigate the problem of lacking sufficient training data, in this paper, we present a simple learning method that explores Web search results to collect more training data automatically by a few seed terms (question answers). In addition, we propose a novel semantically related feature model (SRFM), which takes advantage of question focuses and their semantically related features learned from the larger number of collected training data to support the determination of question type. Our experimental results show that the proposed new learning method can obtain better classification performance than the bigram language modeling (LM) approach for the questions with untrained question focuses.
AB - Recently, some machine learning techniques like support vector machines are employed for question classification. However, these techniques heavily depend on the availability of large amounts of training data, and may suffer many difficulties while facing various new questions from the real users on the Web. To mitigate the problem of lacking sufficient training data, in this paper, we present a simple learning method that explores Web search results to collect more training data automatically by a few seed terms (question answers). In addition, we propose a novel semantically related feature model (SRFM), which takes advantage of question focuses and their semantically related features learned from the larger number of collected training data to support the determination of question type. Our experimental results show that the proposed new learning method can obtain better classification performance than the bigram language modeling (LM) approach for the questions with untrained question focuses.
UR - http://www.scopus.com/inward/record.url?scp=33751353581&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33751353581&partnerID=8YFLogxK
U2 - 10.1007/11880592_22
DO - 10.1007/11880592_22
M3 - Conference contribution
AN - SCOPUS:33751353581
SN - 3540457801
SN - 9783540457800
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 284
EP - 296
BT - Information Retrieval Technology - Third Asia Information Retrieval Symposium, AIRS 2006, Proceedings
PB - Springer Verlag
T2 - 3rd Asia Information Retrieval Symposium, AIRS 2006
Y2 - 16 October 2006 through 18 October 2006
ER -