TY - GEN
T1 - Multi-label text categorization forecasting probability problem using support vector machine techniques
AU - Chiang, Hui Min
AU - Wang, Tai-Yue
AU - Chiang, Yu Min
PY - 2011/1/1
Y1 - 2011/1/1
N2 - The pervasiveness of information available on the Internet means that increasing numbers of documents must be classified. Text categorization is not only undertaken by domain experts, but also by automatic text categorization systems. Therefore, a text categorization system with a multi-label classifier is necessary to process the large number of documents. In this study, a proposed multi-label text categorization system is developed to classify multi-label documents. Data mapping is performed to transform data from a high-dimensional space to a lower-dimensional space with paired SVM output values, thus lower the complexity of the computation. A pair-wise comparison approach is applied to set the membership function in each predicted class to judge all possible classified classes. Finally, the overlapped area of two classes is obtained from the decision function to determine where a document is classified. A comparative study is performed on multi-label approaches using Reuter's data sets. The results of the empirical experiment indicate that the proposed multi-label text categorization system performs better than other methods in terms of overall performance indices. Additionally, the probability of 0.5 for model membership function is a good criterion to judge between correctly and incorrectly classified documents from the results of the empirical experiment.
AB - The pervasiveness of information available on the Internet means that increasing numbers of documents must be classified. Text categorization is not only undertaken by domain experts, but also by automatic text categorization systems. Therefore, a text categorization system with a multi-label classifier is necessary to process the large number of documents. In this study, a proposed multi-label text categorization system is developed to classify multi-label documents. Data mapping is performed to transform data from a high-dimensional space to a lower-dimensional space with paired SVM output values, thus lower the complexity of the computation. A pair-wise comparison approach is applied to set the membership function in each predicted class to judge all possible classified classes. Finally, the overlapped area of two classes is obtained from the decision function to determine where a document is classified. A comparative study is performed on multi-label approaches using Reuter's data sets. The results of the empirical experiment indicate that the proposed multi-label text categorization system performs better than other methods in terms of overall performance indices. Additionally, the probability of 0.5 for model membership function is a good criterion to judge between correctly and incorrectly classified documents from the results of the empirical experiment.
UR - http://www.scopus.com/inward/record.url?scp=84879619295&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84879619295&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-19536-5_3
DO - 10.1007/978-3-642-19536-5_3
M3 - Conference contribution
AN - SCOPUS:84879619295
SN - 9783642195358
T3 - Environmental Science and Engineering (Subseries: Environmental Science)
SP - 39
EP - 48
BT - Information Technologies in Environmental Engineering
PB - Kluwer Academic Publishers
T2 - 5th International Symposium on Information Technologies in Environmental Engineering, ITEE 2011
Y2 - 6 July 2011 through 8 July 2011
ER -