Multi-label text categorization forecasting probability problem using support vector machine techniques

Hui Min Chiang, Tai-Yue Wang, Yu Min Chiang

研究成果: Conference contribution

摘要

The pervasiveness of information available on the Internet means that increasing numbers of documents must be classified. Text categorization is not only undertaken by domain experts, but also by automatic text categorization systems. Therefore, a text categorization system with a multi-label classifier is necessary to process the large number of documents. In this study, a proposed multi-label text categorization system is developed to classify multi-label documents. Data mapping is performed to transform data from a high-dimensional space to a lower-dimensional space with paired SVM output values, thus lower the complexity of the computation. A pair-wise comparison approach is applied to set the membership function in each predicted class to judge all possible classified classes. Finally, the overlapped area of two classes is obtained from the decision function to determine where a document is classified. A comparative study is performed on multi-label approaches using Reuter's data sets. The results of the empirical experiment indicate that the proposed multi-label text categorization system performs better than other methods in terms of overall performance indices. Additionally, the probability of 0.5 for model membership function is a good criterion to judge between correctly and incorrectly classified documents from the results of the empirical experiment.

原文English
主出版物標題Information Technologies in Environmental Engineering
主出版物子標題New Trends and Challenges, ITEE 2011
發行者Kluwer Academic Publishers
頁面39-48
頁數10
ISBN(列印)9783642195358
DOIs
出版狀態Published - 2011 一月 1
事件5th International Symposium on Information Technologies in Environmental Engineering, ITEE 2011 - Poznan, Poland
持續時間: 2011 七月 62011 七月 8

出版系列

名字Environmental Science and Engineering (Subseries: Environmental Science)
ISSN(列印)1863-5520

Other

Other5th International Symposium on Information Technologies in Environmental Engineering, ITEE 2011
國家Poland
城市Poznan
期間11-07-0611-07-08

    指紋

All Science Journal Classification (ASJC) codes

  • Environmental Engineering
  • Information Systems

引用此

Chiang, H. M., Wang, T-Y., & Chiang, Y. M. (2011). Multi-label text categorization forecasting probability problem using support vector machine techniques. 於 Information Technologies in Environmental Engineering: New Trends and Challenges, ITEE 2011 (頁 39-48). (Environmental Science and Engineering (Subseries: Environmental Science)). Kluwer Academic Publishers. https://doi.org/10.1007/978-3-642-19536-5_3