Hierarchically SVM classification based on support vector clustering method and its application to document categorization

Pei Yi Hao, Jung-Hsien Chiang, Yi Kun Tu

Research output: Contribution to journalArticlepeer-review

89 Citations (Scopus)

Abstract

Automatic categorization of documents into pre-defined topic hierarchies or taxonomies is a crucial step in knowledge and content management. Standard machine learning techniques like support vector machines and related large margin methods have been successfully applied for this task, albeit the fact is that they ignore the inter-class relationships. Unfortunately, in the context of document categorization, we face a large number of classes and a huge number of relevant features needed to distinguish between them. The computational cost of training a classifier for a problem of this size is prohibitive. It has also been observed that obtaining a classifier that discriminates between two groups of classes is much easier than distinguishing simultaneously among all classes. This has prompted substantial research in using hierarchical classifiers to address single multi-class problems. In this paper, we propose a novel hierarchical classification method that generalizes support vector machine learning that is based on the results of support vector clustering method, and are structured in a way that mirrors the class hierarchy. Compared to previous non-hierarchical SVM classifier and famous documents categorization systems, the proposed hierarchical SVM classification has a better improvement in classification accuracy in the standard Reuters corpus.

Original languageEnglish
Pages (from-to)627-635
Number of pages9
JournalExpert Systems With Applications
Volume33
Issue number3
DOIs
Publication statusPublished - 2007 Oct 1

All Science Journal Classification (ASJC) codes

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Hierarchically SVM classification based on support vector clustering method and its application to document categorization'. Together they form a unique fingerprint.

Cite this