A Novel Classification Method Based on a Two-Phase Technique for Learning Imbalanced Text Data

  • 徐 文彥

Student thesis: Doctoral Thesis

Abstract

Imbalanced data has a heavy impact on the performance of models In the case of imbalanced text datasets minority class data are often classified to the majority class resulting in a loss of the minority information and low accuracy Thus it is a serious challenge to determine how to tackle high imbalance ratio distribution of datasets In our project a two-phase classification is carried out aimed toward a text data learning model without distribution skewness where the model adjusts to the optimal condition There are two core stages in the proposed method: In stage one the aim of stage is to create balanced dataset and in stage two the balanced dataset is classified using a symmetric cost-sensitive support vector machine We also adjust the learning parameters in both stages with a genetic algorithm in order to create the optimal model The Yelp review datasets are used in this study to validate the effectiveness of the proposed method In addition four criteria are used to evaluate and compare the performance of the proposed method and the other well-known algorithms: Accuracy F-measure Adjusted G-mean and AUC The experimental results reveal that the new method can significantly improve the learning approach
Date of Award2020
Original languageEnglish
SupervisorDer-Chiang Li (Supervisor)

Cite this

'