A Local Information Based Synthetic Minority Oversampling Technique for Imbalanced Dataset Learning

  • 廖 書緯

Student thesis: Doctoral Thesis


A dataset is imbalanced if the classes are not approximately equally represented Data mining on imbalanced datasets receives more and more attentions in recent years The class imbalanced problem occurs when there’s just few number of sample in one classes comparing to other classes The SMOTE : Synthetic Minority Over-Sampling Technique is an effective method to solve imbalanced learning problem The way is to take one of the minority sample as the seed sample and find the minority sample nearby as the selected sample After finding seed sample and selected sample we generate virtual sample between two minority samples Therefore in this paper we consider the influence between majority samples and the selected sample and the influence between minority samples and the selected sample This study develops a new sample-generating procedure by local majority class information and local minority class information Four datasets taken from UCI Machine Learning Repository in experiments We compare the proposed method with SMOTE and other extension version including Borderline SMOTE1(B1-SMOTE) Safe-Level SMOTE(SL-SMOTE) Local-Neighborhood SMOTE(LN-SMOTE) and ADASYN The result shows that the proposed method achieve better classifier performance for the minority class than other methods after examined the data sets with C4 5 decision trees
Date of Award2019
Original languageEnglish
SupervisorDer-Chiang Li (Supervisor)

Cite this