Learning class-imbalanced data with region-impurity synthetic minority oversampling technique

Der Chiang Li, Ssu Yang Wang, Kuan Cheng Huang, Tung I. Tsai

研究成果: Article同行評審

1 引文 斯高帕斯(Scopus)

摘要

Learning from class-imbalanced data is a tough task, which often leads classifiers to fail on identifying the minority class. To balance the class ratio, synthetic minority oversampling technique (SMOTE) has shown its improvement in classifying minority class by generating synthetic minority instances. However, in some scenarios, SMOTE and its extensions will generate noise instances and thus causing the performance degradation. This is because of that they were developed based on kNN (k nearest neighbors), which cannot identify the class distributions between pairs of two minority instances. Furthermore, the number of synthetic instances is left to be discussed in this field of study. To conquer these issues, we propose a new algorithm here named Region-Impurity Synthetic Minority Oversampling Technique (RIOT). Specifically, a region radius, we locate neighbors for minority instances and whereby to identify the relatively hard-to-learn minority instances, by the class ratio within the region and selecting building the base of sample generation. Then, generating synthetic instances until the region is approximately balanced. In the experiment, the results revealed that RIOT can perform better than some SMOTE extensions with less synthetic instances in terms of several model performance indicators for twelve real-world datasets.

原文English
頁(從 - 到)1391-1407
頁數17
期刊Information sciences
607
DOIs
出版狀態Published - 2022 8月

All Science Journal Classification (ASJC) codes

  • 理論電腦科學
  • 軟體
  • 控制與系統工程
  • 電腦科學應用
  • 資訊系統與管理
  • 人工智慧

指紋

深入研究「Learning class-imbalanced data with region-impurity synthetic minority oversampling technique」主題。共同形成了獨特的指紋。

引用此