Chain based sampling for monotonic imbalanced classification

Sergio González, Salvador García, Sheng-Tun Li, Francisco Herrera

Research output: Contribution to journalArticle

Abstract

Classification with monotonic constraints arises from some ordinal real-life problems. In these real-life problems, it is common to find a big difference in the number of instances representing middle-ranked classes and the top classes, because the former usually represents the average or the normality, while the latter are the exceptional and uncommon. This is known as class imbalance problem, and it deteriorates the learning of those under-represented classes. However, the traditional solutions cannot be applied to applications that require monotonic restrictions to be asserted. Since these were not designed to consider monotonic constraints, they compromise the monotonicity of the data-sets and the performance of the monotonic classifiers. In this paper, we propose a set of new sampling techniques to mitigate the imbalanced class distribution and, at the same time, maintain the monotonicity of the data-sets. These methods perform the sampling inside monotonic chains, sets of comparable instances, in order to preserve them and, as a result, the monotonicity. Five different approaches are redesigned based on famous under- and over-sampling techniques and their standard and ordinal versions are compared with outstanding results.

LanguageEnglish
Pages187-204
Number of pages18
JournalInformation Sciences
Volume474
DOIs
Publication statusPublished - 2019 Feb 1

Fingerprint

Monotonic
Sampling
Monotonicity
Oversampling
Classifiers
Normality
Classifier
Class
Restriction
Life

All Science Journal Classification (ASJC) codes

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

González, Sergio ; García, Salvador ; Li, Sheng-Tun ; Herrera, Francisco. / Chain based sampling for monotonic imbalanced classification. In: Information Sciences. 2019 ; Vol. 474. pp. 187-204.
@article{3c2bb804dafd48158c08dbae7f5c156c,
title = "Chain based sampling for monotonic imbalanced classification",
abstract = "Classification with monotonic constraints arises from some ordinal real-life problems. In these real-life problems, it is common to find a big difference in the number of instances representing middle-ranked classes and the top classes, because the former usually represents the average or the normality, while the latter are the exceptional and uncommon. This is known as class imbalance problem, and it deteriorates the learning of those under-represented classes. However, the traditional solutions cannot be applied to applications that require monotonic restrictions to be asserted. Since these were not designed to consider monotonic constraints, they compromise the monotonicity of the data-sets and the performance of the monotonic classifiers. In this paper, we propose a set of new sampling techniques to mitigate the imbalanced class distribution and, at the same time, maintain the monotonicity of the data-sets. These methods perform the sampling inside monotonic chains, sets of comparable instances, in order to preserve them and, as a result, the monotonicity. Five different approaches are redesigned based on famous under- and over-sampling techniques and their standard and ordinal versions are compared with outstanding results.",
author = "Sergio Gonz{\'a}lez and Salvador Garc{\'i}a and Sheng-Tun Li and Francisco Herrera",
year = "2019",
month = "2",
day = "1",
doi = "10.1016/j.ins.2018.09.062",
language = "English",
volume = "474",
pages = "187--204",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

Chain based sampling for monotonic imbalanced classification. / González, Sergio; García, Salvador; Li, Sheng-Tun; Herrera, Francisco.

In: Information Sciences, Vol. 474, 01.02.2019, p. 187-204.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Chain based sampling for monotonic imbalanced classification

AU - González, Sergio

AU - García, Salvador

AU - Li, Sheng-Tun

AU - Herrera, Francisco

PY - 2019/2/1

Y1 - 2019/2/1

N2 - Classification with monotonic constraints arises from some ordinal real-life problems. In these real-life problems, it is common to find a big difference in the number of instances representing middle-ranked classes and the top classes, because the former usually represents the average or the normality, while the latter are the exceptional and uncommon. This is known as class imbalance problem, and it deteriorates the learning of those under-represented classes. However, the traditional solutions cannot be applied to applications that require monotonic restrictions to be asserted. Since these were not designed to consider monotonic constraints, they compromise the monotonicity of the data-sets and the performance of the monotonic classifiers. In this paper, we propose a set of new sampling techniques to mitigate the imbalanced class distribution and, at the same time, maintain the monotonicity of the data-sets. These methods perform the sampling inside monotonic chains, sets of comparable instances, in order to preserve them and, as a result, the monotonicity. Five different approaches are redesigned based on famous under- and over-sampling techniques and their standard and ordinal versions are compared with outstanding results.

AB - Classification with monotonic constraints arises from some ordinal real-life problems. In these real-life problems, it is common to find a big difference in the number of instances representing middle-ranked classes and the top classes, because the former usually represents the average or the normality, while the latter are the exceptional and uncommon. This is known as class imbalance problem, and it deteriorates the learning of those under-represented classes. However, the traditional solutions cannot be applied to applications that require monotonic restrictions to be asserted. Since these were not designed to consider monotonic constraints, they compromise the monotonicity of the data-sets and the performance of the monotonic classifiers. In this paper, we propose a set of new sampling techniques to mitigate the imbalanced class distribution and, at the same time, maintain the monotonicity of the data-sets. These methods perform the sampling inside monotonic chains, sets of comparable instances, in order to preserve them and, as a result, the monotonicity. Five different approaches are redesigned based on famous under- and over-sampling techniques and their standard and ordinal versions are compared with outstanding results.

UR - http://www.scopus.com/inward/record.url?scp=85054299588&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054299588&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2018.09.062

DO - 10.1016/j.ins.2018.09.062

M3 - Article

VL - 474

SP - 187

EP - 204

JO - Information Sciences

T2 - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -