TY - JOUR
T1 - Chain based sampling for monotonic imbalanced classification
AU - González, Sergio
AU - García, Salvador
AU - Li, Sheng Tun
AU - Herrera, Francisco
N1 - Funding Information:
This work was supported by the Spanish National Research Project TIN2017-89517-P and the Project BigDaP-TOOLS - Ayudas Fundación BBVA a Equipos de Investigación Científica 2016 and by a research scholarship (FPU) given to the author Sergio González by the Spanish Ministry of Education, Culture and Sports. Additionally, this paper is the result of a collaboration with the Prof. Sheng-Tun Li during the international stay made by Sergio González at National Cheng Kung University. This stay was partially supported by the 2017 Summer Program in Taiwan for Spanish Graduate Students held by the Ministry of Science and Technology of Taiwan (R.O.C.).
Publisher Copyright:
© 2018 Elsevier Inc.
PY - 2019/2
Y1 - 2019/2
N2 - Classification with monotonic constraints arises from some ordinal real-life problems. In these real-life problems, it is common to find a big difference in the number of instances representing middle-ranked classes and the top classes, because the former usually represents the average or the normality, while the latter are the exceptional and uncommon. This is known as class imbalance problem, and it deteriorates the learning of those under-represented classes. However, the traditional solutions cannot be applied to applications that require monotonic restrictions to be asserted. Since these were not designed to consider monotonic constraints, they compromise the monotonicity of the data-sets and the performance of the monotonic classifiers. In this paper, we propose a set of new sampling techniques to mitigate the imbalanced class distribution and, at the same time, maintain the monotonicity of the data-sets. These methods perform the sampling inside monotonic chains, sets of comparable instances, in order to preserve them and, as a result, the monotonicity. Five different approaches are redesigned based on famous under- and over-sampling techniques and their standard and ordinal versions are compared with outstanding results.
AB - Classification with monotonic constraints arises from some ordinal real-life problems. In these real-life problems, it is common to find a big difference in the number of instances representing middle-ranked classes and the top classes, because the former usually represents the average or the normality, while the latter are the exceptional and uncommon. This is known as class imbalance problem, and it deteriorates the learning of those under-represented classes. However, the traditional solutions cannot be applied to applications that require monotonic restrictions to be asserted. Since these were not designed to consider monotonic constraints, they compromise the monotonicity of the data-sets and the performance of the monotonic classifiers. In this paper, we propose a set of new sampling techniques to mitigate the imbalanced class distribution and, at the same time, maintain the monotonicity of the data-sets. These methods perform the sampling inside monotonic chains, sets of comparable instances, in order to preserve them and, as a result, the monotonicity. Five different approaches are redesigned based on famous under- and over-sampling techniques and their standard and ordinal versions are compared with outstanding results.
UR - http://www.scopus.com/inward/record.url?scp=85054299588&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054299588&partnerID=8YFLogxK
U2 - 10.1016/j.ins.2018.09.062
DO - 10.1016/j.ins.2018.09.062
M3 - Article
AN - SCOPUS:85054299588
SN - 0020-0255
VL - 474
SP - 187
EP - 204
JO - Information Sciences
JF - Information Sciences
ER -