TY - JOUR
T1 - Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions
AU - Li, Der Chiang
AU - Shi, Qi Shi
AU - Chen, Hung Yu
N1 - Publisher Copyright:
© 2018, Springer-Verlag GmbH Germany, part of Springer Nature.
PY - 2019/10/1
Y1 - 2019/10/1
N2 - Learning with small data is challenging for most algorithms in regard to building statistically robust models. In previous studies, virtual sample generation (VSG) approaches have been verified as effective in terms of meeting this challenge. However, most VSG methods were developed for numerical inputs. Therefore, to address situations where data has nominal inputs and continuous outputs, a systemic VSG procedure is proposed to generate samples based on fuzzy techniques to further enhance modelling capability. Based on the concept of the data preprocess in the M5′ model tree, we reveal a useful procedure by which to extract the fuzzy relations between nominal inputs and continuous outputs. Further, with the idea of nonparametric operations, we employ trend similarity to present the fuzzy relations between inputs and outputs. Then, these relations are represented by possibility distributions, and sample candidates are created based on these distributions. Finally, the candidates filtered using α-cut are regarded as qualified virtual samples. In the experiments, we demonstrate the effectiveness of our approach through a comparison with two other VSG approaches using five public datasets and two prediction models. Moreover, three parameters used in our approaches are discussed. However, determining how to find the most fit parameters requires further study in the future.
AB - Learning with small data is challenging for most algorithms in regard to building statistically robust models. In previous studies, virtual sample generation (VSG) approaches have been verified as effective in terms of meeting this challenge. However, most VSG methods were developed for numerical inputs. Therefore, to address situations where data has nominal inputs and continuous outputs, a systemic VSG procedure is proposed to generate samples based on fuzzy techniques to further enhance modelling capability. Based on the concept of the data preprocess in the M5′ model tree, we reveal a useful procedure by which to extract the fuzzy relations between nominal inputs and continuous outputs. Further, with the idea of nonparametric operations, we employ trend similarity to present the fuzzy relations between inputs and outputs. Then, these relations are represented by possibility distributions, and sample candidates are created based on these distributions. Finally, the candidates filtered using α-cut are regarded as qualified virtual samples. In the experiments, we demonstrate the effectiveness of our approach through a comparison with two other VSG approaches using five public datasets and two prediction models. Moreover, three parameters used in our approaches are discussed. However, determining how to find the most fit parameters requires further study in the future.
UR - http://www.scopus.com/inward/record.url?scp=85064273904&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85064273904&partnerID=8YFLogxK
U2 - 10.1007/s13042-018-00905-2
DO - 10.1007/s13042-018-00905-2
M3 - Article
AN - SCOPUS:85064273904
SN - 1868-8071
VL - 10
SP - 2805
EP - 2822
JO - International Journal of Machine Learning and Cybernetics
JF - International Journal of Machine Learning and Cybernetics
IS - 10
ER -