Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions

Der-Chiang Li, Qi Shi Shi, Hung Yu Chen

Research output: Contribution to journalArticle

Abstract

Learning with small data is challenging for most algorithms in regard to building statistically robust models. In previous studies, virtual sample generation (VSG) approaches have been verified as effective in terms of meeting this challenge. However, most VSG methods were developed for numerical inputs. Therefore, to address situations where data has nominal inputs and continuous outputs, a systemic VSG procedure is proposed to generate samples based on fuzzy techniques to further enhance modelling capability. Based on the concept of the data preprocess in the M5′ model tree, we reveal a useful procedure by which to extract the fuzzy relations between nominal inputs and continuous outputs. Further, with the idea of nonparametric operations, we employ trend similarity to present the fuzzy relations between inputs and outputs. Then, these relations are represented by possibility distributions, and sample candidates are created based on these distributions. Finally, the candidates filtered using α-cut are regarded as qualified virtual samples. In the experiments, we demonstrate the effectiveness of our approach through a comparison with two other VSG approaches using five public datasets and two prediction models. Moreover, three parameters used in our approaches are discussed. However, determining how to find the most fit parameters requires further study in the future.

Original languageEnglish
JournalInternational Journal of Machine Learning and Cybernetics
DOIs
Publication statusPublished - 2019 Jan 1

Fingerprint

Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

@article{419d7ca4adf540068f6d241f2310a629,
title = "Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions",
abstract = "Learning with small data is challenging for most algorithms in regard to building statistically robust models. In previous studies, virtual sample generation (VSG) approaches have been verified as effective in terms of meeting this challenge. However, most VSG methods were developed for numerical inputs. Therefore, to address situations where data has nominal inputs and continuous outputs, a systemic VSG procedure is proposed to generate samples based on fuzzy techniques to further enhance modelling capability. Based on the concept of the data preprocess in the M5′ model tree, we reveal a useful procedure by which to extract the fuzzy relations between nominal inputs and continuous outputs. Further, with the idea of nonparametric operations, we employ trend similarity to present the fuzzy relations between inputs and outputs. Then, these relations are represented by possibility distributions, and sample candidates are created based on these distributions. Finally, the candidates filtered using α-cut are regarded as qualified virtual samples. In the experiments, we demonstrate the effectiveness of our approach through a comparison with two other VSG approaches using five public datasets and two prediction models. Moreover, three parameters used in our approaches are discussed. However, determining how to find the most fit parameters requires further study in the future.",
author = "Der-Chiang Li and Shi, {Qi Shi} and Chen, {Hung Yu}",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/s13042-018-00905-2",
language = "English",
journal = "International Journal of Machine Learning and Cybernetics",
issn = "1868-8071",
publisher = "Springer Science + Business Media",

}

TY - JOUR

T1 - Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions

AU - Li, Der-Chiang

AU - Shi, Qi Shi

AU - Chen, Hung Yu

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Learning with small data is challenging for most algorithms in regard to building statistically robust models. In previous studies, virtual sample generation (VSG) approaches have been verified as effective in terms of meeting this challenge. However, most VSG methods were developed for numerical inputs. Therefore, to address situations where data has nominal inputs and continuous outputs, a systemic VSG procedure is proposed to generate samples based on fuzzy techniques to further enhance modelling capability. Based on the concept of the data preprocess in the M5′ model tree, we reveal a useful procedure by which to extract the fuzzy relations between nominal inputs and continuous outputs. Further, with the idea of nonparametric operations, we employ trend similarity to present the fuzzy relations between inputs and outputs. Then, these relations are represented by possibility distributions, and sample candidates are created based on these distributions. Finally, the candidates filtered using α-cut are regarded as qualified virtual samples. In the experiments, we demonstrate the effectiveness of our approach through a comparison with two other VSG approaches using five public datasets and two prediction models. Moreover, three parameters used in our approaches are discussed. However, determining how to find the most fit parameters requires further study in the future.

AB - Learning with small data is challenging for most algorithms in regard to building statistically robust models. In previous studies, virtual sample generation (VSG) approaches have been verified as effective in terms of meeting this challenge. However, most VSG methods were developed for numerical inputs. Therefore, to address situations where data has nominal inputs and continuous outputs, a systemic VSG procedure is proposed to generate samples based on fuzzy techniques to further enhance modelling capability. Based on the concept of the data preprocess in the M5′ model tree, we reveal a useful procedure by which to extract the fuzzy relations between nominal inputs and continuous outputs. Further, with the idea of nonparametric operations, we employ trend similarity to present the fuzzy relations between inputs and outputs. Then, these relations are represented by possibility distributions, and sample candidates are created based on these distributions. Finally, the candidates filtered using α-cut are regarded as qualified virtual samples. In the experiments, we demonstrate the effectiveness of our approach through a comparison with two other VSG approaches using five public datasets and two prediction models. Moreover, three parameters used in our approaches are discussed. However, determining how to find the most fit parameters requires further study in the future.

UR - http://www.scopus.com/inward/record.url?scp=85064273904&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064273904&partnerID=8YFLogxK

U2 - 10.1007/s13042-018-00905-2

DO - 10.1007/s13042-018-00905-2

M3 - Article

JO - International Journal of Machine Learning and Cybernetics

JF - International Journal of Machine Learning and Cybernetics

SN - 1868-8071

ER -