Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions

Der Chiang Li, Qi Shi Shi, Hung Yu Chen

Research output: Contribution to journalArticlepeer-review


Learning with small data is challenging for most algorithms in regard to building statistically robust models. In previous studies, virtual sample generation (VSG) approaches have been verified as effective in terms of meeting this challenge. However, most VSG methods were developed for numerical inputs. Therefore, to address situations where data has nominal inputs and continuous outputs, a systemic VSG procedure is proposed to generate samples based on fuzzy techniques to further enhance modelling capability. Based on the concept of the data preprocess in the M5′ model tree, we reveal a useful procedure by which to extract the fuzzy relations between nominal inputs and continuous outputs. Further, with the idea of nonparametric operations, we employ trend similarity to present the fuzzy relations between inputs and outputs. Then, these relations are represented by possibility distributions, and sample candidates are created based on these distributions. Finally, the candidates filtered using α-cut are regarded as qualified virtual samples. In the experiments, we demonstrate the effectiveness of our approach through a comparison with two other VSG approaches using five public datasets and two prediction models. Moreover, three parameters used in our approaches are discussed. However, determining how to find the most fit parameters requires further study in the future.

Original languageEnglish
Pages (from-to)2805-2822
Number of pages18
JournalInternational Journal of Machine Learning and Cybernetics
Issue number10
Publication statusPublished - 2019 Oct 1

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence


Dive into the research topics of 'Building robust models for small data containing nominal inputs and continuous outputs based on possibility distributions'. Together they form a unique fingerprint.

Cite this