Generating virtual samples to improve learning performance in small datasets with non-linear and asymmetric distributions

Liang Sian Lin, Yao San Lin, Der Chiang Li

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

In today's highly competitive environment, modeling the relationship between inputs and outputs using limited data for a management system at the early stages is important, but difficult. The virtual sample generation (VSG) method has been proposed in many studies to explore potential information to improve prediction performance of learning models for small datasets. However, those studies in general must assume an underlying distribution such as a linear triangular membership function or a Gaussian distribution to generate virtual samples. Thus, previous VSG methods may not effectively upgrade learning performance when the assumed distribution is not elastic enough for small datasets. To address this issue, in this paper, we proposed a novel VSG method called Newton-VSG method to generate virtual samples for small datasets with non-linear and asymmetric distributions. In the suggested method, we used Newton's method to estimate the minimum value of data range and the shape of data distribution. Further, we developed two deep learning models including Siamese network (SN) model for screening virtual sample input values and bagging auto-encoder (AE) model for predicting virtual sample output to ensure the quality of virtual samples. One real dataset for solidification cracking susceptibility test data and an other real dataset obtained from TFT-LCD process of a leading company in Taiwan were used to demonstrate the efficacy of the proposed method. On the partial least square regression (PLSR) and the back propagation neural network (BPNN) predictive models, we compared the proposed method with three state-of-the-art VSG methods in items of the mean absolute error (MAE) and the root mean squared error (RMSE). The experimental results demonstrated that the proposed method outperforms the other three VSG methods in prediction accuracy for small datasets.

Original languageEnglish
Article number126408
JournalNeurocomputing
Volume548
DOIs
Publication statusPublished - 2023 Sept 1

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Generating virtual samples to improve learning performance in small datasets with non-linear and asymmetric distributions'. Together they form a unique fingerprint.

Cite this