Generating virtual samples to improve learning performance in small datasets with non-linear and asymmetric distributions

Liang Sian Lin, Yao San Lin, Der Chiang Li

研究成果: Article同行評審


In today's highly competitive environment, modeling the relationship between inputs and outputs using limited data for a management system at the early stages is important, but difficult. The virtual sample generation (VSG) method has been proposed in many studies to explore potential information to improve prediction performance of learning models for small datasets. However, those studies in general must assume an underlying distribution such as a linear triangular membership function or a Gaussian distribution to generate virtual samples. Thus, previous VSG methods may not effectively upgrade learning performance when the assumed distribution is not elastic enough for small datasets. To address this issue, in this paper, we proposed a novel VSG method called Newton-VSG method to generate virtual samples for small datasets with non-linear and asymmetric distributions. In the suggested method, we used Newton's method to estimate the minimum value of data range and the shape of data distribution. Further, we developed two deep learning models including Siamese network (SN) model for screening virtual sample input values and bagging auto-encoder (AE) model for predicting virtual sample output to ensure the quality of virtual samples. One real dataset for solidification cracking susceptibility test data and an other real dataset obtained from TFT-LCD process of a leading company in Taiwan were used to demonstrate the efficacy of the proposed method. On the partial least square regression (PLSR) and the back propagation neural network (BPNN) predictive models, we compared the proposed method with three state-of-the-art VSG methods in items of the mean absolute error (MAE) and the root mean squared error (RMSE). The experimental results demonstrated that the proposed method outperforms the other three VSG methods in prediction accuracy for small datasets.

出版狀態Published - 2023 9月 1

All Science Journal Classification (ASJC) codes

  • 電腦科學應用
  • 認知神經科學
  • 人工智慧


深入研究「Generating virtual samples to improve learning performance in small datasets with non-linear and asymmetric distributions」主題。共同形成了獨特的指紋。