TY - JOUR
T1 - Generating virtual samples to improve learning performance in small datasets with non-linear and asymmetric distributions
AU - Lin, Liang Sian
AU - Lin, Yao San
AU - Li, Der Chiang
N1 - Funding Information:
This study is supported by the National Science and Technology Council, Taiwan and National Taipei University of Nursing and Health Sciences, Taiwan. This research was funded by the National Science and Technology Council under Taiwan Ministry of Science and Technology grant contract MOST 110-2222-E-227-001-MY2.
Publisher Copyright:
© 2023 Elsevier B.V.
PY - 2023/9/1
Y1 - 2023/9/1
N2 - In today's highly competitive environment, modeling the relationship between inputs and outputs using limited data for a management system at the early stages is important, but difficult. The virtual sample generation (VSG) method has been proposed in many studies to explore potential information to improve prediction performance of learning models for small datasets. However, those studies in general must assume an underlying distribution such as a linear triangular membership function or a Gaussian distribution to generate virtual samples. Thus, previous VSG methods may not effectively upgrade learning performance when the assumed distribution is not elastic enough for small datasets. To address this issue, in this paper, we proposed a novel VSG method called Newton-VSG method to generate virtual samples for small datasets with non-linear and asymmetric distributions. In the suggested method, we used Newton's method to estimate the minimum value of data range and the shape of data distribution. Further, we developed two deep learning models including Siamese network (SN) model for screening virtual sample input values and bagging auto-encoder (AE) model for predicting virtual sample output to ensure the quality of virtual samples. One real dataset for solidification cracking susceptibility test data and an other real dataset obtained from TFT-LCD process of a leading company in Taiwan were used to demonstrate the efficacy of the proposed method. On the partial least square regression (PLSR) and the back propagation neural network (BPNN) predictive models, we compared the proposed method with three state-of-the-art VSG methods in items of the mean absolute error (MAE) and the root mean squared error (RMSE). The experimental results demonstrated that the proposed method outperforms the other three VSG methods in prediction accuracy for small datasets.
AB - In today's highly competitive environment, modeling the relationship between inputs and outputs using limited data for a management system at the early stages is important, but difficult. The virtual sample generation (VSG) method has been proposed in many studies to explore potential information to improve prediction performance of learning models for small datasets. However, those studies in general must assume an underlying distribution such as a linear triangular membership function or a Gaussian distribution to generate virtual samples. Thus, previous VSG methods may not effectively upgrade learning performance when the assumed distribution is not elastic enough for small datasets. To address this issue, in this paper, we proposed a novel VSG method called Newton-VSG method to generate virtual samples for small datasets with non-linear and asymmetric distributions. In the suggested method, we used Newton's method to estimate the minimum value of data range and the shape of data distribution. Further, we developed two deep learning models including Siamese network (SN) model for screening virtual sample input values and bagging auto-encoder (AE) model for predicting virtual sample output to ensure the quality of virtual samples. One real dataset for solidification cracking susceptibility test data and an other real dataset obtained from TFT-LCD process of a leading company in Taiwan were used to demonstrate the efficacy of the proposed method. On the partial least square regression (PLSR) and the back propagation neural network (BPNN) predictive models, we compared the proposed method with three state-of-the-art VSG methods in items of the mean absolute error (MAE) and the root mean squared error (RMSE). The experimental results demonstrated that the proposed method outperforms the other three VSG methods in prediction accuracy for small datasets.
UR - http://www.scopus.com/inward/record.url?scp=85161652183&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85161652183&partnerID=8YFLogxK
U2 - 10.1016/j.neucom.2023.126408
DO - 10.1016/j.neucom.2023.126408
M3 - Article
AN - SCOPUS:85161652183
SN - 0925-2312
VL - 548
JO - Neurocomputing
JF - Neurocomputing
M1 - 126408
ER -