Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation

Liang Sian Lin, Yao San Lin, Der Chiang Li, Yun Hsuan Liu

研究成果: Article同行評審

1 引文 斯高帕斯(Scopus)

摘要

The number of reliable samples obtained in early decision-making activity is usually relatively small. Due to variable distribution and incomplete structure of tiny datasets, it is challenging to create reliable and robust predictive modeling using classic statistical and machine learning models in small sample settings. The virtual sample generation (VSG) technique improves model learning accuracies for minimal datasets across diverse applications. Virtual samples on independent variables were generated using established VSG methods predicated on the assumption of a probability distribution or a membership function to fill data gaps. However, in the actual world, non-linear function interactions between variables are common. To address this issue, this paper developed a novel VSG method called Dual-VSG, which generates non-linear interpolation virtual samples using a self-supervised learning (SSL) framework to improve learning performance on small datasets. We generated non-linear interpolation virtual samples without labels by estimating non-linear functions and transforming them into a high-dimensional space using the proposed dual-net model. The weights of the dual-net model are transferred to a downstream task to generate virtual sample labels. To demonstrate the effectiveness of the suggested strategy, this research employed five datasets. On the Backpropagation Neural Networks (BPNN) predictive model, we compared the suggested method's prediction performance to two state-of-the-art VSG approaches. To assess prediction performance on a regression dataset, the Mean Absolute Percentage Error (MAPE) and the Root Mean Square Error (RMSE) are used. Furthermore, the classification accuracy (ACC) and the Fl measure are used to assess classification capability on classification datasets. In addition, the paired t-test was utilized to see if the suggested Dual-VSG approach differed significantly from the other VSG methods in terms of RMSE, MAPE, accuracy (ACC), or F1 score. For short datasets, the suggested Dual-VSG method outperforms those VSG methods, according to our experimental results.

原文English
文章編號113996
期刊Decision Support Systems
172
DOIs
出版狀態Published - 2023 9月

All Science Journal Classification (ASJC) codes

  • 管理資訊系統
  • 資訊系統
  • 發展與教育心理學
  • 藝術與人文(雜項)
  • 資訊系統與管理

指紋

深入研究「Improved learning performance for small datasets in high dimensions by new dual-net model for non-linear interpolation virtual sample generation」主題。共同形成了獨特的指紋。

引用此