Machine learning algorithms are widely applied to extract useful information, but the sample size is often an important factor in determining their reliability. The key issue that makes small dataset learning tasks difficult is that the information that such datasets contain cannot fully represent the characteristics of the entire population. The principal approach of this study to overcome this problem is systematically adding artificial samples to fill the data gaps; this research employs the mega-trend-diffusion technique to generate virtual samples to extend the data size. In this paper, a real, small dataset learning task in the array process of a thin-film transistor liquid-crystal display (TFT-LCD) panel manufacturer is proposed, where there are only 20 samples used for learning the relationship between 15 inputs and 36 output attributes. The experiment results show that the approach is effective in building robust back-propagation neural network (BPN) and support vector regression (SVR) models. In addition, a sensitivity analysis is implemented with the 20 samples by using SVR to extract the relationship between the 15 factors and the 36 outputs to help engineers infer process knowledge.
All Science Journal Classification (ASJC) codes