Small data-set learning problems are attracting more attention because of the short product lifecycles caused by the increasing pressure of global competition. Although statistical approaches and machine learning algorithms are widely applied to extract information from such data, these are basically developed on the assumption that training samples can represent the properties of the whole population. However, as the properties that the training samples contain are limited, the knowledge that the learning algorithms extract may also be deficient. Virtual sample generation approaches, used as a kind of data pretreatment, have proved their effectiveness when handling small data-set problems. By considering the relationships among attributes in the value generation procedure, this research proposes a non-parametric process to learn the trend similarities among attributes, and then uses these to estimate the corresponding ranges that attribute values may be located in when other attribute values are given. The ranges of the attribute values of the virtual samples are then stepwise estimated using the triangular membership functions (MFs) built to represent the attribute sample distributions. In the experiment, two real cases are examined with four modelling tools, including the M5′ model tree (M5′), multiple linear regression, support vector regression and back-propagation neural network. The results show that the forecasting accuracies of the four modelling tools are improved when training sets contain virtual samples. In addition, the outcomes of the proposed procedure show significantly smaller predictive errors than those of other approaches.
All Science Journal Classification (ASJC) codes
- Strategy and Management
- Management Science and Operations Research
- Industrial and Manufacturing Engineering