Learning from small datasets containing nominal attributes

Der Chiang Li, Hung Yu Chen, Qi Shi Shi

Research output: Contribution to journalArticlepeer-review

3 Citations (Scopus)


In many small-data-learning problems, owing to the incomplete data structure, explicit information for decision makers is limited. Although machine learning algorithms are extensively applied to extract knowledge, most of them are developed without considering whether the training sets can fully represent the population properties. Focusing on small data which contains nominal inputs and continuous outputs, this paper develops an effective sample generating procedure based on fuzzy theories to tackle the learning issue by data preprocessing. According to the derived fuzzy relations between categories and continuous outputs, the possibilities of the combinations of categories (virtual samples) can be aggregated when continuous outputs are given. Proper virtual samples are further selected by using fuzzy alpha-cut on the possibility distributions, and these are added to the training sets to form new ones. In the experiment, sixteen datasets taken from the UC Irvine Machine Learning Repository are examined with back-propagation neural networks and support vector regressions. The results reveal that the forecasting accuracies of the two models are significantly improved when they are built with the proposed new training sets. Moreover, the results also indicate the proposed method outperforms bootstrap aggregating and the synthetic minority over-sampling technique-Nominal-Continuous with the greatest amount of statistical support.

Original languageEnglish
Pages (from-to)226-236
Number of pages11
Publication statusPublished - 2018 May 24

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence


Dive into the research topics of 'Learning from small datasets containing nominal attributes'. Together they form a unique fingerprint.

Cite this