Learning from small datasets containing nominal attributes

Der-Chiang Li, Hung Yu Chen, Qi Shi Shi

Research output: Contribution to journalArticle

Abstract

In many small-data-learning problems, owing to the incomplete data structure, explicit information for decision makers is limited. Although machine learning algorithms are extensively applied to extract knowledge, most of them are developed without considering whether the training sets can fully represent the population properties. Focusing on small data which contains nominal inputs and continuous outputs, this paper develops an effective sample generating procedure based on fuzzy theories to tackle the learning issue by data preprocessing. According to the derived fuzzy relations between categories and continuous outputs, the possibilities of the combinations of categories (virtual samples) can be aggregated when continuous outputs are given. Proper virtual samples are further selected by using fuzzy alpha-cut on the possibility distributions, and these are added to the training sets to form new ones. In the experiment, sixteen datasets taken from the UC Irvine Machine Learning Repository are examined with back-propagation neural networks and support vector regressions. The results reveal that the forecasting accuracies of the two models are significantly improved when they are built with the proposed new training sets. Moreover, the results also indicate the proposed method outperforms bootstrap aggregating and the synthetic minority over-sampling technique-Nominal-Continuous with the greatest amount of statistical support.

Original languageEnglish
Pages (from-to)226-236
Number of pages11
JournalNeurocomputing
Volume291
DOIs
Publication statusPublished - 2018 May 24

Fingerprint

Learning systems
Learning
Backpropagation
Learning algorithms
Data structures
Sampling
Neural networks
Population
Experiments
Machine Learning
Datasets
Support Vector Machine

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Cognitive Neuroscience
  • Artificial Intelligence

Cite this

Li, Der-Chiang ; Chen, Hung Yu ; Shi, Qi Shi. / Learning from small datasets containing nominal attributes. In: Neurocomputing. 2018 ; Vol. 291. pp. 226-236.
@article{6c6cd19f6be147fdbd91f3c01f197aa9,
title = "Learning from small datasets containing nominal attributes",
abstract = "In many small-data-learning problems, owing to the incomplete data structure, explicit information for decision makers is limited. Although machine learning algorithms are extensively applied to extract knowledge, most of them are developed without considering whether the training sets can fully represent the population properties. Focusing on small data which contains nominal inputs and continuous outputs, this paper develops an effective sample generating procedure based on fuzzy theories to tackle the learning issue by data preprocessing. According to the derived fuzzy relations between categories and continuous outputs, the possibilities of the combinations of categories (virtual samples) can be aggregated when continuous outputs are given. Proper virtual samples are further selected by using fuzzy alpha-cut on the possibility distributions, and these are added to the training sets to form new ones. In the experiment, sixteen datasets taken from the UC Irvine Machine Learning Repository are examined with back-propagation neural networks and support vector regressions. The results reveal that the forecasting accuracies of the two models are significantly improved when they are built with the proposed new training sets. Moreover, the results also indicate the proposed method outperforms bootstrap aggregating and the synthetic minority over-sampling technique-Nominal-Continuous with the greatest amount of statistical support.",
author = "Der-Chiang Li and Chen, {Hung Yu} and Shi, {Qi Shi}",
year = "2018",
month = "5",
day = "24",
doi = "10.1016/j.neucom.2018.02.069",
language = "English",
volume = "291",
pages = "226--236",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier",

}

Learning from small datasets containing nominal attributes. / Li, Der-Chiang; Chen, Hung Yu; Shi, Qi Shi.

In: Neurocomputing, Vol. 291, 24.05.2018, p. 226-236.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Learning from small datasets containing nominal attributes

AU - Li, Der-Chiang

AU - Chen, Hung Yu

AU - Shi, Qi Shi

PY - 2018/5/24

Y1 - 2018/5/24

N2 - In many small-data-learning problems, owing to the incomplete data structure, explicit information for decision makers is limited. Although machine learning algorithms are extensively applied to extract knowledge, most of them are developed without considering whether the training sets can fully represent the population properties. Focusing on small data which contains nominal inputs and continuous outputs, this paper develops an effective sample generating procedure based on fuzzy theories to tackle the learning issue by data preprocessing. According to the derived fuzzy relations between categories and continuous outputs, the possibilities of the combinations of categories (virtual samples) can be aggregated when continuous outputs are given. Proper virtual samples are further selected by using fuzzy alpha-cut on the possibility distributions, and these are added to the training sets to form new ones. In the experiment, sixteen datasets taken from the UC Irvine Machine Learning Repository are examined with back-propagation neural networks and support vector regressions. The results reveal that the forecasting accuracies of the two models are significantly improved when they are built with the proposed new training sets. Moreover, the results also indicate the proposed method outperforms bootstrap aggregating and the synthetic minority over-sampling technique-Nominal-Continuous with the greatest amount of statistical support.

AB - In many small-data-learning problems, owing to the incomplete data structure, explicit information for decision makers is limited. Although machine learning algorithms are extensively applied to extract knowledge, most of them are developed without considering whether the training sets can fully represent the population properties. Focusing on small data which contains nominal inputs and continuous outputs, this paper develops an effective sample generating procedure based on fuzzy theories to tackle the learning issue by data preprocessing. According to the derived fuzzy relations between categories and continuous outputs, the possibilities of the combinations of categories (virtual samples) can be aggregated when continuous outputs are given. Proper virtual samples are further selected by using fuzzy alpha-cut on the possibility distributions, and these are added to the training sets to form new ones. In the experiment, sixteen datasets taken from the UC Irvine Machine Learning Repository are examined with back-propagation neural networks and support vector regressions. The results reveal that the forecasting accuracies of the two models are significantly improved when they are built with the proposed new training sets. Moreover, the results also indicate the proposed method outperforms bootstrap aggregating and the synthetic minority over-sampling technique-Nominal-Continuous with the greatest amount of statistical support.

UR - http://www.scopus.com/inward/record.url?scp=85042855912&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85042855912&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2018.02.069

DO - 10.1016/j.neucom.2018.02.069

M3 - Article

VL - 291

SP - 226

EP - 236

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

ER -