Generating information for small data sets with a multi-modal distribution

Der Chiang Li, Liang Sian Lin

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)


Virtual sample generation approaches have been usedwith small data sets to enhance classification performance in a number of reports. The appropriate estimation of data distribution plays an important role in this process, with performance usually better for data sets that have a simple distribution rather than a complex one. Mixed-type data sets often have a multi-modal distribution instead of a simple, uni-modal one. This study thus proposes a new approach to detect multi-modality in data sets, to avoid the problem of inappropriately using a uni-modal distribution. We utilize the common k-means clustering method to detect possible clusters, and, based on the clustered sample sets, aWeibull variate is developed for each of these to producemulti-modal virtual data. In this approach, the degree of error variation in theWeibull skewness between the original and virtual data ismeasured and used as the criterion for determining the sizes of virtual samples. Six data setswith different training data sizes are employed to check the performance of the proposed method, and comparisons are made based on the classification accuracies. The results using non-parametric testing show that the proposed method has better classification performance to that of the recently presented Mega-Trend-Diffusion method.

Original languageEnglish
Pages (from-to)71-81
Number of pages11
JournalDecision Support Systems
Publication statusPublished - 2014 Oct

All Science Journal Classification (ASJC) codes

  • Management Information Systems
  • Information Systems
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Information Systems and Management

Fingerprint Dive into the research topics of 'Generating information for small data sets with a multi-modal distribution'. Together they form a unique fingerprint.

Cite this