TY - JOUR
T1 - Alternative prior assumptions for improving the performance of naïve Bayesian classifiers
AU - Wong, Tzu Tsung
PY - 2009/4
Y1 - 2009/4
N2 - The prior distribution of an attribute in a naïve Bayesian classifier is typically assumed to be a Dirichlet distribution, and this is called the Dirichlet assumption. The variables in a Dirichlet random vector can never be positively correlated and must have the same confidence level as measured by normalized variance. Both the generalized Dirichlet and the Liouville distributions include the Dirichlet distribution as a special case. These two multivariate distributions, also defined on the unit simplex, are employed to investigate the impact of the Dirichlet assumption in naïve Bayesian classifiers. We propose methods to construct appropriate generalized Dirichlet and Liouville priors for naïve Bayesian classifiers. Our experimental results on 18 data sets reveal that the generalized Dirichlet distribution has the best performance among the three distribution families. Not only is the Dirichlet assumption inappropriate, but also forcing the variables in a prior to be all positively correlated can deteriorate the performance of the naïve Bayesian classifier.
AB - The prior distribution of an attribute in a naïve Bayesian classifier is typically assumed to be a Dirichlet distribution, and this is called the Dirichlet assumption. The variables in a Dirichlet random vector can never be positively correlated and must have the same confidence level as measured by normalized variance. Both the generalized Dirichlet and the Liouville distributions include the Dirichlet distribution as a special case. These two multivariate distributions, also defined on the unit simplex, are employed to investigate the impact of the Dirichlet assumption in naïve Bayesian classifiers. We propose methods to construct appropriate generalized Dirichlet and Liouville priors for naïve Bayesian classifiers. Our experimental results on 18 data sets reveal that the generalized Dirichlet distribution has the best performance among the three distribution families. Not only is the Dirichlet assumption inappropriate, but also forcing the variables in a prior to be all positively correlated can deteriorate the performance of the naïve Bayesian classifier.
UR - http://www.scopus.com/inward/record.url?scp=60849091358&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=60849091358&partnerID=8YFLogxK
U2 - 10.1007/s10618-008-0101-6
DO - 10.1007/s10618-008-0101-6
M3 - Article
AN - SCOPUS:60849091358
SN - 1384-5810
VL - 18
SP - 183
EP - 213
JO - Data Mining and Knowledge Discovery
JF - Data Mining and Knowledge Discovery
IS - 2
ER -