TY - JOUR
T1 - Implications of the Dirichlet Assumption for Discretization of Continuous Variables in Naive Bayesian Classifiers
AU - Hsu, Chun Nan
AU - Huang, Hung Ju
AU - Wong, Tzu Tsung
N1 - Funding Information:
The research reported here was supported in part by the National Science Council in Taiwan under Grant No. NSC 89-2213-E-001-031.
PY - 2003/12
Y1 - 2003/12
N2 - In a naive Bayesian classifier, discrete variables as well as discretized continuous variables are assumed to have Dirichlet priors. This paper describes the implications and applications of this model selection choice. We start by reviewing key properties of Dirichlet distributions. Among these properties, the most important one is "perfect aggregation" which allows us to explain why discretization works for a naive Bayesian classifier. Since perfect aggregation holds for Dirichlets, we can explain that in general, discretization can outperform parameter estimation assuming a normal distribution. In addilion, we can explain why a wide variety of well-known discretization methods, such as entropy-based, ten-bin, and bin-log l, can perform well with insignificant difference. We designed experiments to verify our explanation using synthesized and real data sets and showed that in addition to well-known methods, a wide variety of discretization methods all perform similarly. Our analysis leads to a lazy discretization method, which discretizes continuous variables according to test data. The Dirichlet assumption implies that lazy methods can perform as well as eager discretization methods. We empirically confirmed this implication and extended the lazy method to classify set-valued and multi-interval data with a naive Bayesian classifier.
AB - In a naive Bayesian classifier, discrete variables as well as discretized continuous variables are assumed to have Dirichlet priors. This paper describes the implications and applications of this model selection choice. We start by reviewing key properties of Dirichlet distributions. Among these properties, the most important one is "perfect aggregation" which allows us to explain why discretization works for a naive Bayesian classifier. Since perfect aggregation holds for Dirichlets, we can explain that in general, discretization can outperform parameter estimation assuming a normal distribution. In addilion, we can explain why a wide variety of well-known discretization methods, such as entropy-based, ten-bin, and bin-log l, can perform well with insignificant difference. We designed experiments to verify our explanation using synthesized and real data sets and showed that in addition to well-known methods, a wide variety of discretization methods all perform similarly. Our analysis leads to a lazy discretization method, which discretizes continuous variables according to test data. The Dirichlet assumption implies that lazy methods can perform as well as eager discretization methods. We empirically confirmed this implication and extended the lazy method to classify set-valued and multi-interval data with a naive Bayesian classifier.
UR - http://www.scopus.com/inward/record.url?scp=0345306663&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0345306663&partnerID=8YFLogxK
U2 - 10.1023/A:1026367023636
DO - 10.1023/A:1026367023636
M3 - Article
AN - SCOPUS:0345306663
SN - 0885-6125
VL - 53
SP - 235
EP - 263
JO - Machine Learning
JF - Machine Learning
IS - 3
ER -