Generalized Dirichlet priors for Naïve Bayesian classifiers with multinomial models in document classification

研究成果: Article同行評審

9 引文 斯高帕斯(Scopus)

摘要

The generalized Dirichlet distribution has been shown to be a more appropriate prior than the Dirichlet distribution for naïve Bayesian classifiers. When the dimension of a generalized Dirichlet random vector is large, the computational effort for calculating the expected value of a random variable can be high. In document classification, the number of distinct words that is the dimension of a prior for naïve Bayesian classifiers is generally more than ten thousand. Generalized Dirichlet priors can therefore be inapplicable for document classification from the viewpoint of computational efficiency. In this paper, some properties of the generalized Dirichlet distribution are established to accelerate the calculation of the expected values of random variables. Those properties are then used to construct noninformative generalized Dirichlet priors for naïve Bayesian classifiers with multinomial models. Our experimental results on two document sets show that generalized Dirichlet priors can achieve a significantly higher prediction accuracy and that the computational efficiency of naïve Bayesian classifiers is preserved.

原文English
頁(從 - 到)123-144
頁數22
期刊Data Mining and Knowledge Discovery
28
發行號1
DOIs
出版狀態Published - 2014 1月

All Science Journal Classification (ASJC) codes

  • 資訊系統
  • 電腦科學應用
  • 電腦網路與通信

指紋

深入研究「Generalized Dirichlet priors for Naïve Bayesian classifiers with multinomial models in document classification」主題。共同形成了獨特的指紋。

引用此