TY - JOUR
T1 - Naïve Bayesian classifiers with multinomial models for rRNA taxonomic assignment.
AU - Liu, Kuan Liang
AU - Wong, Tzu Tsung
PY - 2013/1/1
Y1 - 2013/1/1
N2 - The introduction of next-generation sequencing in ecological studies has created a major revolution in microbial and fungal ecology. Direct sequencing of hypervariable regions from ribosomal RNA genes can provide rapid and inexpensive analysis for ecological communities. To get deep understanding from these rRNA fragments, the Ribosomal Database Project developed the "RDP Classifierâ utilizing 8-mer nucleotide frequencies with Bayesian theorem to obtain taxonomy affiliation. The classifier is computationally efficient and works well with massive short sequences. However, the binary model employed in the RDP classifier does not consider the repetitive 8-mers in each reference sequence. Previous studies have pointed out that multinomial model usually results a better performance than binary model. In this study, we present the naÃ-ve Bayesian classifiers with multinomial models that take repetitive 8-mers into account for classifying microbial 16S and fungal 28S rRNA sequences. The results obtained from the multinomial approach were compared with those obtained from the binomial RDP classifier by 250-bp, 400-bp, 800-bp, and full-length reads to demonstrate that the multinomial approach can generally achieve a higher predictive accuracy in most hypervariable regions.
AB - The introduction of next-generation sequencing in ecological studies has created a major revolution in microbial and fungal ecology. Direct sequencing of hypervariable regions from ribosomal RNA genes can provide rapid and inexpensive analysis for ecological communities. To get deep understanding from these rRNA fragments, the Ribosomal Database Project developed the "RDP Classifierâ utilizing 8-mer nucleotide frequencies with Bayesian theorem to obtain taxonomy affiliation. The classifier is computationally efficient and works well with massive short sequences. However, the binary model employed in the RDP classifier does not consider the repetitive 8-mers in each reference sequence. Previous studies have pointed out that multinomial model usually results a better performance than binary model. In this study, we present the naÃ-ve Bayesian classifiers with multinomial models that take repetitive 8-mers into account for classifying microbial 16S and fungal 28S rRNA sequences. The results obtained from the multinomial approach were compared with those obtained from the binomial RDP classifier by 250-bp, 400-bp, 800-bp, and full-length reads to demonstrate that the multinomial approach can generally achieve a higher predictive accuracy in most hypervariable regions.
UR - http://www.scopus.com/inward/record.url?scp=84906216437&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906216437&partnerID=8YFLogxK
M3 - Article
C2 - 24384717
AN - SCOPUS:84906216437
SN - 1545-5963
VL - 10
SP - 1334
EP - 1339
JO - IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM
JF - IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM
IS - 5
ER -