Background: Next-generation sequencing (NGS) technology has transformed metagenomics because the high-throughput data allow an in-depth exploration of a complex microbial community. However, accurate species identification with NGS data is challenging because NGS sequences are relatively short. Assembling 16S rDNA segments into longer sequences has been proposed for improving species identification. Current approaches, however, either suffer from amplification bias due to one single primer or insufficient 16S rDNA reads in whole genome sequencing data. Results: Multiple primers were used to amplify different 16S rDNA segments for 454 sequencing, followed by 454 read classification and assembly. This permitted targeted sequencing while reducing primer bias. For test samples containing four known bacteria, accurate and near full-length 16S rDNAs of three known bacteria were obtained. For real soil and sediment samples containing dioxins in various concentrations, 16S rDNA sequences were lengthened by 50% for about half of the non-rare microbes, and 16S rDNAs of several microbes reached more than 1000 bp. In addition, reduced primer bias using multiple primers was illustrated. Conclusions: A new experimental and computational pipeline for obtaining long 16S rDNA sequences was proposed. The capability of the pipeline was validated on test samples and illustrated on real samples. For dioxin-containing samples, the pipeline revealed several microbes suitable for future studies of dioxin chemistry.
All Science Journal Classification (ASJC) codes
- Structural Biology
- Molecular Biology
- Computer Science Applications
- Applied Mathematics