TY - JOUR
T1 - Steps to ensure accuracy in genotype and SNP calling from Illumina sequencing data.
AU - Liu, Q.
AU - Guo, Yan
AU - Li, Jiang
AU - Long, Jirong
AU - Zhang, Bing
AU - Shyr, Y.
N1 - Funding Information:
The authors wish to thank Peggy Schuyler for editorial work on this manuscript and Wei Zheng for his support. This work was supported by National Cancer Institute grants U01 CA163056, P50 CA090949, P50 CA095103, P50 CA098131 and P30 CA068485 (to YS) and the National Institutes of Health grants R01GM088822 (to BZ). Subject recruitment and exome sequencing is supported by CA124558 (to WZ) and CA137013 (to JRL). QL’s work was partially supported by the National Natural Science Foundation of China 31070746 (to QL). This article has been published as part of BMC Genomics Volume 13 Supplement 8, 2012: Proceedings of The International Conference on Intelligent Biology and Medicine (ICIBM): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/ bmcgenomics/supplements/13/S8.
PY - 2012
Y1 - 2012
N2 - Accurate calling of SNPs and genotypes from next-generation sequencing data is an essential prerequisite for most human genetics studies. A number of computational steps are required or recommended when translating the raw sequencing data into the final calls. However, whether each step does contribute to the performance of variant calling and how it affects the accuracy still remain unclear, making it difficult to select and arrange appropriate steps to derive high quality variants from different sequencing data. In this study, we made a systematic assessment of the relative contribution of each step to the accuracy of variant calling from Illumina DNA sequencing data. We found that the read preprocessing step did not improve the accuracy of variant calling, contrary to the general expectation. Although trimming off low-quality tails helped align more reads, it introduced lots of false positives. The ability of markup duplication, local realignment and recalibration, to help eliminate false positive variants depended on the sequencing depth. Rearranging these steps did not affect the results. The relative performance of three popular multi-sample SNP callers, SAMtools, GATK, and GlfMultiples, also varied with the sequencing depth. Our findings clarify the necessity and effectiveness of computational steps for improving the accuracy of SNP and genotype calls from Illumina sequencing data and can serve as a general guideline for choosing SNP calling strategies for data with different coverage.
AB - Accurate calling of SNPs and genotypes from next-generation sequencing data is an essential prerequisite for most human genetics studies. A number of computational steps are required or recommended when translating the raw sequencing data into the final calls. However, whether each step does contribute to the performance of variant calling and how it affects the accuracy still remain unclear, making it difficult to select and arrange appropriate steps to derive high quality variants from different sequencing data. In this study, we made a systematic assessment of the relative contribution of each step to the accuracy of variant calling from Illumina DNA sequencing data. We found that the read preprocessing step did not improve the accuracy of variant calling, contrary to the general expectation. Although trimming off low-quality tails helped align more reads, it introduced lots of false positives. The ability of markup duplication, local realignment and recalibration, to help eliminate false positive variants depended on the sequencing depth. Rearranging these steps did not affect the results. The relative performance of three popular multi-sample SNP callers, SAMtools, GATK, and GlfMultiples, also varied with the sequencing depth. Our findings clarify the necessity and effectiveness of computational steps for improving the accuracy of SNP and genotype calls from Illumina sequencing data and can serve as a general guideline for choosing SNP calling strategies for data with different coverage.
UR - http://www.scopus.com/inward/record.url?scp=84874896774&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84874896774&partnerID=8YFLogxK
U2 - 10.1186/1471-2164-13-s8-s8
DO - 10.1186/1471-2164-13-s8-s8
M3 - Article
C2 - 23281772
AN - SCOPUS:84874896774
SN - 1367-9120
VL - 13 Suppl 8
JO - Journal of Asian Earth Sciences
JF - Journal of Asian Earth Sciences
ER -