Incorporating sequence quality data into alignment improves DNA read mapping

Martin C. Frith, Raymond Wan, Brice Horton Ii Paul

Research output: Contribution to journalArticle

44 Citations (Scopus)

Abstract

New DNA sequencing technologies have achieved breakthroughs in throughput, at the expense of higher error rates. The primary way of interpreting biological sequences is via alignment, but standard alignment methods assume the sequences are accurate. Here, we describe how to incorporate the per-base error probabilities reported by sequencers into alignment. Unlike existing tools for DNA read mapping, our method models both sequencer errors and real sequence differences. This approach consistently improves mapping accuracy, even when the rate of real sequence difference is only 0.2%. Furthermore, when mapping Drosophila melanogaster reads to the Drosophila simulans genome, it increased the amount of correctly mapped reads from 49 to 66%. This approach enables more effective use of DNA reads from organisms that lack reference genomes, are extinct or are highly polymorphic.

Original languageEnglish
Article numbergkq010
JournalNucleic acids research
Volume38
Issue number7
DOIs
Publication statusPublished - 2010 Jan 27

Fingerprint

Genome
DNA
Drosophila melanogaster
DNA Sequence Analysis
Technology
Data Accuracy
Drosophila simulans

All Science Journal Classification (ASJC) codes

  • Genetics

Cite this

@article{947b183186f54f38b52c8096bd92d85a,
title = "Incorporating sequence quality data into alignment improves DNA read mapping",
abstract = "New DNA sequencing technologies have achieved breakthroughs in throughput, at the expense of higher error rates. The primary way of interpreting biological sequences is via alignment, but standard alignment methods assume the sequences are accurate. Here, we describe how to incorporate the per-base error probabilities reported by sequencers into alignment. Unlike existing tools for DNA read mapping, our method models both sequencer errors and real sequence differences. This approach consistently improves mapping accuracy, even when the rate of real sequence difference is only 0.2{\%}. Furthermore, when mapping Drosophila melanogaster reads to the Drosophila simulans genome, it increased the amount of correctly mapped reads from 49 to 66{\%}. This approach enables more effective use of DNA reads from organisms that lack reference genomes, are extinct or are highly polymorphic.",
author = "Frith, {Martin C.} and Raymond Wan and Paul, {Brice Horton Ii}",
year = "2010",
month = "1",
day = "27",
doi = "10.1093/nar/gkq010",
language = "English",
volume = "38",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "7",

}

Incorporating sequence quality data into alignment improves DNA read mapping. / Frith, Martin C.; Wan, Raymond; Paul, Brice Horton Ii.

In: Nucleic acids research, Vol. 38, No. 7, gkq010, 27.01.2010.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Incorporating sequence quality data into alignment improves DNA read mapping

AU - Frith, Martin C.

AU - Wan, Raymond

AU - Paul, Brice Horton Ii

PY - 2010/1/27

Y1 - 2010/1/27

N2 - New DNA sequencing technologies have achieved breakthroughs in throughput, at the expense of higher error rates. The primary way of interpreting biological sequences is via alignment, but standard alignment methods assume the sequences are accurate. Here, we describe how to incorporate the per-base error probabilities reported by sequencers into alignment. Unlike existing tools for DNA read mapping, our method models both sequencer errors and real sequence differences. This approach consistently improves mapping accuracy, even when the rate of real sequence difference is only 0.2%. Furthermore, when mapping Drosophila melanogaster reads to the Drosophila simulans genome, it increased the amount of correctly mapped reads from 49 to 66%. This approach enables more effective use of DNA reads from organisms that lack reference genomes, are extinct or are highly polymorphic.

AB - New DNA sequencing technologies have achieved breakthroughs in throughput, at the expense of higher error rates. The primary way of interpreting biological sequences is via alignment, but standard alignment methods assume the sequences are accurate. Here, we describe how to incorporate the per-base error probabilities reported by sequencers into alignment. Unlike existing tools for DNA read mapping, our method models both sequencer errors and real sequence differences. This approach consistently improves mapping accuracy, even when the rate of real sequence difference is only 0.2%. Furthermore, when mapping Drosophila melanogaster reads to the Drosophila simulans genome, it increased the amount of correctly mapped reads from 49 to 66%. This approach enables more effective use of DNA reads from organisms that lack reference genomes, are extinct or are highly polymorphic.

UR - http://www.scopus.com/inward/record.url?scp=77954139210&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954139210&partnerID=8YFLogxK

U2 - 10.1093/nar/gkq010

DO - 10.1093/nar/gkq010

M3 - Article

VL - 38

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 7

M1 - gkq010

ER -