OMACC: an Optical-Map-Assisted Contig Connector for improving de novo genome assembly.

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Genome sequencing and assembly are essential for revealing the secrets of life hidden in genomes. Because of repeats in most genomes, current programs collate sequencing data into a set of assembled sequences, called contigs, instead of a complete genome. Toward completing a genome, optical mapping is powerful in rendering the relative order of contigs on the genome, which is called scaffolding. However, connecting the neighboring contigs with nucleotide sequences requires further efforts. Nagarajian et al. have recently proposed a software module, FINISH, to close the gaps between contigs with other contig sequences after scaffolding contigs using an optical map. The results, however, are not yet satisfying. To increase the accuracy of contig connections, we develop OMACC, which carefully takes into account length information in optical maps. Specifically, it rescales optical map and applies length constraint for selecting the correct contig sequences for gap closure. In addition, it uses an advanced graph search algorithm to facilitate estimating the number of repeat copies within gaps between contigs. On both simulated and real datasets, OMACC achieves a <10% false gap-closing rate, three times lower than the ~27% false rate by FINISH, while maintaining a similar sensitivity. As optical mapping is becoming popular and repeats are the bottleneck of assembly, OMACC should benefit various downstream biological studies via accurately connecting contigs into a more complete genome. http://140.116.235.124/~tliu/omacc.

Original languageEnglish
Pages (from-to)S7
JournalBMC systems biology
Volume7 Suppl 6
DOIs
Publication statusPublished - 2013

Fingerprint

Connector
Genome
Genes
Sequencing
Graph Search
Chromosome Mapping
Graph Algorithms
Nucleotides
Rendering
Software
Search Algorithm
Closure
Module

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Modelling and Simulation
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

@article{3a673c5916fb42b68861296c19f61c0f,
title = "OMACC: an Optical-Map-Assisted Contig Connector for improving de novo genome assembly.",
abstract = "Genome sequencing and assembly are essential for revealing the secrets of life hidden in genomes. Because of repeats in most genomes, current programs collate sequencing data into a set of assembled sequences, called contigs, instead of a complete genome. Toward completing a genome, optical mapping is powerful in rendering the relative order of contigs on the genome, which is called scaffolding. However, connecting the neighboring contigs with nucleotide sequences requires further efforts. Nagarajian et al. have recently proposed a software module, FINISH, to close the gaps between contigs with other contig sequences after scaffolding contigs using an optical map. The results, however, are not yet satisfying. To increase the accuracy of contig connections, we develop OMACC, which carefully takes into account length information in optical maps. Specifically, it rescales optical map and applies length constraint for selecting the correct contig sequences for gap closure. In addition, it uses an advanced graph search algorithm to facilitate estimating the number of repeat copies within gaps between contigs. On both simulated and real datasets, OMACC achieves a <10{\%} false gap-closing rate, three times lower than the ~27{\%} false rate by FINISH, while maintaining a similar sensitivity. As optical mapping is becoming popular and repeats are the bottleneck of assembly, OMACC should benefit various downstream biological studies via accurately connecting contigs into a more complete genome. http://140.116.235.124/~tliu/omacc.",
author = "Chen, {Yi Min} and Yu, {Chun Hui} and Hwang, {Chi Chuan} and Tsunglin Liu",
year = "2013",
doi = "10.1186/1752-0509-7-S6-S7",
language = "English",
volume = "7 Suppl 6",
pages = "S7",
journal = "BMC Systems Biology",
issn = "1752-0509",
publisher = "BioMed Central",

}

OMACC : an Optical-Map-Assisted Contig Connector for improving de novo genome assembly. / Chen, Yi Min; Yu, Chun Hui; Hwang, Chi Chuan; Liu, Tsunglin.

In: BMC systems biology, Vol. 7 Suppl 6, 2013, p. S7.

Research output: Contribution to journalArticle

TY - JOUR

T1 - OMACC

T2 - an Optical-Map-Assisted Contig Connector for improving de novo genome assembly.

AU - Chen, Yi Min

AU - Yu, Chun Hui

AU - Hwang, Chi Chuan

AU - Liu, Tsunglin

PY - 2013

Y1 - 2013

N2 - Genome sequencing and assembly are essential for revealing the secrets of life hidden in genomes. Because of repeats in most genomes, current programs collate sequencing data into a set of assembled sequences, called contigs, instead of a complete genome. Toward completing a genome, optical mapping is powerful in rendering the relative order of contigs on the genome, which is called scaffolding. However, connecting the neighboring contigs with nucleotide sequences requires further efforts. Nagarajian et al. have recently proposed a software module, FINISH, to close the gaps between contigs with other contig sequences after scaffolding contigs using an optical map. The results, however, are not yet satisfying. To increase the accuracy of contig connections, we develop OMACC, which carefully takes into account length information in optical maps. Specifically, it rescales optical map and applies length constraint for selecting the correct contig sequences for gap closure. In addition, it uses an advanced graph search algorithm to facilitate estimating the number of repeat copies within gaps between contigs. On both simulated and real datasets, OMACC achieves a <10% false gap-closing rate, three times lower than the ~27% false rate by FINISH, while maintaining a similar sensitivity. As optical mapping is becoming popular and repeats are the bottleneck of assembly, OMACC should benefit various downstream biological studies via accurately connecting contigs into a more complete genome. http://140.116.235.124/~tliu/omacc.

AB - Genome sequencing and assembly are essential for revealing the secrets of life hidden in genomes. Because of repeats in most genomes, current programs collate sequencing data into a set of assembled sequences, called contigs, instead of a complete genome. Toward completing a genome, optical mapping is powerful in rendering the relative order of contigs on the genome, which is called scaffolding. However, connecting the neighboring contigs with nucleotide sequences requires further efforts. Nagarajian et al. have recently proposed a software module, FINISH, to close the gaps between contigs with other contig sequences after scaffolding contigs using an optical map. The results, however, are not yet satisfying. To increase the accuracy of contig connections, we develop OMACC, which carefully takes into account length information in optical maps. Specifically, it rescales optical map and applies length constraint for selecting the correct contig sequences for gap closure. In addition, it uses an advanced graph search algorithm to facilitate estimating the number of repeat copies within gaps between contigs. On both simulated and real datasets, OMACC achieves a <10% false gap-closing rate, three times lower than the ~27% false rate by FINISH, while maintaining a similar sensitivity. As optical mapping is becoming popular and repeats are the bottleneck of assembly, OMACC should benefit various downstream biological studies via accurately connecting contigs into a more complete genome. http://140.116.235.124/~tliu/omacc.

UR - http://www.scopus.com/inward/record.url?scp=84908509274&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908509274&partnerID=8YFLogxK

U2 - 10.1186/1752-0509-7-S6-S7

DO - 10.1186/1752-0509-7-S6-S7

M3 - Article

C2 - 24564959

AN - SCOPUS:84908509274

VL - 7 Suppl 6

SP - S7

JO - BMC Systems Biology

JF - BMC Systems Biology

SN - 1752-0509

ER -