Advances in next generation sequencing have generated massive amounts of short reads. However, assembling genome sequences from short reads still remains a challenging task. Due to errors in reads and large repeats in the genome, many of current assembly tools usually produce just collections of contigs whose relative positions and orientations along the genome being sequenced are still unknown. To address this issue, a scaffolding process to order and orient the contigs of a draft genome is needed for completing the genome sequence. In this work, we propose a new scaffolding tool called CSAR that can efficiently and more accurately order and orient the contigs of a given draft genome based on a reference genome of a related organism. In particular, the reference genome required by CSAR is not necessary to be complete in sequence. Our experimental results on real datasets have shown that CSAR outperforms other similar tools such as Projector2, OSLay and Mauve Aligner in terms of average sensitivity, precision, F-score, genome coverage, NGA50 and running time.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics