Evaluating the impact of deviation of a query genome from reference on variant calling

  • 林 政翰

Student thesis: Doctoral Thesis

Abstract

Variant calling is the process of identifying genetic variants such as single nucleotide polymorphisms insertions deletions and large-scale rearrangements in the genome of an individual via aligning the sequence data against a reference genome When a query genome differs moderately from the reference the alignments may not be accurate resulting in false variants Currently very few studies assess the extent of false variants caused by difference between a query and the reference genome Here we evaluate the extent of false variants stemming from the deviation of a query genome (CHM1) from the reference (GRCh38) Via comparing the two whole genomes true variants in the query genome were identified We then simulated next-generation sequencing data from the query genome and called variants via aligning the simulated data against the reference The called variants were compared to the true variants and the inconsistent variants were examined Within the unique regions in the genome most called variants were consistent with the true answers except for about hundreds of false cases For the non-unique regions about twenty thousand called variants differed from the true variants These inconsistent variants could be classified into two categories based on their mechanisms: local duplication and local divergence We further estimated the functional impact of these false variants and found hundreds of genes could be affected The results suggest the necessity of establishing a population-specific reference genome for accurate calling of genetic variants
Date of Award2019
Original languageEnglish
SupervisorTsung-lin Liu (Supervisor)

Cite this

'