Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis

Yan Guo, Yulin Dai, Hui Yu, Shilin Zhao, David C. Samuels, Yu Shyr

Research output: Contribution to journalArticlepeer-review

68 Citations (Scopus)


Analyses of high throughput sequencing data starts with alignment against a reference genome, which is the foundation for all re-sequencing data analyses. Each new release of the human reference genome has been augmented with improved accuracy and completeness. It is presumed that the latest release of human reference genome, GRCh38 will contribute more to high throughput sequencing data analysis by providing more accuracy. But the amount of improvement has not yet been quantified. We conducted a study to compare the genomic analysis results between the GRCh38 reference and its predecessor GRCh37. Through analyses of alignment, single nucleotide polymorphisms, small insertion/deletions, copy number and structural variants, we show that GRCh38 offers overall more accurate analysis of human sequencing data. More importantly, GRCh38 produced fewer false positive structural variants. In conclusion, GRCh38 is an improvement over GRCh37 not only from the genome assembly aspect, but also yields more reliable genomic analysis results.

Original languageEnglish
Pages (from-to)83-90
Number of pages8
Issue number2
Publication statusPublished - 2017 Mar 1

All Science Journal Classification (ASJC) codes

  • Genetics


Dive into the research topics of 'Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis'. Together they form a unique fingerprint.

Cite this