Sample size calculation based on generalized linear models for differential expression analysis in RNA-seq data

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

As RNA-seq rapidly develops and costs continually decrease, the quantity and frequency of samples being sequenced will grow exponentially. With proteomic investigations becoming more multivariate and quantitative, determining a study's optimal sample size is now a vital step in experimental design. Current methods for calculating a study's required sample size are mostly based on the hypothesis testing framework, which assumes each gene count can be modeled through Poisson or negative binomial distributions; however, these methods are limited when it comes to accommodating covariates. To address this limitation, we propose an estimating procedure based on the generalized linear model. This easy-to-use method constructs a representative exemplary dataset and estimates the conditional power, all without requiring complicated mathematical approximations or formulas. Even more attractive, the downstream analysis can be performed with current R/Bioconductor packages. To demonstrate the practicability and efficiency of this method, we apply it to three real-world studies, and introduce our on-line calculator developed to determine the optimal sample size for a RNA-seq study.

Original languageEnglish
Pages (from-to)491-505
Number of pages15
JournalStatistical Applications in Genetics and Molecular Biology
Volume15
Issue number6
DOIs
Publication statusPublished - 2016 Dec 1

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Molecular Biology
  • Genetics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Sample size calculation based on generalized linear models for differential expression analysis in RNA-seq data'. Together they form a unique fingerprint.

Cite this