Abstract
As RNA-seq rapidly develops and costs continually decrease, the quantity and frequency of samples being sequenced will grow exponentially. With proteomic investigations becoming more multivariate and quantitative, determining a study's optimal sample size is now a vital step in experimental design. Current methods for calculating a study's required sample size are mostly based on the hypothesis testing framework, which assumes each gene count can be modeled through Poisson or negative binomial distributions; however, these methods are limited when it comes to accommodating covariates. To address this limitation, we propose an estimating procedure based on the generalized linear model. This easy-to-use method constructs a representative exemplary dataset and estimates the conditional power, all without requiring complicated mathematical approximations or formulas. Even more attractive, the downstream analysis can be performed with current R/Bioconductor packages. To demonstrate the practicability and efficiency of this method, we apply it to three real-world studies, and introduce our on-line calculator developed to determine the optimal sample size for a RNA-seq study.
| Original language | English |
|---|---|
| Pages (from-to) | 491-505 |
| Number of pages | 15 |
| Journal | Statistical Applications in Genetics and Molecular Biology |
| Volume | 15 |
| Issue number | 6 |
| DOIs | |
| Publication status | Published - 2016 Dec 1 |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Molecular Biology
- Genetics
- Computational Mathematics
Fingerprint
Dive into the research topics of 'Sample size calculation based on generalized linear models for differential expression analysis in RNA-seq data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver