Sample size calculation based on generalized linear models for differential expression analysis in RNA-seq data

研究成果: Article同行評審

4 引文 斯高帕斯(Scopus)

摘要

As RNA-seq rapidly develops and costs continually decrease, the quantity and frequency of samples being sequenced will grow exponentially. With proteomic investigations becoming more multivariate and quantitative, determining a study's optimal sample size is now a vital step in experimental design. Current methods for calculating a study's required sample size are mostly based on the hypothesis testing framework, which assumes each gene count can be modeled through Poisson or negative binomial distributions; however, these methods are limited when it comes to accommodating covariates. To address this limitation, we propose an estimating procedure based on the generalized linear model. This easy-to-use method constructs a representative exemplary dataset and estimates the conditional power, all without requiring complicated mathematical approximations or formulas. Even more attractive, the downstream analysis can be performed with current R/Bioconductor packages. To demonstrate the practicability and efficiency of this method, we apply it to three real-world studies, and introduce our on-line calculator developed to determine the optimal sample size for a RNA-seq study.

原文English
頁(從 - 到)491-505
頁數15
期刊Statistical Applications in Genetics and Molecular Biology
15
發行號6
DOIs
出版狀態Published - 2016 12月 1

All Science Journal Classification (ASJC) codes

  • 統計與概率
  • 分子生物學
  • 遺傳學
  • 計算數學

指紋

深入研究「Sample size calculation based on generalized linear models for differential expression analysis in RNA-seq data」主題。共同形成了獨特的指紋。

引用此