TY - JOUR
T1 - Statistical aspects of omics data analysis using the random compound covariate
AU - Su, Pei Fang
AU - Chen, Xi
AU - Chen, Heidi
AU - Shyr, Yu
N1 - Funding Information:
The authors wish to thank editorial assistants, Lynne Berry and Yvonne Poindexter, for their editorial work on this manuscript. This work was supported by National Cancer Institute (grant numbers P30 CA068485, P50 CA095103, P50 CA090949, P50 CA098131). This article has been published as part of BMC Systems Biology Volume 6 Supplement 3, 2012: Proceedings of The International Conference on Intelligent Biology and Medicine (ICIBM) - Systems Biology. The full contents of the supplement are available online at http://www.biomedcentral.com/ bmcsystbiol/supplements/6/S3.
PY - 2012/12/17
Y1 - 2012/12/17
N2 - Background: Dealing with high dimensional markers, such as gene expression data obtained using microarray chip technology or genomics studies, is a key challenge because the numbers of features greatly exceeds the number of biological samples. After selecting biologically relevant genes, how to summarize the expression of selected genes and then further build predicted model is an important issue in medical applications. One intuitive method of addressing this challenge assigns different weights to different features, subsequently combining this information into a single score, named the compound covariate. Investigators commonly employ this score to assess whether an association exists between the compound covariate and clinical outcomes adjusted for baseline covariates. However, we found that some clinical papers concerned with such analysis report bias p-values based on flawed compound covariate in their training data set.Results: We correct this flaw in the analysis and we also propose treating the compound score as a random covariate, to achieve more appropriate results and significantly improve study power for survival outcomes. With this proposed method, we thoroughly assess the performance of two commonly used estimated gene weights through simulation studies. When the sample size is 100, and censoring rates are 50%, 30%, and 10%, power is increased by 10.6%, 3.5%, and 0.4%, respectively, by treating the compound score as a random covariate rather than a fixed covariate. Finally, we assess our proposed method using two publicly available microarray data sets.Conclusion: In this article, we correct this flaw in the analysis and the propose method, treating the compound score as a random covariate, can achieve more appropriate results and improve study power for survival outcomes.
AB - Background: Dealing with high dimensional markers, such as gene expression data obtained using microarray chip technology or genomics studies, is a key challenge because the numbers of features greatly exceeds the number of biological samples. After selecting biologically relevant genes, how to summarize the expression of selected genes and then further build predicted model is an important issue in medical applications. One intuitive method of addressing this challenge assigns different weights to different features, subsequently combining this information into a single score, named the compound covariate. Investigators commonly employ this score to assess whether an association exists between the compound covariate and clinical outcomes adjusted for baseline covariates. However, we found that some clinical papers concerned with such analysis report bias p-values based on flawed compound covariate in their training data set.Results: We correct this flaw in the analysis and we also propose treating the compound score as a random covariate, to achieve more appropriate results and significantly improve study power for survival outcomes. With this proposed method, we thoroughly assess the performance of two commonly used estimated gene weights through simulation studies. When the sample size is 100, and censoring rates are 50%, 30%, and 10%, power is increased by 10.6%, 3.5%, and 0.4%, respectively, by treating the compound score as a random covariate rather than a fixed covariate. Finally, we assess our proposed method using two publicly available microarray data sets.Conclusion: In this article, we correct this flaw in the analysis and the propose method, treating the compound score as a random covariate, can achieve more appropriate results and improve study power for survival outcomes.
UR - http://www.scopus.com/inward/record.url?scp=84878849348&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84878849348&partnerID=8YFLogxK
U2 - 10.1186/1752-0509-6-S3-S11
DO - 10.1186/1752-0509-6-S3-S11
M3 - Article
C2 - 23281681
AN - SCOPUS:84878849348
VL - 6
JO - BMC Systems Biology
JF - BMC Systems Biology
SN - 1752-0509
IS - SUPPL3
M1 - S11
ER -