Efficient Signal Inclusion With Genomic Applications

X. Jessie Jeng, Teng Zhang, Jung Ying Tzeng

Research output: Contribution to journalArticle

Abstract

This article addresses the challenge of efficiently capturing a high proportion of true signals for subsequent data analyses when sample sizes are relatively limited with respect to data dimension. We propose the signal missing rate (SMR) as a new measure for false-negative control to account for the variability of false-negative proportion. Novel data-adaptive procedures are developed to control SMR without incurring many unnecessary false positives under dependence. We justify the efficiency and adaptivity of the proposed methods via theory and simulation. The proposed methods are applied to GWAS on human height to effectively remove irrelevant single nucleotide polymorphisms (SNPs) while retaining a high proportion of relevant SNPs for subsequent polygenic analysis. Supplementary materials for this article are available online.

Original languageEnglish
Pages (from-to)1787-1799
Number of pages13
JournalJournal of the American Statistical Association
Volume114
Issue number528
DOIs
Publication statusPublished - 2019 Oct 2

Fingerprint

Genomics
Proportion
Single nucleotide Polymorphism
Inclusion
Adaptive Procedure
Adaptivity
Signal Control
False Positive
Justify
Sample Size
Simulation
False
Polymorphism
Human
Sample size

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Jeng, X. Jessie ; Zhang, Teng ; Tzeng, Jung Ying. / Efficient Signal Inclusion With Genomic Applications. In: Journal of the American Statistical Association. 2019 ; Vol. 114, No. 528. pp. 1787-1799.
@article{a834a6dbbcea413abe0abd55cc45e5de,
title = "Efficient Signal Inclusion With Genomic Applications",
abstract = "This article addresses the challenge of efficiently capturing a high proportion of true signals for subsequent data analyses when sample sizes are relatively limited with respect to data dimension. We propose the signal missing rate (SMR) as a new measure for false-negative control to account for the variability of false-negative proportion. Novel data-adaptive procedures are developed to control SMR without incurring many unnecessary false positives under dependence. We justify the efficiency and adaptivity of the proposed methods via theory and simulation. The proposed methods are applied to GWAS on human height to effectively remove irrelevant single nucleotide polymorphisms (SNPs) while retaining a high proportion of relevant SNPs for subsequent polygenic analysis. Supplementary materials for this article are available online.",
author = "Jeng, {X. Jessie} and Teng Zhang and Tzeng, {Jung Ying}",
year = "2019",
month = "10",
day = "2",
doi = "10.1080/01621459.2018.1518236",
language = "English",
volume = "114",
pages = "1787--1799",
journal = "Journal of the American Statistical Association",
issn = "0162-1459",
publisher = "Taylor and Francis Ltd.",
number = "528",

}

Efficient Signal Inclusion With Genomic Applications. / Jeng, X. Jessie; Zhang, Teng; Tzeng, Jung Ying.

In: Journal of the American Statistical Association, Vol. 114, No. 528, 02.10.2019, p. 1787-1799.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Efficient Signal Inclusion With Genomic Applications

AU - Jeng, X. Jessie

AU - Zhang, Teng

AU - Tzeng, Jung Ying

PY - 2019/10/2

Y1 - 2019/10/2

N2 - This article addresses the challenge of efficiently capturing a high proportion of true signals for subsequent data analyses when sample sizes are relatively limited with respect to data dimension. We propose the signal missing rate (SMR) as a new measure for false-negative control to account for the variability of false-negative proportion. Novel data-adaptive procedures are developed to control SMR without incurring many unnecessary false positives under dependence. We justify the efficiency and adaptivity of the proposed methods via theory and simulation. The proposed methods are applied to GWAS on human height to effectively remove irrelevant single nucleotide polymorphisms (SNPs) while retaining a high proportion of relevant SNPs for subsequent polygenic analysis. Supplementary materials for this article are available online.

AB - This article addresses the challenge of efficiently capturing a high proportion of true signals for subsequent data analyses when sample sizes are relatively limited with respect to data dimension. We propose the signal missing rate (SMR) as a new measure for false-negative control to account for the variability of false-negative proportion. Novel data-adaptive procedures are developed to control SMR without incurring many unnecessary false positives under dependence. We justify the efficiency and adaptivity of the proposed methods via theory and simulation. The proposed methods are applied to GWAS on human height to effectively remove irrelevant single nucleotide polymorphisms (SNPs) while retaining a high proportion of relevant SNPs for subsequent polygenic analysis. Supplementary materials for this article are available online.

UR - http://www.scopus.com/inward/record.url?scp=85062334610&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062334610&partnerID=8YFLogxK

U2 - 10.1080/01621459.2018.1518236

DO - 10.1080/01621459.2018.1518236

M3 - Article

AN - SCOPUS:85062334610

VL - 114

SP - 1787

EP - 1799

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

SN - 0162-1459

IS - 528

ER -