A generalized framework for controlling FDR in gene regulatory network inference

Daniel Morgan, Andreas Tjärnberg, Torbjörn Nordling, Erik L.L. Sonnhammer

Research output: Contribution to journalArticle

Abstract

Motivation Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied. Results To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.

Original languageEnglish
Pages (from-to)1026-1032
Number of pages7
JournalBioinformatics
Volume35
Issue number6
DOIs
Publication statusPublished - 2019 Mar 15

Fingerprint

Gene Regulatory Networks
Gene Regulatory Network
Genes
Bootstrapping
Bootstrap
Biological systems
Data Perturbation
Framework
Least-Squares Analysis
Pipelines
Noise
Biological Systems
Least Squares

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Morgan, Daniel ; Tjärnberg, Andreas ; Nordling, Torbjörn ; Sonnhammer, Erik L.L. / A generalized framework for controlling FDR in gene regulatory network inference. In: Bioinformatics. 2019 ; Vol. 35, No. 6. pp. 1026-1032.
@article{858c87d25b72457d9465112ba802d4bf,
title = "A generalized framework for controlling FDR in gene regulatory network inference",
abstract = "Motivation Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied. Results To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.",
author = "Daniel Morgan and Andreas Tj{\"a}rnberg and Torbj{\"o}rn Nordling and Sonnhammer, {Erik L.L.}",
year = "2019",
month = "3",
day = "15",
doi = "10.1093/bioinformatics/bty764",
language = "English",
volume = "35",
pages = "1026--1032",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "6",

}

A generalized framework for controlling FDR in gene regulatory network inference. / Morgan, Daniel; Tjärnberg, Andreas; Nordling, Torbjörn; Sonnhammer, Erik L.L.

In: Bioinformatics, Vol. 35, No. 6, 15.03.2019, p. 1026-1032.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A generalized framework for controlling FDR in gene regulatory network inference

AU - Morgan, Daniel

AU - Tjärnberg, Andreas

AU - Nordling, Torbjörn

AU - Sonnhammer, Erik L.L.

PY - 2019/3/15

Y1 - 2019/3/15

N2 - Motivation Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied. Results To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.

AB - Motivation Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied. Results To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.

UR - http://www.scopus.com/inward/record.url?scp=85062946143&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062946143&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bty764

DO - 10.1093/bioinformatics/bty764

M3 - Article

VL - 35

SP - 1026

EP - 1032

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 6

ER -