TY - JOUR
T1 - A generalized framework for controlling FDR in gene regulatory network inference
AU - Morgan, Daniel
AU - Tjärnberg, Andreas
AU - Nordling, Torbjörn E.M.
AU - Sonnhammer, Erik L.L.
N1 - Publisher Copyright:
© 2018 The Author(s). Published by Oxford University Press. All rights reserved.
PY - 2019/3/15
Y1 - 2019/3/15
N2 - Motivation Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied. Results To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.
AB - Motivation Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied. Results To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.
UR - http://www.scopus.com/inward/record.url?scp=85062946143&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062946143&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bty764
DO - 10.1093/bioinformatics/bty764
M3 - Article
C2 - 30169550
AN - SCOPUS:85062946143
SN - 1367-4803
VL - 35
SP - 1026
EP - 1032
JO - Bioinformatics
JF - Bioinformatics
IS - 6
ER -