Optimal sparsity criteria for network inference

Andreas Tjärnberg, Torbjörn Nordling, Matthew Studham, Erik L.L. Sonnhammer

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Gene regulatory network inference (that is, determination of the regulatory interactions between a set of genes) provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/ regularization coefficient, which we call f (zeta), to determine the degree of sparsity of the network estimates, that is, the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular, for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of f. In order to avoid such poor choices, we propose a method for optimization of f, which maximizes the accuracy of the inferred network for any sparsitydependent inference method and data set. Our procedure is based on leave-one-out crossoptimization and selection of the f value that minimizes the prediction error. We also illustrate the adverse effects of noise, few samples, and uninformative experiments on network inference as well as our method for optimization of f. We demonstrate that our f optimization method for two widely used inference algorithms-Glmnet and NIR-gives accurate and informative estimates of the network structure, given that the data is informative enough.

Original languageEnglish
Pages (from-to)398-408
Number of pages11
JournalJournal of Computational Biology
Volume20
Issue number5
DOIs
Publication statusPublished - 2013 May 1

Fingerprint

Sparsity
Genes
Estimate
Optimization
Gene Regulatory Network
Systems Biology
Coefficient
Prediction Error
Network Structure
Gene Regulatory Networks
Optimization Methods
Regularization
Maximise
Noise
Gene
Minimise
Experiments
Vertex of a graph
Interaction
Demonstrate

All Science Journal Classification (ASJC) codes

  • Modelling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Cite this

Tjärnberg, Andreas ; Nordling, Torbjörn ; Studham, Matthew ; Sonnhammer, Erik L.L. / Optimal sparsity criteria for network inference. In: Journal of Computational Biology. 2013 ; Vol. 20, No. 5. pp. 398-408.
@article{e4a2cd1d11cb42f3ac5d8a753f459652,
title = "Optimal sparsity criteria for network inference",
abstract = "Gene regulatory network inference (that is, determination of the regulatory interactions between a set of genes) provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/ regularization coefficient, which we call f (zeta), to determine the degree of sparsity of the network estimates, that is, the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular, for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of f. In order to avoid such poor choices, we propose a method for optimization of f, which maximizes the accuracy of the inferred network for any sparsitydependent inference method and data set. Our procedure is based on leave-one-out crossoptimization and selection of the f value that minimizes the prediction error. We also illustrate the adverse effects of noise, few samples, and uninformative experiments on network inference as well as our method for optimization of f. We demonstrate that our f optimization method for two widely used inference algorithms-Glmnet and NIR-gives accurate and informative estimates of the network structure, given that the data is informative enough.",
author = "Andreas Tj{\"a}rnberg and Torbj{\"o}rn Nordling and Matthew Studham and Sonnhammer, {Erik L.L.}",
year = "2013",
month = "5",
day = "1",
doi = "10.1089/cmb.2012.0268",
language = "English",
volume = "20",
pages = "398--408",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "5",

}

Optimal sparsity criteria for network inference. / Tjärnberg, Andreas; Nordling, Torbjörn; Studham, Matthew; Sonnhammer, Erik L.L.

In: Journal of Computational Biology, Vol. 20, No. 5, 01.05.2013, p. 398-408.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Optimal sparsity criteria for network inference

AU - Tjärnberg, Andreas

AU - Nordling, Torbjörn

AU - Studham, Matthew

AU - Sonnhammer, Erik L.L.

PY - 2013/5/1

Y1 - 2013/5/1

N2 - Gene regulatory network inference (that is, determination of the regulatory interactions between a set of genes) provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/ regularization coefficient, which we call f (zeta), to determine the degree of sparsity of the network estimates, that is, the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular, for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of f. In order to avoid such poor choices, we propose a method for optimization of f, which maximizes the accuracy of the inferred network for any sparsitydependent inference method and data set. Our procedure is based on leave-one-out crossoptimization and selection of the f value that minimizes the prediction error. We also illustrate the adverse effects of noise, few samples, and uninformative experiments on network inference as well as our method for optimization of f. We demonstrate that our f optimization method for two widely used inference algorithms-Glmnet and NIR-gives accurate and informative estimates of the network structure, given that the data is informative enough.

AB - Gene regulatory network inference (that is, determination of the regulatory interactions between a set of genes) provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/ regularization coefficient, which we call f (zeta), to determine the degree of sparsity of the network estimates, that is, the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular, for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of f. In order to avoid such poor choices, we propose a method for optimization of f, which maximizes the accuracy of the inferred network for any sparsitydependent inference method and data set. Our procedure is based on leave-one-out crossoptimization and selection of the f value that minimizes the prediction error. We also illustrate the adverse effects of noise, few samples, and uninformative experiments on network inference as well as our method for optimization of f. We demonstrate that our f optimization method for two widely used inference algorithms-Glmnet and NIR-gives accurate and informative estimates of the network structure, given that the data is informative enough.

UR - http://www.scopus.com/inward/record.url?scp=84881569817&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84881569817&partnerID=8YFLogxK

U2 - 10.1089/cmb.2012.0268

DO - 10.1089/cmb.2012.0268

M3 - Article

C2 - 23641867

AN - SCOPUS:84881569817

VL - 20

SP - 398

EP - 408

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 5

ER -