Predicting Spatial and Temporal Gene Expression Using an Integrative Model of Transcription Factor Occupancy and Chromatin State

Bartek Wilczynski, Ya Hsin Liu, Zhen Xuan Yeo, Eileen E.M. Furlong

Research output: Contribution to journalArticle

29 Citations (Scopus)

Abstract

Precise patterns of spatial and temporal gene expression are central to metazoan complexity and act as a driving force for embryonic development. While there has been substantial progress in dissecting and predicting cis-regulatory activity, our understanding of how information from multiple enhancer elements converge to regulate a gene's expression remains elusive. This is in large part due to the number of different biological processes involved in mediating regulation as well as limited availability of experimental measurements for many of them. Here, we used a Bayesian approach to model diverse experimental regulatory data, leading to accurate predictions of both spatial and temporal aspects of gene expression. We integrated whole-embryo information on transcription factor recruitment to multiple cis-regulatory modules, insulator binding and histone modification status in the vicinity of individual gene loci, at a genome-wide scale during Drosophila development. The model uses Bayesian networks to represent the relation between transcription factor occupancy and enhancer activity in specific tissues and stages. All parameters are optimized in an Expectation Maximization procedure providing a model capable of predicting tissue- and stage-specific activity of new, previously unassayed genes. Performing the optimization with subsets of input data demonstrated that neither enhancer occupancy nor chromatin state alone can explain all gene expression patterns, but taken together allow for accurate predictions of spatio-temporal activity. Model predictions were validated using the expression patterns of more than 600 genes recently made available by the BDGP consortium, demonstrating an average 15-fold enrichment of genes expressed in the predicted tissue over a naïve model. We further validated the model by experimentally testing the expression of 20 predicted target genes of unknown expression, resulting in an accuracy of 95% for temporal predictions and 50% for spatial. While this is, to our knowledge, the first genome-wide approach to predict tissue-specific gene expression in metazoan development, our results suggest that integrative models of this type will become more prevalent in the future.

Original languageEnglish
Article numbere1002798
JournalPLoS Computational Biology
Volume8
Issue number12
DOIs
Publication statusPublished - 2012 Dec 1

Fingerprint

Transcription factors
Chromatin
Transcription Factor
Gene expression
Gene Expression
gene expression
chromatin
Transcription Factors
transcription factors
Genes
Gene
gene
Tissue
prediction
metazoan
Prediction
Genome
Histone Code
Model
genes

All Science Journal Classification (ASJC) codes

  • Ecology, Evolution, Behavior and Systematics
  • Modelling and Simulation
  • Ecology
  • Molecular Biology
  • Genetics
  • Cellular and Molecular Neuroscience
  • Computational Theory and Mathematics

Cite this

@article{1f2f409a3c474da486046fa5d0fa718a,
title = "Predicting Spatial and Temporal Gene Expression Using an Integrative Model of Transcription Factor Occupancy and Chromatin State",
abstract = "Precise patterns of spatial and temporal gene expression are central to metazoan complexity and act as a driving force for embryonic development. While there has been substantial progress in dissecting and predicting cis-regulatory activity, our understanding of how information from multiple enhancer elements converge to regulate a gene's expression remains elusive. This is in large part due to the number of different biological processes involved in mediating regulation as well as limited availability of experimental measurements for many of them. Here, we used a Bayesian approach to model diverse experimental regulatory data, leading to accurate predictions of both spatial and temporal aspects of gene expression. We integrated whole-embryo information on transcription factor recruitment to multiple cis-regulatory modules, insulator binding and histone modification status in the vicinity of individual gene loci, at a genome-wide scale during Drosophila development. The model uses Bayesian networks to represent the relation between transcription factor occupancy and enhancer activity in specific tissues and stages. All parameters are optimized in an Expectation Maximization procedure providing a model capable of predicting tissue- and stage-specific activity of new, previously unassayed genes. Performing the optimization with subsets of input data demonstrated that neither enhancer occupancy nor chromatin state alone can explain all gene expression patterns, but taken together allow for accurate predictions of spatio-temporal activity. Model predictions were validated using the expression patterns of more than 600 genes recently made available by the BDGP consortium, demonstrating an average 15-fold enrichment of genes expressed in the predicted tissue over a na{\"i}ve model. We further validated the model by experimentally testing the expression of 20 predicted target genes of unknown expression, resulting in an accuracy of 95{\%} for temporal predictions and 50{\%} for spatial. While this is, to our knowledge, the first genome-wide approach to predict tissue-specific gene expression in metazoan development, our results suggest that integrative models of this type will become more prevalent in the future.",
author = "Bartek Wilczynski and Liu, {Ya Hsin} and Yeo, {Zhen Xuan} and Furlong, {Eileen E.M.}",
year = "2012",
month = "12",
day = "1",
doi = "10.1371/journal.pcbi.1002798",
language = "English",
volume = "8",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "12",

}

Predicting Spatial and Temporal Gene Expression Using an Integrative Model of Transcription Factor Occupancy and Chromatin State. / Wilczynski, Bartek; Liu, Ya Hsin; Yeo, Zhen Xuan; Furlong, Eileen E.M.

In: PLoS Computational Biology, Vol. 8, No. 12, e1002798, 01.12.2012.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Predicting Spatial and Temporal Gene Expression Using an Integrative Model of Transcription Factor Occupancy and Chromatin State

AU - Wilczynski, Bartek

AU - Liu, Ya Hsin

AU - Yeo, Zhen Xuan

AU - Furlong, Eileen E.M.

PY - 2012/12/1

Y1 - 2012/12/1

N2 - Precise patterns of spatial and temporal gene expression are central to metazoan complexity and act as a driving force for embryonic development. While there has been substantial progress in dissecting and predicting cis-regulatory activity, our understanding of how information from multiple enhancer elements converge to regulate a gene's expression remains elusive. This is in large part due to the number of different biological processes involved in mediating regulation as well as limited availability of experimental measurements for many of them. Here, we used a Bayesian approach to model diverse experimental regulatory data, leading to accurate predictions of both spatial and temporal aspects of gene expression. We integrated whole-embryo information on transcription factor recruitment to multiple cis-regulatory modules, insulator binding and histone modification status in the vicinity of individual gene loci, at a genome-wide scale during Drosophila development. The model uses Bayesian networks to represent the relation between transcription factor occupancy and enhancer activity in specific tissues and stages. All parameters are optimized in an Expectation Maximization procedure providing a model capable of predicting tissue- and stage-specific activity of new, previously unassayed genes. Performing the optimization with subsets of input data demonstrated that neither enhancer occupancy nor chromatin state alone can explain all gene expression patterns, but taken together allow for accurate predictions of spatio-temporal activity. Model predictions were validated using the expression patterns of more than 600 genes recently made available by the BDGP consortium, demonstrating an average 15-fold enrichment of genes expressed in the predicted tissue over a naïve model. We further validated the model by experimentally testing the expression of 20 predicted target genes of unknown expression, resulting in an accuracy of 95% for temporal predictions and 50% for spatial. While this is, to our knowledge, the first genome-wide approach to predict tissue-specific gene expression in metazoan development, our results suggest that integrative models of this type will become more prevalent in the future.

AB - Precise patterns of spatial and temporal gene expression are central to metazoan complexity and act as a driving force for embryonic development. While there has been substantial progress in dissecting and predicting cis-regulatory activity, our understanding of how information from multiple enhancer elements converge to regulate a gene's expression remains elusive. This is in large part due to the number of different biological processes involved in mediating regulation as well as limited availability of experimental measurements for many of them. Here, we used a Bayesian approach to model diverse experimental regulatory data, leading to accurate predictions of both spatial and temporal aspects of gene expression. We integrated whole-embryo information on transcription factor recruitment to multiple cis-regulatory modules, insulator binding and histone modification status in the vicinity of individual gene loci, at a genome-wide scale during Drosophila development. The model uses Bayesian networks to represent the relation between transcription factor occupancy and enhancer activity in specific tissues and stages. All parameters are optimized in an Expectation Maximization procedure providing a model capable of predicting tissue- and stage-specific activity of new, previously unassayed genes. Performing the optimization with subsets of input data demonstrated that neither enhancer occupancy nor chromatin state alone can explain all gene expression patterns, but taken together allow for accurate predictions of spatio-temporal activity. Model predictions were validated using the expression patterns of more than 600 genes recently made available by the BDGP consortium, demonstrating an average 15-fold enrichment of genes expressed in the predicted tissue over a naïve model. We further validated the model by experimentally testing the expression of 20 predicted target genes of unknown expression, resulting in an accuracy of 95% for temporal predictions and 50% for spatial. While this is, to our knowledge, the first genome-wide approach to predict tissue-specific gene expression in metazoan development, our results suggest that integrative models of this type will become more prevalent in the future.

UR - http://www.scopus.com/inward/record.url?scp=84872020892&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84872020892&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1002798

DO - 10.1371/journal.pcbi.1002798

M3 - Article

C2 - 23236268

AN - SCOPUS:84872020892

VL - 8

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 12

M1 - e1002798

ER -