An upper bound on the hardness of exact matrix based motif discovery

Paul Horton, Wataru Fujibuchi

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Motif discovery is the problem of finding local patterns or motifs from a set of unlabeled sequences. One common representation of a motif is a Markov model known as a score matrix. Matrix based motif discovery has been extensively studied but no positive results have been known regarding its theoretical hardness. We present the first non-trivial upper bound on the complexity (worst-case computation time) of this problem. Other than linear terms, our bound depends only on the motif width w (which is typically 5-20) and is a dramatic improvement relative to previously known bounds. We prove this bound by relating the motif discovery problem to a search problem over permutations of strings of length w, in which the permutations have a particular property. We give a constructive proof of an upper bound on the number of such permutations. For an alphabet size of σ (typically 4) the trivial bound is n ! ≈ (frac(n, e))n, n = σw. Our bound is roughly n (σ logσ n)n. We relate this theoretical result to the exact motif discovery program, TsukubaBB, whose algorithm contains ideas which inspired the result. We describe a recent improvement to the TsukubaBB program which can give a speed up of nine or more and use a dataset of REB1 transcription factor binding sites to illustrate that exact methods can indeed be used in some practical situations.

Original languageEnglish
Pages (from-to)706-713
Number of pages8
JournalJournal of Discrete Algorithms
Issue number4 SPEC. ISS.
Publication statusPublished - 2007 Dec

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Discrete Mathematics and Combinatorics
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'An upper bound on the hardness of exact matrix based motif discovery'. Together they form a unique fingerprint.

Cite this