## Abstract

Motif discovery is the problem of finding local patterns or motifs from a set of unlabeled sequences. One common representation of a motif is a Markov model known as a score matrix. Matrix based motif discovery has been extensively studied but no positive results have been known regarding its theoretical hardness. We present the first non-trivial upper bound on the complexity (worst-case computation time) of this problem. Other than linear terms, our bound depends only on the motif width w (which is typically 5-20) and is a dramatic improvement relative to previously known bounds. We prove this bound by relating the motif discovery problem to a search problem over permutations of strings of length w, in which the permutations have a particular property. We give a constructive proof of an upper bound on the number of such permutations. For an alphabet size of σ (typically 4) the trivial bound is n ! ≈ (frac(n, e))^{n}, n = σ^{w}. Our bound is roughly n (σ log_{σ} n)^{n}. We relate this theoretical result to the exact motif discovery program, TsukubaBB, whose algorithm contains ideas which inspired the result. We describe a recent improvement to the TsukubaBB program which can give a speed up of nine or more and use a dataset of REB1 transcription factor binding sites to illustrate that exact methods can indeed be used in some practical situations.

Original language | English |
---|---|

Pages (from-to) | 706-713 |

Number of pages | 8 |

Journal | Journal of Discrete Algorithms |

Volume | 5 |

Issue number | 4 SPEC. ISS. |

DOIs | |

Publication status | Published - 2007 Dec |

## All Science Journal Classification (ASJC) codes

- Theoretical Computer Science
- Discrete Mathematics and Combinatorics
- Computational Theory and Mathematics