CSAM: Using clustering-hashing-signal anchoring method to explore human novel genes

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


The expression of genes in mammalian cells can be constitutive, transient, or inducible. Transcripts of transient and inducible genes are difficult to discover using the EST approach. Transiently expressed genes, however, are crucial to embryo development and the pathogenesis of disease because they determine the outcome of disease. Using our new bioinformatics approach, which we believe will facilitate verification of novel transcripts in developing embryos or pathogen-induced cells; we aimed to identify novel exons in transiently expressed genes. First of all, the proposed method uses a general gene predictor that must be able to produce all possibly optimal or suboptimal candidate exons in human. After applying signal processing, an anchoring procedure in the method transforms and groups the candidate sequences into many numeric hashing-signals clusters rapidly. In the meanwhile, an entropy-based theorem in the method can be used to remove the most error matches, repeat matches. Finally, the method generates the resulting exons identified by alignment with other genomic or EST sequence in cross-species. Our results indicated that 3,223 filtered target exons were potential novel exons. The theoretical threshold determined using the computational method for filtering repeat matches had 95.3% sensitivity and 81.8% specificity. The inferential threshold, however, was close to the experimental threshold, which is a practical expected value for considering both sensitivity and specificity. Therefore, our results proved the feasibility of the method. Combining the anchoring method embedded an entropy-based filter with an inherently unreliable gene predictor can be used to obtain a small scope of exons that may be potentially novel because the combination avoids many drawbacks of some traditional gene predictors.

Original languageEnglish
Pages (from-to)1775-1789
Number of pages15
JournalJournal of Computational Biology
Issue number10
Publication statusPublished - 2006 Dec 1

All Science Journal Classification (ASJC) codes

  • Modelling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics


Dive into the research topics of 'CSAM: Using clustering-hashing-signal anchoring method to explore human novel genes'. Together they form a unique fingerprint.

Cite this