KSPF: Using gene sequence patterns and data mining for biological knowledge management

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)


Most traditional approaches for annotating protein families are not efficient because of high throughput sequences, complex analytic tools and unordered literature and results cannot be reused. Here, we describe a framework, knowledge sharing for protein families (KSPF), that uses sequence pattern data mining and knowledge management to improve upon traditional approaches. It is divided into three modules: automation, retrieval and refinement. This framework builds an environment that allows biological researchers to submit an unknown protein sequence and search for information on its sub-family. Once this sub-family protein category has been found, the related literature and knowledge records provided by previous users can be retrieved. The possible functions of the protein can then be predicted by use of the literature and records. The proposed framework is applicable to all types of protein families. We describe the search for a plant lipid transfer protein (PLTP) with use of the framework. The system KS-PLTP functions to map an unknown sequence to the sub-family of the PLTP knowledge base and predict the sequence's possible function. The prediction rate of KS-PLTP reached 89.6%.

Original languageEnglish
Pages (from-to)537-545
Number of pages9
JournalExpert Systems With Applications
Issue number3
Publication statusPublished - 2005 Apr

All Science Journal Classification (ASJC) codes

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'KSPF: Using gene sequence patterns and data mining for biological knowledge management'. Together they form a unique fingerprint.

Cite this