Better prediction of protein cellular localization sites with the k nearest neighbors classifier.

P. Horton, K. Nakai

Research output: Contribution to journalArticlepeer-review

414 Citations (Scopus)

Abstract

We have compared four classifiers on the problem of predicting the cellular localization sites of proteins in yeast and E. coli. A set of sequence derived features, such as regions of high hydrophobicity, were used for each classifier. The methods compared were a structured probabilistic model specifically designed for the localization problem, the k nearest neighbors classifier, the binary decision tree classifier, and the naïve Bayes classifier. The result of tests using stratified cross validation shows the k nearest neighbors classifier to perform better than the other methods. In the case of yeast this difference was statistically significant using a cross-validated paired t test. The result is an accuracy of approximately 60% for 10 yeast classes and 86% for 8 E. coli classes. The best previously reported accuracies for these datasets were 55% and 81% respectively.

Original languageEnglish
Pages (from-to)147-152
Number of pages6
JournalProceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology
Volume5
Publication statusPublished - 1997

All Science Journal Classification (ASJC) codes

  • Medicine(all)

Fingerprint Dive into the research topics of 'Better prediction of protein cellular localization sites with the k nearest neighbors classifier.'. Together they form a unique fingerprint.

Cite this