RaPiDS: an algorithm for rapid expression profile database search.

Brice Horton Ii Paul, Larisa Kiseleva, Wataru Fujibuchi

Research output: Contribution to journalArticlepeer-review

14 Citations (Scopus)


In this paper we present a fast algorithm and implementation for computing the Spearman rank correlation (SRC) between a query expression profile and each expression profile in a database of profiles. The algorithm is linear in the size of the profile database with a very small constant factor. It is designed to efficiently handle multiple profile platforms and missing values. We show that our specialized algorithm and C++ implementation can achieve an approximately 100-fold speed-up over a reasonable baseline implementation using Perl hash tables. RaPiDS is designed for general similarity search rather than classification - but in order to attempt to classify the usefulness of SRC as a similarity measure we investigate the usefulness of this program as a classifier for classifying normal human cell types based on gene expression. Specifically we use the k nearest neighbor classifier with a t statistic derived from SRC as the similarity measure for profile pairs. We estimate the accuracy using a jackknife test on the microarray data with manually checked cell type annotation. Preliminary results suggest the measure is useful (64% accuracy on 1,685 profiles vs. the majority class classifier's 17.5%) for profiles measured under similar conditions (same laboratory and chip platform); but requires improvement when comparing profiles from different experimental series.

Original languageEnglish
Pages (from-to)67-76
Number of pages10
JournalGenome informatics. International Conference on Genome Informatics
Issue number2
Publication statusPublished - 2006 Jan 1

All Science Journal Classification (ASJC) codes

  • Medicine(all)


Dive into the research topics of 'RaPiDS: an algorithm for rapid expression profile database search.'. Together they form a unique fingerprint.

Cite this