Proteomics aims at determining the structure, function and expression of proteins. High-throughput mass spectrometry (MS) is emerging as a leading technique in the proteomics revolution. Though it can be used to find disease-related protein patterns in mixtures of proteins derived from easily obtained samples, key challenges remain in the processing of proteomic MS data. Multiscale mathematical tools such as wavelets play an important role in signal processing and statistical data analysis. A wavelet-based algorithm for proteomic data processing is developed. A MATLAB implementation of the software package, called WaveSpect0, is presented including processing procedures of step-interval unification, adaptive stationary discrete wavelet denoising, baseline correction using splines, normalization, peak detection, and a newly designed peak alignment method using clustering techniques. Applications to real MS data sets for different cancer research projects in Vanderbilt Ingram Cancer Center show that the algorithm is efficient and satisfactory in MS data mining.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Computational Mathematics
- Computational Theory and Mathematics
- Applied Mathematics