Protein acetylation, which is catalyzed by acetyltransferases, is a type of post-translational modification and crucial to numerous essential biological processes, including transcriptional regulation, apoptosis, and cytokine signaling. As the experimental identification of protein acetylation sites is time consuming and laboratory intensive, several computational approaches have been developed for identifying the candidates of experimental validation. In this work, solvent accessibility and the physicochemical properties of proteins are utilized to identify acetylated alanine, glycine, lysine, methionine, serine, and threonine. A two-stage support vector machine was applied to learn the computational models with combinations of amino acid sequences, and the accessible surface area and physicochemical properties of proteins. The predictive accuracy thus achieved is 5% to 14% higher than that of models trained using only amino acid sequences. Additionally, the substrate specificity of the acetylated site was investigated in detail with reference to the subcellular colocalization of acetyltransferases and acetylated proteins. The proposed method, N-Ace, is evaluated using independent test sets in various acetylated residues and predictive accuracies of 90% were achieved, indicating that the performance of N-Ace is comparable with that of other acetylation prediction methods. N-Ace not only provides a user-friendly input/output interface but also is a creative method for predicting protein acetylation sites. This novel analytical resource is now freely available at.
All Science Journal Classification (ASJC) codes
- Computational Mathematics