TY - JOUR
T1 - End-to-End Deep Learning Model to Predict and Design Secondary Structure Content of Structural Proteins
AU - Yu, Chi Hua
AU - Chen, Wei
AU - Chiang, Yu Hsuan
AU - Guo, Kai
AU - Martin Moldes, Zaira
AU - Kaplan, David L.
AU - Buehler, Markus J.
N1 - Publisher Copyright:
© 2022 American Chemical Society.
PY - 2022/3/14
Y1 - 2022/3/14
N2 - Structural proteins are the basis of many biomaterials and key construction and functional components of all life. Further, it is well-known that the diversity of proteins' function relies on their local structures derived from their primary amino acid sequences. Here, we report a deep learning model to predict the secondary structure content of proteins directly from primary sequences, with high computational efficiency. Understanding the secondary structure content of proteins is crucial to designing proteins with targeted material functions, especially mechanical properties. Using convolutional and recurrent architectures and natural language models, our deep learning model predicts the content of two essential types of secondary structures, the α-helix and the β-sheet. The training data are collected from the Protein Data Bank and contain many existing protein geometries. We find that our model can learn the hidden features as patterns of input sequences that can then be directly related to secondary structure content. The α-helix and β-sheet content predictions show excellent agreement with training data and newly deposited protein structures that were recently identified and that were not included in the original training set. We further demonstrate the features of the model by a search for de novo protein sequences that optimize max/min α-helix/β-sheet content and compare the predictions with folded models of these sequences based on AlphaFold2. Excellent agreement is found, underscoring that our model has predictive potential for rapidly designing proteins with specific secondary structures and could be widely applied to biomedical industries, including protein biomaterial designs and regenerative medicine applications.
AB - Structural proteins are the basis of many biomaterials and key construction and functional components of all life. Further, it is well-known that the diversity of proteins' function relies on their local structures derived from their primary amino acid sequences. Here, we report a deep learning model to predict the secondary structure content of proteins directly from primary sequences, with high computational efficiency. Understanding the secondary structure content of proteins is crucial to designing proteins with targeted material functions, especially mechanical properties. Using convolutional and recurrent architectures and natural language models, our deep learning model predicts the content of two essential types of secondary structures, the α-helix and the β-sheet. The training data are collected from the Protein Data Bank and contain many existing protein geometries. We find that our model can learn the hidden features as patterns of input sequences that can then be directly related to secondary structure content. The α-helix and β-sheet content predictions show excellent agreement with training data and newly deposited protein structures that were recently identified and that were not included in the original training set. We further demonstrate the features of the model by a search for de novo protein sequences that optimize max/min α-helix/β-sheet content and compare the predictions with folded models of these sequences based on AlphaFold2. Excellent agreement is found, underscoring that our model has predictive potential for rapidly designing proteins with specific secondary structures and could be widely applied to biomedical industries, including protein biomaterial designs and regenerative medicine applications.
UR - http://www.scopus.com/inward/record.url?scp=85125053618&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125053618&partnerID=8YFLogxK
U2 - 10.1021/acsbiomaterials.1c01343
DO - 10.1021/acsbiomaterials.1c01343
M3 - Article
C2 - 35129957
AN - SCOPUS:85125053618
SN - 2373-9878
VL - 8
SP - 1156
EP - 1165
JO - ACS Biomaterials Science and Engineering
JF - ACS Biomaterials Science and Engineering
IS - 3
ER -