TY - JOUR
T1 - ColGen
T2 - An end-to-end deep learning model to predict thermal stability of de novo collagen sequences
AU - Yu, Chi Hua
AU - Khare, Eesha
AU - Narayan, Om Prakash
AU - Parker, Rachael
AU - Kaplan, David L.
AU - Buehler, Markus J.
N1 - Publisher Copyright:
© 2021 Elsevier Ltd
PY - 2022/1
Y1 - 2022/1
N2 - Collagen is the most abundant structural protein in humans, with dozens of sequence variants accounting for over 30% of the protein in an animal body. The fibrillar and hierarchical arrangements of collagen are critical in providing mechanical properties with high strength and toughness. Due to this ubiquitous role in human tissues, collagen-based biomaterials are commonly used for tissue repairs and regeneration, requiring chemical and thermal stability over a range of temperatures during materials preparation ex vivo and subsequent utility in vivo. Collagen unfolds from a triple helix to a random coil structure during a temperature interval in which the midpoint or Tm is used as a measure to evaluate the thermal stability of the molecules. However, finding a robust framework to facilitate the design of a specific collagen sequence to yield a specific Tm remains a challenge, including using conventional molecular dynamics modeling. Here we propose a de novo framework to provide a model that outputs the Tm values of input collagen sequences by incorporating deep learning trained on a large data set of collagen sequences and corresponding Tm values. By using this framework, we are able to quickly evaluate how mutations and order in the primary sequence affect the stability of collagen triple helices. Specifically, we confirm that mutations to glycines, mutations in the middle of a sequence, and short sequence lengths cause the greatest drop in Tm values.
AB - Collagen is the most abundant structural protein in humans, with dozens of sequence variants accounting for over 30% of the protein in an animal body. The fibrillar and hierarchical arrangements of collagen are critical in providing mechanical properties with high strength and toughness. Due to this ubiquitous role in human tissues, collagen-based biomaterials are commonly used for tissue repairs and regeneration, requiring chemical and thermal stability over a range of temperatures during materials preparation ex vivo and subsequent utility in vivo. Collagen unfolds from a triple helix to a random coil structure during a temperature interval in which the midpoint or Tm is used as a measure to evaluate the thermal stability of the molecules. However, finding a robust framework to facilitate the design of a specific collagen sequence to yield a specific Tm remains a challenge, including using conventional molecular dynamics modeling. Here we propose a de novo framework to provide a model that outputs the Tm values of input collagen sequences by incorporating deep learning trained on a large data set of collagen sequences and corresponding Tm values. By using this framework, we are able to quickly evaluate how mutations and order in the primary sequence affect the stability of collagen triple helices. Specifically, we confirm that mutations to glycines, mutations in the middle of a sequence, and short sequence lengths cause the greatest drop in Tm values.
UR - http://www.scopus.com/inward/record.url?scp=85118567294&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118567294&partnerID=8YFLogxK
U2 - 10.1016/j.jmbbm.2021.104921
DO - 10.1016/j.jmbbm.2021.104921
M3 - Article
C2 - 34758444
AN - SCOPUS:85118567294
SN - 1751-6161
VL - 125
JO - Journal of the Mechanical Behavior of Biomedical Materials
JF - Journal of the Mechanical Behavior of Biomedical Materials
M1 - 104921
ER -