Abstract
The copolymer, a widely used material in our daily lives, presents a significant challenge in targeted sequence design. While recent advancements in computational simulation and data science offer a promising avenue for addressing this complex issue, challenges persist in labeled data scarcity. In this study, we introduce an uncertainty-based active learning framework for predicting the properties of random copolymers. We found that the active learning strategy allowed for labeling only 40 data points within the design space of 1550 data points, drastically reducing the labeling efforts by 97%. Most data selected by active learning were positioned on the design space's periphery, transforming the learning task into an interpolation problem. Through integrating active learning and molecular dynamics, we successfully overcame the combinatorial explosion problem in copolymer sequence design, streamlining the data labeling process and culminating in a highly accurate model. This research demonstrates data science's potential in polymer design, especially when facing data scarcity.
Original language | English |
---|---|
Article number | 113489 |
Journal | Computational Materials Science |
Volume | 247 |
DOIs | |
Publication status | Published - 2025 Jan 31 |
All Science Journal Classification (ASJC) codes
- General Computer Science
- General Chemistry
- General Materials Science
- Mechanics of Materials
- General Physics and Astronomy
- Computational Mathematics