Because the combinations are essentially endless, figuring out which combinations of materials will make the most effective polymers is a monumental and time-consuming task. To help with this work, researchers at Georgia Tech have developed a machine-learning model that could revolutionize how scientists and manufacturers virtually search the chemical space to identify and develop these all-important polymers. The U.S. National Science Foundation-supported team published its findings in Nature Communications.
The work was conceived and guided by engineer Rampi Ramprasad at Georgia Tech. The new tool aims to overcome the challenges of searching the large chemical space of polymers. Trained on a massive dataset of 80 million polymer chemical structures, polyBERT, as it's called, has become an expert in understanding the language of polymers.
"This is a novel application of language models within polymer informatics," said Ramprasad. "While natural language models may be used to extract materials data from the literature, here, we aim such capabilities at understanding the complex grammar and syntax followed by atoms as they come together to make up polymers."
PolyBERT treats chemical structures and connectivity of atoms as a form of chemical language and uses techniques inspired by natural language processing to extract the most meaningful information from chemical structures. The tool uses Transformer architecture, used in natural language models, to capture the patterns and relationships and learn the grammar and syntax that occur at the atomic and higher levels in the polymer structure.
Speed is one remarkable advantage of polyBERT. Compared to traditional methods, polyBERT is over two orders of magnitude faster. This high-speed capability makes polyBERT an ideal tool for high-throughput polymer informatics pipelines, the researchers said, allowing for the rapid screening of massive polymer spaces. With advancements in graphics processing unit technology, the computation time for polyBERT fingerprints is expected to improve even further, according to the researchers.
PolyBERT's multitask deep neural networks enable it to simultaneously predict multiple properties of polymers, leveraging hidden correlations within the data. This approach outperforms single-task models, enhancing the accuracy of property predictions. Property predictions for large datasets by polyBERT can offer valuable insights into the true limits of the polymer property space. Researchers can establish standardized benchmarks, explore uncharted areas, and even facilitate the direct selection of polymers with specific properties. By analyzing the chemical relevance of polyBERT-generated fingerprints scientists can unravel the functions and interactions of different structural components in polymers. This opens possibilities for designing polymers based on an even wider array of specific properties.
The dataset comprising 100 million hypothetical polymers and their predictions for 29 properties is now available for academic use. This vast collection, generated using polyBERT, presents researchers with ample opportunities to delve into the polymer universe, unlocking new discoveries, design rules, and practical applications.
“Researchers funded by the NSF Partnership for Innovation program are developing a new artificial intelligence tool to overcome the challenge of determining which combinations of chemicals will make the most effective polymers,” says Debora Rodrigues, a program director in NSF’s Directorate for Technology, Innovation and Partnerships. “They’re using AI to train on a massive dataset of 80 million polymer chemical structures, allowing for the rapid screening of diverse polymers without the need of laboratory experimentations.”
Learn more at https://research.gatech.edu and www.nsf.gov.