Scanning electron microscope image of tin crystals, stimulated by electricity and growing on a copper surface. A new method developed by Princeton researchers could speed up the process of designing and testing new crystalline materials. Credit: Lynn Trahey, Argonne National Laboratory
Princeton researchers have created an artificial intelligence (AI) tool to predict the behavior of crystalline materials, a key step in advancing technologies such as batteries and semiconductors. Although computer simulations are commonly used in crystal design, the new method relies on a large language model, similar to those that power text generators like ChatGPT.
By synthesizing information from textual descriptions including details such as the length and angles of bonds between atoms and measurements of electronic and optical properties, the new method can predict the properties of new materials with greater precision and depth than existing simulations, and potentially speed up the process. to design and test new technologies.
The researchers developed a textual repository consisting of descriptions of more than 140,000 crystals from the Materials project, then used it to train an adapted version of a large language model called T5, originally created by Google Research. They tested the tool’s ability to predict the properties of previously studied crystal structures, from ordinary table salt to silicon semiconductors. Now that they have demonstrated its predictive power, they are working to apply the tool to the design of new crystalline materials.
The method, presented Nov. 29 at the fall meeting of the Materials Research Society in Boston, represents a new benchmark that could help accelerate the discovery of materials for a wide range of applications, according to the lead author of the study Adji Bousso Dieng, assistant professor of computer science at Princeton.
The paper describing the method, “LLM-Prop: Predicting the physical and electronic properties of crystalline solids from their textual descriptions,” is now published on arXiv preprint server.
Existing AI-based tools for predicting crystal properties rely on methods called graphical neural networks, but these have limited computing power and cannot adequately capture the nuances of geometry and the lengths of bonds between atoms in a crystal, as well as electronic and optical properties. which result from these structures. Dieng’s team is the first to tackle the problem using large language models, she said.
“We’ve made huge strides in computer vision and natural language,” Dieng said, “but we’re not yet very advanced when it comes to processing graphics (in AI). So I wanted to move from graphics to actual translation in an area where we already have great tools. If we have text, then we can leverage all of these powerful (great language models) on that text.
The language model approach “gives us a whole new way to approach the problem” of materials design, said study co-author Craig Arnold, the Susan Dod Brown Professor of Mechanical and Aerospace Engineering. at Princeton and associate dean for innovation. “It’s really about how do I access all this knowledge that humanity has developed and how can I process this knowledge moving forward? It’s typically different from our current approaches, and I think it That’s what gives him a lot of power.”
For insight into the challenges of designing crystals, Dieng and Ph.D. student Andre Niyongabo Rubungo teamed up with Arnold and with Barry Rand, professor of electrical and computer engineering and the Andlinger Center for Energy and Energy. environment that focuses on semiconductor materials and solar energy. Arnold is interested in laser-material interactions, with applications for energy storage.
“The materials of our world are all those that have been developed through scientific hypothesis testing and sometimes luck,” Rand said. This process “leads to good results, but it takes time. With artificial intelligence methods, we could really speed this up.” Plus, he added, “it allows us to identify things that we humans probably couldn’t intuit.”
Given a crystal with a particular composition of chemical elements, the team’s method can predict properties including the bandgap, which is related to the electronic states and conductivity of the crystal.
“If you can predict that with great accuracy, when you then undertake careful work of experimentation, you can be more confident that it will result in success,” Rand said.
doctoral student Rubungo received the best poster award for presenting his work to materials researchers at the fall meeting. Many have been surprised by the power of large linguistic models in this context. The field is more accustomed to structured data used as input for graphical neural networks, but “texts are easier to process,” Rubungo said. “It’s easier to include the information you want in your description, edit the tool, and remove what you don’t want. People were very excited to see that.”
As a new tool, he pointed out, the prediction method has limitations. It uses more computing power and is slower than graph neural networks typically used for this purpose. It could also benefit from expanded training data to strengthen its ability to predict the properties of new materials.
Dieng continues his collaborations with other materials researchers and aims to expand his work beyond crystals to a wider variety of materials. “This is a nascent area of research, and what advances research is having a well-established, well-curated reference,” she said. “We are gathering more datasets into a repository that will be hosted at Princeton for researchers to use.”
More information:
Andre Niyongabo Rubungo et al, LLM-Prop: Predicting the physical and electronic properties of crystalline solids from their textual descriptions, arXiv (2023). DOI: 10.48550/arxiv.2310.14029
arXiv
Provided by Princeton University
Quote: Researchers leverage large language models to accelerate materials discovery (January 29, 2024) retrieved January 29, 2024 from
This document is subject to copyright. Except for fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.