Credit: CC0 Public domain
Artificial intelligence is becoming an essential tool in chemical research, offering new methods to address the complex challenges that traditional approaches struggle with. A subtype of artificial intelligence that is increasingly used in chemistry is machine learning, which uses algorithms and statistical models to make data-driven decisions and perform tasks for which it has not previously been explicitly programmed.
However, to make reliable predictions, machine learning also requires large amounts of data, which is not always available in chemical research. Small chemical data sets simply do not provide enough information for these algorithms to train on, limiting their effectiveness.
Scientists from Berend Smit’s team at EPFL found a solution in large language models such as GPT-3. These models are pre-trained on huge amounts of text and are known for their vast human-like text comprehension and generation capabilities. GPT-3 forms the basis of the most popular ChatGPT artificial intelligence.
The study, published in Intelligence of natural machines, unveils a new approach that significantly simplifies chemical analysis using artificial intelligence. Contrary to initial skepticism, the method does not directly ask chemical questions of GPT-3.
“GPT-3 hasn’t seen most of the chemical literature, so if we ask ChatGPT a chemical question, the answers are usually limited to what can be found on Wikipedia,” says lead researcher Kevin Jablonka of the study.
“Instead, we refine GPT-3 with a small dataset converted to Q&A, creating a new model capable of providing precise chemical information.”
This process involves providing GPT-3 with an organized list of questions and answers. “For example, for high-entropy alloys, it is important to know whether an alloy occurs in a single phase or whether it has multiple phases,” says Smit. “The curated list of questions and answers is something like: Q = “Is (high entropy alloy name) single phase?” » A= ‘Yes/No.'”
He continues: “In the literature, we found many alloys whose response is known, and we used this data to refine GPT-3. What we get is a refined AI model that is trained solely to answer this question. with a yes or a no.”
In testing, the model, trained with relatively few questions and answers, correctly answered more than 95% of a wide variety of chemical problems, often exceeding the accuracy of state-of-the-art machine learning models. “The fact is, it’s as simple as doing a literature search, which works for many chemical problems,” says Smit.
One of the most striking aspects of this study is its simplicity and speed. Traditional machine learning models take months to develop and require extensive knowledge. In contrast, the approach developed by Jablonka takes five minutes and requires no knowledge.
The implications of the study are profound. It presents a method as simple as a literature search, applicable to various chemical problems. The ability to formulate questions such as “Is the yield of a (chemical) made with this (recipe) high?” » and receiving accurate answers can revolutionize the way chemical research is planned and carried out.
In the article, the authors state: “Alongside a literature search, interrogating a founding model (e.g. GPT-3,4) could become a common way to start a project by exploiting collective knowledge encoded in these founding models. Or, as Smit succinctly puts it, “It’s going to change the way we do chemistry.”
More information:
Kevin Maik Jablonka, Is GPT All You Need for Weak Data Discovery in Chemistry? Intelligence of natural machines (2024). DOI: 10.1038/s42256-023-00788-1
Provided by the Ecole Polytechnique Fédérale de Lausanne
Quote: GPT-3 transforms chemical research (February 6, 2024) retrieved February 6, 2024 from
This document is subject to copyright. Apart from fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for information only.