Replacing AI Hype in Journal Articles with Accurate Measures of Success

Credits: Kyle Palmer / PPPL Communications

The excitement surrounding machine learning, a form of artificial intelligence, can make it seem like it’s only a matter of time before these techniques are used to solve every scientific problem. While the claims are often impressive, they don’t always hold up to scrutiny. Machine learning may be useful for solving some problems, but it fails to solve others.

In a new article published in Nature Artificial IntelligenceResearchers from the U.S. Department of Energy’s Princeton Plasma Physics Laboratory (PPPL) and Princeton University conducted a systematic review of research comparing machine learning to traditional methods for solving partial differential equations (PDEs) for fluids. These equations are important in many areas of science, including plasma research that supports the development of fusion energy for the electrical grid.

The researchers found that comparisons between machine learning methods for solving fluid-related PDEs and traditional methods are often biased in favor of machine learning methods. They also found that negative results are systematically underestimated. They suggest rules for making fair comparisons, but argue that cultural changes are also needed to address what appear to be systemic problems.

“Our research suggests that while machine learning has great potential, the current literature paints an overly optimistic picture of how machine learning works to solve these particular types of equations,” said Ammar Hakim, PPPL’s deputy director of computational sciences and principal investigator of the research.

Comparison of results with weak reference bases

PDEs are ubiquitous in physics and are particularly useful for explaining natural phenomena, such as heat, fluid flow, and waves. For example, this type of equation can be used to determine temperatures along a spoon placed in hot soup.

Knowing the initial temperature of the soup and the spoon, as well as the type of metal in the spoon, a PDE could be used to determine the temperature at any point along the utensil at a given time after it was placed in the soup. Such equations are used in plasma physics, because many of the equations governing plasmas are mathematically similar to those for fluids.

Scientists and engineers have developed various mathematical approaches to solving PDEs. One of these is known as numerical methods, because it solves problems numerically, rather than analytically or symbolically, to find approximate solutions to problems that are difficult or impossible to solve exactly.

Recently, researchers have been investigating whether machine learning could be used to solve these PDEs. The goal is to solve the problems faster than with other methods.

The systematic review found that in most of the journal articles, machine learning was not as successful as expected. “Our research indicates that there may be cases where machine learning may be slightly faster at solving fluid-related PDEs, but in most cases, numerical methods are faster,” said Nick McGreivy. McGreivy is the lead author of the paper and recently completed his PhD in Princeton’s Plasma Physics Program.

Numerical methods involve a fundamental tradeoff between accuracy and running time. “If you spend more time solving the problem, you’ll get a more accurate answer,” McGreivy said. “A lot of papers haven’t factored that into their comparisons.”

Additionally, there can be a significant difference in speed between numerical methods. To be useful, machine learning methods must outperform the best numerical methods, McGreivy said. Yet his research found that comparisons were often made with numerical methods that were much slower than the fastest methods.

Two rules for making fair comparisons

The paper therefore proposes two rules to try to overcome these problems. The first rule is to only compare machine learning methods to numerical methods of equal accuracy or running time. The second is to compare machine learning methods to an efficient numerical method.

Of the 82 journal articles studied, 76 claimed that the machine learning method performed better than a numerical method. The researchers found that 79% of the articles claiming superiority of a machine learning method actually had a weak baseline, violating at least one of these rules. Four of the journal articles claimed that their results were inferior to those of a numerical method, and two articles claimed similar or varied performance.

“Very few papers have reported worse performance with machine learning, not because machine learning almost always does better, but because researchers almost never publish papers in which machine learning does worse,” McGreivy said.

McGreivy believes that downward comparisons are often driven by perverse incentives in academic publishing. “For a paper to get accepted, it helps to have impressive results. That gives you an incentive to make your machine learning model work as well as possible, which is a good thing. However, you can also get impressive results if the reference method you’re comparing to doesn’t work very well. So you have no incentive to improve your reference, which is bad,” he said.

The net result is that researchers end up working hard on their models, but not on finding the best possible numerical method as a basis for comparison.

The researchers also found evidence of publication bias, including publication bias and results publication bias. Publication bias occurs when a researcher chooses not to publish their results after realizing that their machine learning model does not perform better than a numerical method, while results publication bias can involve eliminating negative results from analyses or using non-standard measures of success that make machine learning models appear to perform better.

Overall, reporting biases tend to suppress negative results and create a general impression that machine learning is more effective than it is at solving fluid-related PDEs. “There is a lot of hype in the field. We hope our work will establish guidelines for principled approaches to using machine learning to improve the state of the art,” Hakim said.

To overcome these systemic and cultural problems, Hakim argues that research funding agencies and major conferences should adopt policies to prevent the use of weak baselines or require a more detailed description of the baseline used and why it was selected.

“They need to encourage their researchers to be skeptical of their own results,” Hakim said. “If I find results that seem too good to be true, they probably are.”

More information:
Weak baselines and reporting biases lead to overoptimism in machine learning for fluid-related partial differential equations, Nature Artificial Intelligence (2024). DOI: 10.1038/s42256-024-00897-5. www.nature.com/articles/s42256-024-00897-5

Provided by the Princeton Plasma Physics Laboratory

Quote:Replacing AI Hype in Journal Articles with Accurate Measures of Success (2024, September 25) retrieved September 25, 2024 from

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.

Replacing AI Hype in Journal Articles with Accurate Measures of Success

Banks and War.. Have Conflicts Turned into Profitable Investments? | Economy

Study suggests major increase in carbon capture and storage is essential to meet 2°C climate target

Study suggests major increase in carbon capture and storage is essential to meet 2°C climate target

Leave a Reply Cancel reply

Category