Artificial intelligence has learned to master language, generate art, and even beat grandmasters at chess. But can he crack the code of abstract reasoning, those tricky visual puzzles that leave humans perplexed?
Researchers at the USC Viterbi School of Engineering’s Information Science Institute (ISI) are putting AI’s cognitive capabilities to the test, pushing large multimodal language models (MLLMs) to solve problems visuals once reserved for human IQ tests. The result? A look at how far AI has come and where it still stumbles.
USC Viterbi ISI research assistants Kian Ahrabian and Zhivar Sourati recently studied whether MLLMs could perform nonverbal abstract reasoning, tasks that require both visual perception and logical reasoning, and presented their results at the Conference on Language Modeling (COLM 2024) in Philadelphia, Pennsylvania, on October 7. –9, 2024. The work is also available on the arXiv preprint server.
Jay Pujara, research associate professor of computer science at the USC Viterbi School of Engineering and author of the paper, said: “Every day we are bombarded with new headlines about what AI can (and can’t) not) do, which are often very controversial. We still have a very limited understanding of what new AI models can do, and until we understand these limitations, we won’t be able to make AI better, safer, and more useful. This article helps fill in a missing part of the story. AI is in trouble. »
The challenge: can AI see and think?
“We wanted to see if this new generation of large models, capable of processing images, could reason on its own,” explained Ahrabian. “For example, if you see a yellow circle turn into a blue triangle, can the model apply the same pattern in a different scenario? »
To answer this question, the team tested 24 different MLLMs on puzzles based on Raven’s progressive matrices, a well-known test of abstract reasoning. They found that open source models faced serious difficulties. “They were really mean. They couldn’t get anything out of it,” Ahrabian said plainly.
In contrast, closed-source models, such as GPT-4V (models developed by private companies and not publicly available for modification), performed better. These models are typically trained with more advanced resources, including larger datasets and more powerful computing systems, giving them a notable advantage. “We saw non-trivial results with closed-source models,” Ahrabian added. “Specifically, GPT-4V was relatively good at reasoning, but it is far from perfect.”
Where AI stumbles
A key part of the study was analyzing where these models failed. One of the key issues was AI’s ability to accurately process visual information. “We wanted to know if the models could see details, like colors or lines colliding, and if that’s where they were going wrong,” Ahrabian said.
To isolate the problem, the researchers provided detailed text descriptions of the images, ensuring that the models had all the necessary information in a different format. “Even when we removed the visual element and just gave them text, many models still couldn’t reason effectively. “, explained Sourati.
This revealed a crucial insight: the problem was not just with visual processing, but also with reasoning itself. Now the team had a clearer picture of what wasn’t working, allowing them to refine their focus and direct future improvements.
The way forward: improving AI reasoning
One promising method explored by researchers was “chain of thought,” where the AI is prompted to think step by step through reasoning tasks. This approach has led to significant improvements in some cases. “By guiding the models with cues, we were able to see a performance improvement of up to 100%,” Ahrabian noted.
Despite remaining challenges, researchers are optimistic. The study results highlight both the current limitations of AI and the exciting possibilities for future advancements. As these models continue to develop, USC research could pave the way for AI that not only understands but reasons, blurring the line between artificial intelligence and human cognition.
More information:
Kian Ahrabian et al, The curious case of nonverbal abstract reasoning with large multimodal language models, arXiv (2024). DOI: 10.48550/arxiv.2401.12117
arXiv
Provided by University of Southern California
Quote: Can advanced AI solve visual puzzles and perform abstract reasoning? (October 9, 2024) retrieved October 9, 2024 from
This document is subject to copyright. Except for fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.