An integration-based execution monitoring scheme using fast and slow language model reasoners in concert. Credit: arXiv (2024). DOI: 10.48550/arxiv.2407.08735
Large language models (LLMs), such as OpenAI’s ChatGPT, are known to be very effective at answering a wide range of user queries, and generalize well to many natural language processing (NLP) tasks. Recently, some studies have also explored the potential of these models to detect and mitigate failures in robotic systems.
Researchers from Stanford University and NVIDIA recently presented a new two-stage framework that could make it easier to use LLMs to detect system anomalies and plan robotic actions in real time.
This approach, presented in a paper that won the best paper award at the Robotics: Science and Systems (RSS 2024) conference, could significantly improve the reliability of various robotic systems, including autonomous vehicles. The work is available on the site arXiv preprint server.
“This line of work started when we came across examples of real-world failure modes of autonomous vehicles, such as a case where a self-driving car was confused by a set of traffic lights being carried by a truck or a case where a self-driving car stopped on the highway because it passed a billboard with an image of a stop sign on it,” Rohan Sinha, co-author of the paper, told Tech Xplore.
“Such examples are often called out-of-distribution (OOD) inputs, rare special cases that differ significantly from the data on which AVs are trained.”
In their previous studies, Sinha and his collaborators identified OOD failures that still hamper the performance of autonomous vehicles. They then sought to determine how well existing OOD detection methods could detect these failures.
“For example, existing methods that track visual novelty have struggled to detect these outliers, because seeing stop signs, billboards, or similar objects is not visually novel relative to the training data; it is only once such objects appear on billboards that they become anomalous,” Sinha said.
“Furthermore, we found that these types of failure modes are not easily attributed to a failure of a specific component (e.g., a perception system), but rather reflect system-level deficiencies in contextual reasoning. This makes them difficult to detect with existing component-level monitoring techniques.”
In a paper published in 2023, the researchers demonstrated the potential of LLMs to detect and understand these “semantic anomalies.” But to effectively use these models to prevent OOD failures affecting autonomous robots, they first had to overcome two major research challenges.
“First, we had to mitigate the computational costs of LLMs to enable real-time responsiveness: the best LLMs are very large, which makes them very slow, which is not very practical for a fast-moving robot,” Sinha said.
“Second, we need to integrate LLM-based reasoners into the control of dynamic and agile robots. The goal of our recent paper was to address these two key challenges and thus demonstrate that LLMs can significantly increase the safety of autonomous robots.”
Compared to other computational models, LLMs can be slow to process information. The main reason for this is that to create new text, they generate tokens in an autoregressive and individual manner. To generate a chain-of-thought text that reasons about what a robot should do (i.e., plan a robot’s actions), the transformer models underlying LLM must therefore predict hundreds or even thousands of tokens one by one.
Closed-loop trajectory of a quadcopter using the AESOP algorithm. Credit: arXiv (2024). DOI: 10.48550/arxiv.2407.08735
“To overcome this limitation, we propose a two-stage reasoning pipeline, where the first (fast) stage exploits intermediate outputs, a single integration resulting from a single direct pass through a transformer model, to enable low-latency responsiveness,” Sinha explained.
“In the second (slow) stage, we still rely on the full capabilities of the larger models’ generative thought chain to make unprecedented decisions on OOD scenarios that have never been recorded in the data before.”
Sinha and colleagues first created a database of semantic embedding vectors using an offline baseline LLM model and an existing dataset of nominal experiments. At runtime, the team’s framework embeds what a robot is currently observing and calculates the similarity of the observation’s embedding to those included in the embedding dataset. This is the first step of their model (i.e., the fast step).
“If the observation is similar to previous observations, we proceed with the decisions made by the basic autonomy stack,” Sinha said. “If the observation is anomalous, we query a large model to determine which safety-preserving intervention to take (step 2: slow). We coupled this two-step reasoning framework with a model predictive control (MPC) framework that plans multiple fallbacks and takes into account the latency of the slow reasoner.”
Through these two steps, the framework allows a robot to quickly detect an anomaly and slow down its actions, so that an LLM model can reason about what can be done to mitigate failures. The adaptive plan proposed by the LLM is then executed by the robot.
Sinha and colleagues evaluated their proposed framework in a series of benchmarks and found that it could improve anomaly detection and reactive planning in autonomous robotic systems. Notably, their approach outperformed other methods that rely solely on generative reasoning in LLMs.
“Interestingly, we found that smaller models (e.g., MPNet with 110 million parameters) can detect anomalies just as well as larger models (e.g., Mistral 7B),” Sinha said. “Embedding-based anomaly detectors are really good at detecting when observations differ from previous experiments, whereas zero-shot chain-of-thought reasoning with large models is really needed to determine the safety criticality of an OOD scenario and the appropriate fallback.”
Overall, the recent work of this team of researchers suggests that deploying fast and slow reasoning can improve the performance and practicality of using LLMs for anomaly detection and robotic planning tasks. In the future, their framework could facilitate the use of LLMs to improve the robustness of robots, potentially contributing to the improvement of various autonomous robotic systems.
“Our fast reasoners run in real-time, approximately 360 times faster than GPT-4 querying, while slow reasoning with GPT-4 achieved the highest accuracy in determining security risks from nuanced anomalies in our assessments,” Sinha added.
“We now plan to further develop this framework. For example, we plan to use continuous learning based on delayed anomaly evaluation of the generative reasoner to avoid triggering the slow reasoner a second time on non-safety-critical anomalies.”
More information:
Rohan Sinha et al, Real-time anomaly detection and reactive planning with large language models, arXiv (2024). DOI: 10.48550/arxiv.2407.08735
arXiv
© 2024 Science X Network
Quote:A two-stage framework for improving LLM-based anomaly detection and reactive planning (2024, August 15) retrieved August 15, 2024 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.