User-friendly system makes it easier to verify AI model responses

Compare a standard LLM-generated description (A) with a SymGen description (B, ours) of a basketball game, based on game statistics. SymGen imbues generated text extents (highlighted in blue) with symbolic references to the source data, allowing easier verification: for example, when hovering over an extent, the number “30” displays a tooltip and a link (highlighted in yellow) indicating the value it refers to. Credit: arXiv (2023). DOI: 10.48550/arxiv.2311.09188

Despite their impressive capabilities, large language models are far from perfect. These artificial intelligence models sometimes “hallucinate” by generating incorrect or unsupported information in response to a query.

Because of this hallucination problem, an LLM’s answers are often verified by human fact-checkers, especially if a model is deployed in a high-stakes context like healthcare or finance. However, validation processes typically require users to read lengthy documents cited by the model, a task so cumbersome and error-prone that it can prevent some users from deploying generative AI models in the first place.

To help human validators, MIT researchers created a user-friendly system that allows users to check an LLM’s answers much more quickly. With this tool, called SymGen, an LLM generates answers with citations that point directly to the location in a source document, such as a given cell in a database.

Users hover over the highlighted parts of its text response to see the data the model uses to generate that specific word or phrase. At the same time, the non-highlighted parts show users which sentences require additional attention to check and verify.

“We give users the ability to selectively focus on the parts of the text that concern them most. Ultimately, SymGen can give users greater confidence in a model’s answers because they can easily take a closer look to make sure the information is correct. verified,” says Shannon Shen, a graduate student in electrical engineering and computer science and co-senior author of a paper on SymGen, published on the arXiv preprint server.

Through a user study, Shen and colleagues found that SymGen accelerated verification time by about 20 percent compared to manual procedures. By allowing humans to validate model results more quickly and easily, SymGen could help users identify errors in LLMs deployed in a variety of real-world situations, from generating clinical notes to summarizing financial market reports.

Shen is joined on the paper by Lucas Torroba Hennigen, co-lead author and EECS graduate student; Aniruddha “Ani” Nrusimha, EECS graduate student; Bernhard Gapp, president of the Good Data Initiative; and lead authors David Sontag, EECS professor, MIT Jameel Clinic member, and leader of the Computer Science and Artificial Intelligence Laboratory (CSAIL) Clinical Machine Learning Group; and Yoon Kim, EECS assistant professor and CSAIL member. The research was recently presented at the Language Modeling Conference.

Symbolic references

To facilitate validation, many LLMs are designed to generate citations that point to external documents, along with their language answers so that users can verify them. However, these verification systems are usually designed after the fact, without taking into account the effort required for people to sift through numerous citations, Shen says.

“Generative AI aims to reduce the time it takes for the user to complete a task. If you have to spend hours reading all these documents to verify that the model says something reasonable, then it’s less useful to have generations in practice,” explains Shen.

Researchers approached the validation problem from the perspective of the humans who will do the work.

A SymGen user first provides the LLM with data that it can reference in its response, such as a table containing statistics from a basketball game. Then, rather than immediately asking the model to perform a task, such as generating a game summary from that data, researchers perform an intermediate step. They encourage the model to generate its response in a symbolic form.

With this prompt, whenever the model wants to cite words in its response, it must write the specific cell in the data table that contains the information it is referring to. For example, if the model wants to quote the phrase “Portland Trailblazers” in its response, it will replace that text with the name of the cell in the data table containing those words.

“Thanks to this intermediate step which presents the text in a symbolic format, we are able to have very fine-grained references. We can say that for each extent of text in the output, this is exactly where it corresponds in the data. to,” says Hennigen.

SymGen then resolves each reference using a rules-based tool that copies the corresponding text from the data table into the model response.

“This way we know it’s a verbatim copy, so we know there won’t be any errors in the part of the text that corresponds to the actual data variable,” adds Shen.

Streamlining validation

The model can create symbolic responses because of the way it is trained. Large language models are fed reams of data from the Internet, and some data is saved in a “placeholder format” where codes replace actual values.

When SymGen asks the model to generate a symbolic response, it uses a similar structure. “We design the prompt in a specific way to take advantage of the capabilities of the LLM,” adds Shen.

In a user study, the majority of participants said that SymGen made it easier to verify LLM-generated text. They could validate the model’s responses about 20% faster than if they used standard methods.

However, SymGen is limited by the quality of the source data. The LLM might cite an incorrect variable, and a human verifier might be none the wiser. Additionally, the user must have source data in a structured format, such as a table, to populate SymGen. For now, the system only works with tabular data.

In the future, researchers are improving SymGen so that it can handle arbitrary text and other forms of data. With this capability, it could help validate parts of AI-generated legal document summaries, for example. They also plan to test SymGen with doctors to study how it might identify errors in AI-generated clinical summaries.

More information:
Lucas Torroba Hennigen et al, Towards verifiable text generation with symbolic references, arXiv (2023). DOI: 10.48550/arxiv.2311.09188

Journal information:
arXiv

Provided by the Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news in MIT research, innovation and education.

Quote: User-friendly system makes it easier to verify an AI model’s answers (October 21, 2024) retrieved October 21, 2024 from

This document is subject to copyright. Except for fair use for private study or research purposes, no part may be reproduced without written permission. The content is provided for informational purposes only.

User-friendly system makes it easier to verify AI model responses

Walk or run in the rain? A physics-based approach to staying dry (or at least drier)

Ancient viral DNA in the human genome linked to multiple sclerosis and amyotrophic lateral sclerosis

Ancient viral DNA in the human genome linked to multiple sclerosis and amyotrophic lateral sclerosis

Leave a Reply Cancel reply

Category