“Co-LLM” uses a general-purpose language model to begin answering a prompt, with a “switch variable” intervening on certain words to request a more precise response from the expert model. Credit: Alex Shipps/MIT CSAIL
Have you ever been asked a question to which you only knew part of the answer? To give a more informed answer, the best solution would be to call a friend who is more knowledgeable on the subject.
This collaborative process can also help large language models (LLMs) improve their accuracy. Yet it’s difficult to teach LLMs to recognize when they need to collaborate with another model to get an answer. Instead of using complex formulas or large amounts of labeled data to specify where models should work together, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) came up with a more organic approach.
Their new algorithm, called “Co-LLM,” can combine a general-purpose LLM base with a more specialized model and help them work together. As the former constructs an answer, Co-LLM examines each word (or token) in its answer to see where it can tap into a more accurate answer from the expert model. This process leads to more accurate answers to questions like medical questions and math and reasoning problems. Because the expert model is not needed for every iteration, it also leads to more efficient answer generation.
To determine when a base model needs help from an expert model, the framework uses machine learning to train a “switch variable,” or tool that can indicate the proficiency of each word in the two LLMs’ answers. The switch is like a project manager, finding areas where he or she needs to call on a specialist.
If you ask Co-LLM to name some examples of extinct bear species, for example, two models will work together to construct answers. The general-purpose LLM starts to construct an answer, with the switch variable stepping in where it can insert a better token from the expert model, such as adding the year the bear species went extinct.
“With Co-LLM, we’re essentially training a general-purpose LLM to ‘phone in’ to an expert model when needed,” says Shannon Shen, a PhD student in electrical and computer engineering at MIT and affiliated with CSAIL, who is a lead author of a new paper on the approach. The results are published on the site arXiv preprint server.
“We use domain-specific data to teach the base model about its counterpart’s expertise in domains such as biomedical tasks and math and reasoning questions. This process automatically finds parts of the data that the base model struggles to generate, and then asks the base model to pass it to the expert LLM, which has been pre-trained on data from a similar domain. The general-purpose model provides the “scaffolding” generation, and when it calls on the specialized LLM, it prompts the expert to generate the desired tokens. Our results indicate that LLMs learn collaboration patterns organically, in the same way that humans recognize when to call on an expert to fill in the gaps.”
A combination of flexibility and factuality
Imagine asking a general LLM to name the ingredients in a specific prescription drug. They might answer incorrectly, requiring the expertise of a specialized model.
To demonstrate Co-LLM’s flexibility, the researchers used data such as the BioASQ medical dataset to pair a baseline LLM with expert LLMs in different domains, such as the Meditron model, which is pre-trained on unlabeled medical data. This allowed the algorithm to help answer questions that a biomedical expert would typically receive, such as naming the mechanisms behind a particular disease.
For example, if you ask a simple LLM to name the ingredients of a specific prescription drug on its own, it might give you an incorrect answer. With the added expertise of a model that specializes in biomedical data, you will get a more accurate answer. Co-LLM also tells users where to check the answers.
Another example of the Co-LLM’s improved performance: when asked to solve a mathematical problem such as “a3 · A2 if a=5″, the general-purpose model incorrectly calculated that the answer was 125. As Co-LLM trained the model to collaborate further with a large mathematics LLM called Llemma, together they determined that the correct solution was 3,125.
Co-LLM yielded more accurate answers than simple, fine-tuned LLMs and unfine-tuned specialized models running independently. Co-LLM can guide two models that have been trained differently to work together, whereas other effective LLM collaboration approaches, such as Proxy Tuning, require all their component models to be trained similarly. Furthermore, this baseline requires that each model be used simultaneously to produce the answer, whereas the MIT algorithm simply activates its expert model for particular tokens, leading to more efficient generation.
When to seek expert advice
The MIT researchers’ algorithm shows that more closely mimicking human teamwork can increase the accuracy of collaboration between multiple LLMs. To further improve factual accuracy, the team could rely on human self-correction: They envision a more robust rollback approach that can backtrack when the expert model fails to provide a correct answer. This upgrade would allow Co-LLM to course-correct so that the algorithm can always provide a satisfactory answer.
The team also wants to update the expert model (by training only the base model) when new information becomes available, keeping the answers as current as possible. This would allow Co-LLM to combine the most recent information with strong reasoning power. Eventually, the model could help with enterprise document management, using the most recent information it has to update them accordingly. Co-LLM could also train small private models to work with a more powerful LLM to improve documents that need to remain on the server.
“The Co-LLM program presents an interesting approach to learning how to choose between two models to improve efficiency and performance,” says Colin Raffel, an associate professor at the University of Toronto and associate research director at the Vector Institute, who was not involved in the research.
“Because routing decisions are made at the token level, Co-LLM provides a granular way to defer difficult generation steps to a more powerful model. The unique combination of model- and token-level routing also provides a great deal of flexibility that similar methods lack. Co-LLM contributes to an important line of work that aims to develop specialized model ecosystems to outperform expensive monolithic AI systems.”
More information:
Shannon Zejiang Shen et al., Learning to decode collaboratively with multiple language models, arXiv (2024). DOI: 10.48550/arxiv.2403.03870
arXiv
Provided by the Massachusetts Institute of Technology
This article is republished with kind permission from MIT News (web.mit.edu/newsoffice/), a popular site covering the latest research, innovation, and teaching at MIT.
Quote:New Algorithm Improves LLM Collaboration for Smarter, More Efficient Solutions (2024, September 16) retrieved September 17, 2024 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.