Researchers boost AI performance by having multiple AIs “debate” each other

"Multiagent debate" boosts AI accuracy by having multiple AIs debate and refine answers through several rounds.

In a significant advance for artificial intelligence, researchers from MIT and Google DeepMind have developed a novel approach to enhance the reasoning abilities and factual accuracy of AI systems.

Their method, described in a new paper presented this week at the International Conference on Machine Learning in Vienna, involves orchestrating a “debate” between multiple instances of large language models – the type of AI systems that power chatbots like ChatGPT.

Multiagent debate

The technique, which the researchers call “multiagent debate,” aims to address key limitations in current AI systems, including inconsistent logical reasoning and the generation of false information.

Inspired by Marvin Minsky’s “Society of Mind” theory, the multiagent debate method involves multiple instances of a language model proposing and debating their individual responses to a given query.

Each model instance critiques and refines the answers of others, engaging in a collaborative process that leads to a more accurate and reliable final answer.

In the system tested in this research paper, several copies of an AI model are given the same task.

Each generates an initial answer independently, then reviews and critiques the responses of others, refining their own answers over multiple rounds.

Improved results

The results of this approach are found that for basic arithmetic tasks, the accuracy of solving problems increased from 67% for a single AI to 81.8% after debate between multiple AIs.

When tackling more complex word problems, performance improved from 77% for a single AI to 85% with the debate method.

The benefits extended beyond mathematical reasoning.

In chess, the quality of moves suggested by the AI improved significantly with the debate process.

This suggests that the back-and-forth between different instances of the AI is enhancing its strategic thinking capabilities.

Factual accuracy also saw significant improvements.

On a test designed to measure an AI’s general knowledge across a wide range of academic subjects, accuracy rose from 63.9% for a single AI to 71.1% with debate.

When generating biographies of famous computer scientists, for example, the debating AIs produced more accurate information than a single AI, with fewer invented “facts.”

“Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks,” the researchers write in their paper. “We also demonstrate that our approach improves the factual validity of generated content, reducing false answers and made-up information that contemporary models are prone to.”

The more debaters the better

The researchers found that increasing the number of AI “debaters” and the number of debate rounds generally led to better results.

They even experimented with having different types of AI models debate each other, such as Google’s Bard AI discussing with OpenAI’s ChatGPT.

While promising, the technique does have drawbacks.

Running multiple AI models and having them debate is more computationally expensive and time-consuming than using a single model.

The researchers suggest this issue could potentially be addressed by condensing the knowledge gained through debate back into a single, improved model.

As AI systems become increasingly prevalent in our daily lives, ensuring their reliability and truthfulness is paramount.

This new debate technique offers a novel approach to improving AI performance without changing the underlying models themselves.

It demonstrates that in the world of artificial intelligence, as in human endeavors, multiple perspectives can lead to more accurate and reliable outcomes.

The researchers believe their work could have far-reaching implications. “Our findings suggest that such ‘society of minds’ approach has the potential to significantly advance the capabilities of large language models and pave the way for further breakthroughs in language generation and understanding,” they conclude.

Reference:

Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., & Mordatch, I. (2024). Improving Factuality and Reasoning in Language Models through Multiagent Debate.

Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria;  July 21 – 27, 2024.