Categories
Nevin Manimala Statistics

Testing the limits of large language models in debating humans

Sci Rep. 2025 Apr 22;15(1):13852. doi: 10.1038/s41598-025-98378-1.

ABSTRACT

Large Language Models (LLMs) have shown remarkable promise in communicating with humans. Their potential use as artificial partners with humans in sociological experiments involving conversation is an exciting prospect. But how viable is it? Here, we rigorously test the limits of agents that debate using LLMs in a preregistered study that runs multiple debate-based opinion consensus games. Each game starts with six humans, six agents, or three humans and three agents. We found that agents can blend in and concentrate on a debate’s topic better than humans, improving the productivity of all players. Yet, humans perceive agents as less convincing and confident than other humans, and several behavioral metrics of humans and agents we collected deviate measurably from each other. We observed that agents are already decent debaters, but their behavior generates a pattern distinctly different from the human-generated data.

PMID:40263531 | DOI:10.1038/s41598-025-98378-1

By Nevin Manimala

Portfolio Website for Nevin Manimala