lechmazur / step_game

Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance.

Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure

A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance. The first LLM to reach or surpass 16–24 steps wins outright, or if multiple cross simultaneously, the highest total steps takes it (ties share victory).

This setup goes beyond static Q&A by focusing on social reasoning—models must decide whether to cooperate, negotiate, or deceive. Each turn’s conversation is publicly visible, but final choices remain private, forcing collisions when strategic talk doesn’t match actual moves. By monitoring these dialogues and outcomes, we capture deeper dimensions of multi-agent interaction and see how advanced language models balance shared knowledge with hidden intentions to outmaneuver or cooperate:

Communication vs. Silence: Do…

Future

Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure

lechmazur / step_game

Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure

Top comments (0)