gitmyhub

CollabBench

JavaScript ★ 18 updated 22d ago

An ICML 2026 research benchmark that measures how well AI agents cooperate with human-like partners in multi-player games, testing across five player personality profiles derived from real player data.

JavaScriptPythonsetup: hardcomplexity 4/5

CollabBench is a research benchmark published at ICML 2026 that measures how well AI language model agents cooperate with human-like partners in multi-player games. The research addresses a gap in how AI models are typically evaluated: most tests look at whether a model can answer a question or complete a task alone, but real settings often require working alongside others who have different personalities and habits.

The benchmark uses two cooperative game environments as test beds. In these games, an AI agent must work together with another player (simulated with varying behavioral profiles) to complete shared goals. The researchers modeled five distinct player types the AI might encounter: an efficient collaborator, a hesitant laggard, an anxious doubter, a proactive leader, and an independent loner. Each profile was derived from recorded game behavior by real players.

The framework has three main components. The first is a system that generates realistic simulated player profiles from recorded game data. The second is a training setup that teaches the AI agent to adapt its communication and task-taking behavior based on who it is working with. The third is an evaluation pipeline that collects game session data and scores the AI's collaboration quality using another AI model as a judge.

The repository provides code for running the benchmark in both game environments, named CWAH-MultiPlayer and Cook-MultiPlayer, along with the training code for the collaborative agents and the judging system. Each component lives in its own subdirectory with its own setup instructions.

This is a research artifact that requires following subdirectory-level setup guides rather than a single quick-start command. It was developed by researchers at East China Normal University, Shanghai Innovation Institute, and Tencent.

Where it fits