gitmyhub

tau2-bench-verified

Python ★ 0 updated 2mo ago ⑂ fork

τ²-Bench-Verified is a corrected and verified version of the original τ²-bench benchmark. This release addresses issues discovered in the original dataset where task definitions, expected actions, and evaluation criteria did not properly align with the stated policies or database contents.

No plain-English explanation yet — one is being written right now. Check back in a minute.