tau2-bench-verified
Python
★ 0
updated 2mo ago
⑂ fork
τ²-Bench-Verified is a corrected and verified version of the original τ²-bench benchmark. This release addresses issues discovered in the original dataset where task definitions, expected actions, and evaluation criteria did not properly align with the stated policies or database contents.
No plain-English explanation yet — one is being written right now. Check back in a minute.