t013-logical-fallacies
Python
★ 1
updated 2mo ago
T013: Reasoning Depth Assessment - Logical Fallacies. A psychometric benchmark for evaluating AI agent reasoning quality using Item Response Theory (IRT) principles.
No plain-English explanation yet — one is being written right now. Check back in a minute.