llm-evaluation-system

Python ★ 17 updated 1d ago

Agentic AI-guided evaluation system for comparing LLMs with multi-judge jury scoring

No plain-English explanation yet — one is being written right now. Check back in a minute.