sos-bench
★ 1
updated 1y ago
⑂ fork
This codebase stores the complete artifacts and describes how to reproduce or extend the results from the paper "Style over Substance: Failure modes of LLM judges in alignment benchmarking", including the MisMo-Bench meta-benchmark.
No plain-English explanation yet — one is being written right now. Check back in a minute.