RNGBench

Python ★ 38 updated 1d ago

An official Implementation of "Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games"

No plain-English explanation yet — one is being written right now. Check back in a minute.