POLAR
Python
★ 167
updated 9mo ago
Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.
No plain-English explanation yet — one is being written right now. Check back in a minute.