simpleRL-reason
★ 1
updated 1y ago
⑂ fork
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
No plain-English explanation yet — one is being written right now. Check back in a minute.