gitmyhub

ReMoE

Python ★ 116 updated 1y ago

[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.

No plain-English explanation yet — one is being written right now. Check back in a minute.