DenseMixer
Python
★ 67
updated 10mo ago
Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient
No plain-English explanation yet — one is being written right now. Check back in a minute.