Mixture-of-Transformers
Python
★ 248
updated 9mo ago
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.
No plain-English explanation yet — one is being written right now. Check back in a minute.