gitmyhub

Mixture-of-Depths

Python ★ 121 updated 1d ago

Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"

No plain-English explanation yet — one is being written right now. Check back in a minute.