qwen35-mtp-mlx

Python ★ 0 updated 2mo ago

Qwen3.5 9B/27B native MTP speculative decoding reproduction on MLX (Apple Silicon). 1.3x speed up. Blog: https://blog.web-of-anion.top/archives/qwen-3-5-mtp-mlx

No plain-English explanation yet — one is being written right now. Check back in a minute.

Open on GitHub → Full breakdown on explaingit →