gitmyhub

CPM.cu

Cuda ★ 240 updated 5mo ago

CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and quantization.

No plain-English explanation yet — one is being written right now. Check back in a minute.