CPM.cu
Cuda
★ 240
updated 5mo ago
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge techniques in sparse architecture, speculative sampling and quantization.
No plain-English explanation yet — one is being written right now. Check back in a minute.