gitmyhub

CUDA-Learn-Notes

Cuda ★ 83 updated 1y ago ⑂ fork

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

No plain-English explanation yet — one is being written right now. Check back in a minute.