omniserve
C++
★ 844
updated 1y ago
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
No plain-English explanation yet — one is being written right now. Check back in a minute.