Members
-
Mooncake ★ PINNED
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
C++ ★ 5.6k 1h agoExplain → -
ktransformers ★ PINNED
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
Python ★ 17k 2d agoExplain → -
TrEnv-X ★ PINNED
No description.
Go ★ 87 9mo agoExplain → -
vllm ⑂
A high-throughput and memory-efficient inference and serving engine for LLMs
Python ★ 15 4d agoExplain → -
kvcache-blog
No description.
JavaScript ★ 12 23h agoExplain → -
sglang ⑂
SGLang is a fast serving framework for large language models and vision language models.
Python ★ 11 4d agoExplain → -
custom_flashinfer ⑂
FlashInfer: Kernel Library for LLM Serving
Cuda ★ 7 11mo agoExplain → -
DeepEP_fault_tolerance ⑂
DeepEP: an efficient expert-parallel communication library that supports fault tolerance
Cuda ★ 3 5mo agoExplain → -
sglang_awq ⑂
SGLang is a fast serving framework for large language models and vision language models.
Python ★ 2 2mo agoExplain → -
accelerate ⑂
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
★ 1 1mo agoExplain → -
Model-Optimizer ⑂
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
★ 0 3d agoExplain → -
evalscope ⑂
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
Python ★ 0 2mo agoExplain → -
transformers ⑂
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
★ 0 1mo agoExplain → -
gpustack ⑂
GPU cluster manager for optimized AI model deployment
★ 0 6mo agoExplain → -
sglang-npu ⑂
SGLang is a fast serving framework for large language models and vision language models.
★ 0 10mo agoExplain →
No repos match these filters.