RedAI Infra ORG

@redai-infra ·China

Building the infrastructure for large model training, inference, optimization, and serving — empowering creators and developers to harness AI at scale.

13 repos
41 followers
0 following

Python 100%

Members

All public repos (13)

Show forks Show archived

Relax ★ PINNED

An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale

Python ★ 429 1d ago
Explain →
PIPO ★ PINNED

Implementation of an efficient LLM architecture: the Pair-In / Pair-Out Model (PIPO)

A research project that speeds up AI text generation by compressing two input tokens into one and predicting an extra output token per step, cutting inference time without sacrificing accuracy.

Python ★ 31 9d ago
Explain →
HiSVD ★ PINNED

[ACL 2026] HiSVD: Principled Low-Rank Approximation of LLMs via Hierarchical Modeling of Information Capacity and Spectral Structure

Python ★ 2 3mo ago
Explain →
hint-tuning

Official code, data, and models for "Hint Tuning: Less Data Makes Better Reasoners"

A research project that trains AI reasoning models more efficiently by giving each problem only as much step-by-step reasoning as it actually needs, short answers for easy problems, long chains for hard ones.

Python ★ 22 16d ago
Explain →
slime ⑂

slime is an LLM post-training framework for RL Scaling.

Python ★ 0 10d ago
Explain →
torch_memory_saver ⑂

Allow torch tensor memory to be released and resumed later

Python ★ 0 1mo ago
Explain →
sglang ⑂

SGLang is a high-performance serving framework for large language models and multimodal models.

Python ★ 0 1mo ago
Explain →
Megatron-Bridge ⑂

Training library for Megatron-based models with bidirectional Hugging Face conversion capability

Python ★ 0 29d ago
Explain →
.github

Xiaohongshu AI Platform Team

★ 0 2mo ago
Explain →
TransferQueue ⑂

This is the official **live** mirror of https://gitcode.com/Ascend/TransferQueue. Feel free to contribute!

Python ★ 0 8d ago
Explain →
Model-Optimizer ⑂

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

★ 0 3mo ago
Explain →
dynamo ⑂

A Datacenter Scale Distributed Inference Serving Framework

Rust ★ 0 3mo ago
Explain →
vllm ⑂

A high-throughput and memory-efficient inference and serving engine for LLMs

Python ★ 0 1mo ago
Explain →