gitmyhub

TensorRT

C++ ★ 13k updated 17d ago

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

TensorRT is NVIDIA's toolkit for making AI models run faster on NVIDIA GPUs, it takes a trained model and compiles it into an optimized version with lower latency and less memory use at inference time.

C++PythonCUDAONNXCMakeDockersetup: hardcomplexity 5/5

TensorRT is NVIDIA's toolkit for running AI models as fast as possible on NVIDIA GPUs. When you train an AI model, the result is a large file describing a network of mathematical operations. TensorRT takes that file and optimizes it specifically for the GPU it will run on, producing a much faster version that consumes less memory and delivers lower latency than running the original model directly.

This repository contains the open-source portions of TensorRT, which is a subset of the full product. The open-source components include plugins (modular pieces of custom compute logic that extend what TensorRT can handle), an ONNX parser (ONNX is a standard file format for AI models, and the parser lets TensorRT read models saved in that format), and a collection of sample applications showing how to use the toolkit.

The easiest way to use TensorRT with Python is through a pip install, which handles everything automatically. Building from source is more involved and requires a compatible NVIDIA GPU, CUDA libraries, CMake, and several other system dependencies. The repository provides Docker container setups to make this process more consistent across machines.

TensorRT is widely used in production environments where inference speed matters, such as real-time video processing, autonomous vehicles, and serving large language models at scale. It supports models from frameworks like PyTorch and TensorFlow by first exporting them to the ONNX format and then compiling them with TensorRT.

A major new version, TensorRT 11.0, is planned for mid-2026 and will remove several older APIs while introducing a cleaner interface. The README notes specific older features that will be dropped and points to their replacements for developers who need to migrate existing code.

Where it fits