19-day current streakยท20-day longest streak
I built Cache-DiT , ffpa-attn , LeetCUDA , lite.ai.toolkit , xlite-dev , ... <!-- I built Awesome-LLM-Inference , torchlm , lihang-notes , ... --> ๐ค I contributed to FastDeploy ,โฆ
I built
Cache-DiT,
ffpa-attn,
LeetCUDA,
lite.ai.toolkit,
xlite-dev, ...
<!--
I built
Awesome-LLM-Inference,
torchlm,
lihang-notes, ...
-->
๐ค I contributed to
FastDeploy,
SGLang ,
vLLM ,
Diffusers , ...
<!-- ๐ค FFPA technical report:
I love open source, bro, and I think you do too. -->
-
lite.ai.toolkit โ PINNED โ
๐ A lite C++ toolkit: contains 100+ Awesome AI models, support MNN, NCNN, TNN, ONNXRuntime and TensorRT. ๐๐
C++ โ 32 1y agoExplain โ -
FastDeploy โ PINNED โ
โก๏ธAn Easy-to-use and Fast Deep Learning Model Deployment Toolkit
C++ โ 5 2y agoExplain โ -
sglang โ PINNED โ
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
Python โ 0 6mo agoExplain โ -
cache-dit โ PINNED โ
No description.
Python โ 0 4mo agoExplain โ -
CUDA-Learn-Notes โ
๐200+ Tensor/CUDA Cores Kernels, โก๏ธflash-attn-mma, โก๏ธhgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 ๐๐).
Cuda โ 83 1y agoExplain โ -
Awesome-LLM-Inference โ
๐A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. ๐๐
โ 16 1y agoExplain โ -
lite.ai.toolkit.demo
Demos for how to use the shared libs of Lite.AI.ToolKit๐๐๐. (https://github.com/DefTruth/lite.ai.toolkit)
C++ โ 9 4y agoExplain โ -
BlogLearning โ
่ชๅทฑ็ๅญฆไน ๅ็จ๏ผ้็นๅ ๆฌๅ็งๅฅฝ็ฉ็ๅพๅๅค็็ฎๆณใ่ฟๅจๆๆใๆบๅจๅญฆไน
โ 9 4y agoExplain โ -
flash-attention-minimal โ
Flash Attention in ~100 lines of CUDA (forward pass only)
โ 8 2y agoExplain โ -
cmake-cookbook โ
CMake Cookbook recipes.
โ 8 5y agoExplain โ -
nanodet โ
โกSuper fast and lightweight anchor-free object detection model. ๐ฅOnly 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone๐ฅ
Python โ 8 4y agoExplain โ -
YOLOP โ
You Only Look Once for Panopitic Driving Perception.๏ผhttps://arxiv.org/abs/2108.11250๏ผ
Python โ 8 4y agoExplain โ -
DefTruth
No description.
โ 6 11d agoExplain โ -
hgemm-mma โ
โก๏ธWrite HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peakโก๏ธ Performance.
Cuda โ 5 1y agoExplain โ -
awesome-AI-system โ
paper and its code for AI System
โ 5 2y agoExplain โ -
PaddleOCR โ
Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
โ 5 3y agoExplain โ -
triton โ
Development repository for the Triton language and compiler
C++ โ 4 1y agoExplain โ -
MGMatting โ
This repository includes the official project of Mask Guided (MG) Matting, presented in our paper: Mask Guided Matting via Progressive Refinement Network
Python โ 4 4y agoExplain โ -
RobustVideoMatting โ
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!
Python โ 4 4y agoExplain โ -
YOLOv6 โ
YOLOv6: a single-stage object detection framework dedicated to industrial applications.
Jupyter Notebook โ 4 3y agoExplain โ -
ComputeLibrary โ
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
C++ โ 3 3y agoExplain โ -
torch-tensorrt โ
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
Python โ 3 2y agoExplain โ -
PTX-ISA-8.2-zh
๐ๆ็ปญๆดๆฐ๏ผCUDA 12.2 PTX-ISA-8.2ๅญฆไน ็ฌ่ฎฐ๏ผ้จๅไธญๆ็ฟป่ฏ + ไธชไบบ็่งฃ + ๅ ่ๆฑ็ผ็คบไพ๏ผ่ฎฒ่งฃCUDA 12.2 PTX-ISA-8.2 ๆฑ็ผๆไปค๏ผ่ฟ่กไธญ.....
โ 3 2y agoExplain โ -
pybind11-Chinese-docs โ
pybind11ไธญๆๆๆกฃ๏ผไธชไบบ็ฟป่ฏ๏ผ
โ 3 3y agoExplain โ -
Paddle-Lite โ
Multi-platform high performance deep learning inference engine (้ฃๆกจๅค็ซฏๅคๅนณๅฐ้ซๆง่ฝๆทฑๅบฆๅญฆไน ๆจ็ๅผๆ๏ผ
C++ โ 3 3y agoExplain โ -
kernel-pilot โ
No description.
โ 2 1mo agoExplain โ -
TensorRT-LLM โ
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Python โ 2 5mo agoExplain โ -
vllm โ
A high-throughput and memory-efficient inference and serving engine for LLMs
Python โ 2 1y agoExplain โ -
TensorRT-Model-Optimizer โ
TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed on NVIDIA GPUs.
Python โ 2 1y agoExplain โ -
mediapipe โ
Cross-platform, customizable ML solutions for live and streaming media.
โ 2 4y agoExplain โ -
Yolo-FastestV2 โ
:zap: Based on Yolo's low-power, ultra-lightweight universal target detection algorithm, the parameter is only 250k, and the speed of the smart phone mobile terminal can reach ~300fps+
โ 2 4y agoExplain โ -
TensorRT_Tutorial โ
No description.
โ 2 3y agoExplain โ -
X2Paddle โ
Deep learning model converter for PaddlePaddle. (ใ้ฃๆกจใๆทฑๅบฆๅญฆไน ๆจกๅ่ฝฌๆขๅทฅๅ ท)
โ 2 4y agoExplain โ -
vscode-pdfviewer โ
Show PDF preview in VSCode.
โ 2 3y agoExplain โ -
XNNPACK โ
High-efficiency floating-point neural network inference operators for mobile, server, and Web
โ 2 3y agoExplain โ -
PL-Compiler-Resource โ
็จๅบ่ฏญ่จไธ็ผ่ฏๆๆฏ็ธๅ ณ่ตๆ๏ผๆ็ปญๆดๆฐไธญ๏ผ
โ 2 3y agoExplain โ -
simde โ
Implementations of SIMD instruction sets for systems which don't natively support them.
โ 2 3y agoExplain โ -
FlyCV โ
No description.
C++ โ 2 3y agoExplain โ -
ffpa-attn-mma โ
๐FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2xโ๐vs SDPA EA.
Cuda โ 1 1y agoExplain โ -
DefTruth.github.io
My Personal GitHub Pages generated by Github Copilot.
JavaScript โ 1 2mo agoExplain โ -
ai-infra-skills โ
No description.
โ 1 2mo agoExplain โ -
ptx-isa-markdown โ
PTX ISA 9.1 documentation converted to searchable markdown. Includes Claude Code skill for CUDA development.
โ 1 5mo agoExplain โ -
ChatGLM2-6B โ
ChatGLM2-6B: An Open Bilingual Chat LLM | ๅผๆบๅ่ฏญๅฏน่ฏ่ฏญ่จๆจกๅ
โ 1 2y agoExplain โ -
ChatGLM-6B โ
ChatGLM-6B: An Open Bilingual Dialogue Language Model | ๅผๆบๅ่ฏญๅฏน่ฏ่ฏญ่จๆจกๅ
โ 1 2y agoExplain โ -
ByteTrack โ
ByteTrack: Multi-Object Tracking by Associating Every Detection Box
โ 1 4y agoExplain โ -
MInference โ
[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
โ 1 1y agoExplain โ -
llm-compressor โ
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
โ 1 1y agoExplain โ -
FlashMLA โ
FlashMLA: Efficient MLA Decoding Kernel for Hopper GPUs
โ 1 1y agoExplain โ -
MHA2MLA โ
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
โ 1 1y agoExplain โ -
Cutlass_EX โ
study of cutlass
โ 1 2y agoExplain โ -
InternVL โ
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. ๆฅ่ฟGPT-4o่กจ็ฐ็ๅผๆบๅคๆจกๆๅฏน่ฏๆจกๅ
โ 1 1y agoExplain โ -
cutlass โ
CUDA Templates for Linear Algebra Subroutines
C++ โ 1 1y agoExplain โ -
flash-attention โ
Fast and memory-efficient exact attention
Python โ 1 1y agoExplain โ -
cuda-mode-lectures โ
Material for cuda-mode lectures
โ 1 2y agoExplain โ -
llm-action โ
ๆฌ้กน็ฎๆจๅจๅไบซๅคงๆจกๅ็ธๅ ณๆๆฏๅ็ไปฅๅๅฎๆ็ป้ชใ
โ 1 1y agoExplain โ -
TransformerCompression โ
For releasing code related to compression methods for transformers, accompanying our publications
โ 1 2y agoExplain โ -
CLIP โ
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
โ 1 2y agoExplain โ -
PIDM โ
Person Image Synthesis via Denoising Diffusion Model (CVPR 2023)
โ 1 2y agoExplain โ -
CUDA-Programming โ
Sample codes for my CUDA programming book
โ 1 2y agoExplain โ -
How_to_optimize_in_GPU โ
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
โ 1 2y agoExplain โ -
stb โ
stb single-file public domain libraries for C/C++
โ 1 4y agoExplain โ -
subsampling-scale-image-view โ
Android library (AAR). Highly configurable, easily extendable deep zoom view for displaying huge images without loss of detail. Perfect for photo galleries, maps, building plans etc.
โ 1 4y agoExplain โ -
GFM โ
[IJCV 2022] Bridging Composite and Real: Towards End-to-end Deep Image Matting
โ 1 4y agoExplain โ -
DECA โ
DECA: Detailed Expression Capture and Animation (SIGGRAPH 2021)
โ 1 4y agoExplain โ -
armnn โ
Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
โ 1 3y agoExplain โ -
FBGEMM โ
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
โ 1 3y agoExplain โ -
Paddle โ
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice ๏ผใ้ฃๆกจใๆ ธๅฟๆกๆถ๏ผๆทฑๅบฆๅญฆไน &ๆบๅจๅญฆไน ้ซๆง่ฝๅๆบใๅๅธๅผ่ฎญ็ปๅ่ทจๅนณๅฐ้จ็ฝฒ๏ผ
C++ โ 1 2y agoExplain โ -
OpenBLAS โ
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
C โ 1 3y agoExplain โ -
PaddleNLP โ
๐ Easy-to-use and powerful NLP library with ๐ค Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including ๐Text Classification, ๐ Neural Search, โ Question Answering, โน๏ธ Information Extraction, ๐ Document Intelligence, ๐ Sentiment Analysis and ๐ผ Diffusion AIGC system etc.
Python โ 1 3y agoExplain โ -
CV-CUDA โ
CV-CUDAโข is an open-source, graphics processing unit (GPU)-accelerated library for cloud-scale image processing and computer vision.
C++ โ 1 3y agoExplain โ -
sse2neon โ
A translator from Intel SSE intrinsics to Arm/Aarch64 NEON implementation
โ 1 3y agoExplain โ -
Paddle2ONNX โ
ONNX Model Exporter for PaddlePaddle
Python โ 1 3y agoExplain โ -
opencv-mobile โ
The minimal opencv for Android, iOS, ARM Linux, Windows, Linux, MacOS, WebAssembly
C โ 1 3y agoExplain โ -
ClassificationForAndroid โ
ๅจAndroidไฝฟ็จๆทฑๅบฆๅญฆไน ๆจกๅๅฎ็ฐๅพๅ่ฏๅซ๏ผๆฌ้กน็ฎๆไพไบๅค็งไฝฟ็จๆนๅผ๏ผไฝฟ็จๅฐ็ๆกๆถๅฆไธ๏ผTensorflow LiteใPaddle LiteใMNNใTNN
โ 1 4y agoExplain โ -
insightface โ
State-of-the-art 2D and 3D Face Analysis Project
Python โ 1 4y agoExplain โ -
xmake โ
๐ฅ A cross-platform build utility based on Lua
โ 1 4y agoExplain โ -
yolov5-face โ
YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)
Python โ 1 4y agoExplain โ -
Audio2Face โ
http://www.facegood.cc
โ 1 4y agoExplain โ -
ncnn_Android_face โ
Android face detect and segmentation,facemesh by ncnn
โ 1 4y agoExplain โ -
FasterTransformer โ
Transformer related optimization, including BERT, GPT
C++ โ 1 2y agoExplain โ -
TransformerEngine โ
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Python โ 1 2y agoExplain โ -
ARMv9-ACLE-SVE2-zh
ไธญๆ็ฟป่ฏ + ้จๅไธชไบบ็่งฃ: ARMv9 SVE/SVE2 ๅ ่ๅฝๆฐ
โ 1 2y agoExplain โ -
pfld-ncnn โ
No description.
โ 1 5y agoExplain โ -
Awesome-Chinese-LLM โ
ๆด็ๅผๆบ็ไธญๆๅคง่ฏญ่จๆจกๅ๏ผไปฅ่งๆจก่พๅฐใๅฏ็งๆๅ้จ็ฝฒใ่ฎญ็ปๆๆฌ่พไฝ็ๆจกๅไธบไธป๏ผๅ ๆฌๅบๅบงๆจกๅ๏ผๅ็ด้ขๅๅพฎ่ฐๅๅบ็จ๏ผๆฐๆฎ้ไธๆ็จ็ญใ
โ 1 2y agoExplain โ -
mattematte
๐น mattematte: A C++ ToolKit for matting, segmentation, SR and colorization with MNN, ONNXRuntime and Vulkan.
โ 1 4y agoExplain โ -
MODNet โ
A Trimap-Free Solution for Portrait Matting in Real Time
Python โ 1 4y agoExplain โ -
dlib โ
A toolkit for making real world machine learning and data analysis applications in C++
โ 1 7y agoExplain โ -
Soft-NMS-for-Rotated-Rectangles โ
CPU implementation of Soft-NMS for rotated rectangles
Python โ 1 7y agoExplain โ -
python3-source-code-analysis โ
ใPython 3 ๆบ็ ๅๆใ
Makefile โ 1 7y agoExplain โ -
checkmate โ
A lightweight class for saving the best Tensorflow checkpoints.
Python โ 1 7y agoExplain โ -
Soft-NMS-1 โ
An implement of Soft NMS algorithm in Python
Python โ 1 7y agoExplain โ -
claude-code โ
An independent Python feature port of Claude Code, entirely rewritting from scratch using oh-my-codex. Educational Purpose only.
โ 0 2mo agoExplain โ -
onnx-simplifier โ
Simplify your onnx model
C++ โ 0 3y agoExplain โ -
HunyuanImage-3.0 โ
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
โ 0 8mo agoExplain โ -
Awesome-Diffusion-Acceleration-Cache โ
A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration techniques.
โ 0 8mo agoExplain โ -
Awesome-Video-Attention โ
A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.
โ 0 11mo agoExplain โ -
MeanFlow โ
Pytorch Implementation (unofficial) of the paper "Mean Flows for One-step Generative Modeling" by Geng et al.
โ 0 11mo agoExplain โ -
PIPNet โ
Efficient facial landmark detector
Python โ 0 4y agoExplain โ -
CogVideo โ
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Python โ 0 1y agoExplain โ -
SpargeAttn โ
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
โ 0 1y agoExplain โ -
chain-of-draft โ
Code and data for the Chain-of-Draft (CoD) paper
โ 0 1y agoExplain โ -
xDiT โ
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
Python โ 0 1y agoExplain โ -
unlock-deepseek โ
DeepSeek ็ณปๅๅทฅไฝ่งฃ่ฏปใๆฉๅฑๅๅค็ฐใ
โ 0 1y agoExplain โ -
ParaAttention โ
Context parallel attention that accelerates DiT model inference with dynamic caching
โ 0 1y agoExplain โ -
flash-linear-attention โ
Fast implementations of causal linear attention for autogressive language modeling (Pytorch)
Python โ 0 2y agoExplain โ -
lmdeploy โ
LMDeploy is a toolkit for compressing, deploying, and serving LLM
Python โ 0 1y agoExplain โ -
DeepFaceLive โ
Real-time face swap for PC streaming or video calls
โ 0 4y agoExplain โ -
photo2cartoon โ
ไบบๅๅก้ๅๆข็ดข้กน็ฎ (photo-to-cartoon translation project)
Python โ 0 4y agoExplain โ -
cuda_hgemm โ
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
โ 0 1y agoExplain โ -
cuda-tensorcore-hgemm โ
No description.
โ 0 3y agoExplain โ -
CainCamera โ
CainCamera is an Android Project to learn about development of beauty camera, image and short video
โ 0 4y agoExplain โ -
DeepFaceLab โ
DeepFaceLab is the leading software for creating deepfakes.
โ 0 6y agoExplain โ -
AutoFP8 โ
No description.
Python โ 0 2y agoExplain โ -
tensorrtllm_backend โ
The Triton TensorRT-LLM Backend
Python โ 0 2y agoExplain โ -
DeepSeek-V2 โ
No description.
โ 0 2y agoExplain โ -
LLM-Viewer โ
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
โ 0 2y agoExplain โ -
cute-gemm โ
No description.
โ 0 2y agoExplain โ -
DiT โ
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
โ 0 2y agoExplain โ -
TensorRT โ
NVIDIAยฎ TensorRTโข, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
C++ โ 0 1y agoExplain โ -
LookaheadDecoding โ
No description.
โ 0 2y agoExplain โ -
LLaVA โ
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Python โ 0 2y agoExplain โ -
DeepCache โ
DeepCache: Accelerating Diffusion Models for Free
โ 0 2y agoExplain โ -
stable-fast โ
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
โ 0 2y agoExplain โ -
BEVFormer_tensorrt โ
BEVFormer inference on TensorRT, including INT8 Quantization and Custom TensorRT Plugins (float/half/half2/int8).
โ 0 2y agoExplain โ -
xformers โ
Hackable and optimized Transformers building blocks, supporting a composable construction.
Python โ 0 2y agoExplain โ -
TinyLlama โ
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
โ 0 2y agoExplain โ -
mixtral-offloading โ
Run Mixtral-8x7B models in Colab or consumer desktops
โ 0 2y agoExplain โ -
awesome-distributed-systems โ
A curated list of awesome distributed systems books, papers, resources and shiny things.
โ 0 3y agoExplain โ -
Awesome-LLM โ
Awesome-LLM: a curated list of Large Language Model
โ 0 2y agoExplain โ -
LLMSys-PaperList โ
Large Language Model Systems Paper List
โ 0 2y agoExplain โ -
awesome-graph-self-supervised-learning โ
No description.
โ 0 4y agoExplain โ -
Awesome-LLMOps โ
An awesome & curated list of best LLMOps tools for developers
โ 0 2y agoExplain โ -
cutlass_fpA_intB_gemm โ
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
โ 0 2y agoExplain โ -
BentoML โ
Build Production-Grade AI Applications
โ 0 2y agoExplain โ -
awesome-python โ
A curated list of awesome Python frameworks, libraries, software and resources
โ 0 2y agoExplain โ -
lite3d.ai.toolkit
No description.
โ 0 4y agoExplain โ -
PaddleClas โ
A treasure chest for visual classification and recognition powered by PaddlePaddle
โ 0 3y agoExplain โ -
gemm-sve2 โ
row-major matmul optimization
โ 0 3y agoExplain โ -
tflite-micro โ
Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
C++ โ 0 3y agoExplain โ -
PaddleSeg โ
Easy-to-use image segmentation library with awesome pre-trained model zoo, supporting wide-range of practical tasks in Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, 3D Segmentation, etc.
Python โ 0 3y agoExplain โ -
PaddleDetection โ
Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
โ 0 3y agoExplain โ -
onnxruntime โ
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
C++ โ 0 3y agoExplain โ -
wmma_extension โ
An extension library of WMMA API (Tensor Core API)
Cuda โ 0 2y agoExplain โ -
smoothquant โ
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
โ 0 3y agoExplain โ -
pfld_106_face_landmarks โ
106็นไบบ่ธๅ ณ้ฎ็นๆฃๆต็PFLD็ฎๆณๅฎ็ฐ
โ 0 5y agoExplain โ -
Llama-2-Onnx โ
No description.
โ 0 2y agoExplain โ -
CUDA-Programs โ
Examples from Programming in Parallel with CUDA
โ 0 3y agoExplain โ -
cuda-samples โ
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
โ 0 3y agoExplain โ -
onnxruntime-android-libs
Some prebuilt libs of onnxruntime(1.7.0~1.10.0) for Android.
โ 0 4y agoExplain โ -
triton-python-backend โ
Triton backend that enables pre-process, post-processing and other logic to be implemented in Python.
โ 0 3y agoExplain โ -
triton-server-backend โ
Common source, scripts and utilities for creating Triton backends.
โ 0 3y agoExplain โ -
fastdeploy_ci โ
No description.
Python โ 0 3y agoExplain โ -
optimizer โ
Actively maintained ONNX Optimizer
โ 0 3y agoExplain โ -
netron โ
Visualizer for neural network, deep learning, and machine learning models
JavaScript โ 0 3y agoExplain โ -
node-python-bridge โ
Node.js to Python bridge :sparkles::snake::rocket::sparkles:
โ 0 3y agoExplain โ -
rvv-intrinsic-doc โ
No description.
โ 0 3y agoExplain โ -
taichi โ
Productive & portable high-performance programming in Python.
C++ โ 0 3y agoExplain โ -
tiny-cuda-nn โ
Lightning fast C++/CUDA neural network framework
โ 0 3y agoExplain โ -
MNN โ
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
C++ โ 0 3y agoExplain โ -
gemmlowp โ
Low-precision matrix multiplication
โ 0 3y agoExplain โ -
how-to-optimize-gemm โ
No description.
โ 0 4y agoExplain โ -
Image-processing-algorithm-Speed โ
opencv
โ 0 5y agoExplain โ -
ARM_NEON_2_x86_SSE โ
The platform independent header allowing to compile any C/C++ code containing ARM NEON intrinsic functions for x86 target systems using SIMD up to SSE4 intrinsic functions
โ 0 3y agoExplain โ -
jni-bind โ
JNI Bind is a set of advanced syntactic sugar for writing efficient correct JNI Code in C++17 (and up).
โ 0 3y agoExplain โ -
Paddle-Lite-Demo โ
lib, demo, model, data
โ 0 3y agoExplain โ -
ArmNeonOptimization โ
Arm neon optimization practice
โ 0 5y agoExplain โ -
onnxruntime-extensions โ
The pre- and post- processing library for ONNX Runtime
โ 0 3y agoExplain โ -
Pytorch_Retinaface โ
Retinaface get 80.99% in widerface hard val using mobilenet0.25.
โ 0 3y agoExplain โ -
YOLOU โ
YOLOv3ใYOLOv4ใYOLOv5ใYOLOv5-LiteใYOLOv6ใYOLOv7ใYOLOXใYOLOX-LiteใTensorRTใNCNNใTengineใOpenVINO
โ 0 3y agoExplain โ -
PINTO_model_zoo โ
A repository for storing models that have been inter-converted between various frameworks. Supported frameworks are TensorFlow, PyTorch, ONNX, OpenVINO, TFJS, TFTRT, TensorFlowLite (Float32/16/INT8), EdgeTPU, CoreML.
Python โ 0 4y agoExplain โ -
FastestDet โ
:zap: A newly designed ultra lightweight anchor free target detection algorithm๏ผ weight only 250K parameters๏ผ reduces the time consumption by 30% compared with yolo-fastest, and the post-processing is simpler
โ 0 4y agoExplain โ -
pytorch3d โ
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
โ 0 4y agoExplain โ -
yolov5 โ
YOLOv5 in PyTorch > ONNX > CoreML > TFLite
Python โ 0 3y agoExplain โ -
SynergyNet โ
3DV 2021: Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry
โ 0 4y agoExplain โ -
SLPT-master โ
No description.
โ 0 4y agoExplain โ -
CPEM โ
PyTorch implementation of "Towards Accurate Facial Motion Retargeting with Identity-Consistent and Expression-Exclusive Constraints" (AAAI2022)
โ 0 4y agoExplain โ -
yolact โ
A simple, fully convolutional model for real-time instance segmentation.
โ 0 4y agoExplain โ -
Background-Matting โ
Background Matting: The World is Your Green Screen
โ 0 4y agoExplain โ -
GoogleMediapipePackageDll โ
package google mediapipe hand and holistic tracking into a dynamic link library
โ 0 4y agoExplain โ -
FBA_Matting โ
Official repository for the paper F, B, Alpha Matting
โ 0 4y agoExplain โ -
AIM โ
[IJCAI'21] Deep Automatic Natural Image Matting
โ 0 4y agoExplain โ -
P3M โ
[ACM MM 2021] Privacy-Preserving Portrait Matting
โ 0 4y agoExplain โ -
SIM โ
Official repository of Semantic Image Matting
โ 0 4y agoExplain โ -
landmark-detection โ
Four landmark detection algorithms, implemented in PyTorch.
โ 0 5y agoExplain โ -
kalidoface โ
Become a virtual character with just your webcam!
โ 0 4y agoExplain โ -
kalidoface-3d โ
Face and Body Tracking for VRM 3D models on the web.
โ 0 4y agoExplain โ -
ue4-mediapipe-plugin โ
UE4 MediaPipe plugin
โ 0 4y agoExplain โ -
kalidokit โ
Blendshape and kinematics calculator for Mediapipe/Tensorflow.js Face, Eyes, Pose, and Finger tracking models.
โ 0 4y agoExplain โ -
OnnxRuntimeAndorid โ
No description.
โ 0 5y agoExplain โ -
OcrLiteAndroidOnnx โ
chineseocr lite android onnx ๏ผ่ถ ่ฝป้็บงไธญๆocr android demo๏ผๆฏๆ็ซๆๆๅญ่ฏๅซ, ๆฏๆonnxๆจ็(psenet+anglenet+crnn)
โ 0 4y agoExplain โ -
example_based_facial_rigging_ARkit_blendshapes โ
No description.
โ 0 4y agoExplain โ -
deformation_transfer_ARkit_blendshapes โ
Implementation of the deformation transfer paper and its application in generating all the ARkit facial blend shapes for any 3D face
โ 0 4y agoExplain โ -
TD-FaceBlendshapes โ
Use ARkit application to drive face-blendshapes in TouchDesigner.
โ 0 5y agoExplain โ -
CNN โ
C++ inplementation of CNN(Cnnvolutional Neural Network) for image classification
โ 0 4y agoExplain โ -
RapidOCR โ
A cross platform OCR Library based on PaddleOCR & OnnxRuntime
โ 0 4y agoExplain โ -
quarrying-plant-id โ
ๆค็ฉ่ฏๅซ (Plant Recognition)
โ 0 4y agoExplain โ -
Deep3DFaceRecon_pytorch โ
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.
โ 0 4y agoExplain โ -
quarrying-insect-id โ
ๆ่ซ่ฏๅซ (Insect Recognition)
โ 0 4y agoExplain โ -
AvatarMe โ
Public repository for the CVPR 2020 paper AvatarMe and the TPAMI 2021 AvatarMe++
โ 0 4y agoExplain โ -
yolov5-rt-stack โ
yolort is a runtime stack for yolov5 on specialized accelerators such as libtorch, onnxruntime, tvm, tensorrt and ncnn.
โ 0 4y agoExplain โ
No repos match these filters.