6-day longest streak
-
Paddle ★ PINNED ⑂
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
C++ ★ 0 26d agoExplain → -
.emacs.d ★ PINNED
No description.
Emacs Lisp ★ 0 2y agoExplain → -
test_flashmask
No description.
Python ★ 3 1mo agoExplain → -
flash-attention ⑂
Fast and memory-efficient exact attention
★ 1 1mo agoExplain → -
auto-gpu-kernel ⑂
Winner 🏆 (Agent-only) MLSys 2026 - FlashInfer AI Kernel Generation Contest for the DeepSeek Sparse Attention (DSA) track with an average speedup of 34.93x
★ 0 1mo agoExplain → -
ncu-report-skill ⑂
No description.
★ 0 27d agoExplain → -
KernelWiki ⑂
No description.
★ 0 27d agoExplain → -
kernel-design-agents ⑂
No description.
★ 0 27d agoExplain → -
PaddleFleet ⑂
Core Functional Library for Distributed Training
★ 0 2d agoExplain → -
claude-code ⑂
An independent Python feature port of Claude Code, entirely rewritting from scratch using oh-my-codex. Educational Purpose only.
★ 0 2mo agoExplain → -
claude-code-analysis ⑂
Comprehensive reverse-engineering analysis of Claude Code's internal architecture, modules, and design patterns
★ 0 2mo agoExplain → -
PaddleFormers ⑂
PaddleFormers is an easy-to-use library of pre-trained large language model zoo based on PaddlePaddle.
★ 0 4d agoExplain → -
NeMo ⑂
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
★ 0 1y agoExplain → -
PaddleNLP ⑂
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
★ 0 9mo agoExplain → -
DocumentSASS ⑂
Unofficial description of the CUDA assembly (SASS) instruction sets.
★ 0 3y agoExplain → -
flux ⑂
A fast communication-overlapping library for tensor parallelism on GPUs.
★ 0 1y agoExplain → -
PaddleApiTest ⑂
No description.
★ 0 2y agoExplain → -
PaddleFlashattnTest ⑂
Additional tests of flash attention api in paddle
★ 0 1y agoExplain → -
ppl.llm.kernel.cuda ⑂
No description.
★ 0 2y agoExplain → -
DissectingTensorCores ⑂
No description.
★ 0 2y agoExplain → -
nv_isa_solver ⑂
No description.
★ 0 1y agoExplain → -
maxas ⑂
Assembler for NVIDIA Maxwell architecture
★ 0 3y agoExplain → -
NiuTrans.NMT ⑂
A Fast Neural Machine Translation System. It is developed in C++ and resorts to NiuTensor for fast tensor APIs.
★ 0 2y agoExplain → -
NiuTrans.ST ⑂
No description.
★ 0 2y agoExplain → -
st
No description.
C ★ 0 2y agoExplain → -
dwm
No description.
C ★ 0 2y agoExplain → -
emacs-abyss-theme ⑂
A dark theme for Emacs
★ 0 8y agoExplain → -
cudnn_test
No description.
Cuda ★ 0 2y agoExplain → -
umiswing.github.io
No description.
HTML ★ 0 2y agoExplain → -
draw.io
No description.
★ 0 3y agoExplain → -
emacs-catppuccin ⑂
🍄 Soothing pastel theme for Emacs
Emacs Lisp ★ 0 3y agoExplain → -
cutlassProfilerUsage
No description.
★ 0 3y agoExplain → -
YHs_Sample ⑂
Yinghan's Code Sample
★ 0 3y agoExplain → -
how-to-optimize-gemm ⑂
No description.
★ 0 4y agoExplain → -
How_to_optimize_in_GPU ⑂
This is a series of GPU optimization topics. Here we will introduce how to optimize the program on the GPU in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
★ 0 4y agoExplain →
No repos match these filters.