Phil Wang

@lucidrains ·San Francisco ·lucidrains.github.io

Working with Attention. It's all we need

399 repos
61k followers
4 following

Python 98%
Jupyter Notebook 1%
Nim 1%
HTML 1%

4.6k contributions in the last year

125-day current streak·210-day longest streak

‹ swipe through months ›

Jun 2025

SMTWTFS123456789101112131415161718192021222324252627282930

Jul 2025

SMTWTFS12345678910111213141516171819202122232425262728293031

Aug 2025

SMTWTFS12345678910111213141516171819202122232425262728293031

Sep 2025

SMTWTFS123456789101112131415161718192021222324252627282930

Oct 2025

SMTWTFS12345678910111213141516171819202122232425262728293031

Nov 2025

SMTWTFS123456789101112131415161718192021222324252627282930

Dec 2025

SMTWTFS12345678910111213141516171819202122232425262728293031

Jan 2026

SMTWTFS12345678910111213141516171819202122232425262728293031

Feb 2026

SMTWTFS12345678910111213141516171819202122232425262728

Mar 2026

SMTWTFS12345678910111213141516171819202122232425262728293031

Apr 2026

SMTWTFS123456789101112131415161718192021222324252627282930

May 2026

SMTWTFS12345678910111213141516171819202122232425262728293031

Jun 2026

SMTWTFS123456789101112131415161718192021222324252627282930

Less More

All public repos (200)

Show forks Show archived Sort

vit-pytorch ★ PINNED

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

PyTorch implementation of Vision Transformer (ViT) for image classification, treating image patches as tokens and processing them through a Transformer encoder.

Python ★ 25k 5d ago
Explain →
alphafold3-pytorch ★ PINNED

Implementation of Alphafold 3 from Google Deepmind in Pytorch

Python ★ 1.7k 9mo ago
Explain →
imagen-pytorch ★ PINNED

Implementation of Imagen, Google's Text-to-Image Neural Network, in Pytorch

Python ★ 8.4k 1y ago
Explain →
x-transformers ★ PINNED

A concise but complete full-attention transformer with a set of promising experimental features from various papers

Python ★ 5.9k 16h ago
Explain →
vector-quantize-pytorch ★ PINNED

Vector (and Scalar) Quantization, in Pytorch

Python ★ 4.0k 14d ago
Explain →
transfusion-pytorch ★ PINNED

Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI

Python ★ 1.4k 4mo ago
Explain →
DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

Python ★ 11k 2y ago
Explain →
denoising-diffusion-pytorch

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python ★ 11k 4mo ago
Explain →
PaLM-rlhf-pytorch

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM

Python ★ 7.9k 21d ago
Explain →
DALLE-pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Python ★ 5.6k 2y ago
Explain →
deep-daze

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun

Python ★ 4.3k 4y ago
Explain →
stylegan2-pytorch

Simplest working implementation of Stylegan2, state of the art generative adversarial network, in Pytorch. Enabling everyone to experience disentanglement

Python ★ 3.8k 1y ago
Explain →
musiclm-pytorch

Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch

Python ★ 3.3k 2y ago
Explain →
audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Python ★ 2.6k 1y ago
Explain →
big-sleep

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN. Technique was originally created by https://twitter.com/advadnoun

Python ★ 2.6k 4y ago
Explain →
reformer-pytorch

Reformer, the efficient Transformer, in Pytorch

Python ★ 2.2k 3y ago
Explain →
lion-pytorch

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch

Python ★ 2.2k 1y ago
Explain →
toolformer-pytorch

Implementation of Toolformer, Language Models That Can Use Tools, by MetaAI

Python ★ 2.1k 1y ago
Explain →
make-a-video-pytorch

Implementation of Make-A-Video, new SOTA text to video generator from Meta AI, in Pytorch

Python ★ 2.0k 2y ago
Explain →
titans-pytorch

Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch

Python ★ 2.0k 13d ago
Explain →
gigagan-pytorch

Implementation of GigaGAN, new SOTA GAN out of Adobe. Culmination of nearly a decade of research into GANs

Python ★ 1.9k 1y ago
Explain →
byol-pytorch

Usable Implementation of "Bootstrap Your Own Latent" self-supervised learning, from Deepmind, in Pytorch

Python ★ 1.9k 1mo ago
Explain →
lightweight-gan

Implementation of 'lightweight' GAN, proposed in ICLR 2021, in Pytorch. High resolution image generations that can be trained within a day or two

Python ★ 1.7k 1y ago
Explain →
alphafold2

To eventually become an unofficial Pytorch implementation / replication of Alphafold2, as details of the architecture get released

Python ★ 1.6k 3y ago
Explain →
soundstorm-pytorch

Implementation of SoundStorm, Efficient Parallel Audio Generation from Google Deepmind, in Pytorch

Python ★ 1.5k 1y ago
Explain →
lambda-networks

Implementation of LambdaNetworks, a new approach to image recognition that reaches SOTA with less compute

Python ★ 1.5k 5y ago
Explain →
self-rewarding-lm-pytorch

Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI

Python ★ 1.4k 2y ago
Explain →
video-diffusion-pytorch

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch

Python ★ 1.4k 2y ago
Explain →
naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch

Python ★ 1.3k 2y ago
Explain →
flamingo-pytorch

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Python ★ 1.3k 3y ago
Explain →
perceiver-pytorch

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch

Python ★ 1.2k 11d ago
Explain →
CoCa-pytorch

Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch

Python ★ 1.2k 2y ago
Explain →
performer-pytorch

An implementation of Performer, a linear attention-based transformer, in Pytorch

Python ★ 1.2k 4y ago
Explain →
tab-transformer-pytorch

Implementation of TabTransformer, attention network for tabular data, in Pytorch

Python ★ 1.1k 5mo ago
Explain →
mlp-mixer-pytorch

An All-MLP solution for Vision, from Google AI

Python ★ 1.1k 11mo ago
Explain →
muse-maskgit-pytorch

Implementation of Muse: Text-to-Image Generation via Masked Generative Transformers, in Pytorch

Python ★ 919 2y ago
Explain →
RETRO-pytorch

Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

Python ★ 877 2y ago
Explain →
mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

Python ★ 862 2y ago
Explain →
meshgpt-pytorch

Implementation of MeshGPT, SOTA Mesh generation using Attention, in Pytorch

Python ★ 861 1y ago
Explain →
BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs

Python ★ 848 5d ago
Explain →
linear-attention-transformer

Transformer based on a variant of attention that is linear complexity in respect to sequence length

Python ★ 838 2y ago
Explain →
PaLM-pytorch

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways

Python ★ 824 3y ago
Explain →
rotary-embedding-torch

Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch

Python ★ 812 4mo ago
Explain →
native-sparse-attention-pytorch

Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper

Python ★ 808 10mo ago
Explain →
phenaki-pytorch

Implementation of Phenaki Video, which uses Mask GIT to produce text guided videos of up to 2 minutes in length, in Pytorch

Python ★ 792 1y ago
Explain →
TimeSformer-pytorch

Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification

Python ★ 730 4y ago
Explain →
x-clip

A concise but complete implementation of CLIP with various experimental improvements from recent papers

Python ★ 723 2y ago
Explain →
voicebox-pytorch

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Python ★ 691 1y ago
Explain →
bottleneck-transformer-pytorch

Implementation of Bottleneck Transformer in Pytorch

Python ★ 678 4y ago
Explain →
magvit2-pytorch

Implementation of MagViT2 Tokenizer in Pytorch

Python ★ 665 1y ago
Explain →
MEGABYTE-pytorch

Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch

Python ★ 656 1y ago
Explain →
ema-pytorch

A simple way to keep track of an Exponential Moving Average (EMA) version of your Pytorch model

Python ★ 654 6mo ago
Explain →
memorizing-transformers-pytorch

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Python ★ 645 2y ago
Explain →
point-transformer-pytorch

Implementation of the Point Transformer layer, in Pytorch

Python ★ 601 4y ago
Explain →
pi-zero-pytorch

Implementation of π₀, the robotic foundation model architecture proposed by Physical Intelligence

Python ★ 579 4mo ago
Explain →
enformer-pytorch

Implementation of Enformer, Deepmind's attention network for predicting gene expression, in Pytorch

Python ★ 566 11mo ago
Explain →
ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Python ★ 547 1y ago
Explain →
classifier-free-guidance-pytorch

Implementation of Classifier Free Guidance in Pytorch, with emphasis on text conditioning, and flexibility to include multiple text embedding models

Python ★ 543 1y ago
Explain →
mmdit

Implementation of a single layer of the MMDiT, proposed in Stable Diffusion 3, in Pytorch

Python ★ 541 5mo ago
Explain →
iTransformer

Unofficial implementation of iTransformer - SOTA Time Series Forecasting using Attention networks, out of Tsinghua / Ant group

Python ★ 535 1y ago
Explain →
egnn-pytorch

Implementation of E(n)-Equivariant Graph Neural Networks, in Pytorch

Python ★ 528 1y ago
Explain →
e2-tts-pytorch

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch

Python ★ 517 6mo ago
Explain →
siren-pytorch

Pytorch implementation of SIREN - Implicit Neural Representations with Periodic Activation Function

Python ★ 509 2y ago
Explain →
local-attention

An implementation of local windowed attention for language modeling

Python ★ 500 11mo ago
Explain →
slot-attention

Implementation of Slot Attention from GoogleAI

Python ★ 488 13d ago
Explain →
rectified-flow-pytorch

Implementation of rectified flow and some of its followup research / improvements in Pytorch

Python ★ 469 11d ago
Explain →
robotic-transformer-pytorch

Implementation of RT1 (Robotic Transformer) in Pytorch

Python ★ 453 1y ago
Explain →
conformer

Implementation of the convolutional module from the Conformer paper, for use in Transformers

Python ★ 437 3y ago
Explain →
autoregressive-diffusion-pytorch

Implementation of Autoregressive Diffusion in Pytorch

Python ★ 437 6mo ago
Explain →
g-mlp-pytorch

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

Python ★ 431 4y ago
Explain →
q-transformer

Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, out of Google Deepmind

Python ★ 408 1y ago
Explain →
axial-attention

Implementation of Axial attention - attending to multi-dimensional data efficiently

Python ★ 395 4y ago
Explain →
memory-efficient-attention-pytorch

Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"

Python ★ 392 2y ago
Explain →
st-moe-pytorch

Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch

Python ★ 385 2y ago
Explain →
segformer-pytorch

Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

Python ★ 372 3y ago
Explain →
deformable-attention

Implementation of Deformable Attention in Pytorch from the paper "Vision Transformer with Deformable Attention"

Python ★ 369 1y ago
Explain →
bit-diffusion

Implementation of Bit Diffusion, Hinton's group's attempt at discrete denoising diffusion, in Pytorch

Python ★ 357 2y ago
Explain →
soft-moe-pytorch

Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch

Python ★ 347 1y ago
Explain →
se3-transformer-pytorch

Implementation of SE3-Transformers for Equivariant Self-Attention, in Pytorch. This specific repository is geared towards integration with eventual Alphafold2 replication.

Python ★ 330 9mo ago
Explain →
minGRU-pytorch

Implementation of the proposed minGRU in Pytorch

Python ★ 324 6mo ago
Explain →
clinical-calculator-tooluse

Explorations into training LLMs to use clinical calculators from patient history, using open sourced models. Will start with Wells' Criteria

Python ★ 315 9mo ago
Explain →
speculative-decoding

Explorations into some recent techniques surrounding speculative decoding

Python ★ 307 1y ago
Explain →
transformer-in-transformer

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Python ★ 306 4y ago
Explain →
linformer

Implementation of Linformer for Pytorch

Python ★ 306 2y ago
Explain →
routing-transformer

Fully featured implementation of Routing Transformer

Python ★ 300 4y ago
Explain →
nGPT-pytorch

Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI

Python ★ 300 1y ago
Explain →
x-unet

Implementation of a U-net complete with efficient attention as well as the latest research findings

Python ★ 293 2y ago
Explain →
equiformer-pytorch

Implementation of the Equiformer, SE3/E3 equivariant attention network that reaches new SOTA, and adopted for use by EquiFold for protein folding

Python ★ 290 1y ago
Explain →
lumiere-pytorch

Implementation of Lumiere, SOTA text-to-video generation from Google Deepmind, in Pytorch

Python ★ 282 1y ago
Explain →
triton-transformer

Implementation of a Transformer, but completely in Triton

Python ★ 279 4y ago
Explain →
spear-tts-pytorch

Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch

Python ★ 278 2y ago
Explain →
sinkhorn-transformer

Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention

Python ★ 271 4y ago
Explain →
jax2torch

Use Jax functions in Pytorch

Python ★ 263 3y ago
Explain →
metnet3-pytorch

Implementation of MetNet-3, SOTA neural weather model out of Google Deepmind, in Pytorch

Python ★ 240 2y ago
Explain →
graph-transformer-pytorch

Implementation of Graph Transformer in Pytorch, for potential use in replicating Alphafold2

Python ★ 238 3y ago
Explain →
electra-pytorch

A simple and working implementation of Electra, the fastest way to pretrain language models from scratch, in Pytorch

Python ★ 237 3y ago
Explain →
CoLT5-attention

Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch

Python ★ 230 1y ago
Explain →
flash-attention-jax

Implementation of Flash Attention in Jax

Python ★ 228 2y ago
Explain →
simple-hierarchical-transformer

Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT

Python ★ 228 2mo ago
Explain →
block-recurrent-transformer-pytorch

Implementation of Block Recurrent Transformer - Pytorch

Python ★ 226 1y ago
Explain →
recurrent-interface-network-pytorch

Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in Pytorch

Python ★ 209 8d ago
Explain →
bidirectional-cross-attention

A simple cross attention that updates both the source and target in one step

Python ★ 197 10mo ago
Explain →
dreamer4

Implementation of Danijar's latest iteration for his Dreamer line of work

Python ★ 193 1h ago
Explain →
PaLM-jax

Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)

Python ★ 189 4y ago
Explain →
hyper-connections

Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public

Python ★ 186 1mo ago
Explain →
tiny-recursive-model

Unofficial implementation of Tiny Recursive Model (TRM), improvement to HRM from Sapient AI, by Alexia Jolicoeur-Martineau

Python ★ 184 5mo ago
Explain →
titok-pytorch

Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"

Python ★ 184 2y ago
Explain →
coconut-pytorch

Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch

Python ★ 183 1y ago
Explain →
CALM-pytorch

Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind

Python ★ 178 1y ago
Explain →
h-transformer-1d

Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning

Python ★ 167 2y ago
Explain →
protein-bert-pytorch

Implementation of ProteinBERT in Pytorch

Python ★ 165 4y ago
Explain →
compressive-transformer-pytorch

Pytorch implementation of Compressive Transformers, from Deepmind

Python ★ 165 4y ago
Explain →
chroma-pytorch

Implementation of Chroma, generative models of protein using DDPM and GNNs, in Pytorch

Python ★ 159 3y ago
Explain →
genie2-pytorch

Implementation of a framework for Genie2 in Pytorch

Python ★ 157 1y ago
Explain →
improving-transformers-world-model-for-rl

Implementation of the new SOTA for model based RL, from the paper "Improving Transformer World Models for Data-Efficient RL", in Pytorch

Python ★ 155 1y ago
Explain →
ETSformer-pytorch

Implementation of ETSformer, state of the art time-series Transformer, in Pytorch

Python ★ 154 2y ago
Explain →
contrastive-learner

A simple to use pytorch wrapper for contrastive self-supervised learning on any neural network

Python ★ 153 5y ago
Explain →
nystrom-attention

Implementation of Nyström Self-attention, from the paper Nyströmformer

Python ★ 145 1y ago
Explain →
adam-atan2-pytorch

Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch

Python ★ 137 1mo ago
Explain →
diffusion-policy

Implementation of Diffusion Policy, Toyota Research's supposed breakthrough in leveraging DDPMs for learning policies for real-world Robotics

Python ★ 136 1y ago
Explain →
PEER-pytorch

Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind

Python ★ 136 7mo ago
Explain →
MIMO-pytorch

Pytorch implementation of MIMO, Controllable Character Video Synthesis with Spatial Decomposed Modeling, from Alibaba Intelligence Group

Python ★ 135 1y ago
Explain →
STAM-pytorch

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

Python ★ 133 5y ago
Explain →
gradnorm-pytorch

A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch

Python ★ 131 17d ago
Explain →
RQ-Transformer

Implementation of RQ Transformer, proposed in the paper "Autoregressive Image Generation using Residual Quantization"

Python ★ 127 4y ago
Explain →
ppo

An implementation of PPO in Pytorch

Python ★ 126 1mo ago
Explain →
TRI-LBM

Implementation of the Large Behavioral Model architecture for dexterous manipulation from Toyota Research Institute

Python ★ 120 9mo ago
Explain →
locoformer

LocoFormer - Generalist Locomotion via Long-Context Adaptation

Python ★ 115 21d ago
Explain →
mimic-video

Implementation of Mimic-Video, Video-Action Models for SOTA Generalizable Robot Control Beyond VLAs

Python ★ 113 19d ago
Explain →
evolutionary-policy-optimization

Pytorch implementation of Evolutionary Policy Optimization, from Wang et al. of the Robotics Institute at Carnegie Mellon University

Python ★ 110 1mo ago
Explain →
multimodal-dit-pytorch

Implementation of a multimodal diffusion transformer in Pytorch

★ 108 2y ago
Explain →
HS-TasNet

Implementation of HS-TasNet, "Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet"

Python ★ 106 1mo ago
Explain →
metacontroller

Implementation of the MetaController proposed in "Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning" from the Paradigms of Intelligence team at Google

Jupyter Notebook ★ 105 27d ago
Explain →
deep-cross-attention

Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch

Python ★ 105 2mo ago
Explain →
grokfast-pytorch

Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"

Python ★ 104 1y ago
Explain →
VN-transformer

A Transformer made of Rotation-equivariant Attention using Vector Neurons

Python ★ 102 2y ago
Explain →
gated-state-spaces-pytorch

Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch

Python ★ 101 3y ago
Explain →
alphagenome

Implementation of AlphaGenome, Deepmind's updated genomic attention model

Jupyter Notebook ★ 100 2mo ago
Explain →
lie-transformer-pytorch

Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch

Python ★ 98 5y ago
Explain →
perceiver-ar-pytorch

Implementation of Perceiver AR, Deepmind's new long-context attention network based on Perceiver architecture, in Pytorch

Python ★ 95 3y ago
Explain →
complex-valued-transformer

Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer Architecture"

Python ★ 92 2y ago
Explain →
TPDNE

Thispersondoesnotexist went down, so this time, while building it back up, I am going to open source all of it.

Python ★ 91 2y ago
Explain →
anymal-belief-state-encoder-decoder-pytorch

Implementation of the Belief State Encoder / Decoder in the new breakthrough robotics paper from ETH Zürich

Python ★ 85 1y ago
Explain →
ponder-transformer

Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper

Python ★ 83 4y ago
Explain →
n-grammer-pytorch

Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

Python ★ 81 3y ago
Explain →
h-net-dynamic-chunking

Implementation of the dynamic chunking mechanism in H-net by Hwang et al. of Carnegie Mellon

Python ★ 79 5d ago
Explain →
dreamcraft3d-pytorch

Implementation of Dreamcraft3D, 3D content generation in Pytorch

★ 79 2y ago
Explain →
hl-gauss-pytorch

The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch

Python ★ 78 2mo ago
Explain →
HRM

Exploration into the proposed architecture from Sapient Intelligence of Singapore 🇸🇬

Python ★ 76 10mo ago
Explain →
AMIE-pytorch

Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind

Python ★ 75 1y ago
Explain →
d4rt

Implementation of D4RT, Efficiently Reconstructing Dynamic Scenes, from Deepmind

Python ★ 72 11d ago
Explain →
transformer-directed-evolution

Explorations into whether a transformer with RL can direct a genetic algorithm to converge faster

Python ★ 71 17d ago
Explain →
memory-compressed-attention

Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"

Python ★ 71 3y ago
Explain →
isab-pytorch

An implementation of (Induced) Set Attention Block, from the Set Transformers paper

Python ★ 70 11d ago
Explain →
SAC-pytorch

Implementation of Soft Actor Critic and some of its improvements in Pytorch

Python ★ 69 15d ago
Explain →
PoPE-pytorch

Efficient implementation (and explorations) into polar coordinate positional embedding (PoPE) - from Gopalakrishnan et al. under Schmidhuber

Python ★ 65 2mo ago
Explain →
contrastive-rl

Contrastive Reinforcement Learning

Python ★ 65 2mo ago
Explain →
RL-100

Implementation of RL-100, Performant Robotic Manipulation with Real-World Reinforcement Learning

Python ★ 63 6mo ago
Explain →
gaia2-pytorch

Implementation of the world model architecture for self driving out of Wayve

Python ★ 62 11mo ago
Explain →
genetic-algorithm-pytorch

Toy genetic algorithm in Pytorch

Python ★ 57 1mo ago
Explain →
quartic-transformer

Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)

Python ★ 56 1y ago
Explain →
LVMAE-pytorch

Implementation of the proposed LVMAE, from the paper, Extending Video Masked Autoencoders to 128 frames, in Pytorch

Python ★ 55 1y ago
Explain →
hippoformer

Unofficial implementation of Hippoformer, Integrating Hippocampus-inspired Spatial Memory with Transformers

Python ★ 53 1mo ago
Explain →
x-evolution

Implementation of various evolutionary algorithms, starting with evolutionary strategies

Python ★ 51 1mo ago
Explain →
torch-einops-utils

Some utility functions to help myself (and perhaps others) go faster with ML/AI work

Python ★ 50 8d ago
Explain →
light-recurrent-unit-pytorch

Implementation of a Light Recurrent Unit in Pytorch

Python ★ 50 1y ago
Explain →
x-mlps-pytorch

Just a repository that will house some MLPs and their variants, so to avoid having to reimplement them again and again for different projects (especially RL)

Python ★ 50 1mo ago
Explain →
simplicial-attention

Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Roy et al. (2025)

Python ★ 49 9mo ago
Explain →
HoST-pytorch

Implementation of Humanoid Standing Up, from the paper "Learning Humanoid Standing-up Control across Diverse Postures" out of Shanghai, in Pytorch

Python ★ 46 1y ago
Explain →
kalmanformer

Implementation of Kalmanformer, modeling the Kalman gain with a transformer

Python ★ 43 17d ago
Explain →
neat

Explorations into NEAT and some of its derivative research

Nim ★ 39 16d ago
Explain →
discrete-distribution-network

Exploration into Discrete Distribution Network, by Lei Yang out of Beijing

Python ★ 37 5mo ago
Explain →
ITTR-pytorch

Implementation of the Hybrid Perception Block and Dual-Pruned Self-Attention block from the ITTR paper for Image to Image Translation using Transformers

Python ★ 35 4y ago
Explain →
streaming-deep-rl

Explorations into the proposed Streaming Deep Reinforcement Learning, from University of Alberta

Python ★ 33 1mo ago
Explain →
poly-attention

Implementation of Poly-attention, a higher-order self-attention proposed by Chakrabarti et al. of Columbia

Python ★ 32 4h ago
Explain →
fast-weight-attention

Implementation of Fast Weight Attention

Python ★ 32 16d ago
Explain →
sdft-pytorch

Explorations into the proposed SDFT, Self-Distillation Enables Continual Learning, from Shenfeld et al. of MIT

Python ★ 32 4mo ago
Explain →
SRT-H

Implementation of the model architecture for SRT-H

Python ★ 28 17d ago
Explain →
lookahead-keys-attention

Causal Attention with Lookahead Keys

Python ★ 27 8mo ago
Explain →
RIM-pytorch

Implementation of Recurrent Independent Mechanisms in Pytorch

Python ★ 27 2mo ago
Explain →
rigidformer

Implementation of RigidFormer, Learning Rigid Dynamics using Transformers

Python ★ 23 5d ago
Explain →
worldparticle

Implementation of WorldParticle, Unified World Simulation of Lagrangian Particle Dynamics via Transformer

Python ★ 22 3d ago
Explain →
disco-rl-pytorch

Implementation and explorations into DiscoRL, Discovering state-of-the-art reinforcement learning algorithms, David Silver's last work at Deepmind

Python ★ 19 6d ago
Explain →
discrete-continuous-embed-readout

Embedding and readout for simple multi-categorical and gaussian continuous

Python ★ 19 1mo ago
Explain →
HiLAM

Implementation of the Hierarchical Latent Action Model, proposed by Hanjung Kim et al. of Yonsei University

Python ★ 17 1mo ago
Explain →
value-network

Exploration into some new research surrounding value networks

Python ★ 16 4mo ago
Explain →
populora

Implementation and explorations into PopuLoRA, Co-Evolving LLM Populations for Reasoning Self-Play

Python ★ 15 16d ago
Explain →
multiscreen

Implementation of Multiscreen proposed by Ken Nakanishi for "Screening is Enough"

Python ★ 15 1mo ago
Explain →
lucidrains.github.io

No description.

HTML ★ 13 9mo ago
Explain →
ASAC

Implementation of Attention Schema-based Attention Control (ASAC), A Cognitive-Inspired Approach for Attention Management in Transformers

Python ★ 12 3d ago
Explain →
dmpo

Implementation and explorations into MPO / DMPO

Python ★ 9 14d ago
Explain →
pseudo-projector

Implementation of the pseudo projector proposed by Vitaly Bulgakov at Mass General Brigham

Python ★ 9 2mo ago
Explain →
memmap-replay-buffer

A simple numpy memmap replay buffer for RL and personal use-cases

Python ★ 8 27d ago
Explain →
jvp_flash_attention ⑂

Flash Attention Triton kernel with support for second-order derivatives

★ 8 9mo ago
Explain →
env-ssl-wrapper

Wrappers around environments that will take care of providing representations from self supervised learning automagically

Python ★ 8 2mo ago
Explain →
two-stage-dexterity-learning

Explorations into the proposal from the paper, Visual-tactile pretraining and online multitask learning for humanlike manipulation dexterity

★ 6 4mo ago
Explain →
RISE

Implementation of RISE

Python ★ 3 2mo ago
Explain →
triton ⑂

Development repository for the Triton language and compiler

C++ ★ 2 3y ago
Explain →
ds4 ⑂

DeepSeek 4 Flash local inference engine for Metal and CUDA

★ 1 23d ago
Explain →
Protenix ⑂

Toward High-Accuracy Open-Source Biomolecular Structure Prediction.

★ 1 4mo ago
Explain →

No repos match these filters.