31-day longest streak
Henry Ndubuaku [![LinkedIn][linkedin-shield]][linkedin-url] [![Twitter][twitter-shield]][twitter-url] [![Email][gmail1-shield]][gmail1-url] [![Spotify][spotify-shield]][spotify-url] [gmail1-shield]: https://img.shields.io/badge/Gmail-555?style=for-the-badge&logo=gmail&logoColor=white [gmail1-url]: [email protected] [linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555 [linkedin-url]: https://linkedin.com/in/henry-ndubuaku-7b6350b8 [twitter-shield]: https://img.shields.io/badge/Twitter-555?style=for-the-badge&logo=twitter&logoColor=white [twitter-url]: https://twitter.com/Henry_Ndubuaku [spotify-shield]: https://img.shields.io/badge/Spotify-555?style=for-the-badge&logo=spotify&logoColor=white [spotify-url]: https://open.spotify.com/playlist/656vFNTyI2ZDsxgdQFaPHA?si=c2ff4aa84f6d42c4 I could train a 1B-A200m model on an…
Henry Ndubuaku
[![LinkedIn][linkedin-shield]][linkedin-url]
[![Twitter][twitter-shield]][twitter-url]
[![Email][gmail1-shield]][gmail1-url]
[![Spotify][spotify-shield]][spotify-url]
[gmail1-shield]: https://img.shields.io/badge/Gmail-555?style=for-the-badge&logo=gmail&logoColor=white
[gmail1-url]: [email protected]
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/henry-ndubuaku-7b6350b8
[twitter-shield]: https://img.shields.io/badge/Twitter-555?style=for-the-badge&logo=twitter&logoColor=white
[twitter-url]: https://twitter.com/Henry_Ndubuaku
[spotify-shield]: https://img.shields.io/badge/Spotify-555?style=for-the-badge&logo=spotify&logoColor=white
[spotify-url]: https://open.spotify.com/playlist/656vFNTyI2ZDsxgdQFaPHA?si=c2ff4aa84f6d42c4
I could train a 1B-A200m model on an iPhone 17 Pro at ~650 tokens/sec.
It will take 360 days on 20B tokens of training data and use 156KW of electricity which cost $51.
The phone will fry of course, so I wrote algorithms to run inference on your phone rather.
We named it after a plant that survives in resource-constrained environments, the Cactus.

can run similar model on your Grandma’s Pixel 6a at 80 tokens/second
while only draining 10% battery per hour of continuous inference and using 250MB RAM only.
Cactus runs Nvidia Parakeet 1B models on Raspberry Pi at over 17000 tokens/seconds with only 4% word-error-rate.
End-to-end function calling for Gemma, Qwen & LFM models take sub 1sec on mobile devices.
We raised some money from YCombinator, Oxford's Seed Fund, FCVC (portfolio include Slack, Coinbase, GitLab, Instacart etc.),
and 6 smaller funds like Transpose (run by Garry Tan's brother), fellow YC founders, as well as
62 tech CTOs/VP/Directors both via syndicate and directly at Google DeepMind etc.
Cactus now powers cool products you've probably heard of...I think.
6 exceptionally gifted "Cactus Jacks" from UCLA, Nokia, Google, Stanford, Oxford have joined us!
The project is also co-maintained by groups at UCLA, Yale, Upenn, Imperial, Georgia, NUS, UCI, CU Boulder
and UCI.
Same destination, just a different route!
Career
Core Expertise
Main Tools
2025-XX
Cactus (YC S25)
Founder & CTO
Low-latency AI for phones and wearables
2024-25
Deep Render
AI Research Engineer
Realtime video models for edge devices
2021-24
Wisdm
ML Software Engineer
Visual Perception for Maxar Defence satellite views
2019-21
NanoDL
Founder & CTO
JAX Library for training sub-4B foundation models
2018-19
Google GADS with Andela
Scholar
Large-scale distributed systems design
2017-18
National Youth Service
Software Engineer
Posted to SWE after bootcamp, mostly ARM
2016-17
Omdena
Machine Learning Engineer
Traditional NLP and computer vision.
2012-16
University (from 15y)
EECS, data structures, algorithms, maths, physics
Key Works
Cactus: Kernels, graph and AI engine for Tiny devices (5.3k stars)
Needle: Foundation model for Tiny devices (2.6k stars)
Maths, CS & AI Compendium (4.4k stars)
Parameter-Efficient Transformer Embedding via Functional Factorization (ICML 2026)
TurboQuant-H: Hadamard Rotation for 2-Bit Embedding Quantization
HiDRA: A Blazing Fast LM-Head Replacement (ICLR 2026)
Depth Over Specialization in Small Multimodal Transformers (ICLR 2026)
Just Enough Learning: GRPO-Guided Controllers for Hyperparameter Sweeps (ICLR 2026)
TACE: Token-Aware Chunked Encoding For Realtime Speech Models (ICLR 2026)
CLAWS: Calibration-Aware Activation Sparsity for Instruction-Tuned LLMs (ICML 2026)
Fun Facts
- After CUDARepo, Nvidia reached out, I did 7 technical rounds, got a verbal offer, back-and-forth over YOE/pay, then I got YC.
- Did MSc at QMUL, just to work with Prof Matt Purver (Ex-Stanford Researcher on CALO), did my project/thesis with his team.
- Did BEng under Prof Onyema Uzoamaka (Rumoured first Nigerian CS grad from MIT), he taught computing archs off-head!
- Biggest career miss was a PhD Studentship at Meta FAIR.
Personal Life
Profile
Nigerian-British, born Jan 1996, 185cm, 83kg
Language
English, Igbo, German (barely), French (yikes)
Hobbies
Calisthenics, UFC, chess, music, dance
Philosophy
Humanist Christian, unpolitical (your peace over my opinions)
Speaking at UCLA
The Cactus Jacks
Team dinner
Calisthenics 6/7 days
Cactus Jacks at YC
Cactus Pod at YC HQ
Music Profile
Expressive Rap
Alternative/Folk
Soul/Jazz
Oldies
Genre-Blending Urban
Dark Pop
Movie Profile
Animation
Drama
Comedy
-
cactus ★ PINNED ⑂
Kernels & AI inference engine for mobile devices.
★ 3 5mo agoExplain → -
maths-cs-ai-compendium ★ PINNED
Become a cracked AI/ML Research Engineer
TypeScript ★ 4.6k 6d agoExplain → -
nanodl ★ PINNED
JAX library for training sub-4B foundation models for edge
Python ★ 302 1y agoExplain → -
cuda-tutorials ★ PINNED
Comprehensive CUDA tutorials for Maths & ML with examples
Cuda ★ 233 1y agoExplain → -
super-lazy-autograd ★ PINNED
Hand-derived memory-efficient VJPs for tuning LLMs on laptops.
Python ★ 31 1y agoExplain → -
halo
A Library That Uses Quantized Diffusion Model With Clustered Weights For Efficiently Generating More Image Datasets On-Device.
Python ★ 14 3y agoExplain → -
tango
Decentralised ML engine to train on tiny edge devices.
Go ★ 12 1y agoExplain → -
pete
Parameter-efficient transformer embeddings replace learned embeddings with hardware-aware polynomial expansions of token IDs.
Python ★ 10 4mo agoExplain → -
vision-architectures
Comparative Analysis of SOTA Vision Architectures; VGG, GoogleNet, ResNet & Vision Transformers.
Jupyter Notebook ★ 7 3y agoExplain → -
HenryNdubuaku
No description.
CSS ★ 6 2d agoExplain → -
federated-learning-on-phones
Distributed machine learning on mobile phones
Python ★ 6 1y agoExplain → -
closegan
Conditional Latent-Optimised Sequence Generative Adversarial Network For Creative Text Generation.
Jupyter Notebook ★ 6 2y agoExplain → -
fbnet
Modelling The Brain's Response To Natural Scenes In The Bottleneck Space.
Python ★ 6 2y agoExplain → -
bayesian-cognitive-robot
A Bayesian Belief Network on ROS for robot facial expression.
Python ★ 5 3y agoExplain → -
autonomous-vehicle-fgpa
Progressive Language Enhancement Algorithm Using Masking Filling Transformers In A Markov Chain.
Python ★ 5 3y agoExplain → -
gaussian-moe
A Study of Gaussian Mixture Model In Three-Dimensional Phoneme Clustering.
Python ★ 4 3y agoExplain → -
simple-attention-networks
No description.
★ 0 8d agoExplain →
No repos match these filters.