OpenMythos

Python ★ 14k updated 28d ago

A theoretical reconstruction of the Claude Mythos architecture, built from first principles using the available research literature.

OpenMythos is a Python library that implements a Recurrent-Depth Transformer, a theoretical AI architecture inspired by speculation about how Claude (Anthropic's model) might work internally, built from published research papers.

PythonPyTorchsetup: hardcomplexity 4/5

OpenMythos is a Python library that implements a theoretical guess at how the Claude AI model (made by Anthropic) might be built internally. The author starts from a hypothesis that Claude uses a specific architecture called a Recurrent-Depth Transformer, then builds a working version of that architecture from scratch using publicly available research papers. The project is explicitly marked as independent and not affiliated with Anthropic.

The central idea of a Recurrent-Depth Transformer is that instead of stacking hundreds of unique layers once, a smaller set of layers is run repeatedly in a loop. Each pass through the loop updates an internal state, and the original input signal is re-injected at every step to keep the model from losing track of what it was asked. This looped processing happens entirely inside a single forward pass, with no intermediate text outputs, meaning the model can do more "thinking" without generating any visible chain-of-thought tokens.

The library is installable via pip and provides pre-configured model sizes ranging from 1 billion to 1 trillion parameters. Each size preset specifies how many internal dimensions, expert modules, loop iterations, and context length the model uses. The attention mechanism can be switched between two styles: one that reduces memory by using fewer key-value heads, and one that compresses key-value representations using a low-rank factorization technique.

A training script for the 3 billion parameter variant is included, targeting a dataset called FineWeb-Edu. It supports both single-GPU and multi-GPU training, uses the AdamW optimizer, and trains in lower-precision floating point to reduce memory use. The documentation folder includes a full API reference and a guide on recommended training datasets.

This repository is a research and experimentation tool, not a finished product. It is useful for developers and researchers interested in exploring alternative transformer architectures inspired by speculation about frontier AI model internals.

Where it fits

Train a recurrent-depth transformer on a single GPU or multi-GPU setup using the included 3B parameter training script.
Experiment with looped-layer architectures as a research alternative to standard one-pass transformers.
Use pre-configured model presets from 1B to 1 trillion parameters to prototype experiments without writing architecture code.

Open on GitHub → Full breakdown on explaingit →