aler-distill

Python ★ 16 updated 23d ago

The official repo of ICML2026 Paper: Adversarial Latent Embedding Repair for LLM Continual Learning

Official ICML 2026 research code for AlerDistill, a method that stops AI language models from forgetting old knowledge when fine-tuned on new data, using risky embedding detection and knowledge distillation repair.

PythonHydraQwen3-4Breinforcement learningsetup: hardcomplexity 4/5

This repository contains the official code for AlerDistill, a research method introduced in an ICML 2026 paper. The problem it addresses is a common issue in AI development: when you train a large language model on new information, it tends to forget things it previously learned. This is sometimes called catastrophic forgetting, and it makes it hard to keep updating a model over time without losing old capabilities.

AlerDistill tackles this through two steps. First, it searches for "high-risk" internal representations inside the model, specifically prompt embeddings that are likely to cause forgetting when the model gets updated on new data. Second, it repairs the updated model by comparing it against a frozen copy of the original model, pulling it back toward the original behavior where necessary. This repair step uses a technique called knowledge distillation, where one model learns from another.

The code is written in Python and uses a configuration system called Hydra, which lets you adjust training settings from the command line without editing files directly. The default setup trains a model called Qwen3-4B-Instruct on chemistry question data. During training, the code can spin up a separate inference server to evaluate the model as it learns, testing it on benchmarks like HumanEval (a coding task dataset) and MMLU (a broad knowledge test).

The repository is structured around a main training script, a trainer that handles both the standard fine-tuning and the latent repair logic, a search module that finds the risky embeddings, and an evaluation suite. Configuration files live in a separate folder and cover the model, data, and repair settings. Outputs from each run are saved with timestamps, and checkpoints are stored so training can be resumed.

This is a research release aimed at other AI researchers who want to reproduce the paper's results or build on the method. It is not a general-purpose tool for non-researchers, and the README does not describe a consumer-facing product.

Where it fits

Reproduce the AlerDistill ICML 2026 results on chemistry QA fine-tuning without losing scores on coding and knowledge benchmarks.
Apply the latent embedding search module to identify high-risk representations in your own model before a fine-tuning run.
Use the knowledge distillation repair step to pull an updated model back toward its original capabilities during training.

Open on GitHub → Full breakdown on explaingit →