mechanistic_interpretability
Jupyter Notebook
★ 1
updated 4mo ago
Mechanistic Interpretability (MI) is a subfield of AI alignment and safety research focused on reverse-engineering neural networks to understand their internal computational mechanisms by discovering the actual algorithms and circuits they learn.
No plain-English explanation yet — one is being written right now. Check back in a minute.