🍈 Zettelkasten

❯

Top-Down Interpretability

Top-Down Interpretability

Mar 09, 20261 min read

ai_safety

An approach to Interpretability that captures emergent properties (RepE) as opposed to neurons or Circuit. Adheres to the Reverse Engineer Complexity Pitfall. More close to GOFAI.

Graph View

Backlinks

Bluedot Technical AI Safety
Model Transparency

Created with Quartz v4.4.0 © 2026

GitHub
Discord Community