🍈 Zettelkasten

❯

AI Control

Mar 16, 20261 min read

ai_safety

A field on AI Safety that aims to prevent misaligned AI from causing harm. Does not allow AI to harm humans even if they want to. Involves creating AI Evaluations that detect whether a scheming model is able to subvert measures designed to constrain it.

Methods

Upfront Auditing
Trusted Monitoring
Untrusted Monitoring
Untrusted Monitoring with Monitor Tests
Human Approval Control
Task Decomposition and Reconstruction

Graph View

Backlinks

Human Approval Control
AI Safety
Bluedot Technical AI Safety
Task Decomposition and Reconstruction

Created with Quartz v4.4.0 © 2026

GitHub
Discord Community