A field on AI Safety that aims to prevent misaligned AI from causing harm. Does not allow AI to harm humans even if they want to. Involves creating AI Evaluations that detect whether a scheming model is able to subvert measures designed to constrain it.