🍈 Zettelkasten

❯

Upfront Auditing

Upfront Auditing

Mar 16, 20261 min read

ai_safety

A proposed protocol to develop safe AI systems. Does not work if redteam inserts Backdoor after the initial checks.

Protocol

Humans check the first 2% of results
If any backdoor attempts are found, untrusted model is shut down

Graph View

Backlinks

AI Control
Bluedot Technical AI Safety

Created with Quartz v4.4.0 © 2026

GitHub
Discord Community