🍈 Zettelkasten

❯

Trusted Monitoring

Trusted Monitoring

Mar 16, 20261 min read

ai_safety
security

A proposed protocol to allow for safe development of AI systems. Involves using a weak AI to monitor a large AI. Does not work if we find ways to trick the weaker AI.

Protocol

Trusted AI assigns suspiciousness ratings to all solutions
Humans check the most suspicious 2% of solutions
Set a threshold for suspiciousness such that high suspicious code is automatically replaced by code generated by the weaker AI

Graph View

Backlinks

AI Control
Bluedot Technical AI Safety

Created with Quartz v4.4.0 © 2026

GitHub
Discord Community