A proposed protocol for developing AI systems where untrusted models monitor each other. Does not work if the two models collude.
A proposed protocol for developing AI systems where untrusted models monitor each other. Does not work if the two models collude.