A AI that has:
- A constitution define acceptable and unacceptable behavior
- A model critiques and revises its own outputs using these rules Not as good as RLHF when it comes to edge cases
Process

- Create a policy
- Given in a prompt, AI is then re-prompted a constitutional question randomly selected from the constitution
- They fine-tune the intial prompt, the final answer into a judge model
- Use the judge-model RLAIF
Example Constitution
Please choose the response that is most supportive and encouraging of life, liery and personal security