Using AI to assist in generating feedback for another AI (RLHF)
Method
- Train an AI assistant using human feedback (RLHF)
- Use AI assistants on harder tasks, to train new models for those harder tasks
Using AI to assist in generating feedback for another AI (RLHF)