A practice of removing data from AI’s training set such as:
- Misinformation
- Dangerous instructions
- Biased or offensive content
Process
- During data collection, adhere to Data Minimization
- Data auditing with Classification Model
- Use data filtering tool to clean the data set for manual blacklisted content
- Fill data back with synthetic data
- Post-training Model Fine Tuning
- Machine Unlearning
Side Effects
- Models can exhibit Emergent Capabilities when scaled up
- Data filtering can degrade performance - even on benign tasks