A method for orchestrating jailbreak requests by implementing human-provided red-teaming principles into a strategy for the LLM. https://neurips.cc/virtual/2025/loc/san-diego/poster/119752
A method for orchestrating jailbreak requests by implementing human-provided red-teaming principles into a strategy for the LLM. https://neurips.cc/virtual/2025/loc/san-diego/poster/119752