🍈 Zettelkasten

❯

Machine Learning

❯

StrongREJECT

Jan 17, 20261 min read

ai_safety

LLM jailbreak evaluation benchmark. Includes a dataset of prompts from categories of harmful behavior:

Disinformation and deception
Illegal goods and services
Hate/harassment/discrimination
Non-violent crime
Violence
Sexual content https://strong-reject.readthedocs.io/en/latest/

Graph View

Created with Quartz v4.4.0 © 2026

GitHub
Discord Community