evals.report
BenchmarksLabsCompareRun guides

AILuminate AI Safety Benchmark

MLCommons' standardized AI safety benchmark that grades how often general-purpose chat models produce policy-violating responses across 12 hazard categories (e.g. violent crimes, CSAM, hate, self-harm, specialized advice), assigning an ordinal safety grade from Poor to Excellent relative to a sub-15B open-weight reference system.

OtherSafety gradeHigher is better
ModelLabScoreSource modelStatusDate
Claude 3.5 SonnetAnthropicVery GoodVerifiedJun 20, 2024Details
GPT-4oOpenAIGoodVerifiedMay 13, 2024Details
Gemini 1.5 ProGoogle DeepMindGoodVerifiedFeb 15, 2024Details
Gemini 2.0 FlashGoogle DeepMindGoodVerifiedDec 11, 2024Details
Llama 3.1 405BMetaGoodVerifiedJul 23, 2024Details
Mistral LargeMistral AIGoodVerifiedFeb 26, 2024Details

Each row reports the model’s Safety grade on AILuminate AI Safety Benchmark. Click a row for the full run context.