KernelBench (Hard)

Name: KernelBench (Hard)
Creator: evals.report

Stanford KernelBench's hardest tier: generate correct, high-performance GPU (CUDA) kernels from PyTorch reference operators, scored on the fraction of kernels that are both correct and faster than the baseline (fast₁).

Codingfast₁Higher is better

Scores About Run this benchmark

What this benchmark measures

Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.

The metric shown here is fast₁. It should be interpreted within KernelBench (Hard), not compared as part of a site-wide ranking.

What to be careful about

fast₁ depends heavily on the GPU and tolerance; MiniMax reported its Hard-tier run on NVIDIA Blackwell GPUs.

No composite ranking

evals.report never combines benchmarks. fast₁ on KernelBench (Hard) is its own number — don’t average it with other metrics.