KernelBench (Hard)
Stanford KernelBench's hardest tier: generate correct, high-performance GPU (CUDA) kernels from PyTorch reference operators, scored on the fraction of kernels that are both correct and faster than the baseline (fast₁).
What this benchmark measures
Stanford KernelBench's hardest tier: generate correct, high-performance GPU (CUDA) kernels from PyTorch reference operators, scored on the fraction of kernels that are both correct and faster than the baseline (fast₁).
Rows on this page are sourced from public benchmark artifacts, leaderboard exports, or source-linked model reports. Each row keeps benchmark version, source model name, and available run details attached to the score.
The metric shown here is fast₁. It should be interpreted within KernelBench (Hard), not compared as part of a site-wide ranking.
What to be careful about
fast₁ depends heavily on the GPU and tolerance; MiniMax reported its Hard-tier run on NVIDIA Blackwell GPUs.