KernelBench (Hard)

Name: KernelBench (Hard)
Creator: evals.report

Stanford KernelBench's hardest tier: generate correct, high-performance GPU (CUDA) kernels from PyTorch reference operators, scored on the fraction of kernels that are both correct and faster than the baseline (fast₁).

Codingfast₁Higher is better

Scores About Run this benchmark

Model	Lab	Score↓	Source model	Status	Date
MiniMax M3	MiniMax	28.8%	MiniMax M3	Verified	—	Details

Each row reports the model’s fast₁ on KernelBench (Hard). Click a row for the full run context.