SWE-fficiency

Name: SWE-fficiency
Creator: evals.report

Measures whether coding agents can optimize real-world repositories for performance: generate a pull request that speeds up a target workload while keeping the repository's existing tests passing (498 tasks across 9 large Python repos).

Codingspeedup scoreHigher is better

Scores About Run this benchmark

Model	Lab	Score↓	Source model	Status	Date
MiniMax M3	MiniMax	34.8%	MiniMax M3	Verified	—	Details

Each row reports the model’s speedup score on SWE-fficiency. Click a row for the full run context.