BenchmarksCoding
SWE-fficiency
Measures whether coding agents can optimize real-world repositories for performance: generate a pull request that speeds up a target workload while keeping the repository's existing tests passing (498 tasks across 9 large Python repos).
Codingspeedup scoreHigher is better
| Model | Lab | Score↓ | Source model | Status | Date | |
|---|---|---|---|---|---|---|
| MiniMax M3 | MiniMax | 34.8% | MiniMax M3 | Verified | — | Details |
Each row reports the model’s speedup score on SWE-fficiency. Click a row for the full run context.