evals.report
BenchmarksLabsCompareRun guides
Models
1 selected
Mistral LargeMistral AI
Benchmarks
2 selected
SWE-bench VerifiedCodingGPQA DiamondReasoning
BenchmarkMistral LargeMistral AI
SWE-bench Verified% resolved
GPQA Diamondaccuracy38.8%
SWE-bench Verified
% resolved
Mistral Large · Mistral AI
GPQA Diamond
accuracy
Mistral Large · Mistral AI
38.8%

No aggregate score is calculated. Each row uses its benchmark’s own metric. Compare rows independently.