evals.report
BenchmarksLabsCompareRun guides
AnthropicClaude

Claude Mythos Preview

Anthropic · Claude. Released Apr 7, 2026.

9 results

Benchmark results 9

Compare this model
BenchmarkCategoryScoreMetricStatusDate
SWE-bench VerifiedCoding93.9%% resolvedUnverifiedApr 7, 2026Details
SWE-bench ProCoding77.8%% resolvedUnverifiedApr 7, 2026Details
GPQA DiamondReasoning94.6%accuracyUnverifiedApr 7, 2026Details
Humanity's Last ExamReasoning56.8%accuracyUnverifiedApr 7, 2026Details
OSWorldAgents79.6%task success rateUnverifiedApr 7, 2026Details
GAIA: A Benchmark for General AI AssistantsAgents52.3%accuracyUnverifiedApr 7, 2026Details
METR Task-Completion Time HorizonsAgents1044.8 min50% time horizonOfficialApr 7, 2026Details
CharXivMultimodal93.2%accuracyUnverifiedApr 7, 2026Details
SWE-bench MultimodalCoding59.0%% resolvedVerifiedApr 7, 2026Details