evals.report
BenchmarksSourcesLabsCompareRun guides
LabsMoonshot AI

Moonshot AI

Model provider for Kimi-family public benchmark rows.

3 models13 results kimi.moonshot.cn

Models 3

Progress by benchmark

Show progress on
Kimi K2 Instruct
Jul 11, 2025
Kimi K2.5
Jan 1, 2026
73.8%
Kimi K2.6
Apr 1, 2026
76.7%
Single benchmark only
This view shows SWE-bench Verified (% resolved) only. Other benchmarks use different metrics and are not directly comparable.

Progress matrix

ModelSWE-bench Verified
% resolved
GPQA Diamond
accuracy
LiveCodeBench Pro
Codeforces Elo
Berkeley Function Calling Leaderboard
accuracy
LiveBench
score
Terminal-Bench 2.1
task success
SWE-bench Pro
% resolved
DeepSWE
% resolved
Humanity's Last Exam
accuracy
MMMU-Pro
accuracy
LMArena
source-defined rating
ARC-AGI-3
accuracy
ARC-AGI-2
accuracy
FrontierMath
accuracy
AIME (OTIS Mock)
accuracy
SimpleQA Verified
accuracy
Kimi K2 Instruct
Kimi
59.06%27.67%
Kimi K2.5
Kimi
73.8%87.6%27.9%92.2%
Kimi K2.6
Kimi
76.7%90.8%72.17%23.89%29.9%38.97%96.1%

Scores are not normalised across benchmarks. Each column uses its own metric. Compare columns independently.