Question 1

What is EQ-Bench Creative Writing v3?

Accepted Answer

An LLM-judged creative writing benchmark that scores models across 32 prompts (3 iterations each) using a hybrid of rubric scoring and pairwise Elo comparisons computed with a margin-weighted Glicko-2 rating system. It is a chat preference benchmark measured by Elo.

Question 2

What does Elo mean on EQ-Bench Creative Writing v3?

Accepted Answer

EQ-Bench Creative Writing v3 reports Elo; higher is better. Scores are shown only within EQ-Bench Creative Writing v3 and are never averaged with other benchmarks.

Question 3

What is the top reported EQ-Bench Creative Writing v3 score?

Accepted Answer

Claude Opus 4.7 has the top reported score on EQ-Bench Creative Writing v3: 2206 (Elo).

Question 4

Why do EQ-Bench Creative Writing v3 scores differ across runs?

Accepted Answer

Harness, scaffold, reasoning effort, and prompt setup change results, so two runs of the same model can differ. evals.report keeps each score with its run context so the differences stay visible.

Question 5

Does evals.report rank models across benchmarks?

Accepted Answer

No. EQ-Bench Creative Writing v3 scores are shown within their own metric; evals.report never combines benchmarks into a composite ranking or a single "best model".