evals.report
BenchmarksLabsCompareRun guides

SciCode

A scientist-curated benchmark that evaluates language models on realistic scientific research coding problems, comprising 338 subproblems decomposed from 80 challenging main problems across 16 natural-science subfields (physics, math, chemistry, biology, materials science).

CodingaccuracyHigher is better

No run guide for this benchmark yet.