BenchmarksCoding
SciCode
A scientist-curated benchmark that evaluates language models on realistic scientific research coding problems, comprising 338 subproblems decomposed from 80 challenging main problems across 16 natural-science subfields (physics, math, chemistry, biology, materials science).
CodingaccuracyHigher is better
No run guide for this benchmark yet.