evals.report
BenchmarksLabsCompareRun guides
BenchmarksMultimodal

ScreenSpot-Pro

A GUI grounding benchmark that measures how accurately a multimodal model can locate a referenced UI element (return its position) given a natural-language instruction and a full-screen, high-resolution screenshot of professional desktop software across 23 applications, 5 industries, and 3 operating systems.

MultimodalaccuracyHigher is better

No run guide for this benchmark yet.