evals.report
BenchmarksLabsCompareRun guides

Online-Mind2Web

A live web-agent benchmark of 300 realistic tasks across 136 real websites that measures whether an autonomous agent can complete end-to-end web tasks on dynamic, online pages, scored as task success rate.

AgentsTask success rateHigher is better

No run guide for this benchmark yet.