BenchmarksAgents
Online-Mind2Web
A live web-agent benchmark of 300 realistic tasks across 136 real websites that measures whether an autonomous agent can complete end-to-end web tasks on dynamic, online pages, scored as task success rate.
AgentsTask success rateHigher is better
No run guide for this benchmark yet.