UniPat AI team debuts SaaS-Bench as agents finish under 4% of real SaaS workflows

runtimewire.com

New arXiv paper introduces an open benchmark across 23 live SaaS apps and 106 tasks; strongest model completed under 4% end-to-end. Code on GitHub.

0 pages link to this URL

No pages have linked to this URL yet.