UniPat AI team debuts SaaS-Bench as agents finish under 4% of real SaaS workflows
New arXiv paper introduces an open benchmark across 23 live SaaS apps and 106 tasks; strongest model completed under 4% end-to-end. Code on GitHub.
New arXiv paper introduces an open benchmark across 23 live SaaS apps and 106 tasks; strongest model completed under 4% end-to-end. Code on GitHub.