GeistHaus
log in · sign up

What skills does SWE-bench Verified evaluate?

epoch.ai

We take a deep dive into SWE-bench Verified, a prominent agentic coding benchmark. While one of the best public tests of AI coding agents, it is limited by its focus on simple bug fixes in familiar open-source repositories.

2 pages link to this URL