I built a benchmark that pits coding agents against each other in a bug-finding treasure hunt.
No pages have linked to this URL yet.
Log in or sign up to submit feeds.