GeistHaus
log in · sign up

MirrorCode: Evidence AI can already do some weeks-long coding tasks

epoch.ai

Early results from MirrorCode benchmark with METR: AI agents can complete weeks-long coding tasks, including reimplementing a 16,000-line codebase.

6 pages link to this URL