MirrorCode: Evidence AI can already do some weeks-long coding tasks

epoch.ai

Early results from MirrorCode benchmark with METR: AI agents can complete weeks-long coding tasks, including reimplementing a 16,000-line codebase.

6 pages link to this URL

RIP Classic Reasoning Benchmarks. What’s Next?

Epoch AI Greg Burnham May 5, 2026

Give up at least one of: text only, short time horizon, easy to grade, and expert human superiority.

1 inbound link article en

ProgramBench: Can Language Models Rebuild Programs From Scratch?

arxiv.org John Yang May 5, 2026

1 inbound link en

Import AI

Import AI Import AI May 11, 2026

0 inbound links website en

Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment

Import AI Import AI Apr 13, 2026

Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe. A shorter issue than usual as I was attendi…

0 inbound links article en Uncategorized

Import AI

Import AI Import AI May 11, 2026

0 inbound links website en

Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment

Import AI Jack Clark Apr 13, 2026

Was fire equivalent to a singularity for people at the time?

0 inbound links article en