Opus 4.7 is not generally a worse model than Opus 4.6, but there is a real downgrade: with Opus 4.7, the control over the thinking budget is now fully owned by Anthropic. This change matters in a way…
This is a technical report on three bugs that intermittently degraded responses from Claude. Below we explain what happened, why it took time to fix, and what we're changing.
Opus 4.7 is not generally a worse model than Opus 4.6, but there is a real downgrade: with Opus 4.7, the control over the thinking budget is now fully owned by Anthropic. This change matters in a way…
Track Claude Code's daily performance on SWE-Bench-Pro. Monitor for degradation with statistical significance testing.
I like Thinking Machines, but more than that, I’m grateful.
This is an in-depth post on bugs and how to prevent them in AI software and AI compilers specifically. I was the software lead for TPUv3 at Google and I’ve worked on a variety of AI compilers and projects across Google, Nvidia, Amazon and Facebook.
Practical techniques for getting great results from AI coding agents, from project setup and context management to effective prompting patterns.
Adam Karvonen, Daniel Reuter, Roy Rinberg, Luke Marks, Adrià Garriga-Alonso, Keri Warr · arXiv (paper link) · Github