Today, we’re announcing Claude 3.7 Sonnet, our most intelligent model to date and the first hybrid reasoning model generally available on the market.
This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …
This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …
Inference-Time Compute Scaling Methods to Improve Reasoning Models
I recently released JustHTML, a python-based HTML5 parser. It passes 100% of the html5lib test suite, has zero dependencies, and includes a CSS selector...
Reasoning models were as big of an improvement as the Transformer, at least on some benchmarks
Since Gemini 2.5 Pro’s release last week, I’ve been seeing a lot of hype claiming that the model is the new state of the art. How well does it know R?
How AI coding tools evolved from autocomplete to autonomous agents, and why I expect to abandon my IDE entirely this month.
Last weekend I thought I would try out Cursor, the AI code editor. I have been using Claude for the odd programming question, small script but never for anything larger.
Ce sont des notes personnelles sur la manière dont l’IA générative impacte les applications logicielles. Ce post est traduit de l’anglais et fait suite à ce post précédent, en anglais aussi. Ce que l’IA générative peut faire en août 2025 1. Prendre des fichiers multimodaux en entrée Jusqu’à l’année dernière, la plupart des applis d’IA générative étaient limitées quant aux types de fichiers qu’elles pouvaient lire. Gemini ne pouvait même pas lire les PDF si je me souviens bien. Cette barrière est tombée. La plupart des applis acceptent désormais txt, csv, pdf, docx, xlsx, pptx, tout xml ou json. Plus important encore, toutes les grandes applis d’IA générative sont désormais multimodales : elles acceptent du texte mais aussi des images en entrée et peuvent en faire des analyses de contenu très détaillées. J’ai testé avec Le Chat de Mistral (version gratuite), Gemini, Claude (version gratuite) et ChatGPT : toutes ont pu décrire une capture d’écran de mon ordinateur avec un grand niveau de précision. L’analyse sonore est arrivée aussi. On peut réaliser une analyse détaillée d’un fichier .wav, par exemple (en août 2025, seul ChatGPT possède cette capacité). Et la vidéo ? Pour l’instant, seul ChatGPT peut traiter des vidéos : sur la vidéo que j’ai testée, il extrait un échantillon d’images puis tente d’en déduire un sens. 2. Réaliser des analyses avancées, à la volée ChatGPT peut désormais lancer un mini environnement informatique pour traiter vos requêtes à la volée : il peut exécuter n’importe quel code Python disponible librement en bibliothèque packagée sur le web. 3. Raisonner La capacité de raisonnement est ce qui a fait le succès de l’IA générative : l’effet saisissant des LLMs est qu’ils (semblent) raisonner comme le ferait un humain. À vous d’apprécier si cet humain correspond à « un stagiaire », « un étudiant » ou « un niveau doctorat », mais cela reste assez proche de ce qu’un humain ferait. Le raisonnement a pris une nouvelle tournure depuis l’automne 2024 (Op
The agent is the model plus the harness. The runtime is where the harness lives. As models get better, the structure we put around them turns from scaffoldin...
When I went to find the iconic books to learn the .NET stack I came to a shocking realization. There are too many books! 53 books; 30,656 pages; over 757 hours.
Explaining the Model Context Protocol and everything that might go wrong.
During model distillation, large language models can subtly transmit traits unrelated to the training data.
The Who is Who in AI Land.
Astro description
Jeremy's personal website
Coding agents like Anthropic’s Claude Code and OpenAI’s Codex CLI represent a genuine step change in how useful LLMs can be for producing working code. These agents can now directly …
We take a deep dive into SWE-bench Verified, a prominent agentic coding benchmark. While one of the best public tests of AI coding agents, it is limited by its focus on simple bug fixes in familiar open-source repositories.
This is an Agentic Coding Tool that reasons and writes code.
Coding agents like Anthropic’s Claude Code and OpenAI’s Codex CLI represent a genuine step change in how useful LLMs can be for producing working code. These agents can now directly …
The Zen of Claude Code: How Simplicity Beat Complexity in AI Agents It's amazing how quickly the world of AI agents has changed, especially in the last co...
The story behind me migrating this blog site from Zola to Astro
As a long-time developer advocate, one constant struggle of mine has been building good-looking demo applications. I have little-to-no design talent, and alt...
AI dev tooling is consolidating--acquisitions, coding agents, and fierce competition reshape interfaces, pricing, and memory.
Group 1 I’ve always wanted my own assistant, like JARVIS from Iron Man. Every year we get closer to JARVIS being possible, but we’re not there yet. Let me...
My workflow with generative AI has been changing quite rapidly over the past two weeks. DeepSeek's release really sped up innovation and release cycle among its competitors. I may draft a longer note
Exploring Anthropic's new CLI-based agentic coding tool
One step closer to the irreducible loss of the data
Predictions on near-term AI inference spending
I run Claude Code with --dangerously-skip-permissions flag, giving it full system access. Let me show you a new way of approaching computers.
I've struggled to publish a 2026 essay that captures my headspace appropriately. It's not because of the blog chill, this time. It's because of AI/LLM tools.
The Bitter Lesson in AI Engineering.
Note: This post comes with a NotebookLM podcast (1linked at the bottom), and three generated audio recordings. You can read the conversation I had
@amontalenti - Founder/CTO. Coding in: Python, JavaScript, Zig.
Testing Deque's axe Assistant for accuracy and ability.
I recently released JustHTML, a python-based HTML5 parser. It passes 100% of the html5lib test suite, has zero dependencies, and includes a CSS selector...
What remains if/when coding is ‘solved’
The potential of formal verification for reasoning about systems.
This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …