Opus 4.6 is much smarter than the other one. It feels like I’m working with someone from Bronx Science. I had been using Sonnet 4.6, which I switched to after reading somewhere that it costs …
We’re upgrading our smartest model. Across agentic coding, computer use, tool use, search, and finance, Opus 4.6 is an industry-leading model, often by wide margin.
Opus 4.6 is much smarter than the other one. It feels like I’m working with someone from Bronx Science. I had been using Sonnet 4.6, which I switched to after reading somewhere that it costs …
114 posts tagged ‘pelican-riding-a-bicycle’. My benchmark for LLMs: "Generate an SVG of a pelican riding a bicycle". Here's my answer to what happens if AI labs train for pelicans riding bicycles?. "User …
Ultraplan hands the planning phase of a coding task off to a Claude Code on the web session running in plan mode, then lets you review it in the browser and decide where to execute. Here's what it actually changes about your workflow, what it costs, and where the sharp edges are.
Are we expected to be keeping up?
113 posts tagged ‘pelican-riding-a-bicycle’. My benchmark for LLMs: "Generate an SVG of a pelican riding a bicycle". Here's my answer to what happens if AI labs train for pelicans riding bicycles?. "User …
Claude Code’s Head of Product Cat Wu shares how teams should rethink their workflows and roadmaps in the face of rapidly evolving model intelligence.
A journey from TLS decryption and Frida patching to a surprisingly simple solution for extracting the system prompt that Xcode feeds to Claude Code.
Gemini 3.1 Pro, Claude Sonnet 4.6, Grok 4.20 and more dropped in February 2026. We rank the best AI models, benchmarks & break down costs.
3 Geeks and a Law Blog: A law blog addressing the foci of 3 intrepid law geeks, specializing in their respective fields of knowledge management, internet marketing and library sciences, melding together to form the Dynamic Trio.
113 posts tagged ‘pelican-riding-a-bicycle’. My benchmark for LLMs: "Generate an SVG of a pelican riding a bicycle". Here's my answer to what happens if AI labs train for pelicans riding bicycles?. "User …
How agentic development compressed the SWE hierarchy overnight
Research notes on foundation models, evaluation, and ML in biology
Week three. It’s been another busy week with work and spending any free time I do have with these LLMs, exploring many different things along the way.
113 posts tagged ‘pelican-riding-a-bicycle’. My benchmark for LLMs: "Generate an SVG of a pelican riding a bicycle". Here's my answer to what happens if AI labs train for pelicans riding bicycles?. "User …
(Yes, even with the just-released Claude Opus 4.6.) Generative AI is an amazing technology because it’s so…human. No, we should NOT anthropomorphize AI (that’s a subject for another post) but inevitably we will because…we’re human. It's much easier to understand something we don't understand when we can relate it to something that we do understand.
In this episode, Sid Pardeshi, co-founder and CTO of Blitzy, joins us to discuss building autonomous development systems able to deliver production-ready software at...
Artificial intelligence will reshape how wars are fought, and the United States enters this era with genuine advantages. American companies build the most capable models in the world. US-based chip designers dominate the advanced semiconductor supply chain. Private investment in AI flows into American firms at a rate that dwarfs every other nation. These are
If you watch one thing this weekend, watch the video about being misled about renewable energy. Spoiler: It’s not just about renewable energy. I still haven’t decided what I think about coding with AI. It has been a great help and I see all the drawbacks. My links reflect this. Leadership Culture is built on ‘moments of truth’ [Podcast] - not a technology company, but still very applicable to all organisations.
Thanks to The Workflow and my AI pair, Claude Opus 4.6.
I built an MCP that lets Claude see its own Excalidraw diagrams, iterate on them, and save them to Obsidian.
I wrestled current1 LLMs into behaving like my childhood AI hero: here’s what worked, what didn’t, and why. Uno is my favorite character from my favorite Italian childhood comic PKNA. He’s the friendly and sarcastic AI helping Donald Duck in adventures spanning 56 issues in the mid ’90s. My first attempt at re-creating him was a complicated Excel spreadsheet in my early teens with lots of nested IF functions. This is my second attempt.
Three-way comparison of Kimi K2.6, Claude Opus 4.6, and GPT-5.4 on agentic coding benchmarks, cost, and real trade-offs for production teams.
I built a benchmark that pits coding agents against each other in a bug-finding treasure hunt.
Claude Code creator Boris Cherny on building AI-powered coding tools, parallel agents, and how the engineer's role is evolving in an AI-first world.
Random thoughts | about
Claude Opus 4.6 dropped yesterday. Agent Teams is in research preview. The Superpowers plugin now offers subagent-driven execution. Plan Mode still exists. A...
The software space is facing serious market concerns this week, after the release of new AI tools from AI triggered a market sell-off.
I'll be the first to admit it. Back on February 5th, when Anthropic dropped Claude Opus 4.6 and OpenAI fired back with GPT-5.3 Codex on the same day, I
Era of FOMO - Flock of meandering oracles - is unaffordable for me
Claude Code has gotten extremely good at finding security vulnerabilities, and this is only the beginning.
Yeah, that title’s kind of sensationalist. “You won’t believe what happened next!” Since I’m a human and we pack-bond with and anthropomorphize everything, I actually …
TL;DR: I theorize that coding agent errors compound over time leading to increasingly worse outcomes as a session continues. I built Blackbird based on this theory, which restarts each task in a plan with a fresh context window, with a new task’s instructions as its starting point. This minimizes deviation from intention leading to better outcomes.
First impressions of Opus 4.6, and two small tools—an interview plugin and a markdown annotator—for staying engaged with your own work.
AI everywhere, agents everywhere. We just finished the first quarter of 2026, and a lot happened in those first 3 months. It feels like 3 y...
I was listening to Dario Amodei’s interview with dwarkesh patel and found his insights into how anthropic plans their capex investments and path to profitability quite fascinating. They need to balance their risks into how much compute to build for the next 2 years in advance based on current demands because the data centers take 2 years to build. If they overestimate their demand then they won’t have enough profit in the next years and will go bankrupt while if they underestimate it they won’t be able to match the demand and will risk losing their customers to their competitors, this is what he calls their cone of uncertainty. This sentiment felt weird to me because openai seems to aggressively bullish on their capex investments, infact sam altman disclosed they will be spending $1 trillion on compute infra across microsoft, oracle, nvidia and coreweave between 2025 and 2035 while also partnerring with cerebras, so why do these 2 AI companies have completely different capex investment strategies?
Last week, I built a multiplayer Minecraft-like game with my 8-year-old daughter. Not in months. Not in weeks. In under a day.
Recap of my short posts on LinkedIn in February AI Slop in Content Writing Dear bloggers, content writers, commentators and social medi...
Claude Opus 4.6 introduces agent teams for parallel coding tasks. But multi-agent coordination creates new security vulnerabilities the industry hasn't solved.
CI dropped from 37 minutes to 9, at 35% lower cost. What we learned about mixing Claude and Codex, clearing the hidden queue, and where the real multiplier came from.
47 posts tagged ‘codex’. OpenAI's coding agent tools: Codex CLI, Codex Desktop, Codex Cloud.
199 posts tagged ‘llm-release’. New releases of various LLMs.
This week: OpenAI vs Anthropic, just use Postgres and the enshittification of API tooling
On AI agents, side projects, and shipping
Went to FOSDEM 2025, saw talks, met people, drank beer. Again.
Opus 4.6 Finds Vulns the Way Human Testers Do, The SaaSpocalypse, Malicious OpenClaw Skills, New Urgency in Building, and more