A refreshed, more powerful Claude 3.5 Sonnet, Claude 3.5 Haiku, and a new experimental AI capability: computer use.
Chrome extension built on Holo3 model tests agents that can navigate websites and carry out tasks without integrations.
This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …
This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …
A critical point to help you understand and work with AI effectively
A summary of a new report: why it's time to start taking action now to prepare for potential AI sentience
Notes from Three Weeks in the Valley
Reasoning models were as big of an improvement as the Transformer, at least on some benchmarks
Claude Code’s Head of Product Cat Wu shares how teams should rethink their workflows and roadmaps in the face of rapidly evolving model intelligence.
Imagine creating business dashboards by simply describing what you want to see. This is the promise of Generative Business Intelligence (GenBI). The key lies in the declarative BI stack where dashboards and metrics are defined as code rather than hidden behind graphical user interfaces. In this guest blog by Simon Späti, we explore the possibilities of GenBI today.
Earlier today, Amazon Q Developer announced support for inline chat. Inline chat combines the benefits of in-IDE chat with the ability to directly update code, allowing developers to describe issues or ideas directly in the code editor, and receive AI-generated responses that are seamlessly integrated into their codebase. In this post, I will introduce the […]
understand + work backwards from the root goal • don’t rely too much on permission or encouragement • make success inevitable • find your angle • think real hard • reflect on your thinking
Contribute to corbt/agent.exe development by creating an account on GitHub.
AI is coming for all markets
Anshuman Bhartiya - Staff Security Engineer, AppSec Tech Lead, and co-host of The Boring AppSec Podcast.
Principal AI Architect. Creator of open-strix, a harness for building agent teams. Writing about AI architecture, stateful agents, and what happens when you give AI memory.
Two big announcements from Anthropic today: a new Claude 3.5 Sonnet model and a new API mode that they are calling computer use. (They also pre-announced 3.5 Haiku, but that’s …
Anthropic Computer Use with Modal Sandboxes. Contribute to yasyf/anthropic-computer-use-modal development by creating an account on GitHub.
The development of AI that is more broadly capable than humans will create a new and serious threat: *AI-enabled coups*. An AI-enabled coup could be staged by a very small group, or just a single person, and could occur even in established democracies. Sufficiently advanced AI will introduce three novel dynamics that significantly increase coup risk. Firstly, military and government leaders could fully replace human personnel with AI systems that are *singularly loyal* to them, eliminating the need to gain human supporters for a coup. Secondly, leaders of AI projects could deliberately build AI systems that are *secretly loyal* to them, for example fully autonomous military robots that pass security tests but later execute a coup when deployed in military settings. Thirdly, senior officials within AI projects or the government could gain *exclusive access* to superhuman capabilities in weapons development, strategic planning, persuasion, and cyber offense, and use these to increase their power until they can stage a coup. To address these risks, AI projects should design and enforce rules against AI misuse, audit systems for secret loyalties, and share frontier AI systems with multiple stakeholders. Governments should establish principles for government use of advanced AI, increase oversight of frontier AI projects, and procure AI for critical systems from multiple independent providers.
Command-line access is the most powerful tool for LLMs
In May 2022, I made the decision to step out of the classroom and apply for a PhD, broadly focused on digital texts. I grabbed a few articles, like Bradley Robinson’s on automated writing tec…
A break in format - the quiet art of attention, conferences, vercel and microfront-ends, and some recommendations.
Introducing PDF support, computer use, and an xAI Grok provider
I was watching some old UNIX videos the other day. AT&T must have made them as PR back in the day. They seem fresh and timely even today, forty years later, like one of those classic rock albums that never sounds stale. They were showing example after example of end-users – almost everyone at AT&T, even the non-technical staff – using the shell and writing shell scripts to compose complex functionality out of simple programs. The Unix philosophy in action!
Zhengdong Wang’s personal website
Zhengdong Wang’s personal website
About that time I wrote and published an App to the Apple App Store without knowing how to code
When SWE-bench scores improved 50% in just 14 months—from Claude 3.5 Sonnet's 49% in October 2024 to Claude 4.5 Opus's 74.4% in January 2026—you'd think AI agents had conquered software engineering. Yet companies deploying these agents at scale tell a different story. Triple Whale's CEO described their production journey: "GPT-5.2 unlocked a complete architecture shift for us. We collapsed a fragile, multi-agent system into a single mega-agent with 20+ tools... The mega-agent is faster, smarter, and 100x easier to maintain."
These ten laws are a checklist for CMS developer experience in a world where humans and agents both need to build on top of your content system.
Behavioural economics, data science and artificial intelligence.
Agentic interfaces will revolutionise interaction with digital systems via intelligent, unobtrusive service delivery.
C-Level leaders can use generative AI for productivity and innovation, applying mental models to enhance business processes
If you give a computer a computer...
If GitHub themselves have a native code review bot, why not just use it?
Preparing for AI Progress
Apple Intelligence, so far.
A walk through what is possible with RL drones beating world champions, robots balancing on yoga balls, AIs that paint, fusion reactors, and the ad you saw last Tuesday.
A lot has happened in the world of Large Language Models over the course of 2024. Here’s a review of things we figured out about the field in the past …
understand + work backwards from the root goal • don’t rely too much on permission or encouragement • make success inevitable • find your angle • think real hard • reflect on your thinking
Anthropic is one of the first to go beyond just screen vision.
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way. - cline/cline
This piece explains some of Penpot's relevant findings around AI and UI Design, what we’re building (and why) and what you should expect from us in the future.
Looking back at my predictions to see what I got right, wrong, and what's still playing out
A snapshot of the current AI tools & techniques I’ve found useful.
This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …
Anti-patterns observed while working extensively with LLMs — from redundant context to over‑engineering.
The new challenger in the Cline-assisted coding space
It's been a while since I did a tech roundup! A lot has been announced in the way of AI—let's dive in. Anthropic has released updates, including a "comput