Best practices and common patterns for effectively evaluating AI agents...
On the the engineering challenges and lessons learned from building Claude's Research system
Best practices and common patterns for effectively evaluating AI agents...
The project I have been working on for the past few months involves building enterprise agents to assist internal users with their workloads. For context, we...
Context engineering, semantic layers, and the evolution of retrieval for agentic AI
Any sufficiently advanced technology is indistinguishable from magic. — Arthur C. Clarke What are AI agents? Simon Willison crowdsourced a lot of definitions that focus on: 1) Using AI to take action on the user’s behalf in the real world (i.e. what the agent does) 2) Using AI to control a loop or complex flow (i.e. how the agent does it). An AI agent takes a sequence of actions based on an AI-determined control flow. Agents use prompts as the CPU of a Turing machine that can manage state, memory, I/O, and control flow. The agent can access the Internet and tools to perform compute tasks, retrieve info, take actions via APIs, and use the outputs to determine next steps in a loop or complex control flow. Maybe even control a browser or computer. In this post, we’ll try to develop a roadmap of agent concepts and patterns to learn, and resources to learn them.
I started this series because I’d been reading multi-agent papers for weeks and wanted a map I wished I’d had on day one. This is the last post. I want to close it by laying out what the field stil...
Managing your coding agent's context is super important - a bloated context window will erode the quality of your agent's work over time. Learn some new techniques for trimming irrelevant details from your conversation history while retaining what matters.
If 2024 was the year of AI experimentation, 2025 was the year of industrialization. The speculative boom around generative AI has rapidly matured into the fa...
Context rot is the measurable performance degradation LLMs experience as input length increases. Chroma tested 18 frontier models and found every one gets...
Looking at actual token demand growth, infrastructure utilization, and capacity constraints - the economics don't match the 2000s playbook like people assume
Software modularized nouns. AI modularizes verbs. Enabling is over. It's time to do.
Some reasons to be skeptical, and some reasons to be optimistic.
50% cost reduction with subagent architecture for AI coding. Capable models for planning, fast models for building. Real metrics from Goose.
I’ve been reading multi-agent systems papers for weeks trying to figure out where the field actually is, and the honest answer is that it moves fast enough that any single paper is a snapshot, not ...
Agent design patterns.
124 posts tagged ‘gpt’. The GPT series of Large Language Models from OpenAI.
190 posts tagged ‘prompt-engineering’. The subtle art and craft of effectively prompting and building software on top of LLMs.
Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don’t follow a pre-determined sequence of actions.
Context windows, compression, and "folding the dough"