Best practices and common patterns for effectively evaluating AI agents...
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Best practices and common patterns for effectively evaluating AI agents...
Long-running AI agents need durable channels: workflow IDs, event logs, resumable streams, typed signals, safe cancellation, and user-visible checkpoints.
A project-based course on designing the environments, state, verification, and control systems that make Codex and Claude Code reliable.
Forget agent swarms and complex orchestrators. The secret to overnight AI coding is a bash for-loop. I break down the Ralph Wiggum technique and show you how to actually ship working code with long-running agents.
Claude API Documentation
Part 4 of my gym-coding series: using a Ralph loop to migrate an entire SDK dependency in 81 minutes, what worked, what didn't, and where autonomous loops shine.
You ask Claude Code to refactor your authentication system from basic auth to OAuth2. It responds brilliantly: “I’ll approach this in 5 phases. First, extract interfaces to decouple the authentication provider. Second, implement the OAuth2Provider class. Third, update the user model for external provider IDs. Fourth, migrate existing sessions. Fifth, update API endpoints and remove deprecated code.” Excellent. Systematic thinking. Clear dependencies. You’re impressed. Phase 1 executes perfectly. The agent extracts clean interfaces, maintains backwards compatibility, writes comprehensive tests. You commit the changes. Phase 1 complete.
Type `ralph "prompt"` to start open code in a ralph loop. Also supports a prompt file & status check. Open Code, Claude Code, Codex, Copilot - Th0rgal/open-ralph-wiggum
Awesome list for AI agent harness engineering: tools, patterns, evals, memory, MCP, permissions, observability, and orchestration. - ai-boost/awesome-harness-engineering
Claude API Documentation
Should you go all-in on agents or are they a hype that will pass? I created a survey about the agentic coder landscape to get a better understanding of how agents are affecting development work in the real world.
From the Zed Blog: Agents handle typing so we can focus on thinking.
Claude API Documentation
The agent is the model plus the harness. The runtime is where the harness lives. As models get better, the structure we put around them turns from scaffoldin...
The firm is adding a second class of worker: the agentic employee. But intelligence alone is not the product—the harness is. The winners will redesign themselves into cognitive factories where humans steer, agents execute, and organizational architecture becomes the durable competitive edge.
Lessons from building infrastructure projects with long-running coding agents.
Claude API Documentation
My tech friends tried AI coding tools and said they don't work. We use the same models. The difference is the harness. Here's my full setup.
Network speed tester including server discovery, latency measurement, download and upload speed testing. - FrankRay78/NetPace
The autonomous coding tool based on Claude Code. Contribute to elct9620/autonoe development by creating an account on GitHub.
Claude API Documentation
Claude API Documentation
On Tuesday I did a talk to our internal AI group, along with talks by excellent colleagues, on Context Engineering. The very next day, Anthropic released a blog post on Effective harnesses for long-running agents and updated their prompting best practices guide. Note to Anthropic: please can you release relevant material the day before I do a talk rather than afterwards! I talked about managing context size, how LLM accuracy declines as the context window fills past 50% and my mental model of LLMs as amnesiac pedants, and how this drives Claude Code features like it’s heavy use of todo lists, summarising context into markdown files etc.
Recently, due to work requirements, I needed to evaluate AI Agents and also ended up comparing them with Agentic AI. If you’re also evaluating or adopting AI …
The Ralph Loop technique — a trick for making AI Coding Agents automatically repeat execution — was popular last year, but I didn’t trust whether it was safe enough …
Managed agents now handle sessions, sandboxes, tracing, and events. Keep local harness rules for taste, evidence, privacy, and publishing safely today.
Claude API Documentation
Agent design patterns.
A mental model for building trust in coding agents through feedforward guides, feedback sensors, and iterative harness engineering.
A mental model for building trust in coding agents through feedforward guides, feedback sensors, and iterative harness engineering.
Update, February 6: I've published an in-depth guide with advanced tips for secure credentials, memory management, automations, and proactive work with OpenClaw for our Club members here. For the past week or so, I’ve been working with a digital assistant that knows my name, my preferences for my morning routine, how I like to use
A long-running agent can keep making progress over hours, days, or weeks. It can do this across many context windows and sandboxes, recover from failure, lea...
Claude API Documentation
On agent orchestration patterns, why design and critical thinking are the new bottlenecks, and whether we should let go of looking at code
Claude API Documentation
Google Approach to Stopping Prompt Injection, Response to Cory Doctorow's Latest Book/Talk, AI Persuasion Risk, Screens Most Interesting Thing to Kids, and more...