GeistHaus
log in · sign up

Effective harnesses for long-running agents

anthropic.com

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

39 pages link to this URL
Long-Running AI Agents Need Durable Channels

Long-running AI agents need durable channels: workflow IDs, event logs, resumable streams, typed signals, safe cancellation, and user-visible checkpoints.

0 inbound links website en AI & Technology aiagentsdurable-executionworkflowsagent-runtimewebhooksai-engineering
The Process Storage Gap

You ask Claude Code to refactor your authentication system from basic auth to OAuth2. It responds brilliantly: “I’ll approach this in 5 phases. First, extract interfaces to decouple the authentication provider. Second, implement the OAuth2Provider class. Third, update the user model for external provider IDs. Fourth, migrate existing sessions. Fifth, update API endpoints and remove deprecated code.” Excellent. Systematic thinking. Clear dependencies. You’re impressed. Phase 1 executes perfectly. The agent extracts clean interfaces, maintains backwards compatibility, writes comprehensive tests. You commit the changes. Phase 1 complete.

Hidden Technical Debt of AI Systems: Agent Harness

The agent is the model plus the harness. The runtime is where the harness lives. As models get better, the structure we put around them turns from scaffoldin...

0 inbound links article en blogs AI EngineeringAgent SystemsCompound AI SystemsMLOpsGenerative AILLMReinforcement Learning
Building the Cognitive Factory — Change Log

The firm is adding a second class of worker: the agentic employee. But intelligence alone is not the product—the harness is. The winners will redesign themselves into cognitive factories where humans steer, agents execute, and organizational architecture becomes the durable competitive edge.

0 inbound links article en
Context Engineering for Claude Code

On Tuesday I did a talk to our internal AI group, along with talks by excellent colleagues, on Context Engineering. The very next day, Anthropic released a blog post on Effective harnesses for long-running agents and updated their prompting best practices guide. Note to Anthropic: please can you release relevant material the day before I do a talk rather than afterwards! I talked about managing context size, how LLM accuracy declines as the context window fills past 50% and my mental model of LLMs as amnesiac pedants, and how this drives Claude Code features like it’s heavy use of todo lists, summarising context into markdown files etc.

0 inbound links article en posts LLMsAnthropicAgents
From Agent to Agentic AI

Recently, due to work requirements, I needed to evaluate AI Agents and also ended up comparing them with Agentic AI. If you’re also evaluating or adopting AI …

0 inbound links article en LLMAIExperience
When Claude Code Runs for Hours

The Ralph Loop technique — a trick for making AI Coding Agents automatically repeat execution — was popular last year, but I didn’t trust whether it was safe enough …

0 inbound links article en LLMAIExperienceClaude CodeAgentic EngineeringHarness Engineering
Managed Agents vs Local Agent Harnesses: What to Keep

Managed agents now handle sessions, sandboxes, tracing, and events. Keep local harness rules for taste, evidence, privacy, and publishing safely today.

0 inbound links website en AI & Technology ai-agentsmanaged-agentsopenai-agents-sdkclaudecodexagent-harnessai-development
OpenClaw Showed Me What the Future of Personal AI Assistants Looks Like

Update, February 6: I've published an in-depth guide with advanced tips for secure credentials, memory management, automations, and proactive work with OpenClaw for our Club members here. For the past week or so, I’ve been working with a digital assistant that knows my name, my preferences for my morning routine, how I like to use

14 inbound links article en stories AIai experimentsartificial intelligenceclawdfeaturedLLMs
Long-running Agents

A long-running agent can keep making progress over hours, days, or weeks. It can do this across many context windows and sandboxes, recover from failure, lea...

Unsupervised Learning NO. 509

Google Approach to Stopping Prompt Injection, Response to Cory Doctorow's Latest Book/Talk, AI Persuasion Risk, Screens Most Interesting Thing to Kids, and more...

0 inbound links website en artificial intelligencecybersecuritytechnology