Analysis of AI agent reliability using survival models. Re-examining METR's task data with Weibull distributions reveals insights about long-horizon AI autonomy.
Contrary to my earlier hypothesis , AI agents probably don't have a constant hazard rate / half-life. Instead their hazard rates systematically decline as the task goes on. This means that AI agents’ success rates on tasks beyond their 50%-horizon are better than my constant-hazard-rate model su