GeistHaus
log in · sign up
79 pages link to this URL
America's $1T AI Gamble

US AI-Related Investment Keeps Breaking Records, With Total Software, Computer, & Data Center Spending Now Exceeding $1T Per Year

1 inbound link article en
2025: The year in LLMs

This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …

1 inbound link article en ai 2024openai 419generative-ai 1791llms 1757anthropic 282gemini 185ai-agents 111pelican-riding-a-bicycle 113vibe-coding 91coding-agents 202ai-in-china 95conformance-suites 10
2025: The year in LLMs

This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …

1 inbound link article en ai 2024openai 419generative-ai 1791llms 1757anthropic 282gemini 185ai-agents 111pelican-riding-a-bicycle 113vibe-coding 91coding-agents 202ai-in-china 95conformance-suites 10
Tokenmaxxing

I burnt 250M tokens in a day. Tokenmaxxing is the deliberate practice of maximizing AI token consumption through parallelization.

0 inbound links article en [aiproductivity] [tokenmaxxingAI productivityAI agentsparallel workflowsMETR researchAI token consumptionagent orchestration]
Reflections on 2025

The Compute Theory of Everything, grading the homework of a minor deity, and the acoustic preferences of Atlantic salmon

2 inbound links article en
Against the METR graph

METR’s benchmark has become a bellwether of AI capability growth, but its design isn’t up to the task, argues Nathan Witkin

2 inbound links article en
Agent Autonomy - Part 2: Going Beyond Algorithms

Educational demos, marketing materials, and creative work—problems without a mathematical harness. Part 2 of the Agent Autonomy series shows how orchestrated agent evolution can solve subjective problems through skill-based guidance and multiple independent evaluators.

0 inbound links article en Machine LearningProduct ManagementLeadershipSoftware Engineering
From Behavior to Judgment: Designing Evaluation for Agentic Systems » { design@tive } information design

Learn how to evaluate agentic AI systems using dual evaluation, LLM-as-a-judge, and hybrid methods that go beyond observability.

0 inbound links article en Artificial IntelligenceProject ManagementTalks & WorkshopsUser Experience Agentic AIArtificial IntelligenceDecision MakingHuman-Computer InteractionHuman–Agent Centered Designproduct developerProduct ManagementUsabilityUser ExperienceUX
How to React to a New Frontier Model

Gemini 3 is out. The benchmarks are genuinely incredible. But it’s hard to know what to do about it.41% on HLE. 45% on ARC-AGI-2. These are colossal achievem...

0 inbound links website en tw-jekyll
Are we in a bubble?

Intro/Disclaimer: Trade at your own risk! Just like women think “I’m special” and “This time it’ll be different” when it comes to relationships, men think the same way when it comes to stock trading. Warren Buffett’s advice has remained the same for years: Most people should simply dump their money into an index fund that tracks the market and forget about it rather than actively trade and lose money. I too have recommended that same approach in one of my most-read posts, but given my actions, I think I need to add another oft-repeated piece of advice: “do as I say and not as I do.”

0 inbound links article en posts ThoughtsAI
AI Slow vs. Fast Takeoff, Defined

Cut through the rhetoric: what people mean by each, why benchmarks keep saturating, and how super-exponential autonomy reframes the next few years

1 inbound link article en
Towards end-to-end automation of AI research - Nature

An artificial intelligence system can produce research papers with minimal human involvement, even passing the first round of peer review for the workshop of a main machine learning conference.

2 inbound links article en Computer scienceMathematics and computing Computer scienceMathematics and computingScienceHumanities and Social Sciencesmultidisciplinary CC BY 4.0
Towards end-to-end automation of AI research - Nature

An artificial intelligence system can produce research papers with minimal human involvement, even passing the first round of peer review for the workshop of a main machine learning conference.

5 inbound links article en Computer scienceMathematics and computing Computer scienceMathematics and computingScienceHumanities and Social Sciencesmultidisciplinary CC BY 4.0
Designing for delegation | Ad Hoc

Agentic, delegation-based services could reshape how people access government, cutting administrative burden – if agencies start building the right design patterns now.

1 inbound link website en
The Assembly Language of Knowledge Work

The work most of us are doing right now—the clicking, the tabbing between windows, the copy-pasting, the endless typing interspersed with bursts of genuine cognition—will soon seem as archaic as programming in assembly language—the low-level instruction set for a machine that is about to be automated away. The Atom of Work: The Read-Cognify-Write Loop Break down any task performed by a knowledge worker, and you find the same atomic structure repeating itself:

Thoughts by a non-economist on AI and economics

Crossposted on lesswrong Modern humans first emerged about 100,000 years ago. For the next 99,800 years or so, nothing happened. Well, not quite nothing. There were wars, political intrigue, the in…

2 inbound links article en Uncategorized
The state of AI safety in four fake graphs

Here is a quick overview of my intuitions on where we are with AI safety in early 2026: So far, we continue to see exponential improvements in capabilities. This is most visible in the famous “METR…

0 inbound links article en Uncategorized
Mathematics in the Library of Babel — Daniel Litt

Mathematics isn't only about saying true things. It's about asking the right questions, being confused, stumbling about, getting distracted, being wrong, recognizing when you're wrong, being stuck. Mostly being stuck. It's about clinging to a giant edifice and feeling it out until you understand som

4 inbound links article en
Productivity and AI: it's the tool, not the model

Every week, a new “SOTA” (State of the Art) model is announced, promising higher reasoning capabilities and (often) lower costs. We could be led to think that we are entering an era of infinite, frictionless productivity. But the reality is messier. While the models are getting smarter, the gap between “intelligence on tap” and “completing a task” is managed by our tools and right now, that tooling interface is becoming a major source of friction. As we will see, this isn’t just a developer’s dilemma in the context of new AI assisted coding interfaces. It is a preview of the “retooling tax” that every professional domain must soon learn to navigate. The paradox is simple: as models improve, productivity bottlenecks increasingly shift away from intelligence itself and toward the tools that mediate access to it. The race for better models This is the popular meme reflecting the merry-go-round of weekly improvements of AI models: (source. other versions of this meme do include Anthropic’s Claude, if you wonder) LLMs become more capable, cheaper, and available on tap, to the point that the new best performing model can be indistinguishable from the previous one, simply because models are now so smart that the tasks we perform are not complex enough to clearly differentiate between “a great model” and an “even greater model”: both perform equally well on the tests. This is the experience of Simon Willison when testing a preview of Claude Opus 4.5 on November 24, 2025: It’s clearly an excellent new model, but I did run into a catch. My preview expired at 8pm on Sunday when I still had a few remaining issues in the milestone for the alpha [of his coding project]. I switched back to Claude Sonnet 4.5 and… kept on working at the same pace I’d been achieving with the new model. With hindsight, production coding like this is a less effective way of evaluating the strengths of a new model than I had expected. I’m not saying the new model isn’t an improvement on Sonnet 4.5—but I

The macroeconomics of agentic AI

During the Industrial Revolution, machines displaced most Western agricultural workers, who later went on to cities and earned higher wages. However, some time later, the automobile displaced all horses, and they still haven’t reallocated to new professions. In the coming decades, are we the 19th century peasant, or the horse?

0 inbound links article en
AstroCoder

<nav class="toc"> <ul> <li><a href="#key-features-of-astrocoder">Key Features of AstroCoder</a> <li><a href="#auto-generating-documentation"...

0 inbound links article en
AI vs Human Attention Spans

AI models were initially given human tests (like the LSAT or MCAT) or tests written for AIs like the MMLU. However since they’ve mastered so many of these tests and the tests don’t always carry over to real-world abilities, new measures of progress are needed. One way to rank the difficulty of a task is by how long it would take a human to complete it. In March,

0 inbound links article en
Agentic Engineering: Building Without Writing — Bill de hÓra

tars is a personal AI assistant with CLI, Web UI, Email, and Telegram channels, persistent memory, hybrid search, integration with tools I used all the time. About 35 features, 14kloc of python and 600 tests all told. I didn't write any of it. The experience was different enough from traditional de

2 inbound links article en
2025: The year in LLMs

This is the third in my annual series reviewing everything that happened in the LLM space over the past 12 months. For previous years see Stuff we figured out about …

25 inbound links article en ai 2014openai 418generative-ai 1785llms 1751anthropic 282gemini 185ai-agents 110pelican-riding-a-bicycle 113vibe-coding 90coding-agents 200ai-in-china 95conformance-suites 10