Back in October I wrote about small models being the future. The thesis: you don’t need a trillion parameters for most tasks. I want to get to a point where a collection of small models - 30B or less - running locally can handle my coding workflow. But there’s a catch that hits all models, big and small: context windows. 128K or 200K context windows sound great on paper. In practice, you burn through that budget fast. Once you’re past ~40% capacity, you’re in the dumb zone - the model starts drifting, hallucinating, forgetting its own instructions. This isn’t a small model problem. It’s how attention works. I’ve been digging into Recursive Language Models lately. It’s research, not production-ready, but I think the pattern might finally change this.