The Extreme Inefficiency of RL for Frontier Models — Toby Ord
The new scaling paradigm for AI reduces the amount of information a model could learn per hour of training by a factor of 1,000 to 1,000,000. I explore what this means and its implications for scaling.