Mar 28, 2025 2 min read

How do we tell the difference between new model training runs vs improvements to existing models?

Models require massive compute, time, and money for new pretraining, while others advance through optimization of existing models. Since large improvements can come from either new training runs or reinforcement learning, researchers need to distinguish between these approaches to accurately forecast AI progress.

Why this distinction matters for forecasting

The relationship between training methodology and capability jumps is particularly important when trying to predict future AI development. Once I clarified that the GPT-4 to o1 jump is at the same scale as GPT-3.5 to GPT-4, my answer changed from requiring 3 jumps to 2 jumps to achieve AGI (tentatively meaning an AI system capable of doing 99% of jobs).

I intuited this conclusion from my increased outsource onto chatbots for editing, coding, and thinking in 2025 compared to 2023, and not based on how much I expect inference compute to scale. However, understanding whether such jumps come from new training runs or improvements to existing models would provide additional context for interpreting these patterns and making better predictions.

GPT-4.5 vs o1

Recent releases from OpenAI illustrate this issue. Their blog posts and system cards reveal that GPT-4.5 is primarily a new pretrained model focusing on "scaling unsupervised learning," while o1 is explicitly a reasoning-focused RL improvement to enhance reasoning.

What's striking is that GPT-4.5 doesn't feel as big of an improvement as I would expect from a new pretraining run, while o1 feels like a significant jump despite being primarily an RL improvement on an existing model. This observation makes me even more curious about the relationship between development approach and capability improvements. Understanding gains from pretraining vs RL refinement would make it much easier to predict future jumps in capabilities.

Helpful info for forecasting and safety evaluations

I thought of some other details that could help predict trajectories of different techniques. These include:

1. Resource Information

Split between foundation model pretraining and post-training methods (like reasoning-focused RL in o1)
When training was done and training duration
Number of training runs (including failed attempts)

2. Lineage Information

Progress from newly pretraining or an RL-based improvement on an existing model
Which specific base model was built upon for improvements
Whether the base model was publicly released or internal

Not inadvertently increasing AI capabilities

For safety researchers, understanding whether major capability jumps typically come from new training runs or from RL improvements could help prioritize what to work on. I am uncertain about how to balance releasing information that could help forecasting and evals vs revealing where to make progress on capabilities. A compromise could be releasing information about a model a year after it is released.

Tracking progress and future forecasting

Without clarifying on whether capability leaps come from new pretraining or existing model optimization, our ability to make accurate predictions is limited. While we can observe capability jumps through benchmarks and real-world performance, understanding the underlying development methods could provide context we didn't expect.

If we knew, for instance, that o1's reasoning capabilities came primarily from RL techniques rather than massive new pretraining, this could suggest that future reasoning improvements might be achieved through similar post-pretraining optimization approaches rather than requiring entirely new foundation models. This kind of insight would be valuable for forecasting how and when we might see the next major capability jumps.