METR + LaAl Timeline

METR's Task-Completion Time Horizons has become an iconic tracker of AI progress, and is now one of the most-watched releases in frontier model evaluation. It's updated after major launches and routinely cited in timeline debates among researchers, journalists, and policymakers. We follow each new update closely in this reading club.

The benchmark measures how long a task (as timed by human experts) a frontier AI agent can complete autonomously. Each model is run against a suite of self-contained, automatically-scorable tasks drawn from software engineering, machine learning R&D, and cybersecurity. METR then fits a logistic curve to its success rates and reports the 50% time horizon: the human-expert task duration at which the model is predicted to succeed half the time. Longer horizons indicate greater autonomous capability on this class of work.

The Timeline