METR's Task-Completion Time Horizons on a linear scale. Each point is an AI model; the curve is an exponential fit.
Reading club meetings are shown as diamonds along the top.
AI model (METR)
Reading club meeting
Research sprint
Exponential fit
Click on a point or meeting for details
METR's Task-Completion Time Horizons has become an iconic tracker of AI progress, and is now one of the most-watched releases in frontier model evaluation. It's updated after major launches and routinely cited in timeline debates among researchers, journalists, and policymakers. We follow each new update closely in this reading club.
The benchmark measures how long a task (as timed by human experts) a frontier AI agent can complete autonomously. Each model is run against a suite of self-contained, automatically-scorable tasks drawn from software engineering, machine learning R&D, and cybersecurity. METR then fits a logistic curve to its success rates and reports the 50% time horizon: the human-expert task duration at which the model is predicted to succeed half the time. Longer horizons indicate greater autonomous capability on this class of work.