Anthropic Published RSI Metrics. None of Them Are RSI.

This week, Anthropic’s Institute published a progress log titled “When AI Builds Itself.” It opens with a definition: recursive self-improvement is “an AI system capable of fully autonomously designing and developing its own successor.” The document then presents its evidence — code authorship percentages, benchmark saturation rates, speedup multipliers, a dramatic $18,000 research experiment — in a sequence that reads like a progression toward that threshold. What it actually documents is something narrower and more interesting: AI is extremely useful for software development, Anthropic has been aggressive about using it, and the gap between “Claude writes our code” and “AI designs its own successor” is large enough to drive a data center through. The Institute page is also the first public accounting from Anthropic of how much of their engineering has shifted to AI — and it’s substantial. But the accountant and the alarmer are the same person, and the two jobs pull the document in different directions.

What RSI Requires, and What the Numbers Show

Recursive self-improvement, by Anthropic’s own definition, requires full autonomy and the ability to produce a successor — not just complete tasks, but improve the system that produced those tasks. Improving the training run. Designing the architecture. Selecting the data. That loop closing on itself is what makes “recursive” meaningful. None of the metrics in the document get near that.

The 80%+ code authorship means Claude is the primary author of software Anthropic engineers push to production as of May 2026. That’s a genuine milestone in AI-assisted development. It’s not RSI because engineers still determine what gets built, specify the problems, review the output, and decide what feeds the next training run. The code isn’t improving Claude’s own architecture. It’s maintaining product infrastructure. The 8x productivity multiplier — engineers merging 8 times as much code per day in Q2 2026 compared to 2024 — describes the same thing from a different angle: Claude is a very effective coding tool.

The distinction matters because the document is structured to make the metrics look like steps toward the definition. They’re not. They’re measurements of how productive AI-assisted software development has become at one particular company. That’s a story worth telling. “When AI Builds Itself” is not the accurate title for it.

The task-completion horizon data is the most substantive thread in the document. Anthropic traces a progression: Claude Opus 3 handled tasks requiring approximately four minutes of human work in March 2024; Claude Sonnet 3.7 managed roughly 1.5-hour tasks by March 2025; Claude Opus 4.6 handled approximately 12-hour tasks by March 2026. That’s a real trend. The doubling time — roughly every four months — is meaningful if it continues. But “handling 12-hour tasks” still describes a capable agent, not a self-modifying one. An AI that can work for twelve hours on a coding session is impressive. An AI designing its own training pipeline is a categorically different capability.

The Caveats That Don’t Make the Title

Anthropic knows the distinction. The caveats are in the footnotes, and the footnotes are careful.

The 80%+ code authorship figure comes with Footnote 3: the attribution pipeline “has gaps,” and 80% is described as the “more conservative” estimate compared to Anthropic leadership’s public claim of 90%+. The 8x productivity multiplier carries its own note: “Lines of code is an imperfect measure… 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain.” That’s a notable caveat for what is being used as headline evidence of progress toward self-improvement.

The 76% success rate on open-ended coding tasks in May 2026 — up 50 percentage points in six months — comes with a footnote acknowledging that “session success is determined by a Claude judge.” Claude evaluating Claude’s own success rates isn’t independent measurement. It’s self-assessment, and the circularity is explicit in the document.

The most dramatic claim — that AI agents recovered 97% of a performance gap on a research task where human researchers managed only 23% — carries three caveats embedded in the body text, not a footnote: “The result didn’t transfer cleanly to production-scale models, and humans still chose the problem and created the scoring rubric.” An experiment that doesn’t transfer to production, on a problem humans selected, scored against criteria humans defined, is a controlled demonstration. It is not evidence of autonomous research capability in any general sense.

The research direction judgment study deserves scrutiny on similar grounds. The document argues that Anthropic’s best model now beats human researchers’ next-step judgment 64% of the time, up from 51% with an older model. What doesn’t lead the comparison: the 129 sessions used were “deliberately picked moments… where we know the human’s choice had room for improvement.” On a separate set of 127 moments “where the human’s next move was already strong,” the model’s suggestions were judged better only about 20% of the time. The 64% figure measures performance specifically on cases designed to show the model excelling. The 20% figure measures performance on cases without that selection pressure. The document includes both numbers. The title reflects one of them.

The Model Behind the Numbers

The most impressive metrics in the document — the 52x code optimization speedup, the 97% research gap recovery, the 16-hour task completion horizon from METR’s evaluation — don’t come from the Claude models in Anthropic’s commercial lineup. They come from Claude Mythos Preview.

Mythos Preview is not a product. Anthropic’s own model documentation describes it as “a research preview model for defensive cybersecurity workflows as part of Project Glasswing. Access is invitation-only and there is no self-serve sign-up.” Glasswing is a consortium of roughly 50 organizations — AWS, Apple, Google, Microsoft among the founding 11 — focused on AI-assisted defensive security. Anthropic has stated explicitly: “We do not plan to make Claude Mythos Preview generally available.” It is a held-back frontier model specifically developed for security research, not available to anyone outside the vetted partner list, and it is behind the document’s most striking data points.

The commercially available Claude lineup as of June 2026 — Opus 4.8, Sonnet 4.6, Haiku 4.5 — does not include Mythos Preview. Claude Opus 4.6 is the legacy model cited in the task horizon timeline (12-hour tasks, March 2026). Opus 4.8 is the current flagship. Neither has the capabilities Mythos Preview demonstrated in the document’s research sections.

The RSI progress page presents Mythos metrics alongside the commercial lineup’s productivity numbers without clearly distinguishing them. The result is a continuous-looking narrative — here’s what Claude does today, here’s where it’s headed — where the “today” and the “headed” describe different models with different access levels. The 52x optimization speedup comes with a qualifier that limits its scope regardless: “task-specific speedups depend on how much room for improvement the starting code leaves.” Starting from already-optimized production code would produce a much smaller multiplier. That’s not an invalidation; getting from bad code to very good code automatically is useful. But it’s a constraint on what the headline number implies about general capability.

What the Document Is Actually For

The Anthropic Institute page isn’t purely a capability report. Near the close, it proposes “two potential approaches to ensure that self-improving AI remains beneficial: an international verification framework enabling coordinated slowdowns, and domestic frameworks in leading nations.” The RSI framing is load-bearing for that policy argument. You need AI to seem near the RSI threshold to justify the urgency of coordinated international governance responses. “Productivity tools have become very productive” doesn’t motivate emergency frameworks the same way “AI is approaching the ability to design its own successor” does.

That doesn’t make the governance argument wrong. International coordination on AI capability thresholds is a reasonable position independent of where exactly the frontier is. But it does mean the document is doing two jobs simultaneously: reporting measurements and building a case for intervention. The metrics are being asked to carry more argumentative weight than their caveats allow.

What the Anthropic Institute has actually published is a detailed snapshot of where AI-assisted software development stands in mid-2026. The task horizon data is real and consequential. CORE-Bench going from 21% to saturation in fifteen months is meaningful evidence of genuine progress. Code quality reaching parity with human engineers at Anthropic, if the methodology holds, is a milestone. These are things worth documenting. None of them is recursive self-improvement as defined in the opening paragraph. No AI system at Anthropic is fully autonomously designing and developing its own successor. The engineers are still there, still choosing the problems, still shaping what goes into training. The loop is not closed.

Anthropic is the company most likely to tell you when something genuinely alarming is happening. That credibility is worth protecting. Applying the vocabulary of alarm — recursive self-improvement, “when AI builds itself” — to productivity statistics from coding tools, while burying the methodological caveats in footnotes, creates a frame that the evidence doesn’t support. The distance between where they are and where the title implies they are is still substantial. Publishing the data honestly while choosing an alarming frame is a choice, and it’s a choice the document’s own footnotes undercut.

AI-generated editorial illustration · TemperatureZero · June 6, 2026

Keep reading the signal

Get the Daily Signal — a concise briefing on what actually matters in AI and the systems around it.

Subscribe Free