You Can’t Patch the API Against Distillation

You Can’t Patch the API Against Distillation

/ Maxim Starkweather

Between April 22 and June 5, 2026, operators linked to Alibaba’s Qwen AI research division ran 28.8 million exchanges with Claude through nearly 25,000 fraudulent accounts. They were not stress-testing the API or running benchmarks. They were building Qwen.

On June 10, Anthropic sent a letter to Senate Banking Committee Chair Tim Scott and Ranking Member Elizabeth Warren describing this as “the largest known distillation attack” in its history. The previous record was MiniMax with 13 million Claude interactions. Before that: Moonshot AI at 3.4 million, DeepSeek at 150,000. Alibaba’s campaign dwarfed all three combined. The letter called the campaign illicit. The more important word is “largest.” That superlative will not hold.

Anthropic went to Congress because it had run out of technical options. That is not a criticism — it is the honest description of where this industry stands. If you offer a public API, your model’s behavior is observable. If its behavior is observable at scale, it can be trained on. There is no patch for this. There is only politics.

What Model Distillation Actually Is

Distillation, in the classical machine learning sense, means training a smaller model on the outputs of a larger one. The student never sees the teacher’s weights, parameters, or pretraining data — only text. Question and answer, at enough scale and with enough targeting, and the student begins to approximate the teacher’s behavior on the task categories the training questions covered.

What the Qwen-affiliated operators allegedly ran was something more precise. Rather than sampling broadly from Claude’s general capabilities, they targeted its highest-value narrow band: software engineering tasks and agentic reasoning. These are the capabilities that separate Anthropic’s models from the commodity tier — the cornerstones, in Anthropic’s phrasing, of its Mythos model family. Queries were crafted to elicit Claude’s full reasoning on complex coding tasks, multi-step tool orchestration, and long-horizon autonomous execution. Twenty-eight million of them, over forty-four days.

The resulting dataset is extraordinarily dense in a specific way. You don’t need 28 million examples to establish that a model can write Python. You need that many to capture the behavioral surface of how Claude approaches software engineering problems — edge cases, planning heuristics, debugging sequences, test generation strategies — in enough variety that a model trained on it converges toward Claude’s approach to that task class rather than just its most common surface responses. OSTP Director Michael Kratsios called it, in a post on X, “tens of thousands of proxies and jailbreaking techniques in coordinated campaigns to systematically extract American breakthroughs.” That’s the policy frame. The more accurate technical frame is: structured capability harvesting through sustained, targeted prompt campaigns that map a model’s decision surface in the domains you care about.

Systematic capability extraction: mapping a model's behavioral surface one query at a time.

The selection of software engineering as the target is not random. Anthropic’s commercial case in the enterprise market rests almost entirely on agentic coding capabilities. Fable 5 migrated a 50-million-line Stripe codebase in a single day; Mythos runs autonomous research tasks across 35-hour windows without human intervention. Those capabilities, priced through Anthropic’s API, are what enterprise contracts are built on. A competitor that can replicate them at a fraction of the inference cost — by skipping the years of post-training and instead fine-tuning on a dense catalog of Claude’s actual responses to real software engineering tasks — undercuts the entire commercial thesis. Targeting agentic coding is not just industrial espionage. It’s a direct attack on the margin structure of Anthropic’s business.

The Timeline Is Not a Coincidence

The extraction campaign opened on April 22. Qwen3.6-27B shipped the same day, already scoring 77.2 on SWE-Bench Verified — within three points of Claude Opus 4.5’s 80.9. The OSTP memo warning AI labs about exactly this kind of threat, designated NSTM-4, landed April 23, the day after the attack started. The campaign then ran continuously through May. On May 21, Qwen3.7-Max launched with an SWE-Verified score of 80.4 — matching Anthropic’s flagship — alongside SWE-Pro of 60.6 and Terminal-Bench 2.0 of 69.7. The extraction ended June 5. Anthropic’s letter to Congress arrived June 10.

Alibaba’s published description of Qwen3.7’s training methodology emphasizes “environment scaling” — models trained across diverse agentic environments — and “cross-harness and cross-verifier RL training” where identical tasks are executed under varying tool configurations to build generalized tool-use. That description is technically coherent. It also says nothing about what was in the supervised fine-tuning data that preceded the RL phase. Training on 28.8 million Claude responses covering software engineering, then applying reinforcement learning in agentic environments to stabilize and generalize those capabilities, is not an alternative hypothesis to Alibaba’s description. It is entirely consistent with it. The two can both be true.

Alibaba has declined to comment. Their model cards identify no external data sources for Qwen3.7. Proving in any proceeding that a specific benchmark improvement derives from Claude distillation rather than Alibaba’s independent RL research requires access to Alibaba’s training infrastructure — which no US legal mechanism currently compels a Chinese company to provide. This is the other reason Anthropic wrote to the Senate Banking Committee instead of filing a lawsuit.

Why Congress Is the Only Available Defense

The practitioners arguing in the Hacker News thread made an observation that cuts: “Crawl the whole Internet to build a gargantuan sized LLM and then complain you’re being copied.” It’s not a complete refutation, but it’s not nothing. The AI industry’s relationship with training data provenance is complicated across the board. Meta’s early LLaMA models incorporated ChatGPT-generated outputs before OpenAI intervened. Virtually every lab training below the frontier tier uses some volume of larger-model-generated synthetic data. The line between “learning from observable model behavior” and “illicit extraction” is genuinely blurry at the margins.

Anthropic's appeal to Congress: the political defense when technical defenses run out.

The scale distinction is where Anthropic’s position hardens. DeepSeek ran 150,000 exchanges. Moonshot ran 3.4 million. MiniMax ran 13 million. Each of those could, in principle, be characterized as aggressive competitive research using publicly accessible infrastructure. Alibaba’s 28.8 million interactions, structured as a coordinated campaign through 25,000 fraudulent accounts targeting specific capability domains over six weeks, is a different category of act — organized trade secret extraction rather than competitive benchmarking. The fraudulent account structure is load-bearing for Anthropic’s legal position, because it moves the conduct from “using a public API in ways we don’t like” to “deliberate fraud to circumvent access controls.”

And yet the defensive problem remains intractable. Rate limits at the account level help at the margins. Identity verification catches unsophisticated actors. But 25,000 accounts operating at individually plausible usage rates over six weeks do not trip automated anomaly detection in most API security systems. Distinguishing a software engineering firm’s legitimate API usage from a distillation campaign requires behavioral analysis across accounts, temporal correlation, and semantic clustering of the query content — a surveillance apparatus that almost no API provider runs at this scale, and that raises its own legal questions about what you’re allowed to do with customer data in order to inspect it. Anthropic caught this campaign eventually — through what specific detection mechanism they have not said publicly. They cannot guarantee they will catch the next ten campaigns running simultaneously at lower per-account volume, which is the obvious adaptation. The OSTP memo commits the government to sharing distillation threat intelligence with US labs — a useful early-warning mechanism, but inherently reactive.

The Just Security analysis of the US government response notes the core enforcement problem: “new and existing authorities will not deter further distillation campaigns until deployed or credibly threatened.” The legal framework to impose meaningful penalties on Chinese entities for distillation attacks on US AI models does not yet clearly exist. Building it requires congressional action, which Anthropic’s letter is intended to accelerate. Industry groups have already flagged compliance concerns about the proposed legislation — particularly for cloud providers and API intermediaries with commercial exposure to Chinese AI companies.

Anthropic’s letter asks for three things: antitrust flexibility for AI labs to share distillation detection data across companies; continued chip export controls; and enforceable penalties against firms running industrial-scale extraction campaigns. The first two exist in nascent form. The third is the novel and difficult ask, because it requires the US to build a legal instrument capable of attributing specific capability improvements in a foreign model to specific stolen training data — a forensic standard that may not be achievable against a sophisticated adversary who runs independent RL training on top of a distillation base.

What Anthropic’s letter ultimately reveals is that there is no technical defense against an adversary with sufficient API access, patient account management, and targeted query construction. Rate limits are a tax, not a barrier. Account verification catches noise, not signal. Behavioral detection is slower than behavioral adaptation. The only remaining lever is making the political and economic cost of running these campaigns high enough to outweigh their commercial benefit. Alibaba’s 28.8 million conversations demonstrate what the current cost calculus looks like: they ran the campaign for six weeks, their model shipped on schedule, and they declined to comment. The math is not currently discouraging anyone.

AI-generated editorial image

AI-generated editorial illustration · TemperatureZero · June 25, 2026

Keep reading the signal

Get the Daily Signal — a concise briefing on what actually matters in AI and the systems around it.

Subscribe Free

Continue the archive

Latest BriefingsArticlesAbout Temperature Zero