Explained: Trump’s ‘Industrial-Scale’ China AI Theft Claim

Explained: Trump's 'Industrial-Scale' China AI Theft Claim

I was scanning a late-night memo when the phrase “industrial-scale” clapped like a gavel. You feel the stakes in that kind of language—sudden, official, accusatory. I want to walk you through what those words mean for the models, the code, and the companies that say their futures are being stolen.

The Trump administration—backed by a White House Office of Science and Technology Policy memo from Michael Kratsios—has publicly accused China of sweeping theft of AI intellectual property. The memo, shared with multiple agencies, instructed officials to flag attempts by foreign actors to access sensitive information. The accusation lands inside a politics-heavy debate: the same administration has argued that scraping vast troves of copyright-protected material to train AI can be fair use (read the policy note).

At a Washington desk, a memo asked agencies to alert AI firms about foreign access

I read the Financial Times coverage and the memo closely (FT report). The ask is blunt: if foreign actors probe or access proprietary models, tell the firms. That moves the government from public-policy adviser to quasi-sheriff—sharing threat intel with private labs so companies can respond.

That response matters because the alleged theft isn’t midnight hackers breaking in. The main tactic under scrutiny is distillation: training a smaller model on the outputs of a larger one so the smaller copy can mimic performance with far less compute. Anthropic flagged this tactic against several China-based labs, and OpenAI publicly accused DeepSeek of using similar methods to free-ride on U.S. work (Anthropic notice; LA Times write-up).

What does “industrial-scale theft” mean?

It’s not a poetic turn. It suggests systematic, repeatable processes: scraping, probing models, automating queries to reconstruct weights or behavior. The phrase is meant to signal volume and intent—an accusation that goes beyond casual copying and into organized appropriation.

On GitHub, a leaked codebase survived takedowns after someone rewrote it into another language

I was surprised by how fast Anthropic moved when source for Claude Code leaked. The company issued thousands of takedowns, trying to scrub reposts. One clever leak survived: an AI agent had translated the code into another programming language, and Anthropic reportedly judged that transformation enough to be noninfringing (NYT coverage).

That episode shows how messy these disputes are. Companies zealously guard model internals—weights, datasets, prompt templates—yet they also claim vast swaths of their output were produced by AI and thus aren’t copyrightable. The result feels like a legal Rorschach test: different actors see different protections depending on convenience.

Can distillation steal intellectual property?

Yes and no. Distillation can reproduce behavior and sometimes near-verbatim phrases that trace back to copyrighted training text. Ars Technica documented models spitting out long passages that mirrored training data (Ars report). But proving theft in court requires mapping outputs to specific proprietary inputs—often a technical and legal slog.

Think of distillation like a locksmith copying a master key: precise, mechanical, and capable of opening the same doors. You can see why companies sound both defensive and outraged.

In courtrooms and offices, copyright and “transformative” claims are colliding

I followed the Copyright Office ruling that purely AI-created works lack copyright protection (NYT explainer). At the same time, firms brag that massive portions of their code are AI-generated—Anthropic and OpenAI have each said much of their internal tooling is machine-produced (Fortune piece).

That dynamic creates a paradox: when companies want to bypass copyright (to justify training models on scraped books or articles), the material is framed as fair use or public domain; when they want to stop leaks or competitors, suddenly the same material is proprietary and sacred. Courts are wrestling with whether output is “transformative” enough to escape infringement claims—something companies hope will shield their own assets while exposing rivals.

How do companies spot and stop distillation attacks?

Firms use a mix of technical and legal tools: watermarking, behavior-based detection, legal takedowns, contractual access controls, and threat-sharing with agencies. Anthropic, OpenAI, and others also pursue copyright notices and DMCA requests; they may coordinate with platforms like GitHub to remove reposts. But these are partial defenses against automated scraping operations run at scale.

There’s a political angle, too. When a government that defends broad scraping now accuses another nation of industrial theft, it sharpens global friction. The White House OSTP memo and the FT reporting push this from internal platform dispute into international incident.

Models that can reproduce books or code nearly verbatim feed public fear. Anthropic’s reported purchase, scanning, and destruction of books to build a dataset (Washington Post) reads like a confession designed to avoid lawsuits, even as it muddies the moral ground.

The scene is messy: companies want legal cover for large-scale scraping when it helps them, and full protection when it hurts them. You should be skeptical of anyone claiming a simple answer. The fight will be technical, legal, and political—played out across GitHub repos, federal memos, and courtroom briefs.

One more thought: the tools that power this fight are the very products under dispute—OpenAI models, Anthropic’s Claude, China-based labs like DeepSeek, platforms such as GitHub. They’re both weapon and prize.

So where does that leave you, me, and the broader public? If governments, corporations, and courts are racing to define what counts as theft, who gets to draw the line?