White House Halts AI-Testing Unit Reports Amid Rising Concerns

I was on a call when a note slid across the screen: the Center for AI Standards and Innovation had been instructed to stop publishing its model reviews. You could feel the pause spread through the safety community. For many, that sudden silence read like a flashing warning light.

At a West Wing briefing, officials argued over control of model reviews

Inside the loop, I heard the same complaint: who gets to vet the most powerful AI systems before they reach millions of users? The Trump administration has told CAISI — the Commerce Department office that became the federal government’s primary AI-testing unit — to pause public reports while the new AI executive order is put into place.

CAISI had been doing three things at once: running hands-on tests with frontier models, coordinating findings with other agencies, and publishing those results so researchers and companies could act on risks. That public-facing role is now in question. The pause isn’t just a procedural freeze; it changes how transparency happens, and who controls the narrative about what these models can do.

What is CAISI?

CAISI is the Center for AI Standards and Innovation, a Commerce Department office that evolved from the Biden-era AI Safety Institute. It works directly with labs — including OpenAI and Anthropic — to test models before release and has been one of the few places where government testing results were visible to the public.

At an industry briefing, executives warned about dual-use risks

Companies privately flagged a real worry: advanced models can be repurposed for cyberattacks or to assist biological research in dangerous ways. Those conversations shaped the directive to stop publishing some CAISI reports, the Wall Street Journal reported, citing people familiar with the matter.

Anthropic’s handling of Mythos became a flashpoint. The company initially limited Mythos access to select partners to hunt for vulnerabilities; this week it released a public version with extra guardrails. The move shows how private labs are testing their own safety measures while government oversight shifts beneath them.

That sudden clampdown feels like yanking the spark plug from a running engine — it interrupts systems mid-cycle and forces everyone to relearn how to make decisions without shared data.

Why did the White House tell CAISI to pause its public reports?

The directive arrived as the administration prepares a new framework: labs would give the federal government voluntary access to frontier models up to 30 days before broad release to “strengthen the cybersecurity of critical infrastructure.” An earlier draft reportedly proposed a 90-day review window. Officials backing the order, including Treasury Secretary Scott Bessent and National Cyber Director Sean Cairncross, argue the new process centralizes national-security checks.

At an interagency meeting, people noticed the split in Washington

Not everyone in the administration is aligned. Some officials view the executive order as duplicative of CAISI’s existing work; others see it as a needed centralization of authority. That disagreement helps explain why CAISI may continue internal evaluations while its public reporting is paused.

OpenAI and Anthropic have long-running relationships with CAISI. OpenAI even urged the office to be strengthened last week, showing that major labs value a government testing partner — even as they bristle at new oversight requirements.

For researchers, the pause raises a practical question: will model risk information remain siloed inside government channels, or will that knowledge continue to be shared with the broader security and research communities?

How will President Trump’s AI executive order affect model releases?

The order asks labs to provide voluntary pre-release access for up to 30 days to give agencies time to check for threats to critical infrastructure. That window is narrower than earlier drafts, but still represents a formal mechanism for federal review. The policy is being sold as a cybersecurity measure, yet it also concentrates power over timing and public disclosure in a way that changes incentive structures for labs.

You’re left watching a tug-of-war between secrecy and sunlight — and the consequences matter: the models now moving out of labs shape how products, research, and even national defenses operate. CAISI’s silence may slow some disclosures, but it will not stop the models or the rush to put them into service.

The pause shifts authority without answering a harder question: who should hold the public’s right to know about AI risks — the companies building the systems, a single new federal gatekeeper, or both acting in the open as partners like partners in a high-stakes experiment?

For those of us tracking this, the real concern is timing. The administration is tightening control just as capabilities accelerate and private labs are moving faster than many agencies can follow — as if the lights in a monitoring room flicked out when they were needed most.

I will keep watching — and I want you to watch, too. If the government is going to claim authority over model reviews, how much of that work should remain behind closed doors?