Anthropic Unveils Claude Opus 4.8 – Self-Correcting AI, Mythos Next

I was on a deadline when Claude stopped, highlighted a line, and told me it wasn’t confident. You read that correctly: the model politely raised its hand. For a few tense seconds I realized I had been trusting the wrong kind of certainty.

I watched Anthropic roll out Claude Opus 4.8 on Thursday — available everywhere at the same price as Opus 4.7: $5 per million input tokens (€4.60) and $25 per million output tokens (€23.00). The headline claim is simple: this version is better at catching and flagging its own mistakes, which is the kind of behavior that changes how you use an assistant.

At my terminal, errors turned into prompts — why that matters

Opus 4.8 is designed to admit uncertainty and surface it for you. In Anthropic’s blog post they framed the problem plainly: models often jump to confident-sounding conclusions even when the evidence is thin. Opus 4.8 pushes back on that reflex.

The company shared early tester feedback — including Michael Ran of Bridgewater — who said the model “proactively flag[s] issues with the inputs and outputs of an analysis, something other models routinely missed.” That line is a credibility lever: a senior investment associate is not the kind of endorsement you get for fluff.

What this means for you: fewer silent hallucinations, more explicit uncertainty. If you’re editing code in VS Code with GitHub Copilot or validating financial models, Claude’s new behavior reduces the audit work you usually reserve for late-night panic.

What is Claude Opus 4.8?

Opus 4.8 is the incremental successor to Opus 4.7, tuned to be more candid about its limits and slightly better at “agentic” tasks — agentic coding and agentic computer use — where it coordinates actions or tools. Anthropic claims measurable improvements on benchmarks and a lower rate of misaligned or harmful outputs, detailed in the model’s system card.

In a sprint review, parallel sub-tasks started finishing faster

Anthropic slipped a new research preview into the release: dynamic workflows. This feature spins up hundreds of subagents to work on parts of a complex job simultaneously, so larger coding tasks feel less like serial grunt work and more like organized collaboration.

There’s a control panel, too: an “effort control” setting lives in the model selector dropdown and defaults to Low. You can dial it to Medium, High, or Max, or re-enable adaptive thinking. That’s their answer to complaints that Opus 4.7 sometimes overthought small tasks and under-invested in large ones.

Opus 4.8 is a smoke alarm for its own mistakes; the effort control panel gives you the volume knob.

How much does Claude Opus 4.8 cost?

The public pricing matches Opus 4.7: $5 per million input tokens (€4.60) and $25 per million output tokens (€23.00). Anthropic says the update is a “modest but tangible” improvement, not a wholesale price or tier change.

In a security lab, engineers argued about Mythos-class promises

Anthropic also teased a new family of models it calls “Mythos-class,” which it says are on par with the infamous Mythos system that stirred alarm in some corners of Silicon Valley. The company has not released the model publicly, citing both power and cybersecurity risk, and is testing safeguards before broader distribution.

The system card for Opus 4.8 also claims a “substantially lower” risk of certain misaligned behaviors, including generating harmful sexual content or “undermining liberal democracy.” That language nods toward regulators and enterprise customers worried about misuse.

There’s deliberate vagueness: Anthropic expects to make Mythos-class models available to customers “in the coming weeks,” which reads like a cautious PR timeline. Remember how GPT-5 chatter ballooned into AGI fever? Hype cycles tend to outrun reality — but they can also rewire expectations before the code ships.

Dynamic workflows are an orchestra of subagents, each playing its part while the conductor watches the score.

Are Mythos-class models safe?

Short answer: Anthropic is staking its reputation on the safety engineering. The company is running internal tests and building safeguards; the system cards and blog posts are meant to signal responsible release practices to enterprises, regulators, and the security community. Whether that will be enough is an open question that will be answered in logs, red-team reports, and real-world use.

I’ll be watching how Opus 4.8 behaves under real workloads — and how Mythos-class promises survive scrutiny. If a model can flag its own doubts, you change the workflow; if it can’t, you don’t. Which one are you ready to bet your next release on?