DeepSeek Unveils Low-Cost Models to Rival Silicon Valley

Anthropic: Chinese AI Firms Illicitly Copied Models; DeepSeek Nears

I loaded a million-token codebase into a preview model and watched the token counter climb without the usual stutter. The room went quiet; my laptop hummed, not panicked. You felt that same small electric jolt—you realized scale had a new price tag.

I’ve tracked AI releases long enough to know when a technical leap is marketing noise, and when it’s a market mover. DeepSeek’s latest previews are both.

At my desk I opened a single prompt that held an entire repository — The 1,000,000-token window and why it matters

The headline feature is simple and obvious: a one-million-token context window. That lets a model see whole codebases, long legal files, or multi-hour transcripts without chopping them into puzzles. For you, that means fewer prompts, fewer stitch-together errors, and a cleaner path from problem to solution.

V4 makes scale usable. It is a Swiss Army knife for large codebases, letting you treat an entire project as one living document rather than dozens of fragmented notes. If you’ve ever lost hours reassembling context across sessions, this will feel like reclaiming time.

On an office whiteboard a VC sketched cost versus performance — MoE architecture and efficiency

DeepSeek’s V4 lineup — V4 Pro and the lighter V4 Flash — leans on a mixture-of-experts design. That means trillions of parameters exist, but only a slice activates per request, which keeps inference bills down.

That architecture behaves like a selective power grid: circuits fire only where needed, rather than running the whole plant for a single lightbulb. The practical outcome is lower token costs without a linear hit to performance.

How does DeepSeek compare to GPT-5.4 and Gemini?

DeepSeek’s own technical paper claims V4-Pro-Max outperforms GPT-5.2 and Gemini-3.0-Pro on standard reasoning tests, and sits a few months behind GPT-5.4 and Gemini-3.1-Pro in the firm’s benchmarks. That tracks with what independent testers have started to find: strong reasoning, good coding chops, and agentic improvements, with a slight gap at the very top of frontier systems.

At a coffee shop I heard an engineer laugh about their cloud bill — Price pressure on the incumbents

Money is why this matters. Simon Willison and others compared token pricing and found DeepSeek’s rates notably lower. For V4 Flash DeepSeek charges $0.14 per million input tokens (€0.13) and $0.28 per million output tokens (€0.26). By contrast, GPT-5.4 Nano is $0.20 per million input (€0.18) and $1.25 per million output (€1.15), while Anthropic’s Claude Haiku 4.5 sits at $1 and $5 per million input and output (≈€0.92 and €4.60).

For pro-grade throughput, DeepSeek lists $1.74 per million input (€1.60) and $3.48 per million output (€3.20) for V4 Pro. Gemini 3.1 Pro runs $2 per million input (€1.84) and $12 per million output (€11), while GPT-5.5 is $5 and $30 per million input and output (≈€4.60 and €28). Those spreads add up fast when you scale.

Is DeepSeek cheaper than OpenAI and Google?

Yes, in the pricing bands Simon and others reported, DeepSeek undercuts the big cloud players on token costs. That’s before you factor in the open-weight distribution: V4’s MIT license means anyone with the hardware can run the model without per-token vendor fees — though electricity, GPUs, and ops still carry real cost.

In a trading room the news sent a scrubbed headline across tickers — Market reaction and geopolitics

DeepSeek’s trajectory isn’t just a tech story; it rippled through markets last year. When the company first released the R1 reasoning model, reports tied the move to a sharp selloff that erased roughly $1 trillion in value across indices — about €920 billion — and knocked Nvidia down nearly $600 million (≈€552 million) in a single day. That kind of reaction signals one thing: incumbents watch every efficiency that threatens their margin.

Open-source licensing and cheaper inference price a strategic question for cloud vendors and chips makers. If developer teams can pay far less per token or run models in-house, procurement and architecture choices will change.

Can I run V4 locally, and what does open-weight mean?

V4 is MIT-licensed and open-weight. Practically, that means the model files are available on Hugging Face and anyone with sufficient hardware and operational expertise can run the weights themselves. The “free” label is accurate in a licensing sense, but not in operational reality — electricity and GPU time still show up on your bill.

There are trade-offs: DeepSeek admits a small performance gap when stacked against the very top commercial releases, estimating a lead time of roughly three to six months to close that gap. For many businesses, the price-performance balance will be the decisive factor rather than benchmark supremacy.

I’ve watched platforms, researchers, and ops teams adapt to new cost curves before; when a cheaper, competent option arrives, adoption can be swift. Who benefits most—cloud vendors, startups, or end users—depends on how quickly each group adjusts supply chains and service models.

So where does this leave you, the engineer or buyer? If you need long-context reasoning and large-scale token throughput without the premium price, DeepSeek deserves a short list. If absolute top-tier scores matter for critical tasks, the incumbents still have a narrow lead. Which side will your next AI bet fall on?