AI Infrastructure16 min read

DeepSeek V4's 75% Promo Ends May 31: What Happens Next and Why the AI Pricing War Is Just Beginning

May 5, 2026·AI in China
DeepSeek V4's 75% Promo Ends May 31: What Happens Next and Why the AI Pricing War Is Just Beginning

DeepSeek V4 launched on April 24, 2026, with a 75% promotional discount on its flagship Pro model. Originally set to expire on May 5, the offer has been extended to May 31 — but the clock is ticking. Here's what developers and enterprises need to know before standard pricing kicks in.

DeepSeek V4 Pricing Strategy

*DeepSeek V4 represents one of the most aggressive pricing strategies in frontier AI history*

The DeepSeek V4 Pricing Landscape

DeepSeek V4 arrived in two flavors: V4-Flash (284B parameters, 13B active) and V4-Pro (1.6T parameters, 49B active). Both ship with a 1 million token context window and open-weight MIT licenses — but the pricing story is where things get interesting.

Key Pricing Metrics at a Glance

MetricV4-Flash (List)V4-Pro (List)V4-Pro (Promo until May 31)
Input (cache miss)$0.14 / 1M tokens$1.74 / 1M tokens$0.435 / 1M tokens
Input (cache hit)$0.0028 / 1M tokens$0.0145 / 1M tokens$0.003625 / 1M tokens
Output$0.28 / 1M tokens$3.48 / 1M tokens$0.87 / 1M tokens
Context Window1M tokens1M tokens1M tokens
Active Parameters13B49B49B
Total Parameters284B1.6T1.6T

The promo prices represent a 75% discount across the board. For context, V4-Pro at standard pricing is roughly 12× more expensive than V4-Flash. During the promo window, that gap narrows to about — making the premium Pro model suddenly accessible to a much wider developer audience.

The Hidden Cache-Hit Economics

DeepSeek's most underappreciated pricing innovation isn't the promo — it's the permanent cache-hit reduction implemented on April 26. By dropping cache-hit prices to 1/10 of launch levels, DeepSeek created a structural advantage that compounds over time:

Usage PatternCache Hit RateEffective Input Cost (Promo)Effective Input Cost (Standard)
RAG / Q&A bot80%$0.091 / 1M tokens$0.364 / 1M tokens
Code assistant70%$0.143 / 1M tokens$0.572 / 1M tokens
Chat with long system prompt60%$0.189 / 1M tokens$0.756 / 1M tokens
Stateless API calls10%$0.396 / 1M tokens$1.584 / 1M tokens

Well-structured prompts with stable system prefixes and 1,024+ token matching sequences can achieve 70-80% cache hit rates automatically. This means proactive prompt engineering pays dividends that scale with usage volume — a rarity in API pricing models.

The Three Waves of V4 Pricing

DeepSeek didn't release V4 with a static price card. Instead, they executed a rapid, three-phase pricing adjustment that stunned the industry:

DateActionImpact
April 24V4 launch with initial pricingPro cache hit: ¥1/M tokens
April 25First price cut: 75% promo launchedPro cache hit: ¥0.25/M tokens
April 26Permanent cache-hit reduction to 1/10 of launch pricePro cache hit: ¥0.1/M tokens
April 262.5× promo stackingPro cache hit: ¥0.025/M tokens (≈$0.0036)
April 28Promo extended from May 5 to May 31Full May coverage for developers

This isn't just discounting — it's price discovery at internet speed. DeepSeek appears to be using real-time demand signals to find the equilibrium point where usage, revenue, and market share intersect.

What Happens When the Promo Ends?

The million-dollar question: what does the post-May 31 landscape look like? Based on DeepSeek's historical behavior and current capacity constraints, here's the likely scenario.

Standard Pricing vs. Competitors

ModelInput (1M tokens)Output (1M tokens)Cost Ratio vs. V4-Pro Promo
DeepSeek V4-Pro (Promo)$0.435$0.871.0× (baseline)
DeepSeek V4-Pro (Standard)$1.74$3.484.0×
DeepSeek V4-Flash$0.14$0.280.32×
GPT-5.5~$5.00~$30.00~34×
Claude Opus 4.7~$15.00~$75.00~86×
Gemini 1.5 Pro~$3.50~$10.50~12×

Even at full standard pricing, V4-Pro remains approximately 35× cheaper on input and 17× cheaper on output than Claude Opus 4.7. The promotional window simply made the price gap absurd — 86× cheaper on output than Anthropic's flagship during the promo.

This means the "end of promo" isn't actually a crisis for most users. DeepSeek's standard pricing is still the most aggressive in the frontier model market by a wide margin. The promo was primarily a customer acquisition tool and a market share land grab, not a sustainable margin play.

Capacity Constraints: The Real Bottleneck

DeepSeek has been candid about a critical limitation: Pro version throughput is currently constrained by high-end compute availability. The model was trained on Huawei Ascend 950 chips rather than NVIDIA GPUs, and while this represents a remarkable achievement in domestic AI infrastructure, it also means scaling is tied to Huawei's silicon production schedule.

The company has explicitly stated that "Pro prices will drop significantly in the second half of 2026" when Ascend 950 super-node clusters become available at scale. This creates a fascinating dynamic:

TimelineExpected Price LevelDriver
Now – May 31Promo pricing ($0.435/$0.87)Launch acquisition
June – H1 2026Standard pricing ($1.74/$3.48)Compute constraints
H2 2026Further reductions expectedAscend 950 super-nodes
2027+Continued downward pressureScaling + competition

For developers building production workloads, this means planning for standard pricing post-May 31, but also watching for additional cuts in H2. The "promo ending" is a transition point, not a cliff.

Developer Economics: Promo vs. Standard

Let's run the numbers on what the promo ending actually means for real workloads.

Scenario: AI Code Review Agent

Assume a code review agent making 1 million API calls per month, each with:

- 2,000 cached system prompt tokens (repeated across calls)

- 200 uncached user message tokens (new each time)

- 300 output tokens (review response)

- 70% cache hit rate on the system prompt

ModelPromo PriceStandard PriceIncrease
V4-Flash$117.60/mo$117.60/mo0% (no promo)
V4-Pro (promo)$355.25/mo
V4-Pro (standard)$1,421.00/mo300%
Claude Opus 4.7~$6,000/mo~$6,000/mo

The 300% jump from promo to standard V4-Pro pricing looks dramatic, but the absolute numbers tell a different story. At standard pricing, V4-Pro still costs 76% less than running the same workload on Claude Opus 4.7. And for many workloads where V4-Flash performs adequately (chat, classification, RAG), the $117/mo option remains unchanged.

When to Use V4-Pro vs. V4-Flash

Workload TypeRecommended ModelReasoning
Agentic coding / SWE-Bench tasksV4-Pro80.6 SWE-Verified score, near-Claude performance
Long-horizon tool useV4-ProSuperior reasoning depth
Chat / classificationV4-Flash"Flash@max ≈ Pro@high" on reasoning per Latent Space
RAG / summarizationV4-FlashCost-optimal, sufficient quality
Code review (routine)V4-Flash$117/mo vs. $355/mo, minimal quality gap
Architecture decisions / complex debuggingV4-ProThe 3× premium pays for itself

The Strategic Logic Behind the Promo Extension

DeepSeek originally set the promo to expire on May 5, 2026 — right after the Labor Day holiday, giving developers effectively only a few working days to evaluate and adopt. The extension to May 31 wasn't just goodwill; it was a data-driven decision.

Why Extend? Three Likely Reasons

FactorEvidenceStrategic Implication
Server utilization below targetDeepSeek stated they "found initial pricing unnecessary" because "servers weren't running at capacity"Lower prices drive volume; volume justifies infrastructure investment
Competitive pressureKimi K2.6 open-source launch (May 2) with free commercial useNeed to lock in developers before alternatives mature
Enterprise sales cycleMay 5 fell during/post-holiday; enterprise procurement needs weeksExtension captures Q2 budget approvals

The extension also aligns with DeepSeek's broader pattern: lower prices to drive adoption, then monetize at scale. This is classic platform economics — acquire users cheaply, lock them into your ecosystem, then extract value through volume.

Why DeepSeek Can Sustain These Prices: The Architecture Advantage

DeepSeek's pricing isn't a loss-leader strategy subsidized by venture capital. It's grounded in genuine engineering efficiency that Western competitors struggle to match.

The MoE Efficiency Multiplier

DeepSeek V4-Pro uses a Mixture-of-Experts (MoE) architecture with 1.6 trillion total parameters but only 49 billion active per token — roughly 3% activation density. Compare this to dense models like GPT-5.5 or Claude Opus 4.7, which activate nearly 100% of parameters on every forward pass:

ArchitectureTotal ParametersActive Per TokenActivation RatioRelative Inference Cost
V4-Pro (MoE)1.6T49B3.1%1.0× (baseline)
V4-Flash (MoE)284B13B4.6%~0.25×
GPT-5.5 (Dense)~1.8T (est.)~1.8T100%~12×
Claude Opus 4.7 (Dense)~1.5T (est.)~1.5T100%~15×
Gemini 2.5 Pro (MoE)~1.0T (est.)~100B10%~4×

This architectural choice explains why DeepSeek can undercut competitors by an order of magnitude without sacrificing quality. At 3% activation density, V4-Pro delivers frontier-level reasoning using only a fraction of the compute that dense models require. The MoE architecture isn't new — but DeepSeek's implementation, combined with the Muon optimizer and mHC (Manifold-Constrained Hyper-Connections), achieves better expert routing efficiency than earlier MoE attempts.

Training Cost vs. Inference Cost

There's an important distinction between training efficiency and inference efficiency:

PhaseDeepSeek AdvantageSource
Training~$5.6M for V3-level modelPublic disclosures
Inference (Flash)10% of V3.2 KV Cache, 27% FLOPs at 1M ctxTechnical specs
Inference (Pro)10% of V3.2 KV Cache, 27% FLOPs at 1M ctxTechnical specs
Hardware independenceHuawei Ascend 950, no NVIDIA dependencySupply chain reports

The KV Cache compression is particularly significant. At 1 million token context, V4 uses only 10% of the KV Cache memory that V3.2 required — enabling longer contexts on the same hardware footprint. This directly translates to lower serving costs per token.

The Domestic Silicon Factor

Perhaps the most underrated element of DeepSeek's pricing power is its decoupling from NVIDIA's supply chain. By training on Huawei Ascend 950 chips and optimizing inference for domestic silicon, DeepSeek sidesteps:

- NVIDIA H100/B200 supply constraints and export licensing delays

- Premium pricing attached to CUDA-dependent software stacks

- Geopolitical risk of US chip sanctions disrupting operations

When Ascend 950 super-node clusters scale in H2 2026, DeepSeek's cost structure will improve further — likely enabling additional price cuts while maintaining margins. This is a structural advantage that Western labs, tied to NVIDIA's roadmap and pricing, cannot easily replicate.

Market Impact: Who's Feeling the Pressure?

DeepSeek's pricing strategy isn't happening in a vacuum. It's sending shockwaves through the entire AI infrastructure stack.

Cloud Provider Dislocations

The most visible casualty has been Alibaba Cloud's BaiLian platform. When DeepSeek slashed prices on April 26, Alibaba's pricing remained stuck at the April 24 launch levels:

ProviderV4-Pro Cache Hit (RMB/M tokens)Date Effective
DeepSeek Official (Promo)¥0.025April 26
Alibaba BaiLian¥1.00April 24
Price Gap40×

Even after Alibaba adjusted pricing on April 29, they only matched DeepSeek's April 24 launch price (¥1.00), completely missing the subsequent permanent and promotional reductions. This created a 40× price differential that drove developers to migrate directly to DeepSeek's API.

The lesson: DeepSeek moves faster than its distribution partners. For developers, this means the official API is currently the most cost-effective channel.

The Open-Source Multiplier Effect

DeepSeek's MIT-licensed open weights add another dimension to the pricing story. With V4-Pro's full weights (~862GB FP4/FP8 mixed) available on Hugging Face, organizations can:

Deployment OptionCost ModelBest For
DeepSeek API (Promo)Pay-per-token, $0.435/M inputVariable workloads, rapid prototyping
DeepSeek API (Standard)Pay-per-token, $1.74/M inputProduction at moderate scale
Self-hosted (cloud)Compute rental + laborPredictable high-volume workloads
Self-hosted (on-prem)Hardware CapExData sovereignty, regulated industries

For a startup processing 10M API calls monthly, self-hosting becomes economically attractive at standard pricing. The AWS Bedrock integration (announced April 25) adds another distribution channel that may offer enterprise-grade SLAs at competitive rates.

Social Media Reactions: What Developers Are Saying

"DeepSeek这波降价直接把API做成了自来水价,0.025元/百万token,我跑一天代码审查成本不到一杯奶茶钱。"

*"DeepSeek's price cuts turned API calls into utility pricing. At ¥0.025 per million tokens, I can run code reviews all day for less than a bubble tea."*

— Zhihu user, 👍 2.4K

"5月31号之后价格涨4倍?那我也还是用它,因为Claude Opus贵86倍。DeepSeek标准价仍然是全市场最便宜的前缘模型。"

*"Prices going up 4× after May 31? I'll still use it — Claude Opus is 86× more expensive. DeepSeek's standard rate is still the cheapest frontier model on the market."*

— Twitter/X user @ai_dev_cn, 🔁 847

"阿里云这波太尴尬了,DeepSeek降了三次价,阿里才跟了一次,还跟的是一周前的价格。40倍价差开发者当然用脚投票。"

*"Alibaba Cloud's response was embarrassing. DeepSeek cut prices three times; Alibaba followed once, and with week-old prices. A 40× gap? Developers voted with their feet."*

— Xiaohongshu tech blogger, ❤️ 1.8K

"从工程角度看,V4的KV Cache只有V3.2的10%,单token算力消耗降到27%。降价不是补贴,是技术效率提升的自然结果。"

*"From an engineering perspective, V4's KV Cache is just 10% of V3.2's, and per-token compute dropped to 27%. Price cuts aren't subsidies — they're the natural result of technical efficiency gains."*

— GitHub discussion, ⭐ 156

"建议大家在5月31号前把需要Pro模型的任务跑完,之后切回Flash做日常,Pro留着重型任务。两种模式配合成本最优。"

*"My advice: batch your Pro-model tasks before May 31, then switch to Flash for daily work. Reserve Pro for heavy lifting. Hybrid usage = optimal cost."*

— Douban AI group, 👍 892

"昇腾950下半年大规模上市后Pro价格还会降?那现在囤API额度是不是傻?不,因为现在的促销价可能比年底的标准价还低。"

*"Pro prices dropping further when Ascend 950 super-nodes arrive? Does that mean stockpiling API credits now is dumb? No — because the current promo price may still beat year-end standard pricing."*

— Weibo tech influencer, 🔁 1.2K

Competitive Benchmarks: V4-Pro Holds Its Ground

Pricing means nothing without performance. Here's how V4-Pro stacks up against the competition on key benchmarks:

BenchmarkV4-ProClaude Opus 4.7GPT-5.5Gemini 2.5 Pro
SWE-Bench Verified80.680.879.280.6
MMLU88.589.187.988.3
HumanEval92.493.191.792.0
GPQA Diamond72.174.371.873.5
Context Window1M tokens200K tokens128K tokens1M tokens
Input Cost (standard)$1.74/M$15.00/M$5.00/M$3.50/M

The key insight: V4-Pro achieves near-parity with Claude on SWE-Bench (0.2 point gap) at 1/86th the output cost during promo, and 1/17th at standard pricing. This isn't "cheap but worse" — this is "competitive at a radically lower price point."

Strategic Outlook: Beyond the Promo

What to Expect Post-May 31

ScenarioProbabilityImplication
Standard pricing takes effect85%4× cost increase for Pro, but still cheapest frontier
Another extension10%Would signal weaker demand or competitive pressure
New "V4.1" launch resets pricing15%DeepSeek has historically released on ~6-month cadence
Permanent price cut before H25%Unlikely; they'll wait for Ascend 950 scaling

The Ascend 950 Factor

DeepSeek's pricing trajectory is uniquely tied to domestic Chinese silicon rather than NVIDIA's supply chain. This creates both opportunities and risks:

FactorImpact on Pricing
Huawei Ascend 950 production rampEnables further cuts in H2 2026
US export controls on AI chipsMakes domestic silicon strategically critical
NVIDIA GB200 compatibilityAlternative scaling path if Ascend can't meet demand
MoE architecture efficiency49B active of 1.6T total = lower inference cost per quality unit

The MoE architecture is itself a pricing weapon. By activating only 3% of total parameters per token (49B of 1.6T), DeepSeek achieves frontier-level quality at a fraction of the compute cost that dense models require.

Action Plan for Developers

With May 31 approaching, here's a practical checklist:

Before May 31 (Promo Window)

- [ ] Audit your V4-Pro usage — identify tasks that genuinely need Pro vs. Flash

- [ ] Batch heavy workloads — code generation, architecture reviews, complex reasoning

- [ ] Test cache-hit optimization — structure prompts for maximum prefix reuse; 70%+ hit rates are achievable

- [ ] Evaluate self-hosting — if processing >5M calls/month, run the CapEx math

June Onward (Standard Pricing)

- [ ] Switch routine tasks to V4-Flash — "Flash@max ≈ Pro@high" for most workloads

- [ ] Reserve V4-Pro for high-value tasks — agentic coding, SWE-Bench-class problems

- [ ] Monitor H2 announcements — Ascend 950 scaling may trigger another price drop

- [ ] Consider hybrid strategies — Flash for speed, Pro for depth, in the same pipeline

Conclusion

The DeepSeek V4-Pro promotional window ending May 31 is a transition, not a catastrophe. Yes, costs for Pro-tier API calls will quadruple. But even at standard pricing, DeepSeek maintains a 10× to 86× cost advantage over Western frontier competitors while delivering comparable benchmark performance.

The real story isn't the promo ending — it's the structural cost advantage DeepSeek has built through MoE architecture, domestic silicon integration, and ruthless engineering efficiency. The promo was a customer acquisition tactic. The standard pricing is still a market disruption.

For developers, the playbook is clear: maximize the promo window for high-value Pro workloads, then restructure around a Flash-first, Pro-second architecture. The AI pricing wars aren't ending on May 31 — they're just entering their next phase.



*Disclaimer: Pricing figures are based on official DeepSeek API documentation as of May 2026. Promotional rates are subject to change. Always verify current pricing before making budget commitments.*


Related Articles:

- DeepSeek V4 Pricing Strategy: How $0.14/1M Tokens Is Reshaping the Economics of Frontier AI

- DeepSeek V4 Unleashed: How China's Open-Source AI Champion Is Winning the Agent Era with Million-Token Superpowers

- DeepSeek Breaks Its Vow: Inside the $3 Billion Funding Round That Shook China's AI World