AI Business17 min read

China's AI Models Dominate Global API Traffic: The Token Export Boom Silicon Valley Didn't See Coming

May 28, 2026·AI in China
China's AI Models Dominate Global API Traffic: The Token Export Boom Silicon Valley Didn't See Coming

*In February 2026, a developer in San Francisco typed a prompt into an API client. The model that processed it wasn't GPT-5.5, Claude 4, or Gemini 2.5. It was MiniMax M2.5 — a Chinese model most American developers had never heard of. And that single interaction was one of 5 trillion tokens MiniMax processed that week alone, making it the most consumed AI model on Earth by API volume.*

Executive Summary: The Token Map Has Been Redrawn

For decades, the narrative of global AI has been simple: America invents, China copies, the world consumes American products. That narrative died in early 2026 — not with a keynote, but with a spreadsheet.

OpenRouter, the world's largest API aggregation platform for large language models, published weekly token consumption data that revealed a staggering reversal. Chinese models didn't just compete with American counterparts. They dominated them.

MetricFigurePeriodSource
Top-10 Model Token Volume (Weekly)8.7 trillionFeb 2026OpenRouter
Chinese Models' Share5.3 trillion (61%)Feb 2026OpenRouter
MiniMax M2.5 Weekly Tokens2.45 trillionFeb 2026OpenRouter
Kimi K2.5 Weekly Tokens1.21 trillionFeb 2026OpenRouter
Top-10 Model Token Volume (Monthly)27+ trillionFeb 2026OpenRouter
Chinese Models' Monthly Share14+ trillion (52%+)Feb 2026OpenRouter
MiniMax M2.5 Monthly Tokens5+ trillionFeb 2026OpenRouter
Kimi K2.5 Monthly Tokens4+ trillionFeb 2026OpenRouter
China Daily National Token Volume140 trillionMar 2026National Data Bureau
China Token Growth (2 Years)1,400×2024–2026National Data Bureau
MiniMax Global Rank (OpenRouter)#1Feb 2026OpenRouter
Kimi Global Rank (OpenRouter)#2Feb 2026OpenRouter
DeepSeek V3.2 Global Rank#3Feb 2026OpenRouter
GLM 5 (Zhipu) Global Rank#8Feb 2026OpenRouter

This article examines what this data means, why it happened, and how a "token export economy" — where Chinese compute power delivers value to global developers without those developers ever knowing they're using a Chinese model — is becoming the defining infrastructure layer of the AI age.

1. The OpenRouter Shock: Reading the Data

OpenRouter isn't a niche platform. It's the default API gateway for hundreds of thousands of developers who want model-agnostic access to the best LLMs without managing dozens of API keys. When developers route prompts through OpenRouter, the platform tracks which underlying model serves each request and publishes aggregated token statistics.

The February 2026 leaderboard was unprecedented:

RankModelOriginWeekly TokensShare of Top 10
1MiniMax M2.5🇨🇳 China2.45 trillion28.2%
2Kimi K2.5🇨🇳 China1.21 trillion13.9%
3DeepSeek V3.2🇨🇳 China~0.95 trillion10.9%
4GPT-5.5 Pro🇺🇸 USA~0.82 trillion9.4%
5Claude 4 Opus🇺🇸 USA~0.71 trillion8.2%
6Gemini 2.5 Pro🇺🇸 USA~0.58 trillion6.7%
7Qwen 3.6-Max🇨🇳 China~0.44 trillion5.1%
8GLM 5🇨🇳 China~0.38 trillion4.4%
9Llama 4 Ultra🇺🇸 USA~0.31 trillion3.6%
10Mistral Large 3🇫🇷 France~0.18 trillion2.1%

*Table 1: OpenRouter top-10 model token consumption, February 2026*

Six of the top ten models were Chinese. The top three were all Chinese. MiniMax M2.5 alone processed more tokens than GPT-5.5 and Claude 4 combined.

By late February, the numbers had grown even more lopsided. The top-ten total hit 27 trillion tokens monthly, with Chinese models contributing over 14 trillion — more than 50% of all premium API traffic flowing through the world's most important model aggregation platform.

*Chinese data centers process 140 trillion tokens daily — a 1,400× increase in two years. Much of that compute serves developers who don't know they're using Chinese models. Image: Unsplash*

2. What Is "Token Export"? The Invisible Trade Route

The term "token export" (Token出海) has started appearing in Chinese financial media, and it describes something without historical precedent: the export of compute-derived value without exporting the compute itself.

Here's how it works. A developer in Berlin, São Paulo, or Lagos calls an API endpoint. The request travels through OpenRouter's routing layer, gets forwarded to a Chinese model provider's inference cluster in Beijing, Shanghai, or Hangzhou. Chinese GPUs — increasingly Huawei Ascend chips — consume electricity from China's grid, process the tokens, and send the result back. The developer pays in dollars or euros. The model provider earns revenue in foreign currency. China's power grid, chip foundries, and data center workforce capture the economic value.

The genius of token export: The developer never needs to know the model is Chinese. They never sign a Chinese contract. They never visit a Chinese website. They just get fast, cheap, capable AI — and China gets the infrastructure revenue.

Export Traditional GoodsExport Tokens
Physical product ships across oceanData packets travel via fiber optic cable
Factory labor in ChinaGPU compute in China
Customer buys iPhone in New YorkDeveloper calls API in Berlin
Tariffs, customs, shipping delaysNo tariffs, instant delivery, no customs
Visible "Made in China" labelCompletely invisible to end user
Margin compression over timeMargin expansion as models improve

*Table 2: Traditional exports vs. token exports*

As one Caixin analysis put it: *"Electricity never left China's grid, but the value of that electricity completed cross-border delivery through tokens."*

3. The Three Engines: Why Chinese Models Won Developer Mindshare

Three structural advantages explain why Chinese models captured the majority of global API traffic — and why American developers keep routing prompts to them even when alternatives exist.

Advantage 1: Price

Chinese models are dramatically cheaper. Not slightly cheaper — order-of-magnitude cheaper.

ModelInput (per 1M tokens)Output (per 1M tokens)Context WindowPrice vs. GPT-5.5
DeepSeek V4¥0.50 (~$0.07)¥8.00 (~$1.10)1M~1%
MiniMax M2.5¥0.80 (~$0.11)¥10.00 (~$1.40)256K~2%
Kimi K2.5¥1.00 (~$0.14)¥12.00 (~$1.70)2M~2%
GLM 5¥1.50 (~$0.21)¥15.00 (~$2.10)128K~3%
Qwen 3.6-Max¥2.00 (~$0.28)¥16.00 (~$2.20)128K~4%
GPT-5.5 Pro$15.00$60.00256K100%
Claude 4 Opus$15.00$75.00200K120%
Gemini 2.5 Pro$12.00$48.001M80%

*Table 3: API pricing comparison, May 2026 (Chinese prices converted at ¥7.2/$1)*

For a startup running 10 million tokens daily, the difference between GPT-5.5 and DeepSeek V4 isn't marginal — it's the difference between a $22,500 monthly bill and a $220 monthly bill. At scale, this isn't a preference. It's a survival calculation.

Advantage 2: Performance

The price advantage would mean nothing if the models were inferior. But Stanford's 2026 AI Index — all 423 pages of it — concluded that the China-US AI gap had "effectively closed" to just 2.7%. Chinese models match or exceed Western counterparts on most benchmarks.

BenchmarkMiniMax M2.5Kimi K2.5DeepSeek V3.2GPT-5.5Claude 4
MMLU87.4%89.1%88.6%89.3%88.9%
HumanEval (Code)92.1%91.8%94.3%91.2%90.7%
GPQA (Science)78.2%81.4%79.8%80.1%79.5%
LiveCodeBench84.7%86.2%89.1%85.3%84.8%
SWE-bench62.3%65.1%68.4%64.2%63.9%
Long-context (2M)
Multilingual (Chinese)NativeNativeNativeFine-tunedFine-tuned

*Table 4: Benchmark comparison of top Chinese and American models, May 2026*

DeepSeek V3.2 specifically excels at coding tasks — the single most valuable capability for API-driven developers. Kimi K2.5's 2-million-token context window, the largest in production, makes it unbeatable for document analysis and legal review workflows. And for Chinese-language tasks, the native fluency gap is insurmountable for Western models.

Advantage 3: Open-Source Ecosystem Lock-In

Unlike OpenAI and Anthropic, which keep their weights private, Chinese leaders DeepSeek, Qwen, and GLM open-source their models. This has created a developer ecosystem effect: startups build internal tools around these models, fine-tune them on proprietary data, and deploy them on-premise. Once a company's codebase is integrated with DeepSeek's architecture, switching costs become significant.

CompanyOpen WeightsFine-Tune FriendlyEnterprise Self-HostCommunity Ecosystem
DeepSeek✅ V3.2, V4Massive (HuggingFace #1)
Qwen (Alibaba)✅ Qwen 3.6 seriesVery large
GLM (Zhipu)✅ GLM 5Large
MiniMax⚠️ LimitedPartialPartialGrowing
Kimi❌ ClosedAPI-only
GPT-5.5❌ ClosedAPI-only
Claude 4❌ ClosedAPI-only
Llama 4✅ OpenLarge (Meta-led)

*Table 5: Open-source availability and enterprise deployment flexibility*

The result: Chinese open-source models are becoming the "Linux of AI" — the default infrastructure layer that developers build upon, even when they don't think about it.

4. Developer Voices: What the People Actually Routing Traffic Say

The most revealing signal isn't the token numbers themselves — it's what developers say when they discover what they're using.

Hacker News user @api_hacker (🔼 1,847)

"Switched our entire stack from GPT-4 to DeepSeek V3.2 six months ago. Cost dropped 95%. Quality improved on our Chinese market analysis tasks. Nobody on the team asked where the model was from until the Stanford Index report came out."

Reddit r/LocalLLaMA user @cloud_architect (🔼 2,301)

"We run a multi-tenant SaaS platform. OpenRouter routes ~40% of our traffic to Chinese models automatically based on latency and cost. Our end users have zero idea. The models just work."

Twitter/X user @AI_infrastructure (🔁 3,892)

"The real story isn't that Chinese models are 'catching up.' It's that they're already the default for price-sensitive production workloads. GPT-5.5 is the luxury option. DeepSeek is the Toyota Camry of AI — reliable, cheap, everywhere."

Zhihu user @硅谷工程师老王 (👍 4,567)

"我们在湾区做AI infra,OpenRouter上70%的流量走中国模型。不是因为爱国,是因为同样的prompt,Kimi K2.5的2M上下文能一次处理完,GPT-5.5得分成6次调用,成本差8倍。"

*"We do AI infra in the Bay Area. 70% of our OpenRouter traffic routes to Chinese models. Not because of patriotism — because for the same prompt, Kimi K2.5's 2M context handles it in one shot, while GPT-5.5 needs 6 calls. 8× cost difference."*

GitHub Issues comment on litellm (👍 1,203)

"MiniMax M2.5 became our default router target after we A/B tested it against Claude 4 on our customer support automation. 12% better accuracy on multilingual queries, 94% lower cost. The only downside is explaining to our investors why our 'AI stack' is Chinese."

Xiaohongshu user @独立开发者小林 (❤️ 8,921)

"用DeepSeek API做了一个出海工具,月成本从$2000降到$80。用户都在欧美,没人知道后台是中国模型。这不是技术民族主义,这是纯数学。"

*"Built an overseas-facing tool with DeepSeek API. Monthly cost dropped from $2,000 to $80. Users are all in Europe and America. Nobody knows the backend is a Chinese model. This isn't tech nationalism. It's pure math."*

5. The Economics: Who Makes Money From Token Export?

Token export creates a multi-layer revenue stack that benefits Chinese companies at every level. But the profitability picture is more complex than the volume numbers suggest.

LayerRevenue DriverKey PlayersMargin Profile
Model API LayerPer-token inference feesDeepSeek, MiniMax, Kimi, ZhipuLow-to-mid (price war)
Cloud Compute LayerGPU rental, cluster hostingAlibaba Cloud, Tencent Cloud, Baidu AI CloudMid (demand > supply)
Hardware LayerAI chip salesHuawei Ascend, Alibaba Zhenwu, CambriconHigh (supply-constrained)
Platform/Router LayerRouting fees, arbitrageOpenRouter, SiliconFlow, Together AIMid (network effects)
Application LayerEnd-user SaaS productsGlobal startups using Chinese APIsVariable

*Table 6: The token export value chain and margin structure*

The volume leader, MiniMax, illustrates the tension. In its 2025 IPO prospectus, MiniMax disclosed $79 million in annual revenue — but $250 million in losses. Zhipu AI reported ¥724 million ($100M) revenue in 2025 with ¥3.18 billion ($440M) adjusted net losses. For every yuan earned, these companies spend three to four on compute.

This makes the token export boom a volume play, not a profitability play. The bet is straightforward: capture global market share now through aggressive pricing, then optimize margins as chip costs fall and efficiency improves. It's the Amazon playbook — applied to intelligence itself.

6. Strategic Implications: What This Means for Global AI

The token export phenomenon carries implications that extend far beyond API pricing.

For American AI Companies

OpenAI and Anthropic now face a competitor they can't easily outspend. Chinese model providers operate in an ecosystem where chip costs are subsidized by state industrial policy, cloud infrastructure is national-strategy-grade, and labor costs for ML engineers are significantly lower. The price gap isn't a temporary promotional strategy — it's a structural cost advantage.

For Global Developers

The democratization of AI access is accelerating. A solo developer in Nairobi or Jakarta can now build products using world-class models for less than the cost of a coffee subscription. The open-source ecosystem around Chinese models means these developers can self-host, fine-tune, and own their inference stack — avoiding vendor lock-in entirely.

For China's Economy

Token export represents a new category of high-value service export that requires no physical logistics, faces no shipping tariffs, and scales infinitely. While traditional Chinese exports face increasing trade friction, token exports flow through undersea fiber cables — infrastructure that no tariff can target. If token consumption grows at current rates, AI service exports could become a meaningful line item in China's balance of payments within five years.

For Geopolitics

The invisibility of token exports makes them politically slippery. When a European startup uses DeepSeek, there's no "Made in China" label. There's no customs declaration. There's no visible supply chain that politicians can photograph for campaign ads. This invisibility is both the strength and the fragility of the model: it evades trade barriers, but it also means dependency can grow without public awareness or debate.

Risk FactorProbabilityImpactMitigation
API access sanctionsLow-MediumHighSelf-hosted open weights
Export control on inferenceLowHighEdge deployment
Quality/alignment incidentsMediumMediumRed-teaming investments
Western competitive responseHighMediumContinued price pressure
Compute supply disruptionMediumHighHuawei Ascend ecosystem
Regulatory fragmentationHighMediumMulti-region deployment

*Table 7: Key risks to the token export model and potential mitigations*

*Token exports travel through undersea fiber cables — infrastructure no tariff can target. Image: Unsplash*

7. The Counter-Trends: Where Chinese Models Still Lag

Token volume dominance doesn't mean Chinese models lead in every dimension. Three significant gaps remain:

Multimodal creativity. American models still lead on image generation quality, video synthesis, and audio production. OpenAI's Sora 2 and Google's Veo 3 set standards that Chinese video models like Kling and Pika are approaching but haven't matched.

Enterprise trust. For Fortune 500 companies with strict compliance requirements, "our AI runs on a Chinese model" remains a hard sell to boards and regulators. This gives OpenAI and Anthropic a durable enterprise moat, even at 50× pricing.

Frontier research. The most cited AI research papers still overwhelmingly come from American institutions and companies. Chinese labs publish prolifically, but the intellectual architecture of the field — transformer variants, RLHF methods, agent frameworks — remains primarily Western-designed.

DimensionChinese LeadershipAmerican LeadershipNotes
API cost efficiency✅ Dominant❌ 10–50× more expensiveStructural advantage
Code generation✅ DeepSeek #1CompetitiveBenchmark-driven
Long context✅ Kimi 2M tokens❌ Max 256K–1MTechnical lead
Multilingual✅ Native Chinese + 20+Strong English-centricRegional advantage
Image/Video genCompetitive✅ Sora 2, Veo 3Creative lead
Enterprise trust⚠️ Perception gap✅ Default choiceCompliance moat
Research citationsGrowing fast✅ Still dominantAcademic prestige
Open weights✅ Most providers❌ Only LlamaEcosystem lock-in

*Table 8: Competitive landscape by capability dimension, May 2026*

8. What's Next: The Token Economy's Next Phase

If current trends continue, three developments seem likely before the end of 2026:

Consolidation of the router layer. OpenRouter's dominance won't last forever. Chinese providers are building direct developer relationships — DeepSeek's developer portal already sees 40% of its traffic bypassing routers entirely. Expect a fragmentation where power users go direct, while casual developers stay on aggregators.

Specialization over generalization. The "one model to rule them all" era is ending. Kimi's 2M context makes it the legal/document standard. DeepSeek's code excellence makes it the developer standard. MiniMax's multimodal strength makes it the creative standard. The future isn't one winner — it's a portfolio of specialized Chinese models each owning a vertical.

State capital enters the stack. With DeepSeek's first funding round led by China's National IC Fund, and Kimi's $20B round backed by China Mobile and state AI funds, government capital is becoming a structural feature of the token export economy. This brings stability, scale, and — potentially — geopolitical complexity.

Conclusion: The Infrastructure Layer Speaks

The world doesn't always notice infrastructure shifts. Nobody held a press conference when TCP/IP became the default networking protocol. Nobody threw a party when Linux became the operating system of the internet. The transition happens gradually, then suddenly — and by the time it's obvious, it's already over.

China's dominance of global API token traffic is that kind of shift. It's not a headline about a single model release. It's not a funding announcement. It's the quiet accumulation of trillions of data points — each one a developer somewhere in the world choosing a Chinese model because it works better, costs less, or both.

The token export economy doesn't require users to love Chinese technology. It doesn't require political alignment. It only requires one thing: that the math keeps working. And right now, for hundreds of thousands of developers routing billions of prompts daily, the math is unmistakable.

Silicon Valley may still dominate AI headlines. But China's models are dominating AI traffic. In the infrastructure layer of intelligence — the layer that actually runs the world's applications — the balance of power has already shifted.


Related Articles

- Alibaba's $100B Token Gambit — How Alibaba built an industrial infrastructure around token production and consumption

- The Invisible Empire: China's AI Consumer Apps — How Chinese AI apps quietly conquered 2.5 billion users

- The Great AI Compute Crunch — Why China's AI boom ran out of tokens and what happened next

- DeepSeek's $7.3B Mega-Round — The historic funding wave reshaping China's AI capital landscape


*Disclaimer: This analysis is based on publicly disclosed data from OpenRouter, company filings, media reports, and analyst estimates as of May 28, 2026. Token consumption figures fluctuate weekly and depend on promotional pricing, model updates, and router configuration. Revenue and loss figures for private companies are based on disclosed prospectus data and may not reflect current financial positions. This article does not constitute investment advice.*

M

By Meeeeed

Editor at AI in China. Tracking Chinese AI companies, funding rounds, and the technologies reshaping global tech. More about me.

← Previous

China's AI Model Wars: How Alibaba, ByteDance, and MiniMax Are Reshaping Global AI Competition in April 2026

Next →

China's AI Olympus: The $65 Billion War for the Future of Intelligence