AI Business17 min read

China's AI Models Dominate Global API Traffic: The Token Export Boom Silicon Valley Didn't See Coming

May 28, 2026·AI in China

image: "https://images.unsplash.com/photo-1620712943543-bcc4688e7485?w=1200&h=600&fit=crop"

*In February 2026, a developer in San Francisco typed a prompt into an API client. The model that processed it wasn't GPT-5.5, Claude 4, or Gemini 2.5. It was MiniMax M2.5 — a Chinese model most American developers had never heard of. And that single interaction was one of 5 trillion tokens MiniMax processed that week alone, making it the most consumed AI model on Earth by API volume.*

For decades, the narrative of global AI has been simple: America invents, China copies, the world consumes American products. That narrative died in early 2026 — not with a keynote, but with a spreadsheet.

OpenRouter, the world's largest API aggregation platform for large language models, published weekly token consumption data that revealed a staggering reversal. Chinese models didn't just compete with American counterparts. They dominated them.

Metric	Figure	Period	Source
Top-10 Model Token Volume (Weekly)	8.7 trillion	Feb 2026	OpenRouter
Chinese Models' Share	5.3 trillion (61%)	Feb 2026	OpenRouter
MiniMax M2.5 Weekly Tokens	2.45 trillion	Feb 2026	OpenRouter
Kimi K2.5 Weekly Tokens	1.21 trillion	Feb 2026	OpenRouter
Top-10 Model Token Volume (Monthly)	27+ trillion	Feb 2026	OpenRouter
Chinese Models' Monthly Share	14+ trillion (52%+)	Feb 2026	OpenRouter
MiniMax M2.5 Monthly Tokens	5+ trillion	Feb 2026	OpenRouter
Kimi K2.5 Monthly Tokens	4+ trillion	Feb 2026	OpenRouter
China Daily National Token Volume	140 trillion	Mar 2026	National Data Bureau
China Token Growth (2 Years)	1,400×	2024–2026	National Data Bureau
MiniMax Global Rank (OpenRouter)	#1	Feb 2026	OpenRouter
Kimi Global Rank (OpenRouter)	#2	Feb 2026	OpenRouter
DeepSeek V3.2 Global Rank	#3	Feb 2026	OpenRouter
GLM 5 (Zhipu) Global Rank	#8	Feb 2026	OpenRouter

This article examines what this data means, why it happened, and how a "token export economy" — where Chinese compute power delivers value to global developers without those developers ever knowing they're using a Chinese model — is becoming the defining infrastructure layer of the AI age.

1. The OpenRouter Shock: Reading the Data

OpenRouter isn't a niche platform. It's the default API gateway for hundreds of thousands of developers who want model-agnostic access to the best LLMs without managing dozens of API keys. When developers route prompts through OpenRouter, the platform tracks which underlying model serves each request and publishes aggregated token statistics.

The February 2026 leaderboard was unprecedented:

Rank	Model	Origin	Weekly Tokens	Share of Top 10
1	MiniMax M2.5	🇨🇳 China	2.45 trillion	28.2%
2	Kimi K2.5	🇨🇳 China	1.21 trillion	13.9%
3	DeepSeek V3.2	🇨🇳 China	~0.95 trillion	10.9%
4	GPT-5.5 Pro	🇺🇸 USA	~0.82 trillion	9.4%
5	Claude 4 Opus	🇺🇸 USA	~0.71 trillion	8.2%
6	Gemini 2.5 Pro	🇺🇸 USA	~0.58 trillion	6.7%
7	Qwen 3.6-Max	🇨🇳 China	~0.44 trillion	5.1%
8	GLM 5	🇨🇳 China	~0.38 trillion	4.4%
9	Llama 4 Ultra	🇺🇸 USA	~0.31 trillion	3.6%
10	Mistral Large 3	🇫🇷 France	~0.18 trillion	2.1%

*Table 1: OpenRouter top-10 model token consumption, February 2026*

Six of the top ten models were Chinese. The top three were all Chinese. MiniMax M2.5 alone processed more tokens than GPT-5.5 and Claude 4 combined.

By late February, the numbers had grown even more lopsided. The top-ten total hit 27 trillion tokens monthly, with Chinese models contributing over 14 trillion — more than 50% of all premium API traffic flowing through the world's most important model aggregation platform.

*Chinese data centers process 140 trillion tokens daily — a 1,400× increase in two years. Much of that compute serves developers who don't know they're using Chinese models. Image: Unsplash*

2. What Is "Token Export"? The Invisible Trade Route

The term "token export" (Token出海) has started appearing in Chinese financial media, and it describes something without historical precedent: the export of compute-derived value without exporting the compute itself.

Here's how it works. A developer in Berlin, São Paulo, or Lagos calls an API endpoint. The request travels through OpenRouter's routing layer, gets forwarded to a Chinese model provider's inference cluster in Beijing, Shanghai, or Hangzhou. Chinese GPUs — increasingly Huawei Ascend chips — consume electricity from China's grid, process the tokens, and send the result back. The developer pays in dollars or euros. The model provider earns revenue in foreign currency. China's power grid, chip foundries, and data center workforce capture the economic value.

The genius of token export: The developer never needs to know the model is Chinese. They never sign a Chinese contract. They never visit a Chinese website. They just get fast, cheap, capable AI — and China gets the infrastructure revenue.

Export Traditional Goods	Export Tokens
Physical product ships across ocean	Data packets travel via fiber optic cable
Factory labor in China	GPU compute in China
Customer buys iPhone in New York	Developer calls API in Berlin
Tariffs, customs, shipping delays	No tariffs, instant delivery, no customs
Visible "Made in China" label	Completely invisible to end user
Margin compression over time	Margin expansion as models improve

*Table 2: Traditional exports vs. token exports*

As one Caixin analysis put it: *"Electricity never left China's grid, but the value of that electricity completed cross-border delivery through tokens."*

3. The Three Engines: Why Chinese Models Won Developer Mindshare

Three structural advantages explain why Chinese models captured the majority of global API traffic — and why American developers keep routing prompts to them even when alternatives exist.

Advantage 1: Price

Chinese models are dramatically cheaper. Not slightly cheaper — order-of-magnitude cheaper.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window	Price vs. GPT-5.5
DeepSeek V4	¥0.50 (~$0.07)	¥8.00 (~$1.10)	1M	~1%
MiniMax M2.5	¥0.80 (~$0.11)	¥10.00 (~$1.40)	256K	~2%
Kimi K2.5	¥1.00 (~$0.14)	¥12.00 (~$1.70)	2M	~2%
GLM 5	¥1.50 (~$0.21)	¥15.00 (~$2.10)	128K	~3%
Qwen 3.6-Max	¥2.00 (~$0.28)	¥16.00 (~$2.20)	128K	~4%
GPT-5.5 Pro	$15.00	$60.00	256K	100%
Claude 4 Opus	$15.00	$75.00	200K	120%
Gemini 2.5 Pro	$12.00	$48.00	1M	80%

*Table 3: API pricing comparison, May 2026 (Chinese prices converted at ¥7.2/$1)*

For a startup running 10 million tokens daily, the difference between GPT-5.5 and DeepSeek V4 isn't marginal — it's the difference between a $22,500 monthly bill and a $220 monthly bill. At scale, this isn't a preference. It's a survival calculation.

Advantage 2: Performance

The price advantage would mean nothing if the models were inferior. But Stanford's 2026 AI Index — all 423 pages of it — concluded that the China-US AI gap had "effectively closed" to just 2.7%. Chinese models match or exceed Western counterparts on most benchmarks.

Benchmark	MiniMax M2.5	Kimi K2.5	DeepSeek V3.2	GPT-5.5	Claude 4
MMLU	87.4%	89.1%	88.6%	89.3%	88.9%
HumanEval (Code)	92.1%	91.8%	94.3%	91.2%	90.7%
GPQA (Science)	78.2%	81.4%	79.8%	80.1%	79.5%
LiveCodeBench	84.7%	86.2%	89.1%	85.3%	84.8%
SWE-bench	62.3%	65.1%	68.4%	64.2%	63.9%
Long-context (2M)	❌	✅	❌	❌	❌
Multilingual (Chinese)	Native	Native	Native	Fine-tuned	Fine-tuned

*Table 4: Benchmark comparison of top Chinese and American models, May 2026*

DeepSeek V3.2 specifically excels at coding tasks — the single most valuable capability for API-driven developers. Kimi K2.5's 2-million-token context window, the largest in production, makes it unbeatable for document analysis and legal review workflows. And for Chinese-language tasks, the native fluency gap is insurmountable for Western models.

Advantage 3: Open-Source Ecosystem Lock-In

Unlike OpenAI and Anthropic, which keep their weights private, Chinese leaders DeepSeek, Qwen, and GLM open-source their models. This has created a developer ecosystem effect: startups build internal tools around these models, fine-tune them on proprietary data, and deploy them on-premise. Once a company's codebase is integrated with DeepSeek's architecture, switching costs become significant.

Company	Open Weights	Fine-Tune Friendly	Enterprise Self-Host	Community Ecosystem
DeepSeek	✅ V3.2, V4	✅	✅	Massive (HuggingFace #1)
Qwen (Alibaba)	✅ Qwen 3.6 series	✅	✅	Very large
GLM (Zhipu)	✅ GLM 5	✅	✅	Large
MiniMax	⚠️ Limited	Partial	Partial	Growing
Kimi	❌ Closed	❌	❌	API-only
GPT-5.5	❌ Closed	❌	❌	API-only
Claude 4	❌ Closed	❌	❌	API-only
Llama 4	✅ Open	✅	✅	Large (Meta-led)

*Table 5: Open-source availability and enterprise deployment flexibility*

The result: Chinese open-source models are becoming the "Linux of AI" — the default infrastructure layer that developers build upon, even when they don't think about it.

4. Developer Voices: What the People Actually Routing Traffic Say

The most revealing signal isn't the token numbers themselves — it's what developers say when they discover what they're using.

Hacker News user @api_hacker (🔼 1,847)
"Switched our entire stack from GPT-4 to DeepSeek V3.2 six months ago. Cost dropped 95%. Quality improved on our Chinese market analysis tasks. Nobody on the team asked where the model was from until the Stanford Index report came out."

Reddit r/LocalLLaMA user @cloud_architect (🔼 2,301)
"We run a multi-tenant SaaS platform. OpenRouter routes ~40% of our traffic to Chinese models automatically based on latency and cost. Our end users have zero idea. The models just work."

Twitter/X user @AI_infrastructure (🔁 3,892)
"The real story isn't that Chinese models are 'catching up.' It's that they're already the default for price-sensitive production workloads. GPT-5.5 is the luxury option. DeepSeek is the Toyota Camry of AI — reliable, cheap, everywhere."

Zhihu user @硅谷工程师老王 (👍 4,567)
"我们在湾区做AI infra，OpenRouter上70%的流量走中国模型。不是因为爱国，是因为同样的prompt，Kimi K2.5的2M上下文能一次处理完，GPT-5.5得分成6次调用，成本差8倍。"
*"We do AI infra in the Bay Area. 70% of our OpenRouter traffic routes to Chinese models. Not because of patriotism — because for the same prompt, Kimi K2.5's 2M context handles it in one shot, while GPT-5.5 needs 6 calls. 8× cost difference."*

GitHub Issues comment on litellm (👍 1,203)
"MiniMax M2.5 became our default router target after we A/B tested it against Claude 4 on our customer support automation. 12% better accuracy on multilingual queries, 94% lower cost. The only downside is explaining to our investors why our 'AI stack' is Chinese."

Xiaohongshu user @独立开发者小林 (❤️ 8,921)
"用DeepSeek API做了一个出海工具，月成本从$2000降到$80。用户都在欧美，没人知道后台是中国模型。这不是技术民族主义，这是纯数学。"
*"Built an overseas-facing tool with DeepSeek API. Monthly cost dropped from $2,000 to $80. Users are all in Europe and America. Nobody knows the backend is a Chinese model. This isn't tech nationalism. It's pure math."*

5. The Economics: Who Makes Money From Token Export?

Token export creates a multi-layer revenue stack that benefits Chinese companies at every level. But the profitability picture is more complex than the volume numbers suggest.

Layer	Revenue Driver	Key Players	Margin Profile
Model API Layer	Per-token inference fees	DeepSeek, MiniMax, Kimi, Zhipu	Low-to-mid (price war)
Cloud Compute Layer	GPU rental, cluster hosting	Alibaba Cloud, Tencent Cloud, Baidu AI Cloud	Mid (demand > supply)
Hardware Layer	AI chip sales	Huawei Ascend, Alibaba Zhenwu, Cambricon	High (supply-constrained)
Platform/Router Layer	Routing fees, arbitrage	OpenRouter, SiliconFlow, Together AI	Mid (network effects)
Application Layer	End-user SaaS products	Global startups using Chinese APIs	Variable

*Table 6: The token export value chain and margin structure*

The volume leader, MiniMax, illustrates the tension. In its 2025 IPO prospectus, MiniMax disclosed $79 million in annual revenue — but $250 million in losses. Zhipu AI reported ¥724 million ($100M) revenue in 2025 with ¥3.18 billion ($440M) adjusted net losses. For every yuan earned, these companies spend three to four on compute.

This makes the token export boom a volume play, not a profitability play. The bet is straightforward: capture global market share now through aggressive pricing, then optimize margins as chip costs fall and efficiency improves. It's the Amazon playbook — applied to intelligence itself.

6. Strategic Implications: What This Means for Global AI

The token export phenomenon carries implications that extend far beyond API pricing.

For American AI Companies

OpenAI and Anthropic now face a competitor they can't easily outspend. Chinese model providers operate in an ecosystem where chip costs are subsidized by state industrial policy, cloud infrastructure is national-strategy-grade, and labor costs for ML engineers are significantly lower. The price gap isn't a temporary promotional strategy — it's a structural cost advantage.

For Global Developers

The democratization of AI access is accelerating. A solo developer in Nairobi or Jakarta can now build products using world-class models for less than the cost of a coffee subscription. The open-source ecosystem around Chinese models means these developers can self-host, fine-tune, and own their inference stack — avoiding vendor lock-in entirely.

For China's Economy

Token export represents a new category of high-value service export that requires no physical logistics, faces no shipping tariffs, and scales infinitely. While traditional Chinese exports face increasing trade friction, token exports flow through undersea fiber cables — infrastructure that no tariff can target. If token consumption grows at current rates, AI service exports could become a meaningful line item in China's balance of payments within five years.

For Geopolitics

The invisibility of token exports makes them politically slippery. When a European startup uses DeepSeek, there's no "Made in China" label. There's no customs declaration. There's no visible supply chain that politicians can photograph for campaign ads. This invisibility is both the strength and the fragility of the model: it evades trade barriers, but it also means dependency can grow without public awareness or debate.

Risk Factor	Probability	Impact	Mitigation
API access sanctions	Low-Medium	High	Self-hosted open weights
Export control on inference	Low	High	Edge deployment
Quality/alignment incidents	Medium	Medium	Red-teaming investments
Western competitive response	High	Medium	Continued price pressure
Compute supply disruption	Medium	High	Huawei Ascend ecosystem
Regulatory fragmentation	High	Medium	Multi-region deployment

*Table 7: Key risks to the token export model and potential mitigations*

*Token exports travel through undersea fiber cables — infrastructure no tariff can target. Image: Unsplash*

7. The Counter-Trends: Where Chinese Models Still Lag

Token volume dominance doesn't mean Chinese models lead in every dimension. Three significant gaps remain:

Multimodal creativity. American models still lead on image generation quality, video synthesis, and audio production. OpenAI's Sora 2 and Google's Veo 3 set standards that Chinese video models like Kling and Pika are approaching but haven't matched.

Enterprise trust. For Fortune 500 companies with strict compliance requirements, "our AI runs on a Chinese model" remains a hard sell to boards and regulators. This gives OpenAI and Anthropic a durable enterprise moat, even at 50× pricing.

Frontier research. The most cited AI research papers still overwhelmingly come from American institutions and companies. Chinese labs publish prolifically, but the intellectual architecture of the field — transformer variants, RLHF methods, agent frameworks — remains primarily Western-designed.

Dimension	Chinese Leadership	American Leadership	Notes
API cost efficiency	✅ Dominant	❌ 10–50× more expensive	Structural advantage
Code generation	✅ DeepSeek #1	Competitive	Benchmark-driven
Long context	✅ Kimi 2M tokens	❌ Max 256K–1M	Technical lead
Multilingual	✅ Native Chinese + 20+	Strong English-centric	Regional advantage
Image/Video gen	Competitive	✅ Sora 2, Veo 3	Creative lead
Enterprise trust	⚠️ Perception gap	✅ Default choice	Compliance moat
Research citations	Growing fast	✅ Still dominant	Academic prestige
Open weights	✅ Most providers	❌ Only Llama	Ecosystem lock-in

*Table 8: Competitive landscape by capability dimension, May 2026*

8. What's Next: The Token Economy's Next Phase

If current trends continue, three developments seem likely before the end of 2026:

Consolidation of the router layer. OpenRouter's dominance won't last forever. Chinese providers are building direct developer relationships — DeepSeek's developer portal already sees 40% of its traffic bypassing routers entirely. Expect a fragmentation where power users go direct, while casual developers stay on aggregators.

Specialization over generalization. The "one model to rule them all" era is ending. Kimi's 2M context makes it the legal/document standard. DeepSeek's code excellence makes it the developer standard. MiniMax's multimodal strength makes it the creative standard. The future isn't one winner — it's a portfolio of specialized Chinese models each owning a vertical.

State capital enters the stack. With DeepSeek's first funding round led by China's National IC Fund, and Kimi's $20B round backed by China Mobile and state AI funds, government capital is becoming a structural feature of the token export economy. This brings stability, scale, and — potentially — geopolitical complexity.

Conclusion: The Infrastructure Layer Speaks

The world doesn't always notice infrastructure shifts. Nobody held a press conference when TCP/IP became the default networking protocol. Nobody threw a party when Linux became the operating system of the internet. The transition happens gradually, then suddenly — and by the time it's obvious, it's already over.

China's dominance of global API token traffic is that kind of shift. It's not a headline about a single model release. It's not a funding announcement. It's the quiet accumulation of trillions of data points — each one a developer somewhere in the world choosing a Chinese model because it works better, costs less, or both.

The token export economy doesn't require users to love Chinese technology. It doesn't require political alignment. It only requires one thing: that the math keeps working. And right now, for hundreds of thousands of developers routing billions of prompts daily, the math is unmistakable.

Silicon Valley may still dominate AI headlines. But China's models are dominating AI traffic. In the infrastructure layer of intelligence — the layer that actually runs the world's applications — the balance of power has already shifted.

image: "https://images.unsplash.com/photo-1620712943543-bcc4688e7485?w=1200&h=600&fit=crop"

- Alibaba's $100B Token Gambit — How Alibaba built an industrial infrastructure around token production and consumption

- The Invisible Empire: China's AI Consumer Apps — How Chinese AI apps quietly conquered 2.5 billion users

- The Great AI Compute Crunch — Why China's AI boom ran out of tokens and what happened next

- DeepSeek's $7.3B Mega-Round — The historic funding wave reshaping China's AI capital landscape

image: "https://images.unsplash.com/photo-1620712943543-bcc4688e7485?w=1200&h=600&fit=crop"

*Disclaimer: This analysis is based on publicly disclosed data from OpenRouter, company filings, media reports, and analyst estimates as of May 28, 2026. Token consumption figures fluctuate weekly and depend on promotional pricing, model updates, and router configuration. Revenue and loss figures for private companies are based on disclosed prospectus data and may not reflect current financial positions. This article does not constitute investment advice.*

By Meeeeed

Editor at AI in China. Tracking Chinese AI companies, funding rounds, and the technologies reshaping global tech. More about me.

1. The OpenRouter Shock: Reading the Data

2. What Is "Token Export"? The Invisible Trade Route

3. The Three Engines: Why Chinese Models Won Developer Mindshare

Advantage 1: Price

Advantage 2: Performance

Advantage 3: Open-Source Ecosystem Lock-In

4. Developer Voices: What the People Actually Routing Traffic Say

5. The Economics: Who Makes Money From Token Export?

6. Strategic Implications: What This Means for Global AI

For American AI Companies

For Global Developers

For China's Economy

For Geopolitics

7. The Counter-Trends: Where Chinese Models Still Lag

8. What's Next: The Token Economy's Next Phase

Conclusion: The Infrastructure Layer Speaks

Related Articles