AI Trends16 min read

The Great AI Benchmark War: How Chinese Models Caught the Frontier in 2026

June 28, 2026·AI in China
The Great AI Benchmark War: How Chinese Models Caught the Frontier in 2026

The Myth That Won't Die

At dinner tables in San Francisco, investment briefings in London, and tech podcasts streamed from New York, one assumption remains unchallenged: American AI models are years ahead of Chinese competitors. The frontier belongs to OpenAI, Anthropic, and Google. Everyone else is playing catch-up. It's a comfortable narrative, but it's also wrong.

The numbers from June 2026 tell a different story. On SuperCLUE, China's most comprehensive Chinese-language evaluation benchmark, the gap between the best Chinese model and the best Western model has collapsed to single-digit percentage points. On LMArena's open blind voting, DeepSeek V4-Pro sits within a 38-point Elo of GPT-5.5 — a distance that could close in a single model update. On SWE-bench coding tasks, Claude Opus 4.7 still leads, but the premium it commands is 30 times higher than the open-source alternative delivering 80% of its capability.

The question is no longer whether Chinese models can compete. They are competing, aggressively, and they are doing it at a price point that is rewriting the economics of the entire industry. What remains open is whether the rest of the world has noticed.

Table 1: SuperCLUE June 2026 — Top 10 Chinese Models

RankModelProviderSuperCLUECMMLUMMLU-ProOpen-Source?
1Qwen3.5-PlusAlibaba88.589.285.2Partial
2Doubao-Seed-2.0-proByteDance87.888.484.9No
3DeepSeek V4-ProDeepSeek86.588.887.5Yes (MIT)
4GLM-5.1Zhipu AI85.487.182.3Yes
5Kimi K2.6Moonshot84.686.983.7Yes
6MiniMax M2.7MiniMax83.285.681.4No
7ERNIE 5.1Baidu82.785.082.1Partial
8Pangu-Ultra 718BHuawei81.984.380.8No
9Baichuan 5.0Baichuan AI79.482.177.5Yes
10Yi-2.001.AI78.881.376.9Yes

*Source: SuperCLUE Mid-2026 Benchmark Report; CMMLU & MMLU-Pro scores from vendor reports where available.*

The Data Doesn't Care About Narrative

Benchmarks are imperfect. They can be gamed, cherry-picked, and misinterpreted. But when multiple independent benchmarks — from independent evaluators like LMArena, from standardized test sets like CMMLU and MMLU-Pro, and from industry coding leaderboards like SWE-bench — all point in the same direction, the signal becomes hard to ignore.

Consider the SuperCLUE ranking for June 2026. Qwen3.5-Plus from Alibaba sits at the top with a score of 88.5, followed by ByteDance's Doubao-Seed-2.0-pro at 87.8, and DeepSeek V4-Pro at 86.5. These are not minor players. Qwen has been downloaded more than one billion times across its various model sizes. Doubao is the most widely used AI assistant in China with 345 million monthly active users as of mid-2026. DeepSeek, which released V4 with open-source MIT weights, has become the go-to backbone for enterprise deployments across Chinese cloud platforms.

On CMMLU (Chinese Massive Multitask Language Understanding), the scores are even tighter. Qwen3.5-Plus leads at 89.2, with DeepSeek V4-Pro at 88.8 and Doubao at 88.4. The gap between first and third place is 0.8 percentage points. In the context of language model evaluation, that margin is statistical noise. These three models are effectively tied on Chinese-language reasoning, cultural knowledge, and commonsense inference.

The MMLU-Pro scores are where Western models still hold a modest lead, but the gap is shrinking. DeepSeek V4-Pro reports 87.5 on MMLU-Pro, trailing GPT-5.5's estimated 91.0 by 3.5 points. Claude Opus 4.7 sits at 89.6. On Humanity's Last Exam, the hardest reasoning benchmark currently deployed, GPT-5.5 leads at 55, Claude at 51, and DeepSeek V4-Pro at 48. The gap is real, but it is narrow — and the price gap is not.

Table 2: LMArena Elo Rankings (June 2026) — Independent Blind Testing

RankModelElo ScoreProviderOpen-Source?Notes
1GPT-5.51,506OpenAINoHighest Elo, premium API
2 (tie)Claude Opus 4.71,505AnthropicNoConsistent multi-turn leader
2 (tie)Gemini 3.1 Pro1,505GoogleNo1M-10M context window
4DeepSeek V4-Pro1,467DeepSeekYes (MIT)38-39 Elo gap to frontier
5Qwen3.5-Max1,459AlibabaPartialLeading open Chinese model
6Doubao-Seed-2.0-pro1,452ByteDanceNo345M MAU, enterprise deployed
7Kimi K2.61,448MoonshotYes1T param MoE architecture
8Claude Sonnet 4.61,445AnthropicNoMid-tier at 1/5th Opus price
9GLM-5.11,438Zhipu AIYesTrained on Huawei Ascend chips
10ERNIE 5.11,421BaiduPartial6% of typical training cost

*Source: LMArena Chatbot Arena, June 2026 independent scores. Elo ratings are statistical — a 30-point difference is roughly one Elo class gap.*

The Coding Arena: Where the Real Money Lives

If general knowledge benchmarks are the beauty contest, coding benchmarks are the revenue engine. Software development accounts for the largest share of enterprise AI spending globally, and the models that can write, debug, and refactor code are the ones that command premium pricing.

On SWE-bench Verified, the gold standard for real-world software engineering, Claude Opus 4.7 leads at 77.2, followed by GPT-5 at 74.9 and Gemini 2.5 Pro at 73.1. DeepSeek V4-Pro is further back at 55.4 on SWE-bench Pro, with Gemini 3.1 at 54.2. This is the one benchmark where the Western frontier still holds a clear, measurable advantage.

But here's the critical caveat: SWE-bench measures capability, not value. Claude Opus 4.7 costs $15 per million input tokens and $75 per million output tokens. DeepSeek V4-Pro costs $0.87 per million output tokens. For the price of one Claude Opus coding session, you can run 86 DeepSeek V4-Pro sessions with near-identical capability on most real-world tasks. When GPT-5.5 costs $5/M input and $25/M output, and Gemini 3.1 Pro costs $3.5/M input and $10.5/M output, the pricing asymmetry is not a footnote — it is the dominant feature of the market.

On LiveCodeBench, the dynamic coding benchmark that updates weekly to prevent memorization, the picture is even more interesting. DeepSeek V4-Pro reports 93.5, exceeding Gemini 3.1's 91.7. On Terminal-Bench 2.0, which tests autonomous agentic coding workflows, GPT-5.5 is the clear outlier at 82.7, but the cluster behind it is tight: Claude Opus 4.7 at 69.4, Gemini 3.1 at 68.5, and DeepSeek V4-Pro at 67.9. The gap between the Chinese model and the Western trio is 1.5 points or less — within the margin of a single model update.

Table 3: Coding Benchmark Comparison (June 2026)

BenchmarkGPT-5.5Claude Opus 4.7Gemini 3.1DeepSeek V4-ProQwen3.5-PlusDoubao Seed 2.0
SWE-bench Verified74.977.273.155.4~64.0~52.0
SWE-bench ProN/A71.854.255.4~52.0~48.0
LiveCodeBench92.191.391.793.589.288.1
Terminal-Bench 2.082.769.468.567.962.361.8
MMLU-Pro91.089.688.487.585.284.9
Humanity's Last Exam55.051.049.548.042.141.3
LMArena Elo1,5061,5051,5051,4671,4591,452
Price per 1M output tokens$25.00$75.00$10.50$0.87$1.20$0.30

*Source: LMArena, SWE-bench, LiveCodeBench, Terminal-Bench; vendor-reported and independent scores where available. Prices are estimated June 2026 API rates.*

The Price Revolution Nobody Talks About

In 2023, the AI API market was a two-tier world: OpenAI at the top charging premium prices, and everyone else fighting for scraps. In 2026, the pricing structure has been completely resegmented.

ByteDance's Doubao-Seed-2.0-pro costs $0.30 per million output tokens. DeepSeek V4-Pro costs $0.87. Qwen3.5-Plus costs roughly $1.20. These are not "discount" models. These are frontier models — ranking in the top 10 globally on LMArena, scoring in the top 5 on Chinese-language benchmarks, and capable of handling enterprise workloads at scale — priced at 3% to 12% of what Western APIs charge.

This is not a subsidy. ByteDance and DeepSeek are not losing money on API calls to gain market share. They are making money, because the cost of training and inference on domestic Chinese hardware has collapsed. Huawei's Ascend 910C chips, combined with training frameworks like MindSpore, have driven the cost of training a 1-trillion-parameter model down by approximately 60% compared to 2024 levels. DeepSeek's DeepSeekMoE architecture, which activates only 37 billion parameters out of 671 billion for any given forward pass, reduces inference costs by roughly 40% compared to dense models of the same quality.

The result is a pricing model that makes Western API rates look like a legacy carrier plan. When a Chinese developer can run an entire month of coding assistance through DeepSeek V4-Pro for less than the cost of a single Claude Opus conversation, the "premium" narrative evaporates. The question stops being "which model is smartest?" and becomes "how much of the capability gap am I willing to pay 30x for?"

Table 4: API Pricing Comparison (per 1M output tokens, June 2026)

Model TierModelPrice (USD/M output)LMArena EloCost per Elo Point
Premium FrontierClaude Opus 4.7$75.001,505$0.0498
Premium FrontierGPT-5.5$25.001,506$0.0166
Premium FrontierGemini 3.1 Pro$10.501,505$0.0070
Value FrontierClaude Sonnet 4.6$3.001,445$0.0021
Value FrontierGPT-5.3-Codex$2.00~1,440$0.0014
Chinese FrontierDeepSeek V4-Pro$0.871,467$0.0006
Chinese FrontierQwen3.5-Plus$1.201,459$0.0008
Chinese FrontierDoubao Seed 2.0$0.301,452$0.0002
Open Source (local)Llama 4 Maverick$0.00*~1,420$0.0000
Open Source (local)GLM-5.1$0.00*1,438$0.0000

*Local inference costs are hardware-dependent; ~$0.00 refers to per-token API pricing only. Source: Vendor pricing pages, estimated June 2026.*

The Multi-Modal Assault of May 2026

In a single week in May 2026, Chinese AI labs delivered a coordinated attack on the multi-modal frontier that changed the global landscape. DeepSeek teased V4.1 with native image-and-audio understanding. ByteDance quietly released Mamoda2.5, an open-source 250B-parameter unified multi-modal model. Baidu's ERNIE 5.1 achieved fourth place globally on LMArena using only 6% of typical training costs.

The multi-modal battle is the decisive one because enterprise applications overwhelmingly depend on images, charts, screenshots, and video. A text-only model cannot analyze a factory monitoring feed, diagnose a medical X-ray, or review a design prototype. ByteDance's Mamoda2.5, built on MoE+DiT architecture, achieves 12x faster inference than Alibaba's Wan2.2 on a single device, with video editing latency of 9.2 seconds — matching closed-source Sora and Kuaishou Kling. The critical distinction: enterprises can fine-tune and deploy locally without API dependencies, essential for data-sensitive environments.

DeepSeek V4.1's architecture is even more significant. Rather than bolting image processing onto a text model, it uses native multi-modal fusion with a unified architecture handling text, images, and audio simultaneously, with shared context across modalities. The Deep MCP protocol integration enables V4.1 to function as an enterprise Agent core: analyzing a factory monitoring screenshot triggers not just description but actual workflow — creating tickets, notifying staff, generating recommendations. The rumored $50 billion funding round for DeepSeek directly funds this enterprise infrastructure push.

Table 5: Multi-Modal Model Comparison (May–June 2026)

ModelProviderParametersOpen-Source?Video LatencyKey FeatureLMArena Rank
Mamoda2.5ByteDance250BYes9.2 secMoE+DiT, 12x inference speed~6th
V4.1 (preview)DeepSeek~1TPartial (OSS)12.5 secNative multi-modal fusion, Deep MCP~4th
ERNIE 5.1Baidu220BPartial14.1 sec6% training cost, 4th globally4th
Seed 2.0 ProByteDance~600BNo11.3 sec6th on LMSYS Text, 3rd on Vision6th
Kling 3.0Kuaishou340BNo8.7 secIndustry-leading video generationN/A
Wan2.2 A14BAlibaba14B (active)Yes110 secLong-form video generation~15th
Sora (v2)OpenAIUnknownNo10.2 secPremium video quality1st (video)

*Source: Vendor announcements, technical reports, LMArena rankings as of June 2026. Video latency measured for 5-second clip generation on single A100-equivalent GPU.*

The Open-Source Wedge

One of the most underappreciated dynamics of 2026 is the open-source divergence. Western frontier models are almost entirely closed. GPT-5.5, Claude Opus 4.7, and Gemini 3.1 are API-only, with no weights released. The closest Western open-source competitor is Meta's Llama 4 Maverick, which scores 67.9 on SWE-bench and ranks around 1,420 on LMArena — competitive but not frontier.

Chinese labs, by contrast, have released six frontier-grade open-weight models since January 2026: GLM-5 (Zhipu AI), Kimi K2.5 (Moonshot), DeepSeek V4 (DeepSeek), MiniMax M2.5, ByteDance Seed-OSS-36B, and Mamoda2.5. These are not small models. GLM-5 is 745 billion parameters trained on Huawei Ascend chips. Kimi K2.5 is a 1-trillion-parameter Mixture of Experts with open weights. DeepSeek V4's 671B-parameter model is available under MIT license.

The implications are profound. A startup in Lagos, a university lab in São Paulo, or an enterprise in Jakarta can download a model that ranks in the global top 10 and run it locally, without API dependency, without data leaving their premises, and without paying per-token fees. The open-source tier is not "almost as good as closed." It is the frontier for most of the world's developers. And it is overwhelmingly Chinese.

Table 6: Open-Source Model Landscape (June 2026)

ModelProviderParametersLicenseLMArena EloSWE-benchKey Strength
DeepSeek V4DeepSeek671B MoEMIT1,46755.4Coding, reasoning, value
GLM-5.1Zhipu AI745BOpen1,43852.1Chinese chips, long context
Kimi K2.6Moonshot1T MoEOpen1,44854.81M context, agent swarm
MiniMax M2.5MiniMax456BOpen1,44253.2Multi-SWE-Bench leader
Qwen3.5-72BAlibaba72BOpen1,42151.31B downloads, enterprise
Llama 4 MaverickMeta~400BOpen~1,42067.9Best Western open model
Mamoda2.5ByteDance250BOpen~1,43548.2Multi-modal, video gen
Mistral Large 3Mistral675BOpen1,42556.1#2 open-source on LMArena

*Source: LMArena, vendor repositories, model cards. Elo ratings are approximate for open-weight models that may have been tested at different quantization levels.*

The User Data That Really Matters

Benchmarks are validated by users. The most telling metric is not a test score but a monthly active user count. As of mid-2026, Doubao has 345 million monthly active users in China — making it the most widely used AI assistant in the country by a wide margin. Alibaba's Qwen suite (across all model sizes and integrations) reaches 166 million MAU. Baidu's ERNIE family has approximately 220 million MAU across its various incarnations. DeepSeek's domestic user base, which peaked at 143 million in August 2025, has settled at a lower but still significant number after the initial hype cycle.

The critical insight is not the absolute numbers but the migration patterns. When DeepSeek's API price was cut by 75% and then made permanent in June 2026, the industry assumed users would simply flock to the cheapest option. But the data from Chai Analytics and QuestMobile shows something more nuanced. Of the 39.4% of DeepSeek's churned users who migrated to a competitor in May 2025, the single largest destination was Doubao — not because Doubao was cheaper, but because Doubao was better integrated into the Chinese digital ecosystem, with native WeChat integration, enterprise workflow tools, and ByteDance's massive content graph feeding its training data.

This is the moat that Western models cannot replicate: ecosystem integration. GPT-5.5 is the most capable model in the world on certain benchmarks, but it does not have native access to WeChat, Taobao, or Baidu's search index. It does not understand the 34,000+ characters of Chinese history, literature, and legal code that CMMLU tests for. It is not trained on the specific conversation patterns, meme formats, and social norms of Chinese internet culture. A model that scores 91.0 on MMLU-Pro but struggles with Chinese idioms and regulatory frameworks is a less useful product in China than a model that scores 87.5 but understands them natively.

Table 7: Chinese AI Assistant User Base (Mid-2026)

PlatformMAU (millions)Peak MAUChurn RatePrimary MoatLMArena Rank
Doubao345380 (Apr 2025)8.2%Ecosystem integration, content graph6th
Baidu ERNIE220245 (Aug 2025)12.1%Search integration, enterprise10th
Qwen Suite166180 (May 2025)9.3%Open-source downloads, cloud5th
DeepSeek89143 (Aug 2025)37.8%API pricing, developer trust4th
Kimi6778 (Mar 2025)11.5%Long context, agent swarm7th
MiniMax Talkie4245 (Feb 2025)8.9%Companion AI, voice, emotionalN/A
Tencent Hunyuan3842 (Jan 2025)10.7%WeChat integration~12th
Others120N/AN/ANiche applications

*Source: QuestMobile Mid-2026 Report; Chai Analytics churn study; SuperCLUE usage surveys. MAU figures are estimates based on public disclosures and third-party analytics.*

The Regulatory Tailwind Nobody Predicted

China's AI regulatory framework, often characterized as restrictive by Western observers, has in fact created a competitive advantage for domestic model development. The National Medical Products Administration's (NMPA) AI device approval process has cleared more than 40 AI-assisted medical devices since 2025, with DeepSeek, Baidu, and iFLYTEK models receiving regulatory clearance for clinical use. This creates a market that US models are legally barred from entering — not because they are less capable, but because they lack the regulatory certification, Chinese-language training data, and local clinical partnerships that Chinese models possess.

The same dynamic applies to finance, education, and government. A language model deployed in a Chinese bank's compliance system must understand China's Anti-Money Laundering Law, the Cybersecurity Law, and the Personal Information Protection Law. It must be trained on Chinese court cases, regulatory circulars, and policy documents. Western models can be fine-tuned, but the fundamental training data gap means they start from behind. The regulatory moat is not a wall — it is a learning advantage that compounds with every deployment.

Table 8: NMPA-Cleared AI Medical Devices (2025–2026)

CompanyProductApplicationNMPA ClassDate ClearedModel Base
DeepSeekDeepSeek-Med V2Radiology diagnosisClass IIIMar 2026DeepSeek V4
BaiduERNIE-Med 5.0Pathology analysisClass IIJan 2026ERNIE 5.1
iFLYTEKiFLYTEK Health 3.0Clinical decision supportClass IINov 2025SparkDesk 4.0
TencentWeChat Health AISymptom triageClass IFeb 2026Hunyuan 2.0
HuaweiPangu-Med UltraDrug discoveryClass IIApr 2026Pangu 718B
AlibabaTongyi-MedOncology imagingClass IIIDec 2025Qwen 3.5
YituDr. Yitu 6.0Cardiology AIClass IIIJan 2026Custom MoE
InfervisionInferRead 5.0Chest CT screeningClass IIOct 2025Custom CNN+LLM

*Source: NMPA Public Database, company announcements, Caixin Medical. Class III is the highest risk classification requiring clinical trials.*

The Contrarian View: Where the Gap Actually Is

None of this means Chinese models are universally superior. The honest analysis of June 2026 benchmarks reveals clear Western advantages in specific domains:

Autonomous agentic performance: GPT-5.5's Terminal-Bench 2.0 score of 82.7 is a genuine outlier. On multi-step autonomous workflows requiring tool use, planning, and error recovery, the OpenAI model is in a class of its own. The gap is not marginal — it is 13 points above Claude, and 15 points above DeepSeek. This matters because 2026 is the year AI stopped being a chatbot and started being a worker. Agents that can operate terminals, write code, run tests, and deploy applications are the future of enterprise AI, and OpenAI is leading this category decisively.

The hardest reasoning: On Humanity's Last Exam and GPQA Diamond, the frontier trio (GPT-5.5, Claude Opus 4.7, Gemini 3.1) still hold a 3-to-7-point advantage. For applications requiring frontier-level scientific reasoning — molecular modeling, advanced mathematics, legal analysis at the Supreme Court level — the Western premium is justified.

Ecosystem lock-in outside China: GPT-5.5 and Claude are deeply embedded in Western enterprise workflows, SaaS platforms, and developer tools. GitHub Copilot uses GPT-5.3-Codex. Notion AI uses Claude. Salesforce Einstein uses GPT-5. These integrations are not easily displaced by a cheaper Chinese model, even one with comparable capability. The switching cost is the real moat, and Western models built it first.

The Three Scenarios for 2027

Where does this go? The benchmark data points to three possible futures, each with different implications for the global AI industry:

Scenario 1: Convergence (60% probability). The 38-point Elo gap on LMArena closes to 15 points or less by Q1 2027. DeepSeek or Qwen releases a model that matches GPT-5.5 on the hardest reasoning benchmarks while maintaining 10x pricing advantage. The global market bifurcates: Western models dominate the premium enterprise tier, Chinese models dominate the volume tier, and open-source Chinese models capture the global developer base. This is the most likely outcome given the current trajectory.

Scenario 2: Divergence (25% probability). Export controls on Chinese AI hardware tighten further, and the compute gap widens. GPT-5.5's successors pull ahead on agentic performance and hardest reasoning by a margin that cannot be closed with clever architecture. Chinese models remain competitive on cost and Chinese-language tasks but lose ground on the global frontier. The industry becomes a two-tier system: frontier and near-frontier.

Scenario 3: Flip (15% probability). A Chinese lab — most likely DeepSeek, given its open-source momentum and $50B war chest — achieves a genuine breakthrough on agentic reasoning or autonomous research. The gap flips. Western models become the "expensive alternative" and Chinese models become the default. This would require a discontinuity in capability, not just incremental improvement, but the history of AI is full of such discontinuities.

Table 9: Three-Scenario Outlook for 2027

ScenarioProbabilityTriggerMarket StructureKey Model
Convergence60%15-point Elo gap closed; price advantage holdsPremium West / Volume China / OSS globalDeepSeek V5, Qwen 4.0
Divergence25%Export controls tighten; compute gap widensTwo-tier: frontier vs. near-frontierGPT-5.7, Claude 5.0
Flip15%Breakthrough on agentic/reasoningChina becomes defaultDeepSeek V5.5, GLM-6

*Source: Author analysis based on current benchmark trajectories, hardware availability projections, and funding announcements.*

What the Data Actually Says

The data is not ambiguous. In June 2026, Chinese models are in the same performance tier as Western frontier models on the majority of benchmarks that matter. They are not behind. They are not "catching up." They are competing — and in some cases, winning on cost-adjusted value, Chinese-language capability, multi-modal performance, and open-source availability.

The gap that remains is narrow, specific, and concentrated in the hardest reasoning and most autonomous agentic tasks. On general knowledge, coding, Chinese-language tasks, and real-world deployment economics, the playing field is level. On price, it is tilted dramatically toward China.

The narrative that Western AI leads by a wide margin is a comfort blanket for observers who have not looked at the numbers. The numbers are available. They are updated weekly. And they are telling a story that the conventional wisdom has not yet caught up with.

The frontier is no longer a Western monopoly. It is a shared frontier. And the side that understands this first will build the products, partnerships, and policies that define the next decade of AI.


Social Chatter

@quant_wang (Weibo): "SWE-bench 55.4 vs 77.2 looks like Claude wins, but multiply by price: $0.87 vs $75. That's 86x more sessions per dollar. For 90% of coding tasks, which one do you actually want?"

@liao_ai_vc (X/Twitter): "The open-source dominance is what Western observers miss. Six Chinese frontier-grade open-weight models in 2026 vs. one Western open model (Llama 4). The global developer default is becoming Chinese."

@deepseek_fan (Zhihu): "DeepSeek V4-Pro at 1,467 Elo vs GPT-5.5 at 1,506. 39 points. That's not a gap. That's a sprint. V3 to V4 was 200 points in 8 months."

@bytedance_engineer (Blind): "Doubao 345M MAU. That's bigger than ChatGPT's estimated US user base. And the integration with WeChat, TikTok, and enterprise tools means no Western model can replicate it without a complete ecosystem rebuild."

@european_policy (LinkedIn): "We're talking about export controls as if they stop Chinese AI. But the benchmarks show Chinese models are at frontier on domestic chips. The controls slowed them, but they didn't stop them. We need a different policy framework."

@alibaba_cloud (X/Twitter): "1 billion downloads for Qwen. Not 1 million. 1 billion. That's not a niche open-source project. That's the global default for developers who can't afford $25/M tokens."


*Methodology note: Benchmark scores are drawn from SuperCLUE June 2026, LMArena Chatbot Arena (independent), SWE-bench Verified, LiveCodeBench, Terminal-Bench 2.0, and vendor technical reports. Where independent scores were not available, vendor-reported scores are labeled inline. Pricing data is estimated from vendor API pages as of June 2026 and may vary by region and volume tier. Elo ratings are statistical — a 30-point difference represents approximately one standard error in model comparison. Social media comments are translated and paraphrased from original Chinese sources.*

M

By Meeeeed

Editor at AI in China. Tracking Chinese AI companies, funding rounds, and the technologies reshaping global tech. More about me.

← Previous

China's AI Avatar Revolution: How 410 Million Views Signal a Global Content Creation Shift

Next →

China's AI Capital War: DeepSeek Hits $50B, Kimi Raises $2B in Historic Funding Frenzy