AI Infrastructure17 min read

Alibaba's Zhenwu M890: The 3x AI Chip That Just Shipped 560,000 Units and Rewrote China's Silicon Playbook

May 30, 2026·AI in China

On May 20, 2026, at the Alibaba Cloud Summit in Hangzhou, Liu Weiguang — Senior Vice President of Alibaba Cloud — stood on stage and said something that would have sounded delusional just three years earlier: *"What we're building is China's AI factory."* Behind him, the numbers told the story. 560,000 AI chips shipped. 400+ enterprise customers across 20 industries. A 128-card supernode server with sub-150-nanosecond peer-to-peer latency. And a new chip, the Zhenwu M890, that delivers 3x the performance of its predecessor in a single generation.

For years, Western analysts dismissed Chinese AI chips as perpetual also-rans — capable of modest inference tasks, but nowhere near NVIDIA's league for training frontier models. The Zhenwu M890 doesn't just challenge that narrative. It buries it.

Abstract visualization of neural network data flows and chip architecture

*The Zhenwu M890 represents a generational leap in Chinese AI chip design — 144GB HBM, 800GB/s interconnect, and native FP4 precision support for the Agentic era.*

Here's what you need to know about the Zhenwu M890 in 60 seconds:

Metric	Zhenwu 810E (2024)	Zhenwu M890 (2026)	Change
Memory (HBM)	96GB HBM2e	144GB HBM	+50%
Interconnect Bandwidth	700GB/s	800GB/s	+14%
Performance	Baseline	3x baseline	+200%
Precision Support	Limited	FP32 → FP4 native	Full spectrum
Max Cluster Size	32 cards	128 cards (with ICN Switch)	4x scale
P2P Latency	~500ns	<150ns	-70%
Cumulative Shipments	200,000	560,000	+180%
Annual Revenue Run Rate	~40B RMB	~100B RMB	+150%

The M890 is not an incremental refresh. It is a fundamental architectural redesign built for the Agentic era — where thousands of AI agents run concurrently, each making dozens of model calls per task, demanding ultra-low latency and massive memory bandwidth. And it arrives at a moment when NVIDIA's China market share has collapsed from 95% in 2023 to a projected 8% by year-end 2026.

The Architecture: How T-Head Built a 3x Chip in One Generation

The Zhenwu M890 is built on a self-developed parallel computing architecture — not a licensed ARM or RISC-V design, but a fully custom instruction set and dataflow fabric optimized specifically for transformer workloads and mixture-of-experts (MoE) models.

Memory and Bandwidth: The Real Battleground

Modern AI training and inference are not limited by raw compute. They are limited by memory capacity and bandwidth. The M890's 144GB of HBM (High Bandwidth Memory) puts it in the same tier as NVIDIA's H100 (80GB) and H200 (141GB), and ahead of the H20 (96GB) that NVIDIA is currently allowed to sell in China under US export controls.

Specification	NVIDIA H100 SXM	NVIDIA H20 (China)	Zhenwu M890	Huawei Ascend 910C
Memory	80GB HBM3	96GB HBM2e	144GB HBM	64GB HBM2e
Memory Bandwidth	3.35TB/s	4.0TB/s	800GB/s*	1.6TB/s
Interconnect	900GB/s NVLink	400GB/s NVLink	800GB/s ICN	392GB/s HCCS
Max Cluster	256 GPUs	256 GPUs	128 cards	256 cards
Precision	FP8/FP16/FP32	FP8/FP16/FP32	FP4-FP32	FP16/FP32
Process Node	4nm TSMC	4nm TSMC	7nm (est.)	7nm SMIC

*The M890's 800GB/s is inter-chip bandwidth via ICN Switch, not on-chip HBM bandwidth. For full-system bandwidth, the 128-card AL128 supernode achieves aggregate bandwidth in the petabits-per-second range.*

ICN Switch 1.0: The Secret Weapon

The most underreported announcement at the summit was not the M890 itself, but the ICN Switch 1.0 — a dedicated network switching chip designed specifically for AI cluster interconnect. Traditional Ethernet-based clusters suffer from high tail latency and packet overhead. The ICN Switch uses a custom protocol with deterministic routing, enabling 64 M890 chips to interconnect at full bandwidth with near-zero packet loss.

For the Panjiu AL128 supernode, Alibaba coupled two 64-card ICN domains, achieving 128-card scale with P2P latency below 150 nanoseconds. This is not a theoretical design — the AL128 is already live on Alibaba Cloud's Model Studio (Bailian) and supports Qwen, DeepSeek, and Kimi models out of the box.

🧠 工程师笔记: *"ICN Switch的百纳秒级时延不是实验室数据，是阿里云百炼平台已经上线的实测指标。对于Agentic应用来说，一次任务可能涉及20-30次模型调用，时延每降100ns，整体响应时间就能快3-5秒。"*
Engineer's Note: *"The ICN Switch's sub-100ns latency isn't a lab figure — it's a measured metric on Alibaba Cloud's Bailian platform already in production. For Agentic applications, a single task may involve 20-30 model calls. Every 100ns reduction in latency shaves 3-5 seconds off total response time."*

The Full-Stack Play: Chip, Cloud, Model, Inference

Alibaba didn't just announce a chip. It announced a philosophy: the "芯-云-模型-推理" (Chip-Cloud-Model-Inference) integrated stack. This is vertical integration at a level no Western cloud provider has attempted — not even AWS with its Trainium and Inferentia chips, which still rely heavily on NVIDIA for training workloads.

Layer	Product	Role in Stack
Chip	Zhenwu M890 + ICN Switch 1.0	Compute and interconnect fabric
Server	Panjiu AL128 Supernode	128-card rack-scale system
Cloud	Alibaba Cloud (Bailian)	Model hosting and API infrastructure
Model	Qwen3.7-Max	Frontier LLM trained and deployed on Zhenwu
Platform	Qianwen Cloud	Agent-ready product suite
Service	Model Studio (Bailian)	Developer API and fine-tuning platform

Qwen3.7-Max: The Model That Proves the Chip

Announced alongside the M890, Qwen3.7-Max is Alibaba's most ambitious model yet. It features a 1-million-token context window, scores 56.6 on the Artificial Analysis Intelligence Index (ranked #5 globally at launch, the highest-placed Chinese model), and achieved an extraordinary feat: a 35-hour autonomous coding run with 1,158 tool calls and zero human intervention, optimizing the M890's own kernel software by 10x.

Benchmark	Qwen3.7-Max	Claude Opus 4.6 Max	GPT-5.5
SWE-Bench Pro	60.6	57.3	58.1
Terminal-Bench 2.0	69.7	64.2	67.5
MCP-Atlas	76.4	73.1	75.8
GPQA Diamond	92.4	91.3	93.1
HLE	41.4	40.0	42.2
Intelligence Index v4.0	56.6	54.2	60.2
Context Window	1M tokens	200K tokens	1M tokens
Input Price (per 1M tokens)	$2.50	$5.00	$7.50
Output Price (per 1M tokens)	$7.50	$15.00	$22.50

The pricing is aggressively competitive — roughly half the cost of Claude Opus and one-third the cost of GPT-5.5 for equivalent throughput. The 90% cache discount on input tokens ($0.25 per 1M cached tokens) makes it particularly attractive for long-context agentic workflows where the same documents are referenced repeatedly.

💬 网友热评: *"阿里这次是把芯片、模型、云服务打包卖，企业客户买一套就全解决了。别的厂商还在拼单点性能，阿里直接打组合拳。"*
Netizen Comment: *"Alibaba is selling chips, models, and cloud services as a bundled package this time. Enterprise clients get everything in one solution. While competitors are still fighting over single-point performance, Alibaba is throwing combination punches."*

The Numbers That Prove Scale: 560,000 Chips and Counting

The most important number from the summit wasn't a benchmark score. It was 560,000 — the cumulative number of Zhenwu-series chips shipped as of April 2026. That represents roughly 100 billion RMB in annual revenue run rate for T-Head's chip business alone.

Customer Breakdown by Industry

Industry	Key Customers	Use Cases
Telecommunications	China Telecom, China Mobile	Large-scale model training, 5G AI optimization
Automotive	FAW Group (China First Auto)	Autonomous driving simulation, in-cabin AI
Financial Services	Shanghai Pudong Development Bank	Risk modeling, fraud detection, document AI
Healthcare	Multiple hospital groups	Medical imaging, drug discovery pipelines
Internet/Cloud	Alibaba Cloud, external SaaS	Model serving, inference scaling
Manufacturing	Ningbo Steel, others	Predictive maintenance, quality inspection
Government	Various provincial data centers	Smart city AI, public safety

T-Head's VP of Semiconductor, Gao Hui, revealed the roadmap for the next two generations:

Chip	Target Launch	Expected Performance	Memory	Interconnect
Zhenwu M890	Q2 2026 (now)	3x 810E	144GB	800GB/s
Zhenwu V900	Q3 2027	3x M890 (9x 810E)	216GB	1200GB/s
Zhenwu J900	Q3 2028	TBD	TBD	TBD

The "one generation per year" cadence is faster than NVIDIA's roughly 18-24 month cycle and matches the rapid iteration pace of China's AI model ecosystem. If the V900 hits its targets, Alibaba will have a chip with 9x the performance of the 810E within four years — a trajectory that would put it firmly in the same tier as NVIDIA's post-Blackwell roadmap.

The China AI Chip Landscape: A Market Being Reshaped Overnight

Data visualization showing market share trends and technology analysis

*The Chinese AI chip market is experiencing the most dramatic power shift in semiconductor history — NVIDIA's share collapsed from 95% to a projected 8% in just three years.*

The Zhenwu M890 doesn't exist in a vacuum. It lands in a Chinese AI chip market that is undergoing the most dramatic power shift in semiconductor history.

NVIDIA's China Collapse

In 2023, NVIDIA held 95% of China's AI accelerator market. By the end of 2025, that had fallen to 55%. By Q1 2026, NVIDIA's position is even more precarious — restricted to selling only the compliance-grade H20 chip, with its China revenue projected to fall to 8% market share by year-end.

Year	NVIDIA	Huawei	T-Head (Alibaba)	AMD	Others
2023	95%	2%	<1%	1%	1%
2025 (full year)	55%	20%	6.6%	4%	14.4%
2026 Q1 (sales)	15-20%	35-40%	5-6%	10-12%	27-34%
2026 E (forecast)	8%	50%	6%	12%	24%

NVIDIA CEO Jensen Huang has publicly called the US export restrictions "extremely stupid" and warned that they have effectively "erased" NVIDIA's presence in what was once its second-largest market. The company's China revenue has fallen from $10.3 billion in 2023 to an estimated $1.2 billion in 2026.

The "One Superpower, Multiple Strong Players" Structure

China's domestic AI chip market is consolidating into a clear hierarchy:

Tier	Players	2026E Market Share	Strategic Position
Superpower	Huawei Ascend	50%	Full-stack vertical integration, government backing, largest ecosystem
Second Tier	AMD, Hygon	12-20%	x86/CUDA compatibility bridge for legacy workloads
Specialists	Cambricon, T-Head	6-9% each	Cambricon: inference-optimized ASICs; T-Head: cloud-native training
Emerging	Moore Threads, Biren, MetaX	2-4% each	General-purpose GPU challengers, targeting consumer and edge
Niche	Baidu Kunlun, Enflame	3-5% each	Vertical integration with parent company's AI stack

Cambricon is particularly aggressive — the company plans to deliver 500,000 AI accelerators in 2026, including 300,000 of its most advanced Siyuan 590/690 chips. If it hits that target, it could leapfrog T-Head in unit volume, though T-Head's higher ASP (average selling price) on the M890 may keep it competitive in revenue terms.

💬 行业分析师观点: *"2026年是中国AI芯片的'分水岭年'。英伟达从95%掉到8%，这不是渐进式替代，是断崖式崩塌。华为吃掉了最大一块，但寒武纪和阿里平头哥正在争夺第二名的位置。"*
Industry Analyst View: *"2026 is the 'watershed year' for Chinese AI chips. NVIDIA falling from 95% to 8% isn't gradual substitution — it's a cliff collapse. Huawei took the biggest slice, but Cambricon and Alibaba's T-Head are now fighting for second place."*

The Competitive Moat: What T-Head Has That Others Don't

Financial investment and technology funding concept

*Alibaba's $53 billion capital commitment over three years signals that T-Head's silicon strategy is not a side project — it is a core pillar of the company's future.*

In a crowded field of Chinese AI chip startups, T-Head has several structural advantages that are difficult to replicate:

1. The Alibaba Cloud Distribution Channel

T-Head doesn't need to sell chips one-by-one to enterprises. It sells them by the rack through Alibaba Cloud's existing customer relationships. The Panjiu AL128 is not a reference design — it is a product that Alibaba Cloud deploys, manages, and bills for. This "chip-as-a-service" model dramatically reduces customer friction and allows T-Head to iterate hardware based on real production telemetry.

2. The Qwen Model Flywheel

Unlike standalone chip vendors, T-Head has a captive customer for its most advanced hardware: the Qwen team. Every new Qwen model is trained and optimized on Zhenwu chips first, creating a tight feedback loop between model architecture and hardware design. The 35-hour autonomous optimization run that improved the M890 kernel by 10x is a perfect example of this flywheel in action.

3. The Agentic Era Tailwind

The M890's design — massive memory, ultra-low latency interconnect, and native FP4 precision — is specifically tuned for Agentic AI workloads. Traditional training chips are optimized for large-batch gradient descent. The M890 is optimized for the messy, bursty, high-concurrency reality of thousands of agents making simultaneous inference calls. As the industry pivots from "train one big model" to "run millions of agents," the M890's architectural choices look increasingly prescient.

4. The Capital Commitment

Alibaba CEO Wu Yongming has stated that Alibaba's capital expenditure over the next three years will "far exceed 380 billion RMB" (approximately $53 billion USD), with data center assets growing to 10x their pre-2022 AI boom levels. While not all of that will go to T-Head chips, the commitment signals that Alibaba views its silicon strategy as existential — not a side project, but a core pillar of its future.

Advantage	T-Head (Alibaba)	Huawei (Ascend)	Cambricon	Independent GPU Startups
Captive Cloud	✅ Alibaba Cloud	✅ Huawei Cloud	❌	❌
Captive Model	✅ Qwen series	✅ Pangu series	❌	❌
Enterprise Distribution	✅ Massive	✅ Strong	⚠️ Building	⚠️ Limited
Government Backing	⚠️ Moderate	✅ Extensive	⚠️ Moderate	❌
Software Ecosystem	✅ PAI/Bailian	✅ MindSpore/CANN	⚠️ NeuWare	⚠️ Fragmented
Manufacturing Access	⚠️ 7nm (TSMC/SMIC)	✅ SMIC N+2	⚠️ SMIC N+2	⚠️ Limited

The Risks and Open Questions

No analysis of the Zhenwu M890 would be complete without acknowledging the significant challenges ahead:

1. The Software Gap Remains Real

While T-Head has made enormous strides, the software ecosystem around Zhenwu chips still lags NVIDIA's CUDA. Developers must port models from PyTorch/TensorFlow to T-Head's proprietary stack, and while compatibility layers exist, they require engineering effort. For startups with small teams, this friction can be a dealbreaker.

2. Manufacturing Constraints

The M890 is believed to be manufactured on a 7nm process (likely TSMC or SMIC's N+2 equivalent). As US export controls tighten, access to sub-7nm processes for Chinese companies is increasingly uncertain. The V900's 2027 target assumes continued manufacturing access — a geopolitical variable, not a technical one.

3. The Revenue Concentration Problem

T-Head's 560,000 shipments are impressive, but Alibaba Cloud and its affiliates likely represent a significant share of that total. The true test is whether third-party customers — companies with no Alibaba affiliation — will choose Zhenwu over Ascend, Cambricon, or even AMD. The customer list (China Telecom, FAW, SPD Bank) shows progress, but the ratio of captive-to-independent revenue remains opaque.

4. The "NVIDIA Habit"

China's AI researchers and engineers grew up on CUDA. Even when domestic chips are technically competitive, the muscle memory of CUDA development — the Stack Overflow answers, the GitHub repositories, the university courses — creates a gravitational pull that hardware specs alone cannot overcome. T-Head must invest heavily in education, tooling, and community to break this habit.

💬 开发者社区声音: *"M890硬件参数确实好看，但写 kernel 的时候还是怀念 CUDA 的文档成熟度。平头哥需要把开发者体验做到和 NVIDIA 一样丝滑，否则大厂还是会优先买昇腾。"*
Developer Community Voice: *"The M890's hardware specs look great, but when writing kernels, I still miss CUDA's documentation maturity. T-Head needs to make the developer experience as smooth as NVIDIA's, otherwise big companies will still prefer Ascend."*

The Global Implication: What This Means Beyond China

Technology data streams and global network connectivity

*The Zhenwu M890 signals a broader shift in global AI infrastructure — as Chinese chips become competitive, Asia-Pacific markets are increasingly looking for non-American AI hardware alternatives.*

The Zhenwu M890 is not just a Chinese story. It is a signal about the future structure of global AI infrastructure.

For Asia-Pacific Markets

Southeast Asian governments and enterprises — Singapore, Malaysia, Indonesia, Thailand — are increasingly looking for non-American AI infrastructure to avoid geopolitical risk. Alibaba Cloud is already the dominant cloud provider in much of APAC. The Zhenwu M890 gives it a fully vertical, non-US-dependent stack to offer these customers. The OpenAI Singapore lab ($300M SGD investment) is a direct response to this threat.

For European Enterprises

European companies face a dilemma: US cloud providers (AWS, Azure, GCP) come with GDPR compliance and US CLOUD Act risks. Chinese cloud providers come with China's National Intelligence Law (2017, Article 7) obligations. The Zhenwu M890 creates a third option — a fully Chinese stack that can be deployed in European data centers by Alibaba Cloud's local subsidiaries, though data sovereignty concerns remain significant.

For NVIDIA and AMD

The Chinese market is not coming back for NVIDIA. The question is whether the Zhenwu architecture — or the broader Chinese chip ecosystem — becomes competitive enough to threaten NVIDIA in non-Chinese markets. That is still years away. But the trajectory is clear: every generation of Zhenwu chips narrows the gap, and the Chinese model ecosystem (Qwen, DeepSeek, Kimi) is increasingly optimized for domestic silicon first.

The Verdict: Is the Zhenwu M890 a NVIDIA Killer?

Not yet. But it doesn't need to be.

The Zhenwu M890 is not trying to win benchmark shootouts against the H100 on neutral ground. It is trying to win the Chinese market — and by that standard, it is succeeding. With 560,000 units shipped, 400+ customers, a 100 billion RMB revenue run rate, and a clear roadmap to V900 and J900, T-Head has built a sustainable, self-reinforcing business that gets stronger with every Qwen model and every Alibaba Cloud deployment.

The real question is not whether T-Head can beat NVIDIA. The real question is whether China needs NVIDIA at all for its domestic AI ambitions. And the answer, as of May 2026, is increasingly: no.

Dimension	NVIDIA's Position	T-Head's Trajectory
Global Leadership	✅ Uncontested	⚠️ 3-5 years behind
China Market	❌ Collapsing (8% EOY)	✅ Growing (6% → ?)
Software Ecosystem	✅ CUDA dominance	⚠️ Building, improving
Vertical Integration	❌ Chip-only	✅ Full stack (chip→cloud→model)
Agentic Optimization	⚠️ General purpose	✅ Purpose-built
Geopolitical Risk	❌ Export controls	✅ Domestic priority
Capital Backing	✅ Massive	✅ Alibaba's $53B commitment

💬 国际投资者视角: *"We're no longer asking 'Can China build AI chips?' We're asking 'Which Chinese AI chip company will be the next NVIDIA?' The answer might be: none of them need to be. The Chinese market is big enough to support multiple $10B+ chip companies without ever competing globally."*
International Investor Perspective: *"We're no longer asking 'Can China build AI chips?' We're asking 'Which Chinese AI chip company will be the next NVIDIA?' The answer might be: none of them need to be. The Chinese market is big enough to support multiple $10B+ chip companies without ever competing globally."*