Alibaba's Zhenwu M890: The 3x AI Chip That Just Shipped 560,000 Units and Rewrote China's Silicon Playbook
Alibaba's Zhenwu M890: The 3x AI Chip That Just Shipped 560,000 Units and Rewrote China's Silicon Playbook
*Published: May 30, 2026 | Reading time: 17 minutes | Trending: 🔥🔥🔥🔥🔥*
On May 20, 2026, at the Alibaba Cloud Summit in Hangzhou, Liu Weiguang — Senior Vice President of Alibaba Cloud — stood on stage and said something that would have sounded delusional just three years earlier: *"What we're building is China's AI factory."* Behind him, the numbers told the story. 560,000 AI chips shipped. 400+ enterprise customers across 20 industries. A 128-card supernode server with sub-150-nanosecond peer-to-peer latency. And a new chip, the Zhenwu M890, that delivers 3x the performance of its predecessor in a single generation.
For years, Western analysts dismissed Chinese AI chips as perpetual also-rans — capable of modest inference tasks, but nowhere near NVIDIA's league for training frontier models. The Zhenwu M890 doesn't just challenge that narrative. It buries it.
*The Zhenwu M890 represents a generational leap in Chinese AI chip design — 144GB HBM, 800GB/s interconnect, and native FP4 precision support for the Agentic era.*
Executive Summary: Why This Chip Matters Now
Here's what you need to know about the Zhenwu M890 in 60 seconds:
| Metric | Zhenwu 810E (2024) | Zhenwu M890 (2026) | Change |
|---|---|---|---|
| Memory (HBM) | 96GB HBM2e | 144GB HBM | +50% |
| Interconnect Bandwidth | 700GB/s | 800GB/s | +14% |
| Performance | Baseline | 3x baseline | +200% |
| Precision Support | Limited | FP32 → FP4 native | Full spectrum |
| Max Cluster Size | 32 cards | 128 cards (with ICN Switch) | 4x scale |
| P2P Latency | ~500ns | <150ns | -70% |
| Cumulative Shipments | 200,000 | 560,000 | +180% |
| Annual Revenue Run Rate | ~40B RMB | ~100B RMB | +150% |
The M890 is not an incremental refresh. It is a fundamental architectural redesign built for the Agentic era — where thousands of AI agents run concurrently, each making dozens of model calls per task, demanding ultra-low latency and massive memory bandwidth. And it arrives at a moment when NVIDIA's China market share has collapsed from 95% in 2023 to a projected 8% by year-end 2026.
The Architecture: How T-Head Built a 3x Chip in One Generation
The Zhenwu M890 is built on a self-developed parallel computing architecture — not a licensed ARM or RISC-V design, but a fully custom instruction set and dataflow fabric optimized specifically for transformer workloads and mixture-of-experts (MoE) models.
Memory and Bandwidth: The Real Battleground
Modern AI training and inference are not limited by raw compute. They are limited by memory capacity and bandwidth. The M890's 144GB of HBM (High Bandwidth Memory) puts it in the same tier as NVIDIA's H100 (80GB) and H200 (141GB), and ahead of the H20 (96GB) that NVIDIA is currently allowed to sell in China under US export controls.
| Specification | NVIDIA H100 SXM | NVIDIA H20 (China) | Zhenwu M890 | Huawei Ascend 910C |
|---|---|---|---|---|
| Memory | 80GB HBM3 | 96GB HBM2e | 144GB HBM | 64GB HBM2e |
| Memory Bandwidth | 3.35TB/s | 4.0TB/s | 800GB/s* | 1.6TB/s |
| Interconnect | 900GB/s NVLink | 400GB/s NVLink | 800GB/s ICN | 392GB/s HCCS |
| Max Cluster | 256 GPUs | 256 GPUs | 128 cards | 256 cards |
| Precision | FP8/FP16/FP32 | FP8/FP16/FP32 | FP4-FP32 | FP16/FP32 |
| Process Node | 4nm TSMC | 4nm TSMC | 7nm (est.) | 7nm SMIC |
*The M890's 800GB/s is inter-chip bandwidth via ICN Switch, not on-chip HBM bandwidth. For full-system bandwidth, the 128-card AL128 supernode achieves aggregate bandwidth in the petabits-per-second range.*
ICN Switch 1.0: The Secret Weapon
The most underreported announcement at the summit was not the M890 itself, but the ICN Switch 1.0 — a dedicated network switching chip designed specifically for AI cluster interconnect. Traditional Ethernet-based clusters suffer from high tail latency and packet overhead. The ICN Switch uses a custom protocol with deterministic routing, enabling 64 M890 chips to interconnect at full bandwidth with near-zero packet loss.
For the Panjiu AL128 supernode, Alibaba coupled two 64-card ICN domains, achieving 128-card scale with P2P latency below 150 nanoseconds. This is not a theoretical design — the AL128 is already live on Alibaba Cloud's Model Studio (Bailian) and supports Qwen, DeepSeek, and Kimi models out of the box.
🧠 工程师笔记: *"ICN Switch的百纳秒级时延不是实验室数据,是阿里云百炼平台已经上线的实测指标。对于Agentic应用来说,一次任务可能涉及20-30次模型调用,时延每降100ns,整体响应时间就能快3-5秒。"*
Engineer's Note: *"The ICN Switch's sub-100ns latency isn't a lab figure — it's a measured metric on Alibaba Cloud's Bailian platform already in production. For Agentic applications, a single task may involve 20-30 model calls. Every 100ns reduction in latency shaves 3-5 seconds off total response time."*
The Full-Stack Play: Chip, Cloud, Model, Inference
Alibaba didn't just announce a chip. It announced a philosophy: the "芯-云-模型-推理" (Chip-Cloud-Model-Inference) integrated stack. This is vertical integration at a level no Western cloud provider has attempted — not even AWS with its Trainium and Inferentia chips, which still rely heavily on NVIDIA for training workloads.
| Layer | Product | Role in Stack |
|---|---|---|
| Chip | Zhenwu M890 + ICN Switch 1.0 | Compute and interconnect fabric |
| Server | Panjiu AL128 Supernode | 128-card rack-scale system |
| Cloud | Alibaba Cloud (Bailian) | Model hosting and API infrastructure |
| Model | Qwen3.7-Max | Frontier LLM trained and deployed on Zhenwu |
| Platform | Qianwen Cloud | Agent-ready product suite |
| Service | Model Studio (Bailian) | Developer API and fine-tuning platform |
Qwen3.7-Max: The Model That Proves the Chip
Announced alongside the M890, Qwen3.7-Max is Alibaba's most ambitious model yet. It features a 1-million-token context window, scores 56.6 on the Artificial Analysis Intelligence Index (ranked #5 globally at launch, the highest-placed Chinese model), and achieved an extraordinary feat: a 35-hour autonomous coding run with 1,158 tool calls and zero human intervention, optimizing the M890's own kernel software by 10x.
| Benchmark | Qwen3.7-Max | Claude Opus 4.6 Max | GPT-5.5 |
|---|---|---|---|
| SWE-Bench Pro | 60.6 | 57.3 | 58.1 |
| Terminal-Bench 2.0 | 69.7 | 64.2 | 67.5 |
| MCP-Atlas | 76.4 | 73.1 | 75.8 |
| GPQA Diamond | 92.4 | 91.3 | 93.1 |
| HLE | 41.4 | 40.0 | 42.2 |
| Intelligence Index v4.0 | 56.6 | 54.2 | 60.2 |
| Context Window | 1M tokens | 200K tokens | 1M tokens |
| Input Price (per 1M tokens) | $2.50 | $5.00 | $7.50 |
| Output Price (per 1M tokens) | $7.50 | $15.00 | $22.50 |
The pricing is aggressively competitive — roughly half the cost of Claude Opus and one-third the cost of GPT-5.5 for equivalent throughput. The 90% cache discount on input tokens ($0.25 per 1M cached tokens) makes it particularly attractive for long-context agentic workflows where the same documents are referenced repeatedly.
💬 网友热评: *"阿里这次是把芯片、模型、云服务打包卖,企业客户买一套就全解决了。别的厂商还在拼单点性能,阿里直接打组合拳。"*
Netizen Comment: *"Alibaba is selling chips, models, and cloud services as a bundled package this time. Enterprise clients get everything in one solution. While competitors are still fighting over single-point performance, Alibaba is throwing combination punches."*
The Numbers That Prove Scale: 560,000 Chips and Counting
The most important number from the summit wasn't a benchmark score. It was 560,000 — the cumulative number of Zhenwu-series chips shipped as of April 2026. That represents roughly 100 billion RMB in annual revenue run rate for T-Head's chip business alone.
Customer Breakdown by Industry
| Industry | Key Customers | Use Cases |
|---|---|---|
| Telecommunications | China Telecom, China Mobile | Large-scale model training, 5G AI optimization |
| Automotive | FAW Group (China First Auto) | Autonomous driving simulation, in-cabin AI |
| Financial Services | Shanghai Pudong Development Bank | Risk modeling, fraud detection, document AI |
| Healthcare | Multiple hospital groups | Medical imaging, drug discovery pipelines |
| Internet/Cloud | Alibaba Cloud, external SaaS | Model serving, inference scaling |
| Manufacturing | Ningbo Steel, others | Predictive maintenance, quality inspection |
| Government | Various provincial data centers | Smart city AI, public safety |
T-Head's VP of Semiconductor, Gao Hui, revealed the roadmap for the next two generations:
| Chip | Target Launch | Expected Performance | Memory | Interconnect |
|---|---|---|---|---|
| Zhenwu M890 | Q2 2026 (now) | 3x 810E | 144GB | 800GB/s |
| Zhenwu V900 | Q3 2027 | 3x M890 (9x 810E) | 216GB | 1200GB/s |
| Zhenwu J900 | Q3 2028 | TBD | TBD | TBD |
The "one generation per year" cadence is faster than NVIDIA's roughly 18-24 month cycle and matches the rapid iteration pace of China's AI model ecosystem. If the V900 hits its targets, Alibaba will have a chip with 9x the performance of the 810E within four years — a trajectory that would put it firmly in the same tier as NVIDIA's post-Blackwell roadmap.
The China AI Chip Landscape: A Market Being Reshaped Overnight
*The Chinese AI chip market is experiencing the most dramatic power shift in semiconductor history — NVIDIA's share collapsed from 95% to a projected 8% in just three years.*
The Zhenwu M890 doesn't exist in a vacuum. It lands in a Chinese AI chip market that is undergoing the most dramatic power shift in semiconductor history.
NVIDIA's China Collapse
In 2023, NVIDIA held 95% of China's AI accelerator market. By the end of 2025, that had fallen to 55%. By Q1 2026, NVIDIA's position is even more precarious — restricted to selling only the compliance-grade H20 chip, with its China revenue projected to fall to 8% market share by year-end.
| Year | NVIDIA | Huawei | T-Head (Alibaba) | AMD | Others |
|---|---|---|---|---|---|
| 2023 | 95% | 2% | <1% | 1% | 1% |
| 2025 (full year) | 55% | 20% | 6.6% | 4% | 14.4% |
| 2026 Q1 (sales) | 15-20% | 35-40% | 5-6% | 10-12% | 27-34% |
| 2026 E (forecast) | 8% | 50% | 6% | 12% | 24% |
NVIDIA CEO Jensen Huang has publicly called the US export restrictions "extremely stupid" and warned that they have effectively "erased" NVIDIA's presence in what was once its second-largest market. The company's China revenue has fallen from $10.3 billion in 2023 to an estimated $1.2 billion in 2026.
The "One Superpower, Multiple Strong Players" Structure
China's domestic AI chip market is consolidating into a clear hierarchy:
| Tier | Players | 2026E Market Share | Strategic Position |
|---|---|---|---|
| Superpower | Huawei Ascend | 50% | Full-stack vertical integration, government backing, largest ecosystem |
| Second Tier | AMD, Hygon | 12-20% | x86/CUDA compatibility bridge for legacy workloads |
| Specialists | Cambricon, T-Head | 6-9% each | Cambricon: inference-optimized ASICs; T-Head: cloud-native training |
| Emerging | Moore Threads, Biren, MetaX | 2-4% each | General-purpose GPU challengers, targeting consumer and edge |
| Niche | Baidu Kunlun, Enflame | 3-5% each | Vertical integration with parent company's AI stack |
Cambricon is particularly aggressive — the company plans to deliver 500,000 AI accelerators in 2026, including 300,000 of its most advanced Siyuan 590/690 chips. If it hits that target, it could leapfrog T-Head in unit volume, though T-Head's higher ASP (average selling price) on the M890 may keep it competitive in revenue terms.
💬 行业分析师观点: *"2026年是中国AI芯片的'分水岭年'。英伟达从95%掉到8%,这不是渐进式替代,是断崖式崩塌。华为吃掉了最大一块,但寒武纪和阿里平头哥正在争夺第二名的位置。"*
Industry Analyst View: *"2026 is the 'watershed year' for Chinese AI chips. NVIDIA falling from 95% to 8% isn't gradual substitution — it's a cliff collapse. Huawei took the biggest slice, but Cambricon and Alibaba's T-Head are now fighting for second place."*
The Competitive Moat: What T-Head Has That Others Don't
*Alibaba's $53 billion capital commitment over three years signals that T-Head's silicon strategy is not a side project — it is a core pillar of the company's future.*
In a crowded field of Chinese AI chip startups, T-Head has several structural advantages that are difficult to replicate:
1. The Alibaba Cloud Distribution Channel
T-Head doesn't need to sell chips one-by-one to enterprises. It sells them by the rack through Alibaba Cloud's existing customer relationships. The Panjiu AL128 is not a reference design — it is a product that Alibaba Cloud deploys, manages, and bills for. This "chip-as-a-service" model dramatically reduces customer friction and allows T-Head to iterate hardware based on real production telemetry.
2. The Qwen Model Flywheel
Unlike standalone chip vendors, T-Head has a captive customer for its most advanced hardware: the Qwen team. Every new Qwen model is trained and optimized on Zhenwu chips first, creating a tight feedback loop between model architecture and hardware design. The 35-hour autonomous optimization run that improved the M890 kernel by 10x is a perfect example of this flywheel in action.
3. The Agentic Era Tailwind
The M890's design — massive memory, ultra-low latency interconnect, and native FP4 precision — is specifically tuned for Agentic AI workloads. Traditional training chips are optimized for large-batch gradient descent. The M890 is optimized for the messy, bursty, high-concurrency reality of thousands of agents making simultaneous inference calls. As the industry pivots from "train one big model" to "run millions of agents," the M890's architectural choices look increasingly prescient.
4. The Capital Commitment
Alibaba CEO Wu Yongming has stated that Alibaba's capital expenditure over the next three years will "far exceed 380 billion RMB" (approximately $53 billion USD), with data center assets growing to 10x their pre-2022 AI boom levels. While not all of that will go to T-Head chips, the commitment signals that Alibaba views its silicon strategy as existential — not a side project, but a core pillar of its future.
| Advantage | T-Head (Alibaba) | Huawei (Ascend) | Cambricon | Independent GPU Startups |
|---|---|---|---|---|
| Captive Cloud | ✅ Alibaba Cloud | ✅ Huawei Cloud | ❌ | ❌ |
| Captive Model | ✅ Qwen series | ✅ Pangu series | ❌ | ❌ |
| Enterprise Distribution | ✅ Massive | ✅ Strong | ⚠️ Building | ⚠️ Limited |
| Government Backing | ⚠️ Moderate | ✅ Extensive | ⚠️ Moderate | ❌ |
| Software Ecosystem | ✅ PAI/Bailian | ✅ MindSpore/CANN | ⚠️ NeuWare | ⚠️ Fragmented |
| Manufacturing Access | ⚠️ 7nm (TSMC/SMIC) | ✅ SMIC N+2 | ⚠️ SMIC N+2 | ⚠️ Limited |
The Risks and Open Questions
No analysis of the Zhenwu M890 would be complete without acknowledging the significant challenges ahead:
1. The Software Gap Remains Real
While T-Head has made enormous strides, the software ecosystem around Zhenwu chips still lags NVIDIA's CUDA. Developers must port models from PyTorch/TensorFlow to T-Head's proprietary stack, and while compatibility layers exist, they require engineering effort. For startups with small teams, this friction can be a dealbreaker.
2. Manufacturing Constraints
The M890 is believed to be manufactured on a 7nm process (likely TSMC or SMIC's N+2 equivalent). As US export controls tighten, access to sub-7nm processes for Chinese companies is increasingly uncertain. The V900's 2027 target assumes continued manufacturing access — a geopolitical variable, not a technical one.
3. The Revenue Concentration Problem
T-Head's 560,000 shipments are impressive, but Alibaba Cloud and its affiliates likely represent a significant share of that total. The true test is whether third-party customers — companies with no Alibaba affiliation — will choose Zhenwu over Ascend, Cambricon, or even AMD. The customer list (China Telecom, FAW, SPD Bank) shows progress, but the ratio of captive-to-independent revenue remains opaque.
4. The "NVIDIA Habit"
China's AI researchers and engineers grew up on CUDA. Even when domestic chips are technically competitive, the muscle memory of CUDA development — the Stack Overflow answers, the GitHub repositories, the university courses — creates a gravitational pull that hardware specs alone cannot overcome. T-Head must invest heavily in education, tooling, and community to break this habit.
💬 开发者社区声音: *"M890硬件参数确实好看,但写 kernel 的时候还是怀念 CUDA 的文档成熟度。平头哥需要把开发者体验做到和 NVIDIA 一样丝滑,否则大厂还是会优先买昇腾。"*
Developer Community Voice: *"The M890's hardware specs look great, but when writing kernels, I still miss CUDA's documentation maturity. T-Head needs to make the developer experience as smooth as NVIDIA's, otherwise big companies will still prefer Ascend."*
The Global Implication: What This Means Beyond China
*The Zhenwu M890 signals a broader shift in global AI infrastructure — as Chinese chips become competitive, Asia-Pacific markets are increasingly looking for non-American AI hardware alternatives.*
The Zhenwu M890 is not just a Chinese story. It is a signal about the future structure of global AI infrastructure.
For Asia-Pacific Markets
Southeast Asian governments and enterprises — Singapore, Malaysia, Indonesia, Thailand — are increasingly looking for non-American AI infrastructure to avoid geopolitical risk. Alibaba Cloud is already the dominant cloud provider in much of APAC. The Zhenwu M890 gives it a fully vertical, non-US-dependent stack to offer these customers. The OpenAI Singapore lab ($300M SGD investment) is a direct response to this threat.
For European Enterprises
European companies face a dilemma: US cloud providers (AWS, Azure, GCP) come with GDPR compliance and US CLOUD Act risks. Chinese cloud providers come with China's National Intelligence Law (2017, Article 7) obligations. The Zhenwu M890 creates a third option — a fully Chinese stack that can be deployed in European data centers by Alibaba Cloud's local subsidiaries, though data sovereignty concerns remain significant.
For NVIDIA and AMD
The Chinese market is not coming back for NVIDIA. The question is whether the Zhenwu architecture — or the broader Chinese chip ecosystem — becomes competitive enough to threaten NVIDIA in non-Chinese markets. That is still years away. But the trajectory is clear: every generation of Zhenwu chips narrows the gap, and the Chinese model ecosystem (Qwen, DeepSeek, Kimi) is increasingly optimized for domestic silicon first.
The Verdict: Is the Zhenwu M890 a NVIDIA Killer?
Not yet. But it doesn't need to be.
The Zhenwu M890 is not trying to win benchmark shootouts against the H100 on neutral ground. It is trying to win the Chinese market — and by that standard, it is succeeding. With 560,000 units shipped, 400+ customers, a 100 billion RMB revenue run rate, and a clear roadmap to V900 and J900, T-Head has built a sustainable, self-reinforcing business that gets stronger with every Qwen model and every Alibaba Cloud deployment.
The real question is not whether T-Head can beat NVIDIA. The real question is whether China needs NVIDIA at all for its domestic AI ambitions. And the answer, as of May 2026, is increasingly: no.
| Dimension | NVIDIA's Position | T-Head's Trajectory |
|---|---|---|
| Global Leadership | ✅ Uncontested | ⚠️ 3-5 years behind |
| China Market | ❌ Collapsing (8% EOY) | ✅ Growing (6% → ?) |
| Software Ecosystem | ✅ CUDA dominance | ⚠️ Building, improving |
| Vertical Integration | ❌ Chip-only | ✅ Full stack (chip→cloud→model) |
| Agentic Optimization | ⚠️ General purpose | ✅ Purpose-built |
| Geopolitical Risk | ❌ Export controls | ✅ Domestic priority |
| Capital Backing | ✅ Massive | ✅ Alibaba's $53B commitment |
💬 国际投资者视角: *"We're no longer asking 'Can China build AI chips?' We're asking 'Which Chinese AI chip company will be the next NVIDIA?' The answer might be: none of them need to be. The Chinese market is big enough to support multiple $10B+ chip companies without ever competing globally."*
International Investor Perspective: *"We're no longer asking 'Can China build AI chips?' We're asking 'Which Chinese AI chip company will be the next NVIDIA?' The answer might be: none of them need to be. The Chinese market is big enough to support multiple $10B+ chip companies without ever competing globally."*
Related Reading
- Huawei Pangu Ultra MoE: 718B Parameters on Chinese Chips
- DeepSeek V4: Million-Token Context and China's AI Sovereignty
- Doubao: ByteDance's 12 Trillion Token Explosion
- MiniMax Talkie: 212 Million Users and the AI Companion Empire
- Kimi K2.6: The Open-Source Coding Revolution
- TARS Embodied Intelligence: 455 Million and the Brain Club
*AIN China tracks the frontier of artificial intelligence in China — from model releases to chip breakthroughs to policy shifts. For daily briefings, subscribe to our newsletter or follow our RSS feed. Data current as of May 30, 2026. Market share projections sourced from Bernstein Research, AspenCore, and China Academy of Information and Communications Technology.*
Editor at AI in China. Tracking Chinese AI companies, funding rounds, and the technologies reshaping global tech. More about me.