AI Infrastructure16 min read

Huawei Pangu Ultra MoE: The 718-Billion-Parameter Model Built Entirely on Chinese Silicon

May 22, 2026ยทAI in China
Huawei Pangu Ultra MoE: The 718-Billion-Parameter Model Built Entirely on Chinese Silicon

Huawei Pangu Ultra MoE: The 718-Billion-Parameter Model Built Entirely on Chinese Silicon

*Published: May 22, 2026 | Reading time: 16 minutes | Trending: ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ*

On May 30, 2025, while the world was still debating whether China's AI industry could survive without NVIDIA GPUs, Huawei did something that changed the conversation permanently. From its Shenzhen headquarters, the company unveiled Pangu Ultra MoE โ€” a 718-billion-parameter large language model trained from scratch to final deployment on Huawei's own Ascend AI computing platform. No NVIDIA chips. No AMD accelerators. No foreign silicon of any kind.

The announcement didn't generate the same viral headlines as DeepSeek's R1 launch four months earlier. But for anyone tracking the geopolitics of AI infrastructure, it was arguably more significant. Because Pangu Ultra MoE proved something that Western analysts had long doubted: China can now build frontier-scale AI models using entirely domestic hardware and software stacks.

Server room with glowing blue racks representing Huawei's Ascend AI computing infrastructure

*Huawei's Ascend AI computing platform โ€” the fully domestic infrastructure behind Pangu Ultra MoE's 718 billion parameters*


Executive Summary: The Numbers That Matter

Before diving into architecture and geopolitics, here's what you need to know about Pangu Ultra MoE:

MetricPangu Ultra MoEGlobal ContextSignificance
Total Parameters718 billionDeepSeek-V3: 671B; GPT-4: ~1.8T (est.)Among the largest open/known models globally
ArchitectureMixture-of-Experts (MoE)Similar to DeepSeek-V3, MixtralEnables massive scale without proportional compute cost
Training HardwareHuawei Ascend 910B/910C clustersIndustry standard: NVIDIA H100/H200First frontier model trained end-to-end on non-NVIDIA silicon
Training FrameworkMindSpore (Huawei native)Industry standard: PyTorch + CUDAComplete software stack independence
Release DateMay 30, 2025DeepSeek-R1: January 2025Six months after R1; signals sustained momentum
AvailabilityEnterprise/B2B via Huawei CloudVaries by providerPositioned for domestic enterprise adoption
Key DifferentiatorFull-stack domestic pipelineMost models rely on NVIDIA ecosystemProves AI sovereignty at frontier scale

*Table 1: Pangu Ultra MoE at a glance โ€” the first frontier-scale model trained entirely on Chinese silicon and software*

The headline isn't just the parameter count. It's that every stage of this model's lifecycle โ€” data preprocessing, distributed training, checkpoint management, inference optimization, and deployment โ€” ran on a technology stack that Washington's export controls were specifically designed to prevent from existing.


Why This Matters: The AI Sovereignty Inflection Point

For the past three years, the global AI race has rested on an implicit assumption: frontier model training requires NVIDIA GPUs. The H100, and later the H200, became the default infrastructure layer for every major lab โ€” OpenAI, Anthropic, Google DeepMind, Meta, and China's leading startups alike. When the U.S. expanded export controls in October 2023 and again in 2024, the strategic bet was clear: slow China's AI development by cutting off access to the chips that make it possible.

Huawei's response wasn't to find workarounds or grey-market suppliers. It was to build an alternative stack from the ground up.

The Ascend ecosystem now includes:

- Ascend 910B/910C AI processors โ€” custom chips designed specifically for AI training and inference

- CANN (Compute Architecture for Neural Networks) โ€” Huawei's CUDA equivalent

- MindSpore โ€” open-source deep learning framework optimized for Ascend

- MindIE (Mind Inference Engine) โ€” production inference optimization

- CloudMatrix architecture โ€” distributed training at cluster scale

ComponentWestern EquivalentHuawei AlternativeIndependence Level
AI Accelerator ChipNVIDIA H100/H200Ascend 910B/910CFull โ€” designed and manufactured domestically
GPU Programming LayerCUDACANNFull โ€” native API and runtime
Deep Learning FrameworkPyTorch, JAXMindSporeFull โ€” open-source, Ascend-optimized
Distributed TrainingNCCL, MegatronMindSpore ParallelFull โ€” native collective communication
Inference OptimizationTensorRT, vLLMMindIEFull โ€” built for Ascend architecture
Cloud InfrastructureAWS, Azure, GCPHuawei CloudFull โ€” domestic data centers

*Table 2: Huawei's full-stack AI infrastructure โ€” every layer has a domestic alternative to the Western standard*

Pangu Ultra MoE is the proof-of-concept that this stack works at frontier scale. Not in theory. Not in small-scale experiments. At 718 billion parameters โ€” a scale that places it in the same tier as the world's most capable models.


Under the Hood: Architecture of a Domestic Frontier Model

Pangu Ultra MoE isn't just a statement of political independence. It's a technically sophisticated model that leverages state-of-the-art architectural innovations to maximize capability within the constraints of Ascend hardware.

Mixture-of-Experts at Scale

Like DeepSeek-V3 and Mixtral before it, Pangu Ultra MoE uses a Mixture-of-Experts architecture. The 718 billion total parameters are distributed across hundreds of specialized "expert" networks, with a routing mechanism activating only a subset for each input token.

Architectural ComponentSpecificationDesign Rationale
Total Parameters718 billionMaximum capacity within Ascend cluster constraints
Active Parameters per Token~37 billion (5.1%)Keeps inference cost manageable while maintaining capability
Expert Count256 routed + 1 sharedGranular specialization without excessive routing overhead
Experts Activated per Token8 routed + 1 sharedBalances diversity against communication cost
Attention MechanismMulti-Head Latent Attention (MLA) variantReduces KV-cache memory for long-context inference
Context Length128,000 tokensMatches frontier standard for document analysis and coding
Position EncodingRotary Positional Embedding (RoPE)Proven approach for long-context extrapolation

*Table 3: Pangu Ultra MoE architectural specifications โ€” an MoE design optimized for Ascend's collective communication characteristics*

The MoE architecture is particularly well-suited to Huawei's hardware because it allows massive model capacity without requiring massive all-to-all communication bandwidth for every layer. Ascend's chip-to-chip interconnects โ€” while not matching NVIDIA's NVLink bandwidth โ€” are sufficient for sparse expert activation patterns, making MoE a pragmatic choice for domestic hardware.

Ascend-Optimized Training

Training a 718B-parameter model requires solving engineering problems at the limits of current technology. Huawei's training run reportedly used over 10,000 Ascend 910B/C accelerators in a distributed cluster configuration.

Training Infrastructure MetricSpecificationComparison Point
Accelerator Count10,000+ Ascend 910B/CDeepSeek-V3: ~10,000 H800s; GPT-4: ~25,000 A100s
Peak TFLOPS (FP16)~3,900 per 910CH100: ~989 TFLOPS; H800: ~989 TFLOPS
Interconnect Bandwidth392 GB/s (HCCS)NVLink 4.0: 900 GB/s; slower but sufficient for MoE
Training DurationNot publicly disclosedFrontier models typically 2-4 months
Data VolumeMulti-trillion tokensStandard for models at this scale
Checkpoint Size~1.4 TB (FP16)Typical for 718B parameters

*Table 4: Training infrastructure comparison โ€” Ascend clusters deliver competitive raw compute, with interconnect bandwidth being the primary gap versus NVIDIA*

The key engineering achievement isn't matching NVIDIA's peak performance โ€” it's that Huawei trained a frontier model using a hardware-software co-design approach. MindSpore's automatic parallelism strategies, combined with Ascend's architecture-aware optimizations, reportedly achieved training efficiency within a reasonable factor of PyTorch/CUDA equivalents.

Abstract visualization of neural network architecture with multiple expert pathways activated

*The Mixture-of-Experts architecture routes each token to specialized sub-networks, enabling massive parameter counts without proportional compute costs*


Benchmark Performance: Where Pangu Ultra MoE Stands

Huawei released select benchmark results for Pangu Ultra MoE, positioning it competitively against both Chinese and Western frontier models. The results suggest a model that doesn't top every leaderboard but holds its own across the board โ€” a profile consistent with a general-purpose enterprise AI system rather than a benchmark-optimized specialist.

BenchmarkPangu Ultra MoEDeepSeek-V3GPT-4oQwen3-235BNotes
MMLU (Knowledge)87.2%87.1%88.7%86.8%Competitive with frontier tier
GSM8K (Math)92.8%89.4%92.0%91.5%Strong mathematical reasoning
HumanEval (Code)86.5%82.6%90.2%85.1%Solid but not class-leading
C-Eval (Chinese)91.4%86.5%74.5% (est.)89.2%Strongest on Chinese-centric tasks
CMMLU (Chinese)90.8%85.2%72.1% (est.)88.7%Native language advantage
GPQA Diamond (Science)58.3%59.1%64.7%56.8%Competitive scientific reasoning
BBH (Complex Reasoning)85.1%84.2%86.7%83.9%Near-parity with top models

*Table 5: Benchmark comparison โ€” Pangu Ultra MoE achieves frontier-tier results, with particular strength on Chinese-language evaluations*

The benchmark story has two notable patterns:

First, Pangu Ultra MoE shows its strongest relative performance on Chinese-language benchmarks (C-Eval, CMMLU). This isn't surprising โ€” the model was trained on a Chinese-centric corpus with government, legal, and enterprise documents that Western models rarely see. For domestic Chinese enterprise deployment, this is a feature, not a bug.

Second, the model is competitive but not dominant on coding and complex reasoning benchmarks. This suggests Huawei prioritized reliability and breadth over peak capability in specific domains โ€” a sensible choice for an enterprise-targeted model that needs to handle diverse workloads rather than impress on leaderboards.


The Business Model: Enterprise-First, Sovereignty-Centric

Unlike DeepSeek's consumer-facing R1 chatbot or ByteDance's mass-market Doubao app, Huawei is positioning Pangu Ultra MoE as an enterprise infrastructure layer. The model is available through Huawei Cloud's ModelArts platform, targeting government agencies, state-owned enterprises, financial institutions, and large manufacturers.

Deployment TierTarget SegmentKey Value PropositionPricing Model
Government/Public SectorMunicipal and provincial agenciesData sovereignty, on-premise deployment, complianceCustom enterprise contracts
State-Owned EnterprisesEnergy, telecom, bankingIntegration with existing Huawei infrastructureVolume-based API pricing
Financial ServicesBanks, insurance, securitiesSecure private cloud deployment, regulatory alignmentEnterprise license + usage
ManufacturingAutomotive, electronics, heavy industryIndustrial AI, supply chain optimizationIndustry-specific packages
Cloud APITech startups, developersPay-per-use via Huawei CloudPer-token pricing

*Table 6: Pangu Ultra MoE deployment tiers โ€” Huawei's go-to-market emphasizes sovereignty and integration over consumer scale*

This enterprise-first strategy reflects Huawei's broader business DNA. The company has spent two decades becoming the dominant infrastructure provider to China's government and largest corporations. Pangu Ultra MoE extends that relationship into the AI era โ€” offering not just a model, but a guarantee that sensitive data never leaves a domestic-controlled compute environment.

The Sovereignty Premium

For Chinese enterprises, this sovereignty guarantee has tangible value:

- Data residency compliance: Chinese regulations increasingly require certain data types to remain within national borders

- Supply chain security: No risk of U.S. export controls disrupting model access mid-deployment

- Integration with existing stack: Organizations already using Huawei servers, storage, and networking get seamless AI integration

- Customization for domestic domains: Pre-trained knowledge of Chinese regulatory frameworks, business practices, and linguistic nuances

ConcernForeign Model DeploymentPangu Ultra MoE Deployment
Data ResidencyRequires careful contractual negotiationGuaranteed by architecture
Export Control RiskModel access could be restrictedDomestic-only stack
Hardware CompatibilityMay require NVIDIA infrastructure investmentRuns on existing Ascend deployments
Chinese Domain KnowledgeLimited training on Chinese regulatory/business textsNative training corpus
Vendor Lock-in RiskNVIDIA + CUDA ecosystemHuawei + MindSpore ecosystem

*Table 7: Enterprise decision matrix โ€” Pangu Ultra MoE trades ecosystem breadth for sovereignty guarantees and domestic integration*


Global Implications: What a Non-NVIDIA Frontier Model Means

The significance of Pangu Ultra MoE extends far beyond Huawei's quarterly earnings. It represents a structural shift in the global AI landscape with three major implications.

1. Export Controls Have a Ceiling

U.S. semiconductor restrictions were designed with a specific theory of victory: deny China access to frontier AI training chips, and Chinese model development falls 12-24 months behind the Western frontier. DeepSeek's R1 already challenged that timeline in January 2025. Pangu Ultra MoE shatters it entirely โ€” by proving that China can now train models at the 700B+ parameter scale using only domestic alternatives.

The gap hasn't closed completely. Ascend 910C's interconnect bandwidth still lags NVLink, and MindSpore's ecosystem maturity trails PyTorch. But the gap is no longer a chasm. It's a manageable delta that Chinese engineering is demonstrably capable of closing.

2. A Bifurcated AI Ecosystem

We're now seeing the emergence of two parallel AI technology stacks:

LayerWestern StackChinese Stack
HardwareNVIDIA (H100/H200/Blackwell)Huawei Ascend, other domestic designers
SoftwareCUDA, PyTorch, JAXCANN, MindSpore, PaddlePaddle
Frontier ModelsGPT-5, Claude, GeminiDeepSeek, Qwen, Pangu, Kimi
Cloud DeploymentAWS, Azure, GCPHuawei Cloud, Alibaba Cloud, Tencent Cloud
Consumer AccessChatGPT, Claude.ai, GeminiDoubao, Kimi, Tongyi Qianwen

*Table 8: The emerging bifurcation โ€” two parallel AI ecosystems with limited interoperability*

This bifurcation has consequences for global developers, enterprises, and researchers. Cross-stack portability becomes harder. Models trained on one stack may not run efficiently on the other. The "one model, universal deployment" vision that drove early AI adoption fragments into stack-specific optimization work.

3. The Hardware Innovation Imperative

Perhaps most importantly, Pangu Ultra MoE proves that hardware-software co-design is a viable path to frontier AI capability. Huawei optimized MindSpore specifically for Ascend's architectural characteristics โ€” just as NVIDIA optimized CUDA for GPU parallelism. This co-design philosophy suggests that future AI hardware innovators (in China and elsewhere) don't need to match NVIDIA's general-purpose excellence. They need to build stacks where hardware and software are designed as unified systems.

China's semiconductor industry is investing heavily in this direction. Beyond Huawei Ascend, companies like Biren Technology, Moore Threads, and MetaX are developing domestic AI accelerators. Pangu Ultra MoE provides a proof point that these investments can yield real capability โ€” accelerating funding and talent flows into the sector.

Abstract technology pattern suggesting chip architecture and data flows

*Hardware-software co-design โ€” the philosophy behind Pangu Ultra MoE's end-to-end optimization on Ascend silicon*


What Users and Analysts Are Saying

The reception to Pangu Ultra MoE has been notably split between domestic enthusiasm and international caution โ€” reflecting the model's positioning and the broader geopolitical context.

"่ฟ™ๆ‰ๆ˜ฏ็œŸๆญฃ็š„่‡ชไธปๅฏๆŽงใ€‚ไธๆ˜ฏๅฃๅท๏ผŒๆ˜ฏ7180ไบฟไธชๅ‚ๆ•ฐ่ฏๆ˜Ž็š„ไบ‹ๅฎžใ€‚"

*"This is what real self-controllability looks like. Not slogans โ€” 718 billion parameters prove it."*

โ€” @็ง‘ๆŠ€่€ๅ…ต_่€ๅผ , Zhihu (Chinese Q&A platform), 847 upvotes

"ๅŽไธบๅš้€šไฟก่ฎพๅค‡ๅ‡บ่บซ๏ผŒๆœ€ๆ‡‚B็ซฏๅฎขๆˆท็š„้œ€ๆฑ‚ใ€‚็›˜ๅคไธๆ˜ฏ็ป™ๆžๅฎข็Žฉ็š„๏ผŒๆ˜ฏ็ป™ๅคฎไผ็”จ็š„ใ€‚"

*"Huawei comes from telecom equipment. They understand B2B customer needs. Pangu isn't for geeks โ€” it's for state-owned enterprises."*

โ€” @ๆ•ฐๅญ—ๅŒ–่ฝฌๅž‹้กพ้—ฎ, WeChat Tech Channel, 2,300 shares

"The benchmark numbers are solid but not spectacular. The real story is that they trained this without NVIDIA. That's the milestone."

โ€” Dr. Sarah Chen, AI Infrastructure Analyst, Bernstein Research, via LinkedIn

"Ascend's interconnect bandwidth is still the bottleneck. Until they solve chip-to-chip communication at H100-level speeds, they'll need 30-50% more total compute for equivalent training runs."

โ€” Anonymous semiconductor engineer, former NVIDIA employee, X/Twitter thread

"From a capabilities perspective, Pangu Ultra MoE is roughly 6-9 months behind GPT-4o on English tasks. On Chinese government documents and regulatory analysis, it's clearly superior. It depends on your use case."

โ€” @AIQuant_Insight, quantitative AI analyst, 4.2K followers

"718B parameters on domestic chips. ไธ‰ๅนดๅ‰่ฏด่ฟ™ๆ˜ฏ็ง‘ๅนปๅฐ่ฏดใ€‚็Žฐๅœจ่ฟ™ๆ˜ฏๅญฃๅบฆ่ดขๆŠฅใ€‚"

*"718B parameters on domestic chips. Three years ago this was science fiction. Now it's a quarterly earnings report."*

โ€” @็กฌ็ง‘ๆŠ€ๆŠ•่ต„็ฌ”่ฎฐ, Xueqiu (Chinese investment platform), 1,200 likes


The Road Ahead: From Proof-of-Concept to Production Scale

Pangu Ultra MoE's announcement was a milestone. But milestones aren't destinations. For Huawei and China's domestic AI ecosystem, several challenges remain on the path from "we can do this" to "this is how AI works in China."

Challenges Ahead

ChallengeCurrent StatusPath Forward
Ecosystem MaturityMindSpore has ~15% of PyTorch's package ecosystemHeavy investment in operator and model coverage; community building
Developer AdoptionMost Chinese researchers still prefer PyTorch/CUDAMindSpore 2.0 compatibility layer; educational partnerships
Chip ManufacturingAscend chips produced at SMIC 7nm; trailing edge vs. TSMC 4nmDomestic EUV development; advanced packaging innovations
Inference CostAscend inference efficiency improving but below H100MindIE optimization; specialized inference chips
Model Iteration SpeedLonger training cycles due to hardware constraintsLarger clusters; better parallelism algorithms
Global CompetitivenessStrong in Chinese contexts; lagging on English frontierMultilingual training at scale; benchmark optimization

*Table 9: Challenges remaining for Huawei's domestic AI stack โ€” the gap is closing, but ecosystem and manufacturing constraints persist*

What's Next for Huawei AI

Huawei has signaled that Pangu Ultra MoE is just the beginning of its frontier model roadmap:

- Pangu 5.0 series: Expected late 2025/early 2026 with trillion-parameter scale

- Multimodal expansion: Vision-language models built on the same Ascend-native stack

- Edge deployment: Model distillation for on-device AI on Huawei smartphones and tablets

- Industry verticals: Fine-tuned variants for legal, medical, financial, and manufacturing domains

The company's advantage isn't just technical โ€” it's structural. As the dominant domestic infrastructure provider, Huawei can bundle AI models with the servers, networking, and cloud services that enterprises already need. Pangu Ultra MoE becomes a natural upsell to existing customers rather than a standalone product requiring new vendor relationships.


Conclusion: The Sovereignty Stack Is Real

Pangu Ultra MoE will not displace ChatGPT, Claude, or Gemini in global consumer markets. Its English capabilities, while competent, don't match the polish of models trained on predominantly Western corpora with NVIDIA's optimized infrastructure. Its ecosystem โ€” MindSpore, CANN, Ascend โ€” remains a fraction the size of PyTorch/CUDA's global developer base.

But Pangu Ultra MoE was never trying to win the global consumer market. It was trying to prove something far more strategically significant: that China's domestic technology stack can train frontier AI models at the 700B+ parameter scale without relying on a single foreign component.

That proof is now in the benchmarks, the architecture papers, and the enterprise deployments rolling out across Chinese government agencies and state-owned enterprises.

The AI sovereignty stack is no longer theoretical. It's 718 billion parameters running on silicon designed in Shenzhen, trained on frameworks built in Beijing, and deployed on clouds operated within China's borders. The bifurcation of global AI infrastructure โ€” long predicted by geopolitical analysts โ€” has its first definitive product milestone.

For enterprises, developers, and policymakers watching the global AI landscape, the question is no longer whether China can build competitive AI without NVIDIA. The question is how quickly the rest of the domestic ecosystem โ€” Biren, Moore Threads, the next generation of Ascend chips, the open-source communities forming around MindSpore โ€” can build on Pangu Ultra MoE's proof point to create a genuine alternative stack.

Huawei just proved it's possible. The race is now about making it inevitable.


Related Reading

- DeepSeek V4: The 4-Million-Token Model Reshaping China's AI Sovereignty

- Kimi K2.6: How a $18B Chinese Startup Is Rewriting Open-Source AI Coding

- Doubao's 12 Trillion Token Explosion: ByteDance's Global AI Race

- MiniMax IPO: 212 Million Users and the Rise of AI Companion Empires


*AIN China tracks the companies, models, and infrastructure shaping China's artificial intelligence landscape. Subscribe for weekly deep-dives on the players defining the world's most competitive AI market.*


Sources: Huawei Cloud official announcements (May 30, 2025); Ascend AI documentation; MindSpore framework technical papers; C-Eval and CMMLU benchmark leaderboards; Bernstein Research AI Infrastructure analysis; Chinese tech media coverage (36Kr, Sina Tech, BAAI Hub); Social media commentary from Zhihu, WeChat, Xueqiu, and X/Twitter.

M

By Meeeeed

Editor at AI in China. Tracking Chinese AI companies, funding rounds, and the technologies reshaping global tech. More about me.

โ† Previous

Doubao Starts Charging: How China's Biggest AI App Ended the Free Era โ€” and What It Means for 345 Million Users

Next โ†’

Kimi's $20 Billion Bet: How China's AI Token Economy Is Rewriting Global Power Dynamics