Corporate Worldwide

Back to Main Menu

Corporate

Enterprise

Carrier

Consumer

Huawei Cloud

Digital Power

Back to Main Menu

Select a Country or Region

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy

News & Events

China Mobile Hubei and Huawei Complete China's First Carrier Industry Validation of AI Inference Acceleration Solution

[Shanghai, China, June 24, 2026] At MWC Shanghai 2026, China Mobile Communications Group Hubei Co., Ltd (China Mobile Hubei for short) and Huawei announced the successful live-network validation of Huawei's AI Inference Acceleration Solution, a first for China's carrier industry. Powered by Huawei's OceanStor A800 storage, Ascend A3 SuperPoD, and Unified Cache Manager (UCM), the solution delivers up to a 372% improvement in token throughput for long-sequence artificial intelligence (AI) inference workloads. This milestone provides important technical support for the efficient deployment of AI computing services by carriers.

Technological Innovation: Huawei UCM Eliminates Long-Sequence Inference Bottlenecks

As AI applications rapidly pivot toward AI agents, long-sequence scenarios—such as code generation and multi-turn dialogues—are becoming increasingly common. However, the limited capacity of conventional on-chip memory and dynamic random-access memory (DRAM) significantly constrains KV cache hit ratios, capping overall performance.

Huawei introduced the UCM in 2025 to directly address this challenge. By using external high-performance storage, UCM shatters conventional capacity limitations of on-chip memory and DRAM, enabling petabyte-scale KV cache capabilities. The solution implements full-lifecycle, hierarchical management and scheduling of KV cache, significantly expanding the context window for single-turn dialogues. For multi-turn dialogues, UCM reuses historical KV cache to eliminate redundant computations, delivering an optimized inference experience at lower inference costs.

Dramatic Performance Gains: Significant Improvements in both TTFT and TPS, Shown in Multi-Model Validation

The validation deployed the vLLM-Ascend framework in China Mobile Hubei's live network environment and simulated long-sequence inputs, ranging from 8K to 190K tokens, across mainstream models such as MiniMax M2.5 and GLM-5.1. Key findings are as follows:

MiniMax M2.5: With UCM enabled, the time to first token (TTFT) was improved by 26%–62%, alongside a substantial boost in tokens per second (TPS) per NPU. Looking closely at different sequence lengths, TPS was improved by 58% at a 64K sequence length and surged by 78% in a 128K long-sequence environment.
GLM-5.1: TTFT was improved by 51%–93%, while TPS soared by 56%–372%. TPS rose by 313% at a 64K sequence length and skyrocketed by 372% in a 128K long-sequence environment.

The test results indicate that, as context length increases, the advantages of the AI Inference Acceleration Solution become even more pronounced. The solution effectively resolves the KV cache capacity bottleneck commonly encountered in long-sequence inference.

Value Amplified: Powering Mission-Critical Services in the Agentic Era

A representative from China Mobile Hubei noted: "Hubei is located in the core area with only 10 milliseconds of latency to the nation's eight major computing power hubs. This test validates the necessity of storage-compute-network collaboration. In scenarios such as AI agent interaction and code generation, the AI Inference Acceleration Solution can increase throughput by over 50%, laying a solid foundation for the large-scale deployment of China Mobile Hubei's AI services."

Industry Outlook: Reshaping AI Data Infrastructure

Michael Qiu, President of the Huawei Global Data Storage Marketing & Solution Sales Department, remarked: "With major carriers launching token packages, the large-scale adoption of AI agents has clearly entered a new phase. Token consumption is expected to grow exponentially in the future. The AI Inference Acceleration Solution not only significantly reduces TTFT, but also helps slash token costs, enabling carriers to build efficient and green AI computing infrastructure."

The successful validation marks a major step forward in the collaborative optimization of AI computing infrastructure for carriers, providing a replicable technical model for the global AI industry.

MWC Shanghai 2026 will be held from June 24 to June 26 in Shanghai, China. During the event, Huawei will showcase its latest products and solutions in Hall N1 of the Shanghai New International Expo Center (SNIEC).

The ICT industry is rapidly moving towards an era of token monetization. Huawei is collaborating with global carriers and partners to explore 5G-A high uplink and experience monetization, as well as AI-powered business upgrade, through enhanced connectivity and compute. Together, we will seize opportunities presented by token monetization.

For more information, please visit: https://carrier.huawei.com/minisite/mwcs2026/en/

More News

Learn More

Select a Country or Region

Products

Industry Solutions

Consumer

Huawei Cloud

Enterprise

Carrier

Digital Power

Partners

Developers

Training & Certification

About Us

News & Events

Explore More

China Mobile Hubei and Huawei Complete China's First Carrier Industry Validation of AI Inference Acceleration Solution

More News

Select a Country or Region

Products

Industry Solutions

Consumer

Huawei Cloud

Enterprise

Carrier

Digital Power

Partners

Developers

Training & Certification

About Us

News & Events

Explore More

China Mobile Hubei and Huawei Complete China's First Carrier Industry Validation of AI Inference Acceleration Solution

More News

Online Services

Consumer Products

Huawei Cloud

Enterprise

Carrier Network