Overview of Domestic GPU Manufacturers
China's domestic GPU industry has formed a multi-echelon competitive landscape, with domestic manufacturers capturing an overall market share of 15% in 2025. Leading enterprises and emerging players are jointly driving technological iteration and scenario penetration. The industry currently exhibits the characteristics of "leading enterprises dominating technological breakthroughs and emerging forces breaking through in segmented scenarios." Leading companies are building ecological barriers through full-stack capabilities, while emerging forces are rapidly scaling in vertical fields through differentiated technical routes.
Huawei Ascend, as a leader in the domestic GPU field, has built a technological moat with the Da Vinci architecture, forming a dual-drive model of "computing power iteration + ecological expansion." In terms of technical routes, the Ascend series chips have achieved "doubling computing power every generation." The mass-produced Ascend 910C in 2025 adopts SMIC's 7nm+ process, with FP16 computing power increased to 400 TFLOPS, supporting self-developed HBM HiBL 1.0 memory with a bandwidth of 1.6TB/s, reaching 90% of international flagship levels. Its predecessor, the Ascend 910B (7nm process), outperformed the NVIDIA A100 by 20% in some tests, achieving a balanced configuration of 320 TFLOPS FP16 and 512 TOPS INT8 computing power . The core product matrix covers training-inference all-in-one machines (FusionCube A3000 DS version) and inference all-in-one machines (Atlas series), and has launched the Ascend 384 supernode distributed computing system, achieving 1.6Tbps inter-chip communication bandwidth through 3D packaging technology . In terms of market performance, Huawei plans to produce 450,000 Ascend 910 series chips in 2025 (100,000 910C + 300,000 910B), gathering over 1,200 partners across 20 industries including finance and healthcare, with multiple intelligent computing center projects already implemented .
Biren Technology has achieved differentiated competition in the high-end AI training market through Chiplet technology breakthroughs. Technically, its flagship product BR100 adopts 7nm+ Chiplet packaging, achieving mass production of 2.5D packaging through domestic silicon interposers, with INT8 computing power of 2048 TOPS and 64GB HBM2e memory bandwidth of 2.3TB/s, supporting PCIe 5.0 interface . The core product has passed Alibaba Tongyi large model adaptation verification, with linear acceleration ratio exceeding 95% in thousand-card clusters, supporting trillion-parameter model training, and has entered the supply chain of some data centers .
Emerging players have achieved technological breakthroughs and commercial closed loops by focusing on vertical fields, effectively supplementing leading enterprises. Cambricon has reconstructed computing architecture with the Siyuan(思元) series chips. The Siyuan(思元) 590 adopts Chiplet technology to achieve INT8 computing power of 560 TOPS, with Q1 2025 shipments increasing by 4230% year-on-year, and net profit reaching 355 million yuan, achieving large-scale deployment in intelligent driving and smart city scenarios .
To comprehensively assess the technological gap between Chinese domestic GPUs and internationally leading products, this chapter constructs a three-dimensional analysis framework of "parameter comparison - measured performance - energy efficiency analysis," revealing the development status of domestic GPUs in hardware specifications, actual performance, and ecological construction through multi-dimensional data.
Horizontal comparison tables show significant gaps in core indicators between representative domestic products (Ascend 910B) and NVIDIA H100. In terms of process technology, Ascend 910B adopts 7nm process, lagging 1-2 generations behind H100's 4nm, directly affecting chip density and power consumption control. In computing performance, Ascend 910B's FP16 computing power of 320 TFLOPS is only 32.4% of H100's 989 TFLOPS. The memory bandwidth gap is even more prominent, with 392GB/s versus 3TB/s indicating approximately 7.7x difference in data throughput capacity, which directly leads to insufficient parallel processing efficiency in large model training scenarios [10].
| Indicator | Domestic Representative (Ascend 910B) | NVIDIA H100 | Gap |
|---|
| Process Technology | 7nm | 4nm | 1-2 generations behind |
| FP16 Computing Power | 320 TFLOPS | 989 TFLOPS | ~3x gap |
| Memory Bandwidth | 392GB/s | 3TB/s | ~7.7x gap |
| Ecosystem Maturity | MindSpore/CANN | CUDA | 5+ years in software |
In practical application scenarios, hardware parameter gaps are further translated into performance differences. At the architectural level, domestic GPUs compensate for some hardware disadvantages through operator optimization strategies. For example, Ascend 910B's fused operators designed for convolutional neural networks improve computational efficiency in specific scenarios, but overall architectural innovation remains insufficient. Ecological shortcomings constitute a more critical constraint, with current domestic GPU compatibility with CUDA below 60%, resulting innumerous AI systems relying on CUDA acceleration libraries requiring additional adaptation [10].
The application expansion of China's domestic GPU industry shows significant scenario-specific characteristics, with each field following a progressive logic of "policy-driven - technical adaptation - commercial implement." From government-enterprise information innovation to AI computing, from industrial optimization to consumer markets, differentiated breakthrough paths and market patterns have been formed.
At the policy-driven level, the "Eastern Data and Western Computing" project has become the core driver. This project has promoted rapid expansion of western data center clusters, directly driving a surge in domestic GPU procurement demand. 2025 data shows that domestic chips account for 28% of GPU procurement in western data centers, an increase of 17 percentage points from the project's initial stage, forming a pattern of "computing power shifting westward, domestic priority."
In recent years, China's domestic GPU industry has shown the development characteristics of "accelerated technological breakthroughs, expanding market scale but prominent core bottlenecks." The following analyzes from three dimensions: technological achievements, three-dimensional challenges, and breakthrough paths.
Domestic GPUs have made significant progress in advanced packaging technology application and computing density improvement. Through Chiplet integration and 3D stacked packaging technological innovations, the generational gap in process technology has been effectively compensated. For example, a leading enterprise's GPU product designed with 8 small-chip integration achieved performance equivalent to 7nm process on SMIC's 14nm FinFET process platform, with computing density reaching 3.2 TFLOPS/mm², an increase of 58% over the previous generation, meeting basic computing power requirements for AI training scenarios.
Hardware Level: Process technology still lags, with SMIC's 14nm process achieving mass production with 92% yield, but transistor density is 5x lower than TSMC's 3nm process, resulting in 30-40% lag in energy efficiency ratio for high-end GPUs.
Software Level: Ecological barriers are difficult to break, with CUDA ecosystem occupying 85% of the global GPU software development market. Domestic GPUs require secondary adaptation for mainstream frameworks like PyTorch and TensorFlow, increasing application migration costs by an average of 40%.
Supply Chain Level: U.S. export controls directly impact core component supply, with HBM imports dropping 35% year-on-year in 2024, leading to delivery cycles for high-end GPUs equipped with HBM3 extending to 18 weeks. Some enterprises are forced to adopt GDDR6 alternatives, resulting in 50% bandwidth performance loss.
Industrial policies and capital investment are forming a joint force to promote breakthroughs. The third phase of the Large Fund hasexplicitly allocated 15 billion yuan in special investment, focusing on GPU chip design, advanced packaging, and HBM alternative material research and development, with 30% of funds earmarked for Chiplet packaging technology industrialization. On the manufacturing side, SMIC's 14nm process yield has stabilized at 92%, with monthly capacity increased to 120,000 wafers, providing reliable mass production guarantee for domestic GPUs. Combining technological iteration speed and capacity ramp-up curves, it is expected that by 2026-2028, domestic high-end GPUs will achieve further global market share growth.