Five Trends Shaping Next-gen, Data-intensive Supercomputing
Supercomputing is a key technology that can transform society and open the door to humanity’s next stage of technological evolution. Many nations are prioritizing supercomputing R&D, including the US, Japan, the EU, Russia, and China. China, for example, has 188 entries on the June 2021 release of the TOP500 most powerful supercomputers, with Tianhe-2A ranking 7th. Before that, Tianhe 2, Tianhe 3, and then Sunway Taihu Light topped the list ten consecutive times.
Nevertheless, new technologies like cloud computing, big data, artificial intelligence, and blockchain have drawn attention away from supercomputing. And alongside a limited application ecosystem coupled with an insufficiently diverse machine-hour supercomputing service model, China’s supercomputing sector has much room to develop.
Seeing its broader value in terms of socioeconomic progress, an increasing number of provinces and cities in China are establishing supercomputing centers and deploying next-gen supercomputing systems. China currently runs 10 national-level supercomputing centers in major cities, including Tianjin, Shenzhen, Guangzhou, and Xi'an, with many more planned.
To transform supercomputing centers from computing service providers into integrated data value providers, China is prioritizing five supercomputing trends: diversified computing, all-optical networks, intensified data, containerized applications, and converged architecture.
Diversified computing is becoming mainstream. Traditional high-performance computing (HPC) systems use CPUs for double-precision floating-point computing, while emerging supercomputing systems use CPUs, GPUs, and FPGAs for more powerful parallel computing. Today’s industry in China is stepping up R&D and the deployment of homegrown microprocessors and accelerators, improving the efficiency of diversified heterogeneous computing and improving the efficiency of diversified hybrid applications.
Optical switching technology is maturing and networks are becoming all-optical. Remote Direct Memory Access over Converged Ethernet (RoCE) and lossless data technologies make it possible for the computing, storage, and management networks in a supercomputing center to be integrated into a single box. An all-optical supercomputing Internet between supercomputing centers has been proposed to facilitate resource sharing.
Data is becoming intensified. Traditional supercomputing applications, such as weather forecasting, energy exploration, and satellite remote sensing, will generate increasing amounts of data as precision improves. Moreover, more than 80% of emerging supercomputing applications, including autonomous vehicles, genetic testing, and brain science, generate data at the petabyte scale. Larger data volumes, more data types and concurrent tasks, and higher reliability requirements demand more from supercomputing storage to deliver higher bandwidth, IOPS, reliability, and support for massive concurrent access.
Containerized applications. Containerization technology can encapsulate the supercomputing operating environment to decouple supercomputing applications from the underlying hardware, making supercomputing easier to use for the majority of non-expert users. Containerization technology is currently open-source, making ecosystem development more viable.
Supercomputing architecture is converging. Aligned with the first four trends, supercomputing will adopt a heterogeneous, multi-state composite architecture to converge siloed resources, data, and applications. This means a unified, converged heterogeneous system in which CPUs, GPUs, and other dedicated computing power systems are scheduled on a unified service scheduling platform. Various supercomputing applications will be managed on a unified application platform and data assets carried by a unified data foundation. Data silos will be broken down, ensuring that no data needs to be migrated if the data foundation remains unchanged, optimizing TCO and boosting ROI.
Data-intensive supercomputing at the core
Of the five trends, data intensification is the most significant. Traditionally, supercomputing is mainly used to solve computing problems. Customers bring data on hard drives to a supercomputing center and copy the results back onto hard drives, leaving no data for long-term storage in the center. However, the evolution of supercomputing has led to both changes and new challenges.
First, the amount of data involved in computing has increased dramatically. For example, the improvement in precision by applications like weather forecasting and satellite remote sensing has doubled data volumes. More types of data, both structured and unstructured, are involved in computing. For example, image data is directly used for computing in applications like brain science and electron cryomicroscopy (CryoEM).
Second, computing power has increased dramatically. Currently, few single tasks can consume all the computing power in a cluster, so multiple tasks run concurrently in most cases. The 100 PFLOPS HPC Center at Shanghai Jiao Tong University can run nearly 50 concurrent tasks, some of which require high bandwidth while others require high I/O performance. Therefore, more balanced storage capabilities are needed.
Third, higher reliability is required. When traditional supercomputing was applied to research projects, users could tolerate iterative operations before a reliable result was obtained. However, today's supercomputing is mostly applied to production systems that have higher requirements on the reliability of both results and processes. Storage systems therefore need to be extraordinarily reliable.
Fourth, supercomputing centers and data centers need to converge. In recent years, supercomputing centers are exploring more diversified services such as AI computing, big data analytics, virtualization, and disaster recovery. During this process, centers have found data mobility to be the biggest problem, as storage is split between supercomputing files, virtualized blocks, machine learning objects, and big data HDFS. Mobilizing stored data is the biggest challenge facing supercomputing centers.
These developments are both challenges and opportunities for the data storage industry, an industry that is key to transforming supercomputing from computing-intensive into data-intensive.
Data-intensive supercomputing serves as a data-centric, high-performance data analytics platform with the analytics capabilities of traditional supercomputing, big data analytics, and AI. It supports end-to-end scientific computing services through application-driven, unified data sources. It also provides diversified computing power for both research and business, and provides high-level data value services leveraging accumulated knowledge about data.
Data-intensive supercomputing transitions computing centers to centers delivering diversified computing services. Ultimately, diversified computing convergence and a unified storage foundation for massive data will enable high-performance data analytics, driving supercomputing from the computing-service era to the data-value era (Figure 2).
Data-intensive supercomputing delivers the following value:
Research: The architecture that converges HPC, AI, and big data technologies is an interdisciplinary innovation fueled by data-intensive research. It facilitates the evolution of research from the third paradigm (computational science) to the fourth paradigm (data science).
Business: The unified data foundation is converged, efficient, secure, and low-carbon, reducing the lifecycle management costs of massive structured and unstructured data and improving the data utilization efficiency of applications that converge scientific computing, big data, and AI.
Industry: Chinese-made high performance data analytics (HPDA) software, parallel file systems, and data storage and data management systems have boosted the development of China's supercomputing storage industry and application technology ecosystem.
Figure 1: Supercomputing entering the data-value era
Wide adoption of data-intensive supercomputing
Data-intensive supercomputing is widely used in scientific research, manufacturing, and business.
For example, in gene sequencing applications, MGI's DNBSEQ-T7 sequencer generates 4.5 TB of data every 24 hours. At full load, that totals 1.7 PB of data a year. Analyzing biological information generally creates intermediate files and results that are about five times the volume of the original data. West China Hospital adopted data-intensive supercomputing to improve gene sequencing efficiency, shortening the time required for a single gene sequencing task from 3 hours to just minutes.
Autonomous applications are highly complex and involve more than 10 steps, including data import, preprocessing, training, simulation, and results analysis. Each requires different protocols such as object, NAS, and HDFS. Data silos are also a major issue, because it takes twice as long to copy data than it does to process and analyze it. The auto maker Geely-Volvo has adopted data-intensive supercomputing and uses a single data foundation to support multi-protocol interoperability and adapt to the entire service process, reducing data storage costs while improving data analysis efficiency.
In university supercomputing applications, the π supercomputer series at Shanghai Jiao Tong University and the Hanhai supercomputer series at the University of Science and Technology of China provide more balanced data access capabilities by adopting data-intensive supercomputing. The series can support more than 50 concurrent supercomputing tasks with different load demands. When computing facilities are upgrading, legacy data can be stored for a long time without migration to provide greater support for research tasks.
Figure 2: Data-centric, data-intensive supercomputing
At the Ninth Supercomputing Innovation Alliance Conference held in China in September 2021, a data-intensive supercomputing working group was established, recognizing data as equally as important as computing power. Data intensification was also a key topic at the Seventh China Scientific Data Conference held in Hohhot, Inner Mongolia a month later. And at CCF HPC China 2021, Huawei and the CCF HPC Profession Committee jointly released the Data-intensive Supercomputing Technology White Paper.
The industry consensus is that data-intensive computing can unlock a thriving supercomputing industry covering data collection, storage, computing, transmission, and utilization capabilities.