With the acceleration of digital transformation of global enterprises, applications represented by ChatGPT are becoming increasingly in-depth in the fields of production and life. Behind the surge in popularity of ChatGPT, the demand for infrastructure required for automatic content generation technologies such as artificial intelligence is also rising. In the next five years, my country's intelligent computing power will grow at a compound annual growth rate of 52.3%.According to the 2022-2023 China Artificial Intelligence Computing Power Development Assessment Report, China's intelligent computing power reached 155.2 EFLOPS (FP16) in 2021, and it is expected to reach 1271.4 EFLOPS by 2026. During the period from 2021 to 2026, the compound annual growth rate of China's intelligent computing power is expected to reach 52.3%. With the introduction of national policy plans such as the "East Data West Computing" project and new infrastructure, my country's intelligent computing centers have set off a construction boom. Currently, more than 30 cities in my country are building or proposing to build intelligent computing centers. The overall layout is mainly in the eastern region and gradually expanding to the central and western regions. From the perspective of development foundation, around the development ideas of AI industrialization and industrial AI, the artificial intelligence industry has initially formed an architecture system with heterogeneous chips, computing facilities, algorithm models, and industrial applications as the core, and the intelligent computing center has the foundation for construction. Building a large-scale intelligent computing power baseCurrently, ChatGPT's training model is mainly based on the general basic large model base GPT-3. Training a super-large basic model requires the support of multiple key technologies, including algorithms, computing power, and data. The algorithm relies on the improvement of large model parameters and the optimization of the model itself, while computing power and data need to rely on traditional GPU servers, storage, and networks to achieve mutual promotion. Data shows that ChatGPT's total computing power consumption is about 3640PF-days (that is, if it calculates one quadrillion times per second, it will take 3640 days to calculate), and it requires 7 to 8 data centers with an investment scale of 3 billion and a computing power of 500P to support its operation. According to ChatGPT's 13 million visits per day, it is estimated that more than 30,000 GPUs are needed. GPUs communicate frequently during training, including P2P communication and Collective communication. Within a node, the communication interconnection bandwidth between GPUs can reach 400GB/s. Between nodes, GPU communication uses RDMA network. With the support of GDR (GDR, GPU Direct RDMA) technology, RDMA network cards can bypass CPU and memory and directly read data from remote nodes to GPU video memory. At the network level of the computing power center, it is necessary to achieve integrated optimization of the network and application systems through technologies such as intelligent lossless storage networks, and to improve the overall network throughput and reduce network latency through flow control technology and congestion control technology. For H3C intelligent lossless network , ultra-large-scale networking is the only way to build intelligent computing power. Currently, AIGC represented by ChatGPT, including the significance of the large model behind it, is not only in its implementation itself, but its scientific research value may be even greater. It is generally believed that the first few industries to be implemented may include scientific research, education, and Internet-related industries. Taking the large-scale deployment of the Internet industry as an example, an Internet company followed the AI training such as chatGPT as an opportunity to build a cluster computing network that supports 4,000 200G ports in a single PoD. In the intelligent computing center based on scientific research and education, the number of ports deployed in the current PoD is usually between 1,000 and 4,000. Therefore, H3C provides a variety of optional high-performance network solutions to fully meet the scale of different business scenarios of users. Box-box networking : The current main GPU server has 100G/200G/400G network card speeds. Taking H3C's latest S9825/S9855 series three-layer ToR/Leaf/Spine networking architecture as an example, Spine uses dual planes and ensures that the ToR uplink and downlink convergence ratio meets the 1:1 requirement. At a server access rate of 400G, a single PoD can support 1024 servers, and a cluster can provide access to 2048 400G servers; if a 200G rate is used, a single PoD can support 2048 servers, and a cluster supports a maximum of 32 PoDs, which can theoretically meet the access of 65,000 servers; if a 100G rate is used, the cluster can meet the access of more than 100,000 servers. Figure 1: Three-level box architecture 200G access network As for deterministic-scale lossless networks, H3C provides a lightweight, intelligent lossless network deployment solution that is “lossless in one frame”, which can also meet the intelligent computing networking needs of most scenarios. Taking the fully configured S12516CR with 576 400G ports as an example, a single frame can be directly connected to the server network card as a ToR to achieve 1:1 convergence, and can support up to 576 400G QSFP DD ports of a single PoD; 200G QSFP56 can meet the access of up to 1152 ports; and 100G QSFP56 can meet the access of up to 1536 ports. It should be noted that the direct splitting of 400G DR4 can obtain more than 2000 100G ports with DR1 encapsulation, while the current mainstream network cards do not support DR1. The advantages of using a single frame lossless are obvious. The use of a networking architecture that abandons the traditional Leaf/Spine architecture can effectively reduce the number of devices, reduce the number of data forwarding hops, and effectively reduce data forwarding latency. At the same time, there is no need to calculate the convergence ratio and device scale under multiple levels, which greatly simplifies the difficulty of deployment and selection and effectively improves networking efficiency. It is a new attempt for a deterministic scale intelligent lossless network. Figure 2: “One-frame, lossless” 200G access networking Frame-box networking : For those with larger-scale networking needs, H3C data center network provides a frame-box lossless architecture. Taking the 100G/200G/400G network card rate of the GPU server as an example, if the H3C flagship data center frame product S12500CR series is used to build a ToR/Leaf/Spine three-layer networking architecture, a single S12516CR is used as the Spine and the ToR upstream and downstream convergence ratio is guaranteed to meet the 1:1 requirement. At a server access rate of 400G, a single PoD can support thousands of servers, and the cluster can theoretically provide access to nearly 59 400G servers at a maximum scale; if a 200G rate is used, a single PoD can support two thousand servers, and the cluster can provide access to nearly 1.18 million servers; if a 100G access rate is used, the cluster can provide access to more than 2 million servers at a maximum scale. The following figure shows a three-layer frame architecture 200G access network. Combines large-scale networking and cell switchingFor data center switches, whether they are traditional frame-type or box-type switches, as the port rate increases from 100G to 400G, they not only face power consumption issues, but also need to solve the hash accuracy and elephant and mouse flow of box-type networking. Therefore, when building intelligent lossless computing power data center networks, H3C data center switches give priority to using DDC (Distributed Disaggregated Chassis) technology to cope with the growing computing power network solution. DDC technology distributes and decouples large frame devices, uses box-type switches as forwarding line cards and switching network boards, and flexibly distributes them in multiple cabinets, optimizing the networking scale and power consumption distribution issues. At the same time, DDC box switches still use cell switching. Names of roles in the DDC system: NCP: Network Cloud Packet (Line card in Chassis) NCF:Network Cloud Fabric (Fabric card in Chassis) NCM: Network Cloud Management (Main Management card in Chassis) Figure 4: DDC architecture Figure 5: DDC architecture decoupling, 400G Full Mesh full interconnection Taking S12516CR as an example, a single device can support 2304 100G servers and support 1:1 convergence. The DDC solution decouples the control end elements independently, adopts 400G full interconnection between NCP and NCF, and supports cell forwarding at the same time, supports non-blocking Leaf and Spine in the data center, and effectively improves the efficiency of data packet forwarding. After testing, DDC has certain advantages in the All-to-all scenario, and the completion time is improved by 20-30%. At the same time, compared with traditional box-type networking, DDC hardware convergence performance has obvious advantages. From the comparison of port up and down tests, it can be found that the convergence time of using DDC is less than 1% of the box-type networking time. Network intelligence + traffic visualizationThe service model of the intelligent computing center has shifted from providing computing power to providing "algorithms + computing power". AI lossless algorithms are also needed in intelligent lossless networks. The data traffic characteristics forwarded by each queue in a lossless network will change dynamically over time. When the network administrator statically sets the ECN threshold, it cannot meet the real-time dynamic changes in network traffic characteristics. H3C lossless network switches support the AI ECN function, which uses the AI business components on the local device or analyzer to dynamically optimize the ECN threshold according to certain rules. Among them, the AI business component is the key to realizing ECN dynamic tuning. It is a system process built into the network device or analyzer. It mainly includes three levels of functional framework:
Figure 6: Schematic diagram of AI ECN function implementation In addition, H3C data center network provides AI ECN operation and maintenance visualization. According to the different implementation locations of AI service components in the network, AI ECN functions can be divided into two modes: centralized AI ECN and distributed AI ECN:
In both scenarios above, the advantages of SeerAnalyzer can be leveraged to present users with a visual representation of AI ECN parameter tuning effects. Figure 7: PFC back pressure frame rate comparison before and after AI ECN tuning Looking back, H3C has reached in-depth cooperation with many leading companies in the field of intelligent lossless networks. In the future, H3C data center networks will continue to focus on ultra-wide, intelligent, integrated, and green evolution, and provide smarter, greener, and more powerful data center network products and solutions. References: 1. Guangming Online: The popularity of ChatGPT has driven up the demand for computing power. Can my country’s computing power scale support it? 2. Guidelines for the Innovation and Development of Intelligent Computing Centers 3. DDC Technology White Paper |
<<: Ethernet Adapter Market to See Record Revenue Growth in 2022
>>: Black screen problem on some live IPTV channels under BRAS equipment
On June 6, 2019, my country's 5G license was ...
[51CTO.com original article] Enterprises want to ...
On June 6, the IMT-2030 (6G) Promotion Group offi...
1. The main responsibilities of TCP/IP protocol ●...
IPTV is good, everyone knows it! The number of IP...
[[342086]] This article is reprinted from the WeC...
[[419264]] Recently, it is understood that even t...
JuHost has released a regular November promotion,...
RAKsmart has upgraded its July discount plan. In ...
OpenRAN (Open Radio Access Network) seems to be v...
The great progress of social productivity has giv...
Recently, the 2022 Information Technology Autonom...
The global economy has been put on hold due to th...
After the rapid development in 2020, 2021 is a cr...