How many optical modules does a GPU need?

How many optical modules does a GPU need?

1. Network card model

There are mainly two types of network cards, ConnectX-6 (200Gb/s, mainly used with A100). The main optical modules used are MMA1T00-HS (200G Infiniband HDR QSFP56 SR4 PAM4 850nm 100m) and ConnectX-7 (400Gb/s, mainly used with H100).

2. Switch model

The next-generation ConnectX-8 800Gb/s switch models mainly include two types of switches, the QM9700 series (32-port OSFP (2*400Gb/s), a total of 64 channels at 400Gb/s transmission rate, a total of 51.2 Tb/s throughput) and the QM8700 series (40-port QSFP56, a total of 40 200Gb/s channels, a total of 16Tb/s throughput).

3. Number of units (expandable unit SU)

The number of units affects the level of the switching architecture. When the number of units is small, only a two-layer architecture is used, and when the number of units is large, a three-layer architecture is used.

H100 SuperPOD: Each unit consists of 32 nodes (DGX H100 servers), supports up to 4 units to form a cluster, and adopts a two-layer switching architecture.

A100 SuperPOD: Each unit contains 20 nodes (DGX A100 servers) and supports up to 7 units to form a cluster. More than 5 units require a three-layer switching architecture.

summary:

(1) A100+ConnectX6+QM8700 three-layer network: 1:6 ratio, all using 200G QSFP56 optical modules

(2) A100+ConnectX6+QM9700 Layer 2 network: 1:0.75 800G OSFP optical module + 1:1 200G QSFP56 optical module

(3) H100+ConnectX7+QM9700 Layer 2 network: 1:1.5 800G OSFP optical module + 1:1 400G OSFP optical module

(4) H100+ConnectX8 (not yet released)+QM9700 three-layer network: 1:6 ratio, all using 800G OSFP transceivers

Assuming that the shipment volume of H100+A100 in 2023 is 300,000+900,000, it will generate demand for 3.15 million 200G QSP56+300,000 400G OSFP+787,500 800G OSFP, and the incremental space of AI market will be US$1.38 billion.

Assuming that the shipment volume of H100+A100 in 2024 is 1.5 million + 1.5 million, it will generate demand for 750,000 pieces of 200G QSFP56 + 750,000 pieces of 400G OSFP + 6.75 million pieces of 800G OSFP. The incremental space of the AI ​​market is US$4.97 billion, which is approximately equal to the market size of digital pass-through optical modules in 2021.

Below is the detailed measurement process for each of the above scenarios.

Scenario 1: A100+ConnectX6+QM8700 three-layer network.

A100 has 8 computing interfaces, 4 on the left and 4 on the right (as shown below). Currently, A100 is mainly shipped with ConnectX6 for external communication, with an interface rate of 200Gb/s.

In the first-layer architecture, each node has 8 interfaces, each node is connected to 8 leaf switches, and every 20 nodes form a unit (SU). Therefore, the first layer requires 8*SU leaf switches, 8*SU*20 cables, and 2*8*SU*20 200G optical modules.

In the second layer architecture, due to the non-blocking architecture, the upstream rate is equal to the downstream rate. The total unidirectional transmission rate of the first layer is 200G*number of cables. Since the second layer also uses a single cable 200G transmission rate, the number of cables in the second layer should be the same as that in the first layer, requiring 8*SU*20 cables and 2*8*SU*20

200G transceivers. The number of spine switches required is the number of cables divided by the number of leaf switches, that is, (8*SU*20)/(8*SU) spine switches. But when the number of leaf switches is not enough, more than two connections can be established between the leaf and the spine to save the number of spine switches (as long as the limit of 40 interfaces is not exceeded). Therefore, when the number of units is 1/2/4/5, the number of spine switches required is 4/10/20/20, and the number of optical modules required is 320/640/1280/1600 respectively. The number of spine switches will not increase in the same proportion, but the number of optical modules will increase in the same proportion.

When the number of units reaches 7, a third-layer architecture is required. Since it is a non-blocking architecture, the number of cables required for the third-layer architecture is the same as that of the second layer.

Recommended configuration of SuperPOD: 7 units are networked, which requires adding a third-layer architecture and adding core switches. The number of switches per layer and the number of connecting cables for different numbers of units are shown in the figure.

How many optical modules does a GPU need?

140 servers, a total of 140*8=1120 A100s, a total of 56+56+28=140 switches (QM8790), 1120+1120+1120=3360 cables, 3360*2=6720 200G QSFP56 optical modules, the mapping between A100 and 200G QSFP56 optical modules is 1120/6720=1:6.

Scenario 2: A100+ConnectX6+QM9700 Layer 2 Network

Currently, this solution is not available in the recommended configuration, but in the future, more and more A100s may choose QM9700 networking, which will reduce the number of optical modules used, but bring about the demand for 800G OSFP optical modules. The biggest difference is that the first layer connection is converted from 8 external 200G cables to QSFP to OSFP interfaces, with 2 and 1 to 4.


First layer: For 7 units, 140 servers have 140*8=1120 interfaces, and a total of 1120/4=280 1-tow-4 cables are connected to the outside, resulting in 280 800G OSFP and 1120 200G OSFP56 optical port modules. A total of 12 QM9700 switches are required.

The second layer: 800G connection only, requiring 280*2=560 800G OSFP transceivers, and 9 QM9700 switches.

Therefore, 140 servers and 1120 A100s require 12+9=21 switches, 560+280=840 800G OSFP optical modules and 1120 200G QSFP56 optical modules.

The mapping between A100 and 800G OSFP optical module is 1120:840=1:0.75, and the mapping between A100 and 200G QSFP56 optical module is 1:1

Scenario 3: H100+ConnectX7+QM9700 Layer 2 Network

The special thing about the H100 design is that although the network card is 8 GPUs with 8 400G network cards, the interfaces are merged into 4 800G interfaces, which will bring a large demand for 800G OSFP optical modules.

At the first layer, according to the recommended configuration, it is recommended to connect 1 [2*400G] 800G to the server interface

OSFP optical module: MMA4Z00-NS (800Gb/s Twin-port OSFP 2x400G SR8 PAM4 850nm 100m DOM Dual MPO-12 MMF) or MMS4X00-NM (800Gb/s Dual-port OSFP 2x400G PAM4 1310nm 500m DOM Dual MTP/MPO-12 MMF), through dual ports. ), two fiber optic cables (MPO) are connected through dual ports and plugged into each of the two switches.

For the first layer, one unit contains 32 servers, one server is connected to 2*4=8 switches, and the SuperPOD consists of 4 units. The first layer needs to connect a total of 4*8=32 leaf switches.

Therefore, it is recommended to reserve a node for management purposes (UFM). Due to the limited impact on the use of optical modules, only 4 servers with 128 nodes are used for simple calculation.

The first layer has 4*128=512 800G

OSFP optical module, 2*4*128=1024 400G OSFP optical modules: MMA4Z00-NS400 (400G OSFP SR4 PAM4 850nm 30m on OM3/50m on OM4 MTP/MPO-12) or NVIDIA MMS4X00-NS400 (400G OSFP DR4 PAM4 1310nm MTP/MPO-12 500m).

The second-layer switches are directly connected using 800G optical modules, and are connected to a leaf switch downward, with a unidirectional rate of 32*400G. In order to ensure the same upstream and downstream rates, the upstream connection requires a unidirectional rate of 16*800G, requiring 16 spine switches, with a total of 4*8*16*2=1024 800G optical modules.

Therefore, under this architecture, a total of 512+1024=1536 800G ports are required for the two layers.

OSFP optical modules and 1024 400G OSFP optical modules, a total of 4*32*8=1024 H100. Therefore, the mapping relationship between GPU and 800G OSFP optical modules is 1024/1536→1:1.5, and the mapping relationship between GPU and 400G OSFP optical modules is 1024/1024→1:1.

Scenario 4: H100+ConnectX8 (not yet released)+QM9700 three-layer network

Assuming that H100 is upgraded to an 800G network card, the external interface should be upgraded from 4 OSFP interfaces to 8 OSFP interfaces. The connection between each layer uses 800G connection. The entire network architecture is similar to the first scenario, except that the 200G optical module is replaced with an 800G optical module. Therefore, the ratio of GPU to optical module in this architecture is also 1:6.

In summary, the four scenarios are organized into the following table.

Assuming that the shipment volume of H100+A100 in 2023 is 300,000+900,000, it will generate demand for 3.15 million 200G+300,000 400G+787,500 800G OSFP pieces.

Assuming that the shipment volume of H100+A100 in 2024 is 1.5 million + 1.5 million, it will generate demand for 750,000 200G + 750,000 400G + 6.75 million 800G OSFP.

*A100 uses half 200G switches and half 400G switches.

**H100 uses half 400G switches and half 800G switches.

The above estimates of A100 H100 quantities are merely assumptions and do not represent future expectations.

Based on a simple calculation based on the average price of USD 1/GB in 2023 and USD 0.85/GB in 2024, AI is expected to bring an incremental AI market space of USD 1.38 billion/USD 4.97 billion for optical modules.

<<:  What are the categories of 800G optical modules?

>>:  Ministry of Industry and Information Technology: Hangzhou Asian Games opening ceremony pioneered 5G ultra-dense networking solution, with seamless network coverage of venues

Recommend

Can IPFS subvert the HTTP protocol?

How many people knew about IPFS before August 201...

How 5G will impact data centers and how to prepare

New 5G networks are increasing connectivity betwe...

What impact will the cancellation of data roaming have on telecom operators?

With the development of economy and society, the ...

BGPTO: Singapore dedicated server $49/month, E3-1230v3/16GB/480G SSD/10M (CN2)

BGPTO currently offers a special discount code fo...

Review of the top ten 5G trends in 2021: coverage, applications, and a future

Looking back at the communications industry this ...

Can 5G and ecosystem construction support the rapid development of MEC?

MEC (Mobile Edge Computing) was born in the 4G er...

vivo: From mobile overseas expansion to security protection in the AIGC era

After more than 20 years of development, vivo has...