NVIDIA Ethernet Acceleration xAI Builds World's Largest AI Supercomputer

NVIDIA Ethernet Acceleration xAI Builds World's Largest AI Supercomputer

Oct. 28, 2024—NVIDIA announced that xAI’s Colossus supercomputer cluster in Memphis, Tennessee, has reached a massive scale of 100,000 NVIDIA® Hopper GPUs. The cluster uses the NVIDIA Spectrum-X™ Ethernet networking platform, an RDMA (Remote Direct Memory Access) network designed to deliver exceptional performance for multi-tenant, hyperscale AI factories.

Colossus is the world’s largest AI supercomputer and is currently being used to train xAI’s Grok series of large language models, as well as its chatbot as part of the X Premium user feature. xAI is further doubling the size of Colossus to 200,000 NVIDIA Hopper GPUs.

xAI and NVIDIA built all the supporting facilities and this state-of-the-art supercomputer in just 122 days, and from the first rack landing to the start of training tasks, it took only 19 days. Building a system of this scale usually takes months or even years.

When training a very large model like Grok, Colossus achieved unprecedented network performance. Under the three-layer network architecture, the entire system did not experience any increase in application latency or packet loss due to traffic conflicts. With Spectrum-X's advanced congestion control function, the system data throughput remained at 95%.

This level of performance is simply unachievable at scale with traditional Ethernet, which can only deliver 60% of data throughput when thousands of flows collide.

“AI is becoming increasingly critical, placing greater demands on performance, security, scalability and cost-efficiency,” said Gilad Shainer, senior vice president of networking at NVIDIA. “The NVIDIA Spectrum-X Ethernet networking platform is purpose-built to enable innovators like xAI to process, analyze and execute AI workloads faster, accelerating the development, deployment and time to market of AI solutions.”

Elon Musk said at X: “Colossus is the most powerful training system in the world. Well done to the xAI team, NVIDIA, and our many partners and suppliers.”

“xAI builds the world’s largest and most powerful supercomputers,” said an xAI spokesperson. “With NVIDIA Hopper GPUs and Spectrum-X, we are able to push the boundaries of large-scale AI model training and build an AI factory that is super-accelerated and optimized based on Ethernet standards.”

At the heart of the Spectrum-X platform is the Spectrum SN5600 Ethernet switch, which supports port speeds up to 800Gb/s and is powered by the Spectrum-4 switch ASIC. xAI uses an end-to-end solution that combines the Spectrum-X SN5600 switch with the NVIDIA BlueField-3® SuperNIC to achieve unprecedented performance.

Spectrum-X Ethernet networks specifically for AI have advanced features that deliver low latency and short tail latency while providing efficient, scalable bandwidth, features that were previously exclusive to InfiniBand networks. Spectrum-X features include dynamic routing based on NVIDIA DDP (Direct Data Placement) technology, congestion control calculations, and enhanced visibility and performance isolation for AI networks, all of which are key requirements for multi-tenant generative AI clouds and large-scale enterprise application environments.

<<: 

>>:  Traffic scheduling: DNS, full-site acceleration and computer room load balancing

Recommend

The love-hate relationship between video surveillance networks and IPv6

Among the three major layers of the Internet of T...

Major events in the global Internet of Things in 2017

Recently, iot.ru reviewed the major events of the...

Revolutionizing Connectivity: Benefits of Power over Ethernet Solutions

Revolutionizing Connectivity: The Untold Benefits...

Industry Observation | Impact of 5G on the Environment

Investigating the technical, environmental and so...

20 industries that 5G technology can change

5G is changing the way we connect. The technology...

5G and edge computing: a powerful combination

The benefits of 5G and edge computing in the ente...