Facebook: An innovative data center network topology

Facebook: An innovative data center network topology

[[126753]]

Aerial view of Facebook's data center in Altoona, Iowa

Facebook's data centers receive billions of user requests every day; as the company continues to add members and launch new features, the number of requests continues to increase. All of this is basically good for Facebook, but it is a challenge for Facebook's network staff. For example, the data center topology that was able to meet the requirements five months ago is now overwhelmed.

So in addition to building large data centers, like this one in Altoona, Iowa, Facebook engineers are constantly optimizing the network design of data centers. That said, tweaking and changing the ideas that engineers came up with and implemented in the Altoona data center might not be the right word to describe it. It's more like they rewrote the network design guidelines.

The old Facebook network

Before the Altoona data center was built, Facebook engineers arranged the server racks in the data center into clusters similar to the architecture shown in Figure A. In a real environment, Facebook would have hundreds of racks instead of just three. The figure also shows the top-of-rack (TOR) switches for each rack, which act as intermediaries between the servers and the upstream aggregation switches.

Figure A: Top of Rack (TOR) - Network Connection Architecture

This architecture worked well, but it presented several challenges for Facebook engineers. "First, the size of the cluster was limited by the port density of the cluster switches," explains Alexey Andreyev, a network engineer at Facebook. "To build the largest clusters, we needed the largest network equipment, which was available from a limited number of vendors. Also, needing so many ports in a device was inconsistent with the desire to provide the highest bandwidth infrastructure. Even more difficult was finding the optimal long-term balance between cluster size, rack bandwidth, and bandwidth outside the cluster."

Fabric: A new network topology

Seeing those billions of requests per day as an incentive, engineers decided to move away from the complex, bandwidth-hungry top-down network hierarchy in favor of a new design called Fabric. The slide in Figure B depicts the new cluster of server racks, called pods. A single pod includes 48 racks and top-of-rack switches that are interconnected into four fabric switches. "Each top-of-rack switch currently has four 40G uplinks, providing a total of 160G of bandwidth capacity to the server racks connected with 10G."

Figure B

This design approach has the following advantages:

• Easy to deploy pods with 48 nodes

• Scalability is simplified and unlimited

• Each pod is identical and uses the same connection

The next step is to connect all the fabric switches -- the slide in Figure C describes how this task is accomplished. Andreyev says this is relatively simple (it's hard to imagine how it used to be).

Figure C

Andreyev explained that Facebook engineers adhered to this 48-node rule when adding spine switches. "To implement connectivity across the entire building, we created four separate 'planes' of spine switches, each of which can scale to up to 48 independent devices. Each fabric switch in each pod is connected to each spine switch in the local plane."

The numbers Andreyev mentioned next are staggeringly large. "Together, the pods and planes form a modular network topology capable of accommodating hundreds of thousands of servers connected with 10G, scaling to multiple petabits of bisection bandwidth, and providing non-oversubscribed rack-to-rack performance for our data center buildings."

Network Operations

From the top-of-rack switches to the edge of the network, the Fabric network design uses unified "Layer 3" technology, supports IPv4 and IPv6, and uses equal cost multi-path (ECMP) routing. Andreyev added: "To prevent occasional 'elephant traffic' from occupying a large amount of bandwidth and causing end-to-end path performance degradation, we make the network have multiple speeds-all 40G links between switches, while connecting servers through 10G ports on the top-of-rack switches. We also have server-side mechanisms so that in the event of a problem, we can bypass the fault."

Physical layout

Andreyev wrote that the new building layout shown in Figure D is not much different from Facebook's previous design. One difference is that the new spine and edge switches of Fabric are placed on the first floor between Data Hall X and Data Hall Y, and the network connection to the outside world (minimum point of entry, or MPOE) spans the era of spine switches and edge switches.

Figure D

Overcoming challenges

Facebook engineers appear to have overcome the challenges they faced. Hardware limitations are no longer an issue. Not only has the number of different components been reduced, but also the complexity. Andreyev said the technical team followed the "KISS (keep it simple)" principle. He added at the end of the article: "Our new fabric is not an exception to this approach. Although this topology is large and complex, it is actually a highly modular system with many repeated parts. It is easy to automate and deploy, and it is simpler to operate than a smaller batch of custom clusters."

<<:  Four questions to help you understand what DCIM is?

>>:  Comprehensive Anatomy of Data Center Facility Planning and IT Operations Checklist

Recommend

Do you really understand the network layer model?

I went for interviews throughout the summer and i...

Practice: How to connect two routers through WAN and LAN ports respectively?

The IP addresses of two routers in a network segm...

kernel panic-not syncing:VFS:Unable to mount root fs on unknown-block

According to the feedback from the merchant, a us...

Let’s talk about the four major features of 5G

From telegraphs, telephones to mobile phones, and...

6 SD-WAN trends to watch in 2020

SD-WAN reached a new inflection point in 2019. Du...

Http code: What does 304 mean? How much do you know?

picture 1. http code 304 Not Modified The HTTP st...

5G and its impact on the Internet of Things

Not surprisingly, the digital world is gradually ...

User complaints have dropped significantly, so why can’t operators smile?

[[403552]] This article is reprinted from the WeC...