Graphic: A brief history of router architecture

Graphic: A brief history of router architecture

Over the past 50 years, we’ve made a lot of progress in the development of the Internet, from a tiny network of a few computers connected to each other to a global network with billions of nodes. In the process, we’ve learned a lot about how to build networks and the routers that connect them, and every mistake we’ve made has provided important lessons for those who come after us.

Initially, a router looked like a simple computer with multiple network interface cards (NICs).

Figure 1 - Network interface card connected to the bus

This works to a certain extent. In this architecture, packets come into the NIC and are transferred from the NIC to memory by the CPU. The CPU makes the forwarding decision and then pushes the packet to the outbound NIC. The CPU and memory are centralized resources, constrained by what they can support. The bus has an additional constraint: the bandwidth of the bus must be able to support the bandwidth of all NICs at the same time.

Once you want to expand the scale, the problem will soon become apparent. We can buy faster CPUs, but how can the bus be expanded? If the bus speed is doubled, then you must double the bus interface speed on each network card and CPU card. In this way, the performance of a single network card may not be improved much, but the price becomes very expensive.

1): The cost of a router should increase linearly with capacity.

To scale up, the only stopgap solution is to add additional buses and processors:

Figure 2 - The solution to scale up is to add another bus and another processor

The arithmetic logic unit (ALU) is a digital signal processing (DSP) chip that was chosen for its excellent price/performance ratio. Additional buses increased bandwidth, but the architecture still did not scale. In other words, it was not feasible to continue adding more ALUs and more buses to get higher performance.

Since the ALU was still a significant limitation, the next step was to add an FPGA to the architecture to offload the workload of longest prefix match (LPM) lookups.


Figure 3 - The next step is to add the FPGA

This helps somewhat, but it is limited, the ALUs are still saturated. LPMs account for a large portion of the workload, but if you remove this part of the problem, the centralized architecture still cannot scale.

2): LPM can be implemented in custom chips and will not be a performance barrier.

The next step is to go in the other direction: replace the ALU and FPGA with a general purpose processor. Try to scale by adding more CPUs and more buses. This requires a lot of investment and only achieves very small incremental gains, and is still limited by the bandwidth of the centralized bus.

At this stage in the Internet’s development, larger forces came into play. As the Internet became more widespread, the Internet’s enormous potential became apparent. Telecommunications companies acquired the NSFnet regional network and began to deploy commercial backbones. Application-specific integrated circuits (ASICs) became a reliable technology, allowing more functions to be implemented directly on the chip. Demand for routers soared, and the need for major improvements in scalability eventually overwhelmed the original conservatism. To meet this demand, many startups emerged, offering a variety of potential solutions.

First up is the crossbar:


Figure 4 - As the demand for routers soars, an alternative is the crossbar

In this architecture, each NIC is an input and an output. The processor on the NIC makes the forwarding decision, selects the output NIC and sends a scheduling request to the crossbar. The scheduler receives all requests from the NICs, tries to find the best solution, programs the crossbar, and prompts the input for transmission.

The problem with this is that each output can only listen to one input at a time, and Internet traffic is bursty. If two packets need to go to the same output, one of them must wait. If the packet that has to wait causes other packets on the same input to also wait, then the system will suffer from line blocking (HOLB), causing the router to perform very poorly.

3): The internal structure of the router needs to be non-blocking even under stress conditions.

Another approach is to arrange the network cards in a ring:

Figure 5 - Arranging the NICs in a Torus

This way, each NIC is connected to four neighboring NICs, and the input NIC must calculate a path through the fabric to reach the output line card. There is a problem here - the bandwidth is not uniform. The bandwidth in the north-south direction is greater than the bandwidth in the east-west direction. If the input traffic pattern requires east-west, there will be blocking.

4): The internal structure of the router must have a uniform bandwidth distribution because we cannot predict the distribution of traffic.

One approach is to create a full mesh of NIC-to-NIC links.

Figure 6 - Fully meshed network

Despite the lessons learned from previous efforts, new problems were exposed. Everything seemed to be running fine in this architecture until someone needed to replace a card for repair. Since each network card saved all the packet units in the system, when a card was pulled out, all packets could not be reconstructed, resulting in outages.

5): The router cannot have a single point of failure.

Figure 7 - All packets flow into the central memory and then to the output NIC

All packets flow into central memory and then out to the output NIC. This works great, but scaling memory is a challenge. Users can add multiple memory controllers and memory sticks, but at some point the aggregate bandwidth is simply too great to physically design. Hitting practical physical limits forces us to think in other directions.

The telephone network provided the inspiration. Charles Clos realized long ago that you could build scalable switches by building networks of smaller switches. It turns out that a Clos network is what we need:

Figure 8 — Clos Network

Clos Network:

  • Can scale capacity well.
  • No single point of failure.
  • Supports sufficient redundancy and has the ability to resist failures.
  • Handles congestion bursts by distributing the load across the entire fabric.

The ability to implement both input and output simultaneously led to the creation of Folded Clos networks, which are what we use today in cluster routers.

Figure 9 - Folded Clos network

However, this architecture is not without problems. The resulting chip lock-in problem makes hardware upgrades more challenging, and new cell switches must support both legacy links and formats to achieve interoperability and link upgrades.

Each cell must have an address indicating the output NIC it should flow to. This addressing is necessarily limited, resulting in an upper limit on scalability. Until now, control and management of the cluster has been completely proprietary, introducing another vendor lock-in issue to the software stack.

But we can solve these problems by changing the architecture. For the past 50 years, we have been trying to scale up the size of routers. What we have learned from our experience building large clouds is that the idea of ​​scaling out is often more successful.

In a scale-out architecture, rather than trying to build a single giant, blazingly fast server, it may be more appropriate to divide and conquer. Racks full of smaller servers can do the same job while being more resilient, flexible, and economical.

When applied to routers, the idea is similar. Can we use a number of smaller routers and arrange them in a Clos topology so that we have similar architectural advantages but avoid the above problems? It turns out that this is indeed the case:

Figure 10 - Grouping switches, retaining Clos topology for easy scalability

By replacing the cell switches with packet switches (eg, routers) and retaining the Clos topology, scalability can be ensured.

We can scale in two dimensions: by adding more ingress routers and packet switches in parallel with the existing layers, or by adding additional switch layers. Since the individual routers are now relatively generic, vendor lock-in is avoided. The links are all standard Ethernet, so there are no issues with interoperability.

If a switch needs more links, then get a bigger switch. If a given link needs to be upgraded and both ends of the link have the expansion capability, then just upgrade the optics. Running heterogeneous link speeds within the fabric is not a problem because each router can act as a speed matching device.

This architecture is already very common in the data center field. Depending on the number of switch layers, it is called leaf-spine or super-spine architecture. It has also been proven to be very robust, stable and flexible.

This is a viable alternative architecture from a forwarding plane perspective, leaving the remaining issues to the control and management planes. Scaling the control plane requires an order of magnitude improvement in the size of the control protocols. Additionally, we are developing management plane abstractions that will allow us to control the entire Clos fabric as a single router. This work is being done as an open standard, so none of the technologies involved are proprietary.

Over the past 50 years, router architecture has evolved in fits and starts as technology has been constantly weighed. Clearly, evolution is not yet complete. With each iteration, problems from the previous generation need to be solved, while new problems are discovered.

I hope that by carefully summarizing past and existing experiences, we can move forward with a more flexible and robust architecture!

<<:  How Industry 4.0 and 5G will change supply chain visibility

>>:  How does your domain name become an IP address?

Blog    

Recommend

Cloud Data Center in the "Internet +" Era

Recently, the concept of "Internet +" h...

edgeNAT Anniversary Sale 40% off, Hong Kong/Korea/US Data Centers Available

edgeNAT launched its first anniversary celebratio...

Edge computing workloads: VMs, containers, or bare metal?

We live in an age of connected and smart devices....

Six free network latency testing tools worth recommending

As a network administrator or network engineer, i...

Is 5G only about fast internet speed? Is it a rigid demand or a false demand?

In 2019, we thought 5G was a distant thing, but i...

IDC: Core network infrastructure market growth is slow but stable

According to IDC's Global Ethernet Switch and...

What changes will the integration of 5G and the Internet of Things bring?

The convergence of 5G and the Internet of Things ...