Graphic: A brief history of router architecture

Over the past 50 years, we’ve made a lot of progress in the development of the Internet, from a tiny network of a few computers connected to each other to a global network with billions of nodes. In the process, we’ve learned a lot about how to build networks and the routers that connect them, and every mistake we’ve made has provided important lessons for those who come after us.

Initially, a router looked like a simple computer with multiple network interface cards (NICs).

Figure 1 - Network interface card connected to the bus

This works to a certain extent. In this architecture, packets come into the NIC and are transferred from the NIC to memory by the CPU. The CPU makes the forwarding decision and then pushes the packet to the outbound NIC. The CPU and memory are centralized resources, constrained by what they can support. The bus has an additional constraint: the bandwidth of the bus must be able to support the bandwidth of all NICs at the same time.

Once you want to expand the scale, the problem will soon become apparent. We can buy faster CPUs, but how can the bus be expanded? If the bus speed is doubled, then you must double the bus interface speed on each network card and CPU card. In this way, the performance of a single network card may not be improved much, but the price becomes very expensive.

1): The cost of a router should increase linearly with capacity.

To scale up, the only stopgap solution is to add additional buses and processors:

Figure 2 - The solution to scale up is to add another bus and another processor

The arithmetic logic unit (ALU) is a digital signal processing (DSP) chip that was chosen for its excellent price/performance ratio. Additional buses increased bandwidth, but the architecture still did not scale. In other words, it was not feasible to continue adding more ALUs and more buses to get higher performance.

Since the ALU was still a significant limitation, the next step was to add an FPGA to the architecture to offload the workload of longest prefix match (LPM) lookups.

Figure 3 - The next step is to add the FPGA

This helps somewhat, but it is limited, the ALUs are still saturated. LPMs account for a large portion of the workload, but if you remove this part of the problem, the centralized architecture still cannot scale.

2): LPM can be implemented in custom chips and will not be a performance barrier.

The next step is to go in the other direction: replace the ALU and FPGA with a general purpose processor. Try to scale by adding more CPUs and more buses. This requires a lot of investment and only achieves very small incremental gains, and is still limited by the bandwidth of the centralized bus.

At this stage in the Internet’s development, larger forces came into play. As the Internet became more widespread, the Internet’s enormous potential became apparent. Telecommunications companies acquired the NSFnet regional network and began to deploy commercial backbones. Application-specific integrated circuits (ASICs) became a reliable technology, allowing more functions to be implemented directly on the chip. Demand for routers soared, and the need for major improvements in scalability eventually overwhelmed the original conservatism. To meet this demand, many startups emerged, offering a variety of potential solutions.

First up is the crossbar:

Figure 4 - As the demand for routers soars, an alternative is the crossbar

In this architecture, each NIC is an input and an output. The processor on the NIC makes the forwarding decision, selects the output NIC and sends a scheduling request to the crossbar. The scheduler receives all requests from the NICs, tries to find the best solution, programs the crossbar, and prompts the input for transmission.

The problem with this is that each output can only listen to one input at a time, and Internet traffic is bursty. If two packets need to go to the same output, one of them must wait. If the packet that has to wait causes other packets on the same input to also wait, then the system will suffer from line blocking (HOLB), causing the router to perform very poorly.

3): The internal structure of the router needs to be non-blocking even under stress conditions.

Another approach is to arrange the network cards in a ring:

Figure 5 - Arranging the NICs in a Torus

This way, each NIC is connected to four neighboring NICs, and the input NIC must calculate a path through the fabric to reach the output line card. There is a problem here - the bandwidth is not uniform. The bandwidth in the north-south direction is greater than the bandwidth in the east-west direction. If the input traffic pattern requires east-west, there will be blocking.

4): The internal structure of the router must have a uniform bandwidth distribution because we cannot predict the distribution of traffic.

One approach is to create a full mesh of NIC-to-NIC links.

Figure 6 - Fully meshed network

Despite the lessons learned from previous efforts, new problems were exposed. Everything seemed to be running fine in this architecture until someone needed to replace a card for repair. Since each network card saved all the packet units in the system, when a card was pulled out, all packets could not be reconstructed, resulting in outages.

5): The router cannot have a single point of failure.

Figure 7 - All packets flow into the central memory and then to the output NIC

All packets flow into central memory and then out to the output NIC. This works great, but scaling memory is a challenge. Users can add multiple memory controllers and memory sticks, but at some point the aggregate bandwidth is simply too great to physically design. Hitting practical physical limits forces us to think in other directions.

The telephone network provided the inspiration. Charles Clos realized long ago that you could build scalable switches by building networks of smaller switches. It turns out that a Clos network is what we need:

Figure 8 — Clos Network

Clos Network:

Can scale capacity well.
No single point of failure.
Supports sufficient redundancy and has the ability to resist failures.
Handles congestion bursts by distributing the load across the entire fabric.

The ability to implement both input and output simultaneously led to the creation of Folded Clos networks, which are what we use today in cluster routers.

Figure 9 - Folded Clos network

However, this architecture is not without problems. The resulting chip lock-in problem makes hardware upgrades more challenging, and new cell switches must support both legacy links and formats to achieve interoperability and link upgrades.

Each cell must have an address indicating the output NIC it should flow to. This addressing is necessarily limited, resulting in an upper limit on scalability. Until now, control and management of the cluster has been completely proprietary, introducing another vendor lock-in issue to the software stack.

But we can solve these problems by changing the architecture. For the past 50 years, we have been trying to scale up the size of routers. What we have learned from our experience building large clouds is that the idea of scaling out is often more successful.

In a scale-out architecture, rather than trying to build a single giant, blazingly fast server, it may be more appropriate to divide and conquer. Racks full of smaller servers can do the same job while being more resilient, flexible, and economical.

When applied to routers, the idea is similar. Can we use a number of smaller routers and arrange them in a Clos topology so that we have similar architectural advantages but avoid the above problems? It turns out that this is indeed the case:

Figure 10 - Grouping switches, retaining Clos topology for easy scalability

By replacing the cell switches with packet switches (eg, routers) and retaining the Clos topology, scalability can be ensured.

We can scale in two dimensions: by adding more ingress routers and packet switches in parallel with the existing layers, or by adding additional switch layers. Since the individual routers are now relatively generic, vendor lock-in is avoided. The links are all standard Ethernet, so there are no issues with interoperability.

If a switch needs more links, then get a bigger switch. If a given link needs to be upgraded and both ends of the link have the expansion capability, then just upgrade the optics. Running heterogeneous link speeds within the fabric is not a problem because each router can act as a speed matching device.

This architecture is already very common in the data center field. Depending on the number of switch layers, it is called leaf-spine or super-spine architecture. It has also been proven to be very robust, stable and flexible.

This is a viable alternative architecture from a forwarding plane perspective, leaving the remaining issues to the control and management planes. Scaling the control plane requires an order of magnitude improvement in the size of the control protocols. Additionally, we are developing management plane abstractions that will allow us to control the entire Clos fabric as a single router. This work is being done as an open standard, so none of the technologies involved are proprietary.

Over the past 50 years, router architecture has evolved in fits and starts as technology has been constantly weighed. Clearly, evolution is not yet complete. With each iteration, problems from the previous generation need to be solved, while new problems are discovered.

I hope that by carefully summarizing past and existing experiences, we can move forward with a more flexible and robust architecture!

<<: How Industry 4.0 and 5G will change supply chain visibility

>>: How does your domain name become an IP address?

5G is officially commercialized in seven countries. Review of network events in the first half of 2019

It has been difficult for virtual operators to become legal operators in three years, and telecommunications fraud has become a stumbling block

Blog

Recommend

Network | Why is the speed of 4G getting slower and slower?

Have you noticed that when you upgrade from 2G or...

NWCU's Smart New Campus 2.0, layout of "IT unified smart operation and maintenance" (Part 2): operation and maintenance organization management and process management

Xi’an University of Architecture and Technology i...

Riverbed Launches Industry's Most Comprehensive Digital Experience Management Solution

Recently, Riverbed Technology announced the launc...

Overview of the five major 5G wireless technologies

Two of the five most important wireless technolog...

How can IT operation and maintenance service providers keep WannaCry out? Hengyuan Zhicheng said that security needs to be prevented

[51CTO.com original article] It has been a week s...

How to explain network engineering technologies such as STP, HSRP, etc. in a simple and understandable way?

During an interview, for example, I was asked abo...

SD-WAN: A killer way to improve network flexibility and efficiency

Wide area networks are generally used to connect ...

To promote the migration of 2G/3G to NB-IoT/Cat1, the Ministry of Industry and Information Technology released the first important document on the Internet of Things in 2020!

Yesterday, the General Office of the Ministry of ...

Graphic: A brief history of router architecture

1): The cost of a router should increase linearly with capacity.

2): LPM can be implemented in custom chips and will not be a performance barrier.

3): The internal structure of the router needs to be non-blocking even under stress conditions.

4): The internal structure of the router must have a uniform bandwidth distribution because we cannot predict the distribution of traffic.

5): The router cannot have a single point of failure.

5G is officially commercialized in seven countries. Review of network events in the first half of 2019

RepriseHosting: Seattle servers from $27.97/month - L5640/16G memory/1TB or 240G SSD/1Gbps bandwidth

Ruijie Cloud Desktop leads the trend and more than 500 experts discuss smart education

5 Common SD-WAN Challenges and How to Address Them

CloudCone: Los Angeles VPS from $1.80/month, Cloud Server (SC2) from $1.65/month

5G development has reached a critical turning point

The fourth largest operator is here and is about to join the battle for 5G users, breaking the three-way competition

MIIT releases three-year action plan for industrial internet

TCP three-way handshake and four-way wave and 11 states

It has been difficult for virtual operators to become legal operators in three years, and telecommunications fraud has become a stumbling block

Recommend

Network | Why is the speed of 4G getting slower and slower?

NWCU's Smart New Campus 2.0, layout of "IT unified smart operation and maintenance" (Part 2): operation and maintenance organization management and process management

Riverbed Launches Industry's Most Comprehensive Digital Experience Management Solution

Overview of the five major 5G wireless technologies

How can IT operation and maintenance service providers keep WannaCry out? Hengyuan Zhicheng said that security needs to be prevented

How to explain network engineering technologies such as STP, HSRP, etc. in a simple and understandable way?

SD-WAN: A killer way to improve network flexibility and efficiency

To promote the migration of 2G/3G to NB-IoT/Cat1, the Ministry of Industry and Information Technology released the first important document on the Internet of Things in 2020!

An article about Google Cherry IPU

What exactly is the performance problem with TCP?

Eight data center technologies verified in 2015

GTI releases 2.3GHz spectrum industry joint statement

From the user's perspective: the battle between Ethernet color light and PON

HostHatch US VPS 40% off + 5 times the traffic, Los Angeles 1TB large hard drive starting at $33/year

"Feng Qingyang" Jack Ma will "retire" and look back at the 20 years of China's Internet trends