Understanding Deterministic Networks in Seconds: Playing with Queues (Part 1)

Understanding Deterministic Networks in Seconds: Playing with Queues (Part 1)

Queue scheduling is a core issue in computer networks. In the past few decades, a large number of scheduling algorithms have been designed to provide different features and optimize different goals in scenarios such as industrial networks, data center networks, and wide area networks. Programmable packet scheduling is the crown of data center network research in recent years. As applications continue to increase their requirements for network service quality, queue mechanisms in deterministic networks are constantly being innovated. This article will show you how to play with queues. It is divided into three sections according to the concept of queues, the evolution of queues, and the enhancement of queue determinism, revealing the core mysteries behind many mechanisms.

[[422925]]

This article first explains the concept of queues, starting from the position of queues, and introduces the entry, scheduling, and exit processes of a single queue, presenting two classic queue forms: First-In-First-Out (FIFO) and Push-In-First-Out (PIFO), as well as related scheduling algorithms; then analyzes the evolution of the queue mechanism, extending from a single queue to multiple queues, from hardware queues to software queues, as well as scheduling granularities such as per-packet, per-flow, per-class, and per-queue; finally, explains queue enhancement mechanisms such as traffic shaping, gating, and frame preemption in deterministic networks.

The concept of queue

What is a queue?

As shown in Figure 1, a switch has multiple ports. The ports are connected by a switch matrix. Data packets always enter from one port, pass through the switch matrix, and then exit from another port. In full-duplex mode, each port is both an inbound port and an outbound port. There is a storage resource called a buffer in the switch. The on-chip cache of most switches is not large, usually a few MB to tens of MB, and evenly distributed to each port is only a few hundred KB. When there is a burst of traffic, it can store dozens of packets.

Although the bandwidth of a single port has grown from 1G to 400G in less than a decade, the cache has not been greatly improved. Although a large cache can reduce packet loss, it requires relatively more addressing time, which will reduce the forwarding speed of data packets, increase equipment costs and network latency. A queue is a data structure in the buffer that is used to optimize the packet loss rate and latency performance of the network for different applications.

Figure 1 Switch structure and queues

Why do we need queues?

Queues mainly solve the queuing problem. If the network is a linear topology with the same bandwidth, the rate is matched everywhere, and the total traffic size does not exceed the port bandwidth, then no queue is needed. As shown in Figure 2, the actual network topology is complex, and there are traffic microbursts (such as bursts caused by TCP sliding windows) and upstream and downstream port rate mismatch problems, so queue scheduling is required.

In the process of cache queuing, some applications require low latency, requiring a small cache and priority scheduling; some applications require zero packet loss, so the larger the cache, the better; some pursue network throughput and utilization, requiring bandwidth optimization; some pursue fairness, requiring queue resources to be distributed as evenly as possible; some have requirements for flow completion time, and need to reduce the long-tail latency of traffic. Meeting multiple application requirements at the same time brings huge challenges to queue scheduling.

Figure 2 Microburst and many-to-one scenario

Position of the queue

There are many different opinions on where the buffer queue should be placed. As shown in Figure 3, there are four main options: output port buffer, input port buffer + virtual output queue (VOQs), crosspoint buffer, and shared buffer [1].

  • Output port buffering is to place the buffer at the output port. It has better latency and throughput performance than input port buffering. However, it requires each output port to have N times the line speed processing capability to handle the situation where N input ports are connected to the same output port. This N times processing capability is often impractical.
  • Therefore, the architecture of ingress port buffer + virtual output queue VOQs was developed, which can meet line-speed processing but reduces throughput. This is also the most commonly used buffering method at present.
  • Crosspoint buffering can ensure high throughput, but it requires O(N*N) memory cost for a switch with N input ports and N output ports. In fact, in this architecture, each input-output pair has its own crosspoint buffer and cannot share different crosspoint buffers, so when N is large, it will cause considerable memory waste. In embedded systems with limited on-chip memory, the high memory cost is also unacceptable.
  • Finally, there is a "switch-memory-switch" shared buffer architecture that uses at least 2N-1 buffers in the middle of the architecture to simulate output buffer switching; these buffers are shared by all ports, so even if only a small number of ports are used, up to 100% memory utilization can be obtained.

In general, no matter where the buffer is placed, when it comes to "queue scheduling", it is best to assume that it occurs at the switch egress port by default.

Figure 3 Four positions of the buffer queue

Single queue scheduling:

  • Let's look at the scheduling of a single queue, which is divided into three processes: queue entry, scheduling, and queue exit. Among them, queue entry is to perform access control, identify and classify traffic, and discard traffic that does not meet the requirements; scheduling is to select and exit the queue for data packets. In a single queue, there are two most commonly used modes: "First In First Out FIFO" and "Press In First Out PIFO", and various scheduling algorithm innovations can be made based on the mode; queue exit is to transmit the data packet to the link, and the link is connected to the ingress port of the next node. When exiting the queue, traffic shaping is performed (that is, controlling the sending rate of traffic when it is transmitted at the egress port), which can reduce or avoid congestion of downstream nodes.

First-in-first-out queue:

  • The first-in-first-out queue is the most basic and pure queue, and is the master version of other queue improvements. As shown in Figure 4(a), it allows the first arriving data packet to be the first to leave the queue without changing the order of the packets, that is, there is no scheduling algorithm and no scheduling. Of course, it can also perform access control when entering the queue and traffic shaping when leaving the queue.

Push-in first-out queue:

  • As shown in Figure 4(b), each data packet carries a digital tag as its rank. Generally, the smaller the rank, the higher the rank of the data packet, and the more it needs to be transmitted first [2]. The push-in first-out queue first discards the data packets that do not meet the rank conditions when entering the queue, and then sorts them according to the rank size, pushing the packets with smaller ranks to the head of the queue and giving them priority in dequeue transmission.

Queue scheduling algorithm:

  • Based on the single-queue mode alone, people have made a lot of innovations in queue scheduling algorithms, such as active queue management, which sets a threshold for the queue length. If the queue length exceeds this threshold, the subsequent packets will be discarded to ensure that the transmission delay is not too large. Another example is the ECN congestion notification derived on this basis. If the queue length exceeds this threshold, the upstream sending node will be notified to reduce the sending rate until the feedback is fed back to the sending end to reduce the sending rate. When the queue length is less than this threshold, the sending end can be notified to increase the sending rate.

In addition, when choosing who should be prioritized, not only rank but also latency estimation can be used. For example, packets with a short deadline can be prioritized. The volume of the flow can be used. When the elephant flow and the mouse flow arrive at the same time, the mouse flow with a small volume is prioritized. In a deterministic network, an upper bound can be set for the queue length so that the traffic does not exceed the maximum value of the queue length, thereby ensuring zero packet loss and bounded low latency.

The next section will introduce the evolution of the queue mechanism and the queue enhancement mechanism in deterministic networks. For more information, please see the next section.

References:

[1] Z. Li et al., "Time-triggered switch-memory-switch architecture for time-sensitive networking switches," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 1, pp. 185-198, 2020.

[2] Zhuolong Yu, et al., “Programmable packet scheduling with a single queue”. In SIGCOMM '21. New York, USA, 179–193.

About the author: Huang Yudong is a first-year doctoral student at the State Key Laboratory of Network and Switching, Beijing University of Posts and Telecommunications. His research interests include future network architecture and deterministic networks. Email address: [email protected].

<<:  5G and the edge: Convergence is accelerating

>>:  Transition to 5G drives demand for fiber

Blog    

Recommend

What problems does each generation of HTTP protocol solve?

Recently, I briefly studied the development histo...

GSA: Global 5G user numbers doubled in Q2, LTE market to decline from 2023

Nearly 800 million of these LTE subscriptions wer...

5G private network is a big watermelon (Part 2): The mystery of the collision

In the first article of this series, we explained...

Why Cisco is making intent-based networking an open platform

The way businesses run their networks has remaine...

Verizon expands 5G enterprise network to 24 cities in the U.S.

Beijing time, April 16th morning news, the larges...

What is SD-Branch? Why do you need it?

[51CTO.com Quick Translation] The deployed SD-WAN...

ACI's "hardcore security" is more eye-catching

[51CTO.com original article] According to market ...

ABC in the eyes of communication professionals...

[[375451]] As a communications engineer, I am exp...