A brief analysis of RoCE network technology

A brief analysis of RoCE network technology

In the era of data being king, people have more stringent requirements on the network. However, traditional TCP/IP Ethernet connections occupy a lot of CPU resources and require additional data processing, which can no longer meet the current demand for faster, more efficient and scalable networks. In this case, RoCE (RDMA over Converged Ethernet) has come into people's view.

What is RDMA?

RDMA (Remote Direct Data Access) was created to solve the delay of server-side data processing during network transmission. It allows direct access from the memory of one host or server to the memory of another host or server without using the CPU. It frees up the CPU to perform its work, such as running applications and processing large amounts of data. This not only increases bandwidth but also reduces latency, jitter, and CPU consumption.

RDMA Technology

Therefore, RDMA can be simply understood as using related hardware and network technologies so that the network card of server 1 can directly read and write the memory of server 2, ultimately achieving the effects of high bandwidth, low latency, and low resource utilization.

As shown in the figure below, the application does not need to participate in the data transfer process. It only needs to specify the memory read and write addresses, start the transfer and wait for the transfer to complete.

Currently, there are roughly three types of RDMA networks, namely Infiniband, RoCE, and iWARP. Among them, Infiniband is a network designed specifically for RDMA, which guarantees reliable transmission from the hardware level, while RoCE and iWARP are both based on Ethernet RDMA technology and support corresponding verbs interfaces.

What is RoCE?

As the name suggests, RoCE is a network protocol defined in the InfiniBand Trade Association (IBTA) standard that allows the use of RDMA over Ethernet networks. In short, it can be seen as the application of RDMA technology in hyper-converged data centers, clouds, storage, and virtualization environments.

Types of RoCE

There are two versions of the RoCE protocol: RoCEv1 and RoCEv2, depending on the network adapter or network card used.

  • RoCE v1: RoCE v1 is an RDMA protocol implemented based on the Ethernet link layer (the switch needs to support flow control technologies such as PFC to ensure reliable transmission at the physical layer), allowing two hosts in the same VLAN to communicate. The typeID of the RoCE V1 protocol at the Ethernet layer is 0x8915.
  • RoCE v2: RoCE v2 overcomes the limitation of RoCE v1 being bound to a single VLAN. By changing the packet encapsulation, including IP and UDP headers, RoCE v2 can now be used across L2 and L3 networks.

RoCE v1 and RoCE v2 packet formats

How to implement RoCE?

Typically, to implement RoCE, you can install a network card or card driver that supports RoCE. All Ethernet NICs require a RoCE network adapter card. RoCE drivers are used in Red Hat, Linux, Microsoft Windows, and other common operating systems. There are two ways to use RoCE: for network switches, you can choose to use a switch that supports PFC (priority flow control) operating system; for rack servers or hosts, you need to use a network card.

Benefits of RoCE

  • Low CPU utilization: Accessing the memory of a remote switch or server does not consume CPU cycles on the remote server, allowing full utilization of available bandwidth and greater scalability.
  • Zero copy: sending and receiving data to and from the remote buffer.
  • Efficient: Since RoCE improves latency and throughput, network performance is greatly improved.
  • Cost savings: With RoCE, there is no need to purchase new equipment or replace Ethernet infrastructure to handle large amounts of data, which greatly saves companies’ capital expenditures.

Frequently Asked Questions about RoCE

Listed below are some frequently asked questions about RoCE.

1. Technical comparison between RoCE, iWARP, and InfiniBand

RDMA was first implemented on the Infiniband transmission network. It is an advanced technology but expensive. Later, industry manufacturers transplanted RDMA to traditional Ethernet, which reduced the cost of using RDMA and promoted the popularization of RDMA technology. On Ethernet, according to the difference in the degree of integration of protocol stacks, it is divided into two technologies: iWARP and RoCE. RoCE includes two versions: RoCEv1 and RoCEv2 (the biggest improvement of RoCEv2 is the support for IP routing). The comparison of various RDMA network protocol stacks is shown in the following figure.

  • Infiniband is a new generation network protocol that supports RDMA. Since this is a new network technology, NICs and switches that support this technology are required.
  • RoCE, a network protocol that allows to do RDMA over Ethernet. Its lower network header is an Ethernet header and its upper network header (including data) is an InfiniBand header. This supports using RDMA over standard Ethernet infrastructure (switches). Only the network card should be special to support RoCE.
  • iWARP, a network protocol that allows to do RDMA over TCP. Features present in IB and RoCE are not supported in iWARP. This enables the use of RDMA over standard Ethernet infrastructure (switches). Only the NIC should be special and support iWARP (if using CPU offload), otherwise all the iWARP stack can be implemented in software and lose most of the RDMA performance benefits.

RoCE and iWARP, one is based on the connectionless protocol UDP, and the other is based on a connection-oriented protocol (such as TCP). RoCEv1 can only be limited to a layer 2 broadcast domain, while RoCEv2 and iWARP can support layer 3 routing. Compared with RoCE, in the case of large-scale networking, iWARP's large number of TCP connections will occupy a large amount of memory resources and have higher system specifications. In addition, RoCE supports multicast, while iWARP has no relevant standard definition.

2. Can a RoCE adapter communicate with other adapter types (e.g. iWARP)?

RoCE adapters can only communicate with other RoCE adapters and may revert to traditional TCP/IP connections if mixed adapter types are configured, such as a RoCE adapter combined with an iWARP adapter.

in conclusion

Running RDMA in the data center can reduce the burden of data movement and provide higher CPU resource availability to applications. The RoCE protocol can benefit from the capabilities of RDMA without changing its network infrastructure. By reducing Ethernet latency and CPU overhead, RoCE can improve the performance of search, storage, database, and high transaction rate applications. By improving CPU efficiency and application performance, RoCE can reduce the number of servers required, thereby saving energy and reducing the footprint of Ethernet-based data centers.

<<:  Is it necessary to upgrade from 4G to 5G mobile phone now?

>>:  Outlook for domestic 5G development in 2021 (Part 3): Opportunities

Recommend

...

The UK officially bans Huawei 5G equipment! Officials respond quickly

The UK has just officially announced that Huawei ...

A Simple Explanation of Decentralized Applications

[[397123]] In this article, we will explain what ...

Why use MAC address when we have IP address?

The IP address and MAC address can be compared to...

Detailed Explanation of IPv6 MSTP

Background of MSTP RSTP is an improvement on STP,...