User Datagram Protocol (UDP) in plain language

User Datagram Protocol (UDP) in plain language

What is UDP?

UDP is the abbreviation of User Datagram Protocol. It is a simple protocol, so simple that the UDP specification RFC0768 is only 3 pages.

UDP is a transport layer protocol that works on top of the IP layer. UDP has two main extensions to IP:

  • Extending the outbound port number allows IP datagrams to be multiplexed to user processes.
  • The checksum is extended to provide detection of data errors during network transmission.

IP provides a best-effort, connectionless datagram delivery service. IP implements routing and packet forwarding based on IP addresses, and can transmit an IP datagram from one host to another on the network. The IP address determines which host the IP datagram will be sent to. Therefore, IP provides host-to-host datagram transmission services.

After the IP datagram arrives at the destination host, the IP module implemented in the kernel layer will be responsible for receiving the IP datagram on the network card. However, multiple processes usually run on the host at the same time. Which process should the IP datagram be handed over to? IP can't figure it out.

The port number (located in the UDP header) determines which process on the host the datagram is handed over to. Therefore, UDP provides end-to-end service for applications running on the end host.

Characteristics of UDP

  • UDP is connectionless, and datagrams can be sent directly without establishing a connection before communication, while TCP is connection-oriented.
  • UDP does not provide error correction, but UDP provides error detection (end-to-end checksum).
  • UDP does not perform deduplication.
  • UDP does not perform flow control.
  • UDP does not perform congestion control, and there is no protocol mechanism to prevent high-speed UDP traffic from negatively affecting other network users.
  • UDP does not guarantee the order in which datagrams are delivered to the application.
  • UDP is unreliable. UDP is only responsible for sending the data passed by the application to the IP layer, and cannot guarantee that the datagram arrives at the destination. Reliable delivery needs to be implemented by the application.
  • UDP supports multicast delivery.
  • UDP is a transport layer protocol that preserves message boundaries.

Message Boundaries

Each time the application requests UDP output, a UDP datagram will be generated, thereby sending an IP datagram, and each time the receiving end requests UDP reception, a complete UDP packet (if any) will be received, which is different from the data stream-oriented TCP.

Suppose host A sends data to host B twice, the first time with 4 bytes "abcd" and the second time with 3 bytes "xyz", and host B receives it twice and returns two messages "abcd" and "xyz" respectively, or it can return two messages "xyz" and "abcd" (the order is not important), then this is how to preserve message boundaries.

UDP is a transport layer protocol that preserves message boundaries. An application that uses UDP to communicate generates an IP datagram for each send operation (without considering fragmentation). This constrains the amount of data sent each time to be no larger than the MTU (maximum transmission unit). The receiving end returns the complete payload of a UDP datagram each time it receives data, and will not return half of the datagram payload.

TCP is a streaming protocol that does not preserve message boundaries. There is no corresponding relationship between the number of times the sender calls send and the amount of data sent each time, and the number of times the receiver calls receive and the amount of data received each time. Therefore, applications using TCP need to handle message boundaries.

UDP datagram encapsulation format

The IPv4 protocol field uses the value 17 to identify UDP. The UDP datagram header is usually 8 bytes. The IPv4 header is followed by the UDP header, and then the UDP data payload (if any).

IPv4 UDP datagram encapsulation format

The UDP header corresponding to the IPv4 encapsulation packet consists of the source port number, destination port number, length, and checksum, and each field is 2 bytes.

(1) Port number is a purely abstract identifier that is not related to any physical entity.

The port number is used to help the protocol distinguish between the sending and receiving processes. After the kernel layer at the receiving end receives the IP datagram from the network card and identifies the UDP datagram (IP datagram header protocol field value = 17), it will map it to the corresponding process based on the destination port number in the UDP header and hand the UDP datagram to the corresponding process for processing. This mapping relationship is managed and maintained by the system kernel.

UDP header and payload

The destination port number is required, but the source port number is optional. If the sender of the datagram does not need a reply from the other party, the source port number can be set to 0.

Because the IP layer distributes incoming IP datagrams to specific transport protocols (TCP or UDP, etc.) based on the protocol type field in the IP header, and then distributes the protocol data to different processes based on the port number at the transport protocol layer. Therefore, the port number is protocol-independent, and the same port number for different protocols will not cause distribution confusion.

For example, two network service processes on a machine use the same IP address and port number, but one uses the TCP protocol and the other uses the UDP protocol. This is no problem.

(2) The length field is the total length of the UDP header and UDP data in bytes. Since the UDP header length is 8 and UDP datagrams with empty data are allowed, this means that the minimum value of the length field is 8. The UDP length value is redundant because it can be derived by subtracting the length of the IP header from the total length of the IP datagram.

(3) Checksum, which covers the UDP header, UDP data, and a pseudo-header. It is calculated by the initial sender and checked by the final destination to determine whether the datagram has errors during network transmission, such as a bit changing from 1 to 0.

How to achieve reliable transmission in applications using UDP

As we all know, UDP is unreliable and does not guarantee the order.

(1) What is unreliable? A sends a UDP datagram to B. The UDP datagram may not be correctly delivered to the receiving end B. Due to various reasons such as network quality, the packet may be lost. IP datagrams are delivered on a best-effort basis, leaving everything to chance.

Is there any way to guarantee that the sent UDP will reach the destination? Sorry, I can't guarantee it. I can't do it.

So what does the reliable transmission provided by TCP mean? The reliable transmission provided by TCP does not mean no packet loss, because TCP also relies on IP (IP is unreliable) to achieve datagram delivery. The reliability of TCP means that the lost packets will be retransmitted until they are correctly delivered, and then the next datagram will be transmitted.

So how does TCP achieve reliable transmission? It's very simple, with ACK + retransmission of lost packets. So if UDP wants to provide reliable transmission, it can also refer to the implementation mechanism of TCP, but TCP is implemented at the kernel layer, while UDP-based applications can achieve reliable transmission at the application layer. To do ACK + retransmission of lost packets, some additional information is needed, such as the packet sequence number, which can be put in the Payload. Just agree on the structure layout of this additional information in the Payload.

(2) What does it mean that the order is not guaranteed? A sends two UDP datagrams to B. The two UDP datagrams will be encapsulated into two IP datagrams and transmitted through the IP protocol. Because the two IP datagrams are routed independently, which one will arrive first? Not necessarily. It depends on my mood.

Is there any way to ensure that UDP datagrams arrive at the destination in the order in which they are sent by the sender? Sorry, it is not possible.

Therefore, the ordering provided by TCP is actually just the reordering of IP datagrams according to the sending order at the receiving end. Obviously, UDP also needs some additional information to support reordering, and it can only be carried through the payload, not like TCP (some fields in the TCP header are used for reordering at the receiving end).

In summary, UDP only provides end-to-end services for applications on the simplest end hosts. If you want to provide other features, please refer to the ideas of TCP to implement them.

This has advantages: because it is simple, the overhead is very low. And in some application scenarios, packet loss and disorder can be tolerated, so UDP is very suitable. Porsche is good, but tractors are better for pulling bricks.

UDP Socket Programming

There are not many APIs for UDP Socket network programming. socket() is used to create a socket, close() is used to close the socket, sendto() is used to send data, and recvfrom() is used to receive data.

bind() means binding, TCP can be bound, UDP can also be bound, bind for UDP is equivalent to telling the kernel: this socket is bound to a remote end of the network

Before bind, you can only use the sendto() interface (specify the destination through parameters). UDP socket recv() returns the data part (Payload) of the UDP datagram, excluding the UDP datagram header. This is because the fields in the UDP header are used for distribution or verification and do not need to be transparently transmitted to the application.

The network application Server/Client developed using UDP socket, the network IO related operations and processes are shown in the figure below:

UDP Socket Programming

<<:  In the 5G era, will WiFi be eliminated or become more powerful?

>>:  If the Internet connection becomes slow, you don't need to change the router and restart it to restore it to full health

Recommend

5G speed may be slower than 4G?

In 2020, as the first year of 5G, 5G network cons...

AllHost: £9.5/quarter-1GB/30G NVMe/8TB@2Gbps/UK VPS

AllHost is a UK-based company (company number 134...

More secure: Windows 10 will block installation of uncertified drivers

With the October 2020 Patch Tuesday cumulative up...

Practical analysis of network log correlation on OSSIM platform

This article mainly conducts an in-depth analysis...

Learn how to manage and protect cabling systems

When choosing the transmission medium for the cab...

Three steps to converge cloud and edge computing for IoT

The Internet of Things has grown rapidly over the...

...

Summary of the "thread" model in IO flow

1. Basic Introduction In the IO flow network mode...

Multicast Protocol: The "Group Chat Master" of the Internet World

Fans who love to think, have you ever had these c...