What is UDP? UDP is the abbreviation of User Datagram Protocol. It is a simple protocol, so simple that the UDP specification RFC0768 is only 3 pages. UDP is a transport layer protocol that works on top of the IP layer. UDP has two main extensions to IP:
IP provides a best-effort, connectionless datagram delivery service. IP implements routing and packet forwarding based on IP addresses, and can transmit an IP datagram from one host to another on the network. The IP address determines which host the IP datagram will be sent to. Therefore, IP provides host-to-host datagram transmission services. After the IP datagram arrives at the destination host, the IP module implemented in the kernel layer will be responsible for receiving the IP datagram on the network card. However, multiple processes usually run on the host at the same time. Which process should the IP datagram be handed over to? IP can't figure it out. The port number (located in the UDP header) determines which process on the host the datagram is handed over to. Therefore, UDP provides end-to-end service for applications running on the end host.
Characteristics of UDP
Message Boundaries Each time the application requests UDP output, a UDP datagram will be generated, thereby sending an IP datagram, and each time the receiving end requests UDP reception, a complete UDP packet (if any) will be received, which is different from the data stream-oriented TCP. Suppose host A sends data to host B twice, the first time with 4 bytes "abcd" and the second time with 3 bytes "xyz", and host B receives it twice and returns two messages "abcd" and "xyz" respectively, or it can return two messages "xyz" and "abcd" (the order is not important), then this is how to preserve message boundaries. UDP is a transport layer protocol that preserves message boundaries. An application that uses UDP to communicate generates an IP datagram for each send operation (without considering fragmentation). This constrains the amount of data sent each time to be no larger than the MTU (maximum transmission unit). The receiving end returns the complete payload of a UDP datagram each time it receives data, and will not return half of the datagram payload. TCP is a streaming protocol that does not preserve message boundaries. There is no corresponding relationship between the number of times the sender calls send and the amount of data sent each time, and the number of times the receiver calls receive and the amount of data received each time. Therefore, applications using TCP need to handle message boundaries. UDP datagram encapsulation format The IPv4 protocol field uses the value 17 to identify UDP. The UDP datagram header is usually 8 bytes. The IPv4 header is followed by the UDP header, and then the UDP data payload (if any). IPv4 UDP datagram encapsulation format The UDP header corresponding to the IPv4 encapsulation packet consists of the source port number, destination port number, length, and checksum, and each field is 2 bytes. 1. Port number is a purely abstract identifier that is not related to any physical entity. The port number is used to help the protocol distinguish between the sending and receiving processes. After the kernel layer at the receiving end receives the IP datagram from the network card and identifies the UDP datagram (IP datagram header protocol field value = 17), it will map it to the corresponding process based on the destination port number in the UDP header and hand the UDP datagram to the corresponding process for processing. This mapping relationship is managed and maintained by the system kernel. UDP header and payload The destination port number is required, but the source port number is optional. If the sender of the datagram does not need a reply from the other party, the source port number can be set to 0. Because the IP layer distributes incoming IP datagrams to specific transport protocols (TCP or UDP, etc.) based on the protocol type field in the IP header, and then distributes the protocol data to different processes based on the port number at the transport protocol layer. Therefore, the port number is protocol-independent, and the same port number for different protocols will not cause distribution confusion. For example, two network service processes on a machine use the same IP address and port number, but one uses the TCP protocol and the other uses the UDP protocol. This is no problem. 2. The length field is the total length of the UDP header and UDP data in bytes. Since the UDP header length is 8 and UDP datagrams with empty data are allowed, this means that the minimum length field value is 8. The UDP length value is redundant because it can be derived by subtracting the length of the IP header from the total length of the IP datagram. 3. Checksum, which covers the UDP header, UDP data and a pseudo header, is calculated by the initial sender and checked by the final destination. It is used to determine whether the datagram has an error during network transmission, such as a bit changing from 1 to 0. How to achieve reliable transmission in applications using UDP As we all know, UDP is unreliable and does not guarantee the order. 1. What is unreliable? A sends a UDP datagram to B. The UDP datagram may not be correctly delivered to the receiving end B, but due to various reasons such as network quality, the packet may be lost. IP datagrams are delivered on a best-effort basis, and everything is left to fate. Is there any way to guarantee that the sent UDP will reach the destination? Sorry, I can't guarantee it. I can't do it. So what does the reliable transmission provided by TCP mean? The reliable transmission provided by TCP does not mean no packet loss, because TCP also relies on IP (IP is unreliable) to achieve datagram delivery. The reliability of TCP means that the lost packets will be retransmitted until they are correctly delivered, and then the next datagram will be transmitted. So how does TCP achieve reliable transmission? It's very simple, with ACK + retransmission of lost packets. So if UDP wants to provide reliable transmission, it can also refer to the implementation mechanism of TCP, but TCP is implemented at the kernel layer, while UDP-based applications can achieve reliable transmission at the application layer. To do ACK + retransmission of lost packets, some additional information is needed, such as the packet sequence number, which can be put in the Payload. Just agree on the structure layout of this additional information in the Payload. 2. What does it mean that the order is not guaranteed? A sends two UDP datagrams to B. The two UDP datagrams will be encapsulated into two IP datagrams and transmitted through the IP protocol. Because the two IP datagrams are routed independently, which one will arrive first? Not necessarily, it depends on my mood. Is there any way to ensure that UDP datagrams arrive at the destination in the order in which they are sent by the sender? Sorry, it is not possible. Therefore, the ordering provided by TCP is actually just the reordering of IP datagrams according to the sending order at the receiving end. Obviously, UDP also needs some additional information to support reordering, and it can only be carried through the payload, not like TCP (some fields in the TCP header are used for reordering at the receiving end). In summary, UDP only provides end-to-end services for applications on the simplest end hosts. If you want to provide other features, please refer to the ideas of TCP to implement them. This has advantages: because it is simple, the overhead is very low. And in some application scenarios, packet loss and disorder can be tolerated, so UDP is very suitable. Porsche is good, but tractors are better for pulling bricks. UDP Socket Programming There are not many APIs for UDP Socket network programming. socket() is used to create a socket, close() is used to close the socket, sendto() is used to send data, and recvfrom() is used to receive data. bind() means binding, TCP can be bound, UDP can also be bound, bind for UDP is equivalent to telling the kernel: this socket is bound to a remote end of the network Before bind, you can only use the sendto() interface (specify the destination through parameters). UDP socket recv() returns the data part (Payload) of the UDP datagram, excluding the UDP datagram header. This is because the fields in the UDP header are used for distribution or verification and do not need to be transparently transmitted to the application. The network application Server/Client developed using UDP socket, the network IO related operations and processes are shown in the figure below: UDP Socket Programming |
<<: What is the difference between 5G bearer network?
>>: Current Affairs | How many cards does the US have left to crush China’s 5G?
Preface The daily bug troubleshooting series is a...
Recently, at China Mobile's 2021 Science and ...
[Original article from 51CTO.com] At the Huawei C...
According to foreign media reports, Honda and tel...
We know that IP data transmission in current bear...
Last year, Intel and Broadcom performed the first...
China Mobile released its unaudited financial dat...
As early as this year's "Two Sessions&qu...
PacificRack is now offering a promotion for Multi...
5G technology has the characteristics and advanta...
Edgevirt is a foreign hosting company established...
This sharing will be explained from four aspects....
According to foreign media, the US telecommunicat...
CloudSilk.io has launched a special promotion for...
On March 16, the "Network Security Pilot Dem...