UDP, you need to feed the mouse!

UDP, you need to feed the mouse!

[[353775]]

The transport layer is located between the application layer and the network layer. It is the fourth layer in the OSI layering system and is also an important part of the network architecture. The transport layer is mainly responsible for end-to-end communication on the network.

The transport layer plays a vital role in the communication between applications running on different hosts. Let's discuss the protocol part of the transport layer.

Transport Layer Overview

The transport layer of a computer network is very similar to a highway. A highway is responsible for transporting people or goods from one end to the other, while the transport layer of a computer network is responsible for transporting messages from one end to the other. This end refers to the end system. In a computer network, any medium that can exchange information can be called an end system, such as mobile phones, network media, computers, operators, etc.

In the process of transporting messages at the transport layer, certain protocol specifications will be followed, such as the data limit for one transmission, the choice of transport protocol, etc. The transport layer implements the function of allowing two unrelated hosts to communicate logically, which looks like connecting the two hosts.

The transport layer protocol is implemented in the end system, not in the router. Routing is only responsible for identifying addresses and forwarding. This is like a courier delivering a package. Of course, it is up to the recipient of the address, that is, the person in Room xxx, Unit xxx, Building xxx, to make the decision!

How does TCP determine which port it is?

Remember the structure of the data packet? Let's review it here

After the data packet passes through each layer, the protocol of that layer will attach a packet header to the data packet. A complete packet header diagram is shown above.

After the data is transferred to the transport layer, a TCP header is attached to it, which contains the source port number and the destination port number.

At the sending end, the transport layer converts the message received from the sending application process into transport layer packets, which are also called segments in computer networks. The transport layer generally divides the segments into smaller pieces, adds a transport layer header to each piece, and sends it to the destination.

During the sending process, the optional transport layer protocols (that is, transportation tools) are mainly TCP and UDP. The selection and characteristics of these two transport protocols are also the focus of our discussion.

TCP and UDP prerequisites

In the TCP/IP protocol, the most representative ones that can realize the transport layer function are TCP and UDP. When talking about TCP and UDP, we must first talk about the definitions of these two protocols.

TCP is called Transmission Control Protocol (TCP). From the name, we can roughly know that the TCP protocol has the function of controlling transmission, which is mainly reflected in its controllability. Controllability means reliability, which is indeed the case. TCP provides a reliable, connection-oriented service for the application layer, which can reliably transmit packets to the server.

UDP is called User Datagram Protocol (UDP). As the name suggests, UDP focuses on datagrams. It provides a method for the application layer to directly send datagrams without establishing a connection.

Why are there so many terms in computer networks to describe a piece of data?

In computer networks, different layers have different descriptions. We mentioned above that the packets in the transport layer are called segments. In addition, the packets in TCP are also called segments. However, the packets in UDP are called datagrams, and the packets in the network layer are also called datagrams.

However, for the sake of uniformity, we generally call TCP and UDP messages as message segments in computer networks. This is equivalent to an agreement, and there is no need to worry too much about what to call it.

Sockets

Before TCP or UDP sends specific message information, it needs to go through a door first. This door is the socket. The socket is connected to the application layer upward and the network layer downward. In the operating system, the operating system provides an interface (Application Programming Interface) for applications and hardware respectively. In computer networks, the socket is also an interface, and it also has an interface API.

When using TCP or UDP for communication, the socket API is widely used. This API is used to set the IP address and port number to send and receive data.

Now we know that there is no necessary connection between Socket and TCP/IP. The emergence of Socket only facilitates the use of TCP/IP. How to use it conveniently? You can directly use the following methods of Socket API.

Socket Type

There are three main types of sockets. Let's take a look at each one.

  • Datagram sockets: Datagram sockets provide a connectionless service and cannot guarantee the reliability of data transmission. Data may be lost or duplicated during transmission, and there is no guarantee that data will be received in order. Datagram sockets use UDP (User Datagram Protocol) protocol for data transmission. Since datagram sockets cannot guarantee the reliability of data transmission, the program needs to handle the possible data loss accordingly.
  • Stream sockets: Stream sockets are used to provide connection-oriented, reliable data transmission services. They can ensure the reliability and order of data. The reason why stream sockets can provide reliable data services is that they use the Transmission Control Protocol, namely TCP (The Transmission Control Protocol) protocol.
  • Raw sockets: Raw sockets allow IP packets to be sent and received directly without any protocol-specific transport layer format. Raw sockets can read and write IP packets that are not processed by the kernel.

Socket Processing

In a computer network, in order to achieve communication, at least two end systems are required, and at least one pair of two sockets is required. The following is the communication process of the socket.

  • The socket API is used to create endpoints in a communication link. After the creation is completed, a socket descriptor describing the socket will be returned.

Just as file descriptors are used to access files, socket descriptors are used to access sockets.

  • Once an application has a socket descriptor, it can bind a unique name to the socket. The server must bind a name to be accessible on the network.
  • After the server has allocated a socket and bound a name to the socket using bind, the listen API will be called. listen indicates the client's willingness to wait for a connection, and listen must be called before the accept API.
  • The client application calls connect on a stream socket (based on TCP) to initiate a connection request with the server.
  • The server application uses the acceptAPI to accept client connection requests. The server must successfully call bind and listen before calling the accept API.
  • After establishing a connection between stream sockets, the client and server can initiate read/write API calls.
  • When the server or client wants to stop the operation, it calls the close API to release all system resources acquired by the socket.

Although the sockets API is located in the communication model between the application layer and the transport layer, the sockets API is not part of the communication model. The sockets API allows applications to interact with the transport layer and the network layer.

Before we continue, let's play a short episode and talk briefly about IP.

Let’s talk about IP

IP is the abbreviation of Internet Protocol, which is the network layer protocol in the TCP/IP system. The original intention of designing IP was to solve two types of problems:

Improving network scalability: achieving large-scale network interconnection

Decouple the application layer and the link layer to allow them to develop independently.

IP is the core of the entire TCP/IP protocol suite and the foundation of the Internet. In order to achieve large-scale network interconnection, IP pays more attention to adaptability, simplicity and operability, and makes certain sacrifices in reliability. IP does not guarantee the delivery time limit and reliability of packets, and the transmitted packets may be lost, repeated, delayed or out of order.

We know that the next layer of the TCP protocol is the IP protocol layer. Since IP is unreliable, how can we ensure that the data can arrive accurately?

This involves the issue of TCP transmission mechanism, which we will discuss later when we talk about TCP.

Port Number

Before talking about port numbers, let's talk about file descriptions and the relationship between sockets and port numbers.

In order to facilitate the use of resources, improve the performance, utilization and stability of the machine, etc., our computers have a layer of software called an operating system, which is used to help us manage the resources that the computer can use. When our program wants to use a resource, it can apply to the operating system, and then the operating system allocates and manages the resource for our program. Usually when we want to access a kernel device or file, the program can call the system function, and the system will open the device or file for us, and then return a file descriptor fd (or ID, which is an integer). We can only access the device or file through this file descriptor. It can be considered that the number corresponds to the open file or device.

When our program wants to use the network, it needs to use the corresponding operating system kernel operations and network card devices, so we can apply to the operating system, and then the system will create a socket for us and return the ID of this socket. In the future, when our program wants to use network resources, it only needs to operate the number ID of this socket. And each of our network communication processes corresponds to at least one socket. Writing data to the socket ID is equivalent to sending data to the network, and reading data from the socket is equivalent to receiving data. And these sockets have a unique identifier-port number.

The port number is a 16-bit non-negative integer ranging from 0 to 65535. This range is divided into three different port number segments and is allocated by the Internet Assigned Numbers Authority (IANA).

  • Well-known/standard port number, which ranges from 0 to 1023
  • Registered port number, range is 1024 - 49151
  • Private port number, ranging from 49152 to 6553

A computer can run multiple applications. When a segment arrives at the host, which application should it be transmitted to? How do you know that this segment is passed to the HTTP server instead of the SSH server?

Is it based on the port number? When the message reaches the server, the port number is used to distinguish different applications, so the port number should be used to distinguish them.

Let me give you an example to refute cxuan. If two data arrive at the server, both are sent from port 80. How do you distinguish them? Or if two data arrive at the server from the same port but different protocols, how do you distinguish them?

Therefore, it is obviously not enough to identify a message only by the port number.

The source IP address, destination IP address, source port number, and destination port number are generally used to distinguish packets on the Internet. If any of these items are different, they are considered to be different message segments. These are also the basis for demultiplexing and multiplexing.

Determine the port number

Before actual communication, you need to determine the port number. There are two ways to determine the port number:

Standard port numbers

The standard port numbers are statically assigned. Each program has its own port number, and each port number has a different purpose. A port number is a 16-bit number between 0 and 65535. Port numbers in the range of 0 to 1023 are dynamically assigned port numbers. For example, HTTP uses port 80 to identify, FTP uses port 21 to identify, and SSH uses port 22 to identify. This type of port number has a special name, called the Well-Known Port Number.

Port number assigned by timing

The second way to assign port numbers is a dynamic allocation method. In this method, the client application does not need to set the port number by itself. The operating system can allocate non-conflicting port numbers to each application. This mechanism of dynamically allocating port numbers can identify different connections even if the TCP connection is initiated by the same client.

Multiplexing and Demultiplexing

We have talked about how each socket on the host is assigned a port number. When a segment arrives at the host, the transport layer checks the destination port number in the segment and directs it to the corresponding socket. The data in the segment then enters the process to which it is connected through the socket. Let's talk about the concepts of multiplexing and demultiplexing.

There are two types of multiplexing and demultiplexing, namely connectionless multiplexing (demultiplexing) and connection-oriented multiplexing (demultiplexing)

Connectionless multiplexing and demultiplexing

Developers will write code to determine whether the port number is a well-known port number or a time-assigned port number. If a port 10637 in host A wants to send data to port 45438 in host B, the transport layer uses the UDP protocol. After the data is generated in the application layer, it will be processed in the transport layer, and then the data will be encapsulated in the network layer to obtain an IP datagram. The IP data packet is delivered to host B through the link layer on a best-effort basis, and then host B will check the port number in the message segment to determine which socket it belongs to. This series of processes is shown below

A UDP socket is a two-tuple that contains the destination IP address and the destination port number.

Therefore, if two UDP segments have different source IP addresses and/or the same source port number, but have the same destination IP address and destination port number, then the two segments will be located at the same destination process through the socket.

Let's think about a question here. When host A sends a message to host B, why does it need to know the source port number? For example, if I tell a girl that I am interested in you, does she need to know which organ of mine sent this message? Isn't it enough to know that I am interested in you? In fact, it is necessary, because if a girl wants to express that she is interested in you, she might kiss you, so she needs to know where to kiss you, right?

That is, in the message segment from A to B, the source port number will be used as part of the return address. That is, when B needs to send a message segment back to A, B needs to take the source port number from A to B, as shown in the following figure

Connection-oriented multiplexing and demultiplexing

If connectionless multiplexing and demultiplexing refer to UDP, then connection-oriented multiplexing and demultiplexing refer to TCP. The difference between TCP and UDP in message structure is that UDP is a two-tuple while TCP is a four-tuple, namely source IP address, destination IP address, source port number, destination port number, which we mentioned above. When a TCP message segment arrives at a host from the network, the host will disassemble it to the corresponding socket according to these four values.

The figure above shows the process of connection-oriented multiplexing and demultiplexing. In the figure, host C sends two HTTP requests to host B, and host A sends one HTTP request to host C. Hosts A, B, and C all have their own unique IP addresses. When host C sends an HTTP request, host B can decompose the two HTTP connections because the two source port numbers of the requests sent by host C are different. So for host B, these are two requests, and host B can decompose them. For host A and host C, these two hosts have different IP addresses, so for host B, they can also be decomposed.

UDP

Finally, we started to explore the UDP protocol. Let’s go!

UDP stands for User Datagram Protocol (UDP). UDP provides a way for applications to send encapsulated IP data packets without establishing a connection. If the application developer chooses UDP instead of TCP, then the application is equivalent to dealing directly with IP.

The data passed from the application will be attached with multiplexed/demultiplexed source and destination port number fields and other fields, and then the formed message will be passed to the network layer, which will encapsulate the transport layer message segment into an IP datagram and then deliver it to the target host as much as possible. The most critical point is that when using the UDP protocol to pass the datagram to the target host, there is no handshake between the transport layer entities of the sender and the receiver. Because of this, UDP is called a connectionless protocol.

UDP Features

UDP protocol is generally used as a transport layer protocol for streaming media applications, voice communication, and video conferencing. The DNS protocol that we all know also uses UDP protocol at the bottom. The main reasons why these applications or protocols choose UDP are as follows:

  • Fast speed. When using UDP protocol, as long as the application process transmits data to UDP, UDP will package the data into UDP segments and immediately pass it to the network layer. TCP has congestion control function. It will judge the congestion of the Internet before sending. If the Internet is extremely congested, it will inhibit the sender of TCP. The purpose of using UDP is to achieve real-time performance.
  • There is no need to establish a connection. TCP needs to go through a three-way handshake operation before data transmission, while UDP can transmit data without any preparation. Therefore, UDP has no delay in establishing a connection. If we use TCP and UDP to compare developers: TCP is the kind of engineer who has to design everything well, and will not develop without design. He needs to take all factors into consideration before starting to work! So he is very reliable; while UDP is the kind of engineer who just works hard and starts working as soon as he receives the project requirements, regardless of design or technology selection. This kind of developer is very unreliable, but suitable for rapid iterative development because he can get started right away!
  • No connection state. TCP needs to maintain the connection state in the end system. The connection state includes the receiving and sending buffers, congestion control parameters, and parameters of sequence numbers and acknowledgment numbers. UDP does not have these parameters, nor does it have sending buffers and receiving buffers. Therefore, some servers dedicated to a specific application can generally support more active users when the application runs on UDP.
  • The packet header overhead is small. Each TCP segment has a 20-byte header overhead, while UDP has only an 8-byte overhead.

It should be noted here that not all application layers using the UDP protocol are unreliable. Applications can achieve reliable data transmission by themselves by adding confirmation and retransmission mechanisms. Therefore, the biggest feature of using the UDP protocol is its high speed.

UDP message structure

Let's take a look at the UDP message structure. Each UDP message is divided into two parts: the UDP header and the UDP data area. The header consists of four 16-bit (2-byte) fields, which respectively describe the source port, destination port, message length and checksum of the message.

  • Source Port: This field occupies the first 16 bits of the UDP header and usually contains the UDP port used by the application sending the datagram. The receiving application uses the value of this field as the destination address to send the response. This field is optional and sometimes the source port number is not set. If there is no source port number, it defaults to 0 and is usually used in communications that do not require return messages.
  • Destination Port: Indicates the receiving port, the field length is 16 bits
  • Length: This field occupies 16 bits and indicates the length of the UDP datagram, including the UDP header and the UDP data length. Since the UDP header length is 8 bytes, the minimum value is 8 and the maximum length is 65535 bytes.
  • Checksum: UDP uses checksum to ensure data security. UDP checksum also provides error detection function, which is used to check whether the integrity of data has changed during the process of sending the message segment from the source to the target host. The sender's UDP performs the inverse operation on the sum of the 16-bit words in the message segment. Bit overflow encountered during the summation will be ignored. For example, in the following example, three 16-bit numbers are added.

The first two sums of these 16 bits are

Then add the above result to the third 16-bit number

The last bit added will overflow, and the overflow bit 1 will be discarded, and then the inverse operation will be performed, which is to change all 1s to 0s and 0s to 1s. Therefore, the inverse of 1000 0100 1001 0101 is 0111 1011 0110 1010, which is the checksum. If there is no error in the data at the receiving end, all 4 16-bit values ​​will be calculated, including the checksum. If the final result is not 1111 1111 1111 1111, it means that there is an error in the data during transmission.

Let's think about a question, why does UDP provide error detection function?

This is actually an end-to-end design principle, which states that the probability of various errors occurring during transmission should be reduced to an acceptable level.

When a file is transferred from host A to host B, that is, when hosts A and B communicate, there are three steps: first, host A reads the file from the disk and groups the data into packets, then the packets are transmitted to host B through the network connecting host A and host B, and finally host B receives the packets and writes them to the disk. In this seemingly simple but actually complex process, normal communication may be affected due to some reasons. For example, file read and write errors on the disk, buffer overflow, memory errors, network congestion, etc. These factors may cause errors or loss of data packets, which shows that the network used for communication is unreliable.

Since communication can be achieved through the above three links, we wonder whether we can add an error detection and correction mechanism to one of the links to check the information?

The network layer certainly cannot do this, because the main purpose of the network layer is to increase the data transmission rate. The network layer does not need to consider the integrity of the data. The integrity and correctness of the data can be left to the end system to detect. Therefore, in data transmission, the network layer can only be required to provide the best possible data transmission service, and it is impossible to expect the network layer to provide data integrity services.

The reason why UDP is unreliable is that although it provides error detection function, it has no ability to recover from errors and no retransmission mechanism.

This article is reprinted from the WeChat public account "Programmer cxuan", which can be followed through the following QR code. To reprint this article, please contact the programmer cxuan public account.

<<:  5G and satellite, what is the relationship?

>>:  Wi-Fi 6 testing completes, global deployment to begin in 2021

Recommend

Understand RFID technology in one article! 3 types, 6 fields, 7 major advantages

Why can our express deliveries always be accurate...

What is the difference between WiFi and Ethernet connections?

In today's networking world, Wifi and Etherne...

Operations and Continuous Delivery

Operations and Continuous Delivery In the era of ...

SKT launches online-only plans for 5G and 4G customers

South Korean telecom operator SK Telecom recently...

5G may be just around the corner as a universal core for fiber

Convergence between wired and wireless networks i...

The love-hate relationship between Bluetooth 5 and WiFi

It can be said that Bluetooth and Wi-Fi each have...

Bryan to launch fiber optic internet service

The city of Bryan, Texas, recently announced that...

China Mobile has built more than 410,000 5G base stations

On April 25, China Mobile General Manager Dong Xi...

How to configure floating routing?

[[416937]] Experimental requirements ISP-1 and IS...