[[386167]] This article is reprinted from the WeChat public account "Linux Kernel Matters", written by songsong001. Please contact the Linux Kernel Matters public account to reprint this article. This article mainly analyzes the implementation of the TCP protocol. However, since the TCP protocol is relatively complex, it is analyzed in several articles. This article mainly introduces the three-way handshake process when the TCP protocol establishes a connection. The TCP protocol should be the most complex protocol in the TCP/IP protocol stack (no one else). The complexity of the TCP protocol comes from its connection orientation and guaranteed reliable transmission. As shown in the figure below, the TCP protocol is located at the fourth layer of the TCP/IP protocol stack, which is the transport layer, and is built on the IP protocol at the network layer. However, since the IP protocol is a connectionless and unreliable protocol, the TCP protocol must maintain a connection state for each CS (Client - Server) connection in order to achieve connection-oriented reliable transmission. Therefore, the TCP protocol connection only maintains a connection state, not a real connection. Since this article mainly introduces how the Linux kernel implements the TCP protocol, if you are not very clear about the principles of the TCP protocol, you can refer to the famous "TCP/IP Protocol Detailed Explanation". Three-way handshake process We know that the TCP protocol is built on the connectionless IP protocol. In order to achieve connection-oriented, the TCP protocol uses a negotiation method to establish a connection state, called: three-way handshake. The process of the three-way handshake is as follows: The process of establishing a connection is as follows: - The client needs to send a SYN packet to the server (including the client initialization sequence number) and set the connection state to SYN_SENT.
- After the server receives the SYN packet from the client, it needs to reply with a SYN+ACK packet to the client (including the server initialization sequence number) and set the connection state to SYN_RCVD.
- After the client receives the SYN+ACK packet from the server, it sets the connection state to ESTABLISHED (indicating that the connection has been established) and replies with an ACK packet to the server.
- After receiving the ACK packet from the client, the server sets the connection status to ESTABLISHED (indicating that the connection has been established).
After the above process is completed, a TCP connection is established. TCP Header To analyze the TCP protocol, it is necessary to understand the TCP protocol header. We use the following picture to introduce the format of the TCP header: The following describes the functions of each field in the TCP header: - Source port number: used to specify the port to which the local program is bound.
- Destination port number: used to specify the port to which the remote program is bound.
- Sequence number: The sequence number used when sending data locally.
- Confirmation number: It is used to confirm the local receipt of the data sequence number sent by the remote end.
- Header Length: Indicates the length of the TCP header.
- Flag bit: used to indicate the type of TCP data packet.
- Window size: used for flow control and indicates the ability of the remote end to receive data.
- Checksum: Used to verify whether the data packet is damaged during transmission.
- Urgent pointer: Generally rarely used, used to specify the offset of urgent data (valid when the URG flag is 1).
- Optional: The options part of TCP.
Let's take a look at how the Linux kernel defines the structure of the TCP header, as follows: - struct tcphdr {
- __u16 source; // source port
- __u16 dest; // destination port
- __u32 seq; // sequence number
- __u32 ack_seq; // confirmation number
- __u16 doff:4, //header length
- res1:4, // reserved
- res2:2, // reserved
- urg:1, // Whether to include urgent data
- ack:1, // Whether it is ACK packet
- psh:1, //Whether to push the package
- rst:1, //Reset package?
- syn:1, // Is it a SYN packet?
- fin:1; // Is it a FIN packet?
- __u16 window; // sliding window
- __u16 check ; // checksum
- __u16 urg_ptr; // Urgent pointer
- };
From the above definition, we can see that the fields of the structure tcphdr correspond one-to-one to the fields of the TCP header. Client connection process A TCP connection is initiated by the client. When the client program calls the connect() system call, a TCP connection is established with the server program. The prototype of the connect() system call is as follows: - int connect ( int sockfd, const struct sockaddr *addr, socklen_t addrlen);
Here are the functions of the various parameters of the connect() system call: - sockfd: The file handle created by the socket() system call.
- addr: specifies the remote IP address and port to connect to.
- addrlen: specifies the length of parameter addr.
When the client calls the connect() function, it triggers the kernel to call the sys_connect() kernel function. The sys_connect() function is implemented as follows: - int sys_connect( int fd, struct sockaddr *useraddr, int addrlen)
- {
- struct socket *sock;
- char address[MAX_SOCK_ADDR];
- int err;
- ...
- // Get the socket object corresponding to the file handle
- sock = sockfd_lookup(fd, &err);
- ...
- // Copy the remote IP address and port information to connect to from user space
- err = move_addr_to_kernel(uservaddr, addrlen, address);
- ...
- // Call inet_stream_connect() function to complete the connection operation
- err = sock->ops-> connect (sock, (struct sockaddr *)address, addrlen,
- sock->file->f_flags);
- ...
- return err;
- }
The sys_connect() kernel function mainly completes three steps: - Call the sockfd_lookup() function to obtain the socket object corresponding to the fd file handle.
- Call move_addr_to_kernel() function to copy the remote IP address and port information to be connected from user space.
- Call inet_stream_connect() function to complete the connection operation.
We continue to analyze the implementation of the inet_stream_connect() function: - int inet_stream_connect(struct socket *sock, struct sockaddr * uaddr,
- int addr_len, int flags)
- {
- struct sock *sk = sock->sk;
- int err;
- ...
- if (sock->state == SS_CONNECTING) {
- ...
- } else {
- // Try to automatically bind a local port
- if (inet_autobind(sk) != 0)
- return (-EAGAIN);
- ...
- // Call tcp_v4_connect() to connect
- err = sk->prot-> connect (sk, uaddr, addr_len);
- if (err < 0)
- return (err);
- sock->state = SS_CONNECTING;
- }
- ...
- // If the socket is set to non-blocking and the connection has not been established, then return EINPROGRESS error
- if (sk->state != TCP_ESTABLISHED && (flags & O_NONBLOCK))
- return (-EINPROGRESS);
-
- // Wait for the connection process to complete
- if (sk->state == TCP_SYN_SENT || sk->state == TCP_SYN_RECV) {
- inet_wait_for_connect(sk);
- if (signal_pending( current ))
- return -ERESTARTSYS;
- }
- sock->state = SS_CONNECTED; // Set the socket status to connected
- ...
- return (0);
- }
The main operations of the inet_stream_connect() function are as follows: - Call the inet_autobind() function to try to automatically bind to a local port.
- Call the tcp_v4_connect() function to perform the TCP protocol connection operation.
- If the socket is set to non-blocking, and the connection has not yet been established, then the EINPROGRESS error is returned.
- Call the inet_wait_for_connect() function to wait for the connection to the server to complete.
- Set the socket status to SS_CONNECTED, indicating that the connection has been established.
In the above steps, the most important thing is to call the tcp_v4_connect() function for the connection operation. Let's analyze the implementation of the tcp_v4_connect() function: - int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
- {
- struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp);
- struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;
- struct sk_buff *buff;
- struct rtable *rt;
- u32 daddr, nexthop;
- int tmp;
- ...
- nexthop = daddr = usin->sin_addr.s_addr;
- ...
- // 1. Get the routing information for sending data
- tmp = ip_route_connect(&rt, nexthop, sk->saddr,
- RT_TOS(sk->ip_tos)|RTO_CONN|sk->localroute,
- sk->bound_dev_if);
- ...
- dst_release(xchg(&sk->dst_cache, rt)); // 2. Set sk's routing information
-
- // 3. Apply for a skb data packet object
- buff = sock_wmalloc(sk, (MAX_HEADER + sk->prot->max_header), 0, GFP_KERNEL);
- ...
- sk->dport = usin->sin_port; // 4. Set the destination port
- sk->daddr = rt->rt_dst; // 5. Set the destination IP address
- ...
- if (!sk->saddr)
- sk->saddr = rt->rt_src; // 6. If the source IP address is not specified, the source IP address of the routing information is used
- sk->rcv_saddr = sk->saddr;
- ...
- // 7. Initialize TCP sequence number
- tp->write_seq = secure_tcp_sequence_number(sk->saddr, sk->daddr, sk->sport,
- usin->sin_port);
- ...
- // 8. Reset TCP maximum segment size
- tp->mss_clamp = ~0;
- ...
- // 9. Call tcp_connect() function to continue the connection operation
- tcp_connect(sk, buff, rt->u.dst.pmtu);
- return 0;
- }
The tcp_v4_connect() function just does some preparation before the connection, as follows: - Call the ip_route_connect() function to obtain the routing information for sending data, and save the routing information to the routing cache of the socket object.
- Call the sock_wmalloc() function to apply for an skb data packet object.
- Set the destination port and destination IP address.
- If the source IP address is not specified, the source IP address in the routing information is used.
- Call the secure_tcp_sequence_number() function to initialize the TCP sequence number.
- Reset the maximum segment size of the TCP protocol.
- Call the tcp_connect() function to send a SYN packet to the server program.
Since the first step of the TCP three-way handshake is for the client to send a SYN packet to the server, we mainly focus on the implementation of the tcp_connect() function, whose code is as follows: - void tcp_connect(struct sock *sk, struct sk_buff *buff, int mtu)
- {
- struct dst_entry *dst = sk->dst_cache;
- struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp);
-
- skb_reserve(buff, MAX_HEADER + sk->prot->max_header); // Reserve all protocol header space
-
- tp->snd_wnd = 0;
- tp->snd_wl1 = 0;
- tp->snd_wl2 = tp->write_seq;
- tp->snd_una = tp->write_seq;
- tp->rcv_nxt = 0;
- sk->err = 0;
- // Set the TCP header length
- tp->tcp_header_len = sizeof(struct tcphdr) +
- (sysctl_tcp_timestamps ? TCPOLEN_TSTAMP_ALIGNED : 0);
- ...
- tcp_sync_mss(sk, mtu); // Set the maximum length of TCP segment
- ...
- TCP_SKB_CB(buff)->flags = TCPCB_FLAG_SYN; // Set the SYN flag to 1 (indicating that this is a SYN packet)
- TCP_SKB_CB(buff)->sacked = 0;
- TCP_SKB_CB(buff)->urg_ptr = 0;
- buff->csum = 0;
- TCP_SKB_CB(buff)->seq = tp->write_seq++; // Set the sequence number
- TCP_SKB_CB(buff)->end_seq = tp->write_seq; // Set confirmation number
- tp->snd_nxt = TCP_SKB_CB(buff)->end_seq;
-
- // Initialize the size of the sliding window
- tp->window_clamp = dst->window;
- tcp_select_initial_window(sock_rspace(sk)/2, tp->mss_clamp,
- &tp->rcv_wnd, &tp->window_clamp,
- sysctl_tcp_window_scaling, &tp->rcv_wscale);
- ...
- tcp_set_state(sk, TCP_SYN_SENT); // Set the socket state to SYN_SENT
-
- // Call the tcp_v4_hash() function to add the socket to the tcp_established_hash hash table
- sk->prot->hash(sk);
-
- tp->rto = dst->rtt;
- tcp_init_xmit_timers(sk); // Set the timeout retransmission timer
- ...
- // Add skb to the write_queue queue for retransmission
- __skb_queue_tail(&sk->write_queue, buff);
- TCP_SKB_CB(buff)-> when = jiffies;
- ...
- // Call tcp_transmit_skb() function to build SYN packet and send it to the server program
- tcp_transmit_skb(sk, skb_clone(buff, GFP_KERNEL));
- ...
- }
Although the implementation of the tcp_connect() function is relatively long, the logic is relatively simple, which is to set the values of each field in the TCP header and then send the data packet to the server. The main work of the tcp_connect() function is listed below: - Set the SYN flag in the TCP header to 1 (indicating that this is a SYN packet).
- Set the sequence number and acknowledgment number in the TCP header.
- Initialize the sliding window size.
- Set the socket status to SYN_SENT, refer to the status diagram of the three-way handshake above.
- Call the tcp_v4_hash() function to add the socket to the tcp_established_hash hash table, which is used to quickly find the corresponding socket object by IP address and port.
- Set the timeout retransmission timer.
- Add skb to the write_queue queue for timeout retransmission.
- Call the tcp_transmit_skb() function to build a SYN packet and send it to the server program.
Note: The Linux kernel uses the tcp_established_hash hash table to store all TCP connection socket objects, and the key value of the hash table is the connected IP and port, so the corresponding socket connection can be quickly found from the tcp_established_hash hash table by the connected IP and port. As shown in the following figure: Through the above analysis, building a SYN packet and sending it to the server is done by the tcp_transmit_skb() function, so let's analyze the implementation of the tcp_transmit_skb() function: - void tcp_transmit_skb(struct sock *sk, struct sk_buff *skb)
- {
- if (skb != NULL ) {
- struct tcp_opt *tp = &(sk->tp_pinfo.af_tcp);
- struct tcp_skb_cb *tcb = TCP_SKB_CB(skb);
- int tcp_header_size = tp->tcp_header_len;
- struct tcphdr *th;
- ...
- //TCP header pointer
- th = (struct tcphdr *)skb_push(skb, tcp_header_size);
- skb->h.th = th;
-
- skb_set_owner_w(skb, sk);
-
- // Build the TCP protocol header
- th->source = sk->sport; // Source port
- th->dest = sk->dport; // destination port
- th->seq = htonl(TCP_SKB_CB(skb)->seq); // Request sequence number
- th->ack_seq = htonl(tp->rcv_nxt); // Response sequence number
- th->doff = (tcp_header_size >> 2); // header length
- th->res1 = 0;
- *(((__u8 *)th) + 13) = tcb->flags; // Set the flag bit of the TCP header
-
- if (!(tcb->flags & TCPCB_FLAG_SYN))
- th->window = htons(tcp_select_window(sk)); // sliding window size
-
- th-> check = 0; // checksum
- th->urg_ptr = ntohs(tcb->urg_ptr); // Urgent pointer
- ...
- // Calculate the TCP header checksum
- tp->af_specific->send_check(sk, th, skb->len, skb);
- ...
- tp->af_specific->queue_xmit(skb); // Call ip_queue_xmit() function to send data packet
- }
- }
The implementation of the tcp_transmit_skb() function is relatively simple. It builds the TCP protocol header and then calls the ip_queue_xmit() function to hand over the data packet to the IP protocol for sending. At this point, the client has sent a SYN packet to the server, which means that the first step of the TCP three-way handshake has been completed. |