Network programming - starting from establishing a TCP connection

Network programming - starting from establishing a TCP connection

[[388071]]

Preface

Network programming is something that almost every programming language involves. Although the calling methods of different languages ​​may be different, the principles behind them are the same. Therefore, this article will start with the establishment of TCP connections. Before that, it is assumed that you have a basic understanding of computer networks.

What does network programming do?

There are countless network applications nowadays, such as WeChat, which allows you to communicate with friends in a foreign country through the Internet; online videos, which allow you to watch your favorite videos through the Internet, and all of these are supported by network programming technology. In layman's terms, network programming can be considered as data exchange or transmission between two or more hosts (applications).

TCP: Transmission Control Protocol

Data exchange needs to follow certain rules, and these rules are protocols. Only by following the agreed rules can the two parties exchange data correctly. TCP is one of these protocols, which provides a connection-oriented, reliable byte stream service.

  • Connection-oriented: Two applications using TCP must establish a TCP connection before exchanging data.
  • Reliable: TCP has many mechanisms to ensure that data is not lost as much as possible.
  • Byte stream: does not distinguish between ASCII characters and binary data, and the data interpretation is left to the application layer

Why understand TCP

In fact, you can still write code without understanding the basic principles behind TCP, but when you encounter some strange problems that cannot be solved by the API instructions, you will be glad that you spent some time to learn TCP.

TCP connection establishment

You may have heard of the establishment of a TCP connection and can recite the process by heart. But I think it is necessary to sort it out again. The establishment of a TCP connection, that is, the process of the three-way handshake is as follows:

Let's try to describe the three-way handshake process:

  • The server starts and pauses, waiting in the LISTEN state.
  • The client initiates a connection request, sends the sequence number seq=X, and is in the SYN_SENT state
  • After receiving it, the server responds with ACK=X+1 and seq=Y and is in the SYN_RCVD state. The client's sending capability and the server's receiving capability are normal.
  • The client receives the ACK from the server, the connection is established, and it replies ACK to the server, which is in the ESTABLISHED state.
  • The server receives ACK, the connection is established, and it is in the ESTABLISHED state. The client's receiving capability is normal.

So far, the three-way handshake is completed. It should be noted that this is a three-way handshake under the normal process. The above-mentioned states can be viewed through the netstat command or ss command. Of course, some states exist for a short time and may not be observed.

Okay, so here comes the question:

  • Why three-way handshake?
  • What happens when you connect to a non-existent port?
  • What happens when you connect to a non-existent server host?
  • How does the initial seq change?
  • What is a semi-connected queue?
  • What is a SYN attack?

If you can answer all the above questions easily, you can skip the rest of this article.

Why three-way handshake?

This is a question that is almost always asked in an interview. A TCP connection is full-duplex, meaning that data can be transmitted in both directions simultaneously. Therefore, the process of establishing a connection must ensure that both parties have normal sending and receiving capabilities.

Is a four-way handshake possible? Absolutely! But it is not necessary! After the server receives SYN, it can reply ACK first and then send SYN, but these two messages can be sent together, so it is not necessary.

Is a two-handshake possible? Imagine a situation where a client initiates a connection request that stays in the network for a long time, so that it reaches the server only after the connection is established and disconnected. If a two-handshake is used at this time, the server will think that this message is a new connection request, so it will establish a connection and wait for the client to send data. However, the client actually does not send a request to establish the connection and will not pay attention to the server, so the server waits in vain and wastes resources.

Why does the server think that this late message is a new connection request? Because if a two-way handshake mechanism is used, the server cannot use SYN to determine whether this is a late or duplicate message or a normally arrived message. However, for a three-way handshake, even if this happens, a real connection will not be established on the server.

A normal three-way handshake

We use the tcpdump command and the nc command to observe a normal TCP connection establishment process. First, prepare to capture packets at terminal 1:

  1. 1
  2. $ tcpdump port 1234 -i any -v -n

Start listening on port 1234 in terminal 2:

  1. 1
  2. $ nc -l 1234

In terminal 3 connect:

  1. 1
  2. $ nc 127.0.0.1 1234

The following output is obtained in Terminal 1:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. tcpdump: listening on   any , link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
  9. 21:00:50.794424 IP (tos 0x0, ttl 64, id 50542, offset 0, flags [DF], proto TCP (6), length 60)
  10. 127.0.0.1.45848 > 127.0.0.1.1234: Flags [S], cksum 0xfe30 (incorrect -> 0x3163), seq 1310563628, win 43690, options [mss 65495, sackOK, TS val 3721786049 ecr 0,nop,wscale 7], length 0
  11. 21:00:50.794437 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
  12. 127.0.0.1.1234 > 127.0.0.1.45848: Flags [S.], cksum 0xfe30 (incorrect -> 0xef35), seq 1685196050, ack 1310563629, win 43690, options [mss 65495, sackOK, TS val 3721786049 ecr 3721786049,nop,wscale 7], length 0
  13. 21:00:50.794449 IP (tos 0x0, ttl 64, id 50543, offset 0, flags [DF], proto TCP (6), length 52)
  14. 127.0.0.1.45848 > 127.0.0.1.1234: Flags [.], cksum 0xfe28 (incorrect -> 0xc17a), ack 1, win 342, options [nop,nop,TS val 3721786049 ecr 3721786049], length 0

From the packet capture above, we can see that there are three packets in total, namely the SYN sent by the client to the server, the SYN and ACK responded by the server to the client, and the ACK responded by the client to the server.

Connecting to a non-existent port

What will happen if the server port to be connected does not exist? We use the nc command to capture the packet and observe.

In a terminal window, use administrator privileges to execute the following command to capture packets and print relevant information:

  1. 1
  2. $ tcpdump port 1234 -i any -v -n

In another terminal, use the nc command to try to connect to the local port 1234

  1. 1
  2. 2
  3. $ nc 127.0.0.1 1234 -v
  4. nc: connect   to 127.0.0.1 port 1234 (tcp) failed: Connection refused

The TCP packet capture content is as follows:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. tcpdump: listening on   any , link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
  7. 21:06:15.295407 IP (tos 0x0, ttl 64, id 29112, offset 0, flags [DF], proto TCP (6), length 60)
  8. 127.0.0.1.46108 > 127.0.0.1.1234: Flags [S], cksum 0xfe30 (incorrect -> 0x7fef), seq 1175796450, win 43690, options [mss 65495, sackOK, TS val 2076405654 ecr 0,nop,wscale 7], length 0
  9. 21:06:15.295462 IP (tos 0x0, ttl 64, id 58706, offset 0, flags [DF], proto TCP (6), length 40)
  10. 127.0.0.1.1234 > 127.0.0.1.46108: Flags [R.], cksum 0x77e7 (correct), seq 0, ack 1175796451, win 0, length 0

From the captured packet content, we can see that the nc client first sends a SYN (Flags is S) with a seq of 1175796450. Then it receives a RST (Flags is R) with a seq of 1175796451.

That is to say, if you connect to a non-existent port, the system where the server is located will respond with an RST (reset) and terminate the connection directly.

The meaning of the Flags field is as follows:

  • F : FIN - end; end the session
  • S : SYN - Synchronize; indicates a request to start a session
  • R : RST - Reset; terminate a connection
  • P : PUSH - Push; data packet is sent immediately
  • A : ACK - Acknowledge
  • U : URG - Urgent
  • E : ECE - Explicit Congestion Echo
  • W : CWR - Congestion Window Reduction

Connecting to a non-existent server

The same is done using the nc and tcpdump commands.

  1. 1
  2. $ tcpdump port 1234 -i any -v -n

In another window, use the nc command to connect to a server address that does not exist or cannot be connected:

  1. 1
  2. 2
  3. $ nc 121.11.12.31 1234 -v
  4. nc: connect   to 121.11.12.31 port 1234 (tcp) failed: Connection timed out  

The tcpdump output is as follows:

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. tcpdump: listening on   any , link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
  17. 21:13:04.259752 IP (tos 0x0, ttl 64, id 33411, offset 0, flags [DF], proto TCP (6), length 60)
  18. 192.168.0.103.52402 > 121.11.12.31.1234: Flags [S], cksum 0xcdc0 (correct), seq 2648987704, win 29200, options [mss 1460, sackOK, TS val 75888078 ecr 0,nop,wscale 7], length 0
  19. 21:13:05.269438 IP (tos 0x0, ttl 64, id 33412, offset 0, flags [DF], proto TCP (6), length 60)
  20. 192.168.0.103.52402 > 121.11.12.31.1234: Flags [S], cksum 0xc9ce (correct), seq 2648987704, win 29200, options [mss 1460, sackOK, TS val 75889088 ecr 0, nop, wscale 7], length 0
  21. 21:13:07.285415 IP (tos 0x0, ttl 64, id 33413, offset 0, flags [DF], proto TCP (6), length 60)
  22. 192.168.0.103.52402 > 121.11.12.31.1234: Flags [S], cksum 0xc1ee (correct), seq 2648987704, win 29200, options [mss 1460, sackOK, TS val 75891104 ecr 0, nop, wscale 7], length 0
  23. 21:13:11.445491 IP (tos 0x0, ttl 64, id 33414, offset 0, flags [DF], proto TCP (6), length 60)
  24. 192.168.0.103.52402 > 121.11.12.31.1234: Flags [S], cksum 0xb1ae (correct), seq 2648987704, win 29200, options [mss 1460, sackOK, TS val 75895264 ecr 0, nop, wscale 7], length 0
  25. 21:13:19.637403 IP (tos 0x0, ttl 64, id 33415, offset 0, flags [DF], proto TCP (6), length 60)
  26. 192.168.0.103.52402 > 121.11.12.31.1234: Flags [S], cksum 0x91ae (correct), seq 2648987704, win 29200, options [mss 1460, sackOK, TS val 75903456 ecr 0, nop, wscale 7], length 0
  27. 21:13:35.765417 IP (tos 0x0, ttl 64, id 33416, offset 0, flags [DF], proto TCP (6), length 60)
  28. 192.168.0.103.52402 > 121.11.12.31.1234: Flags [S], cksum 0x52ae (correct), seq 2648987704, win 29200, options [mss 1460, sackOK, TS val 75919584 ecr 0, nop, wscale 7], length 0
  29. 21:14:09.045497 IP (tos 0x0, ttl 64, id 33417, offset 0, flags [DF], proto TCP (6), length 60)
  30. 192.168.0.103.52402 > 121.11.12.31.1234: Flags [S], cksum 0xd0ad (correct), seq 2648987704, win 29200, options [mss 1460, sackOK, TS val 75952864 ecr 0, nop, wscale 7], length 0

Through actual operation, it can be found that when there is no response to the first SYN sent, the client will send it again; if there is still no response, it will continue to send SYN after a longer period of time, and finally the connection will time out. From the observation, it is observed that the default retransmission is 5 times, and the retry intervals are 1s, 2s, 4s, 8s, and 16s respectively.

How does the initial sequence number change?

From the previous two packet captures, we can see that the initial sequence number seq of the first SYN request is not fixed. In fact, different systems may have different generation methods, but we know that the generated seq value must be different within a certain period of time, otherwise the server cannot distinguish whether it is the retransmission of the same seq or the message has been stranded in the network for a period of time and then arrived again. RFC 793 points out that the initial sequence number can be regarded as a 32-bit counter, which increases by 1 every 4ms (but the actual implementation of different systems may be different, and it will be processed into a random value for safety reasons). Therefore, when it returns to the beginning, enough time has passed, so that the delayed message in the network has long disappeared.

Semi-connected queue

After the server receives the client's connection request and sends an ACK, the server is in the SYN_RECV state. The connection at this time becomes a semi-connection, and the server will put the semi-connection in a place called the semi-connection queue.

SYN Attack

For this reason, if someone maliciously sends a large number of SYN packets to the server, and because the client IP is forged, the server cannot receive the ACK and keeps resending the ACK, so that the semi-connection queue is easily filled up, resulting in the inability to process normal connection requests and possibly causing server resource exhaustion.

How to deal with SYN attacks is another topic.

Summarize

It is easy for us to describe the normal scenario of TCP three-way handshake, but we may not be so familiar with more details and abnormal scenarios. Through this article, we can simply understand the establishment of TCP connection and lay the foundation for the subsequent network programming. However, it should be noted that this article only briefly introduces the establishment of TCP connection and does not introduce it in depth.

<<:  China Unicom successfully led a new project on shared network requirements and architecture based on blockchain in ITU-T

>>:  6 Examples of How 5G Can Improve IoT Deployments

Recommend

What did Chinese operators show the world at the Winter Olympics?

This Winter Olympics is full of technological con...

5G - the future network technology for all applications

As 5G is being promoted and deployed around the w...

Worth learning! 10 good habits of network administrators

【51CTO.com Quick Translation】I have been a comic ...

Five-minute technical talk | AI technology and the governance of "cyber violence"

Part 01 What is “cyberbullying”? "Cyber ​​vi...

Five communication methods between processes required for interviews

Inter-Process Communication (IPC) refers to the t...

Hostmem: $11.99/year KVM-512MB/10GB/500GB/Los Angeles data center

Hostmem is a Chinese VPS service provider. The tr...

Node.js knowledge - How to set cookie information in HTTP request and response

[[398674]] HTTP Cookie[1] is a small piece of dat...

Shandong issues six standards for e-government cloud platform construction

Recently, Shandong issued six standards in the fi...

TCP/IP protocol family architecture--network communication

Computers and network devices need to follow the ...