Understanding HTTP and TCP protocols from an HTTP request

Understanding HTTP and TCP protocols from an HTTP request

[[347384]]

From an HTTP request to see the principle of network layering

There are many network devices between two hosts. Data loss may occur in any network device. If data loss occurs, data retransmission will occur, and data duplication will occur (the previously lost packet is not lost but delayed). The media for data transmission may also be diverse. For example, the transmission in the intranet is carried out through network cables, and the connection to the public network is carried out through optical fibers. Therefore, it is necessary to realize the conversion of signals between different media, as well as the conversion of wireless pulses from optical fibers to routers. If the distance is far, there will be signal attenuation problems. Therefore, there are many problems to be solved in the process of network transmission. The problems are grouped and layered, and different problems are solved at different levels. Standardized interfaces are defined between different levels so that data can be communicated between them.

Complex Network

In order to simplify the complexity of the network, different aspects of network communication are broken down into a multi-level structure. Each layer only interacts with the layer immediately above or below it. The network is layered so that the software of a certain layer can be modified or even replaced. As long as the interface between the layers remains unchanged, it will not affect other layers.

  • OSI (Open System Interconnection Reference Model): Open System Interconnection Reference Model
  • TCP/IP protocol suite

OSI seven-layer theoretical architecture

  1. Physical layer: solves the communication problem between two hosts - A sends a bit stream (0101) to B, and B can receive these bit streams. It defines the standards of physical devices such as the type of network cable, the interface type of optical fiber, and the transmission rate of the transmission medium.
  2. Data link layer: Since the bit stream transmitted on the physical layer may be mistransmitted or erroneous, the data link layer defines how to format the data, that is, encapsulate the bit stream into frames, and provides error detection.
  3. Network layer: With the increase of nodes, point-to-point communication needs to go through multiple nodes. How to find the target node and how to find the optimal path become the primary needs. Therefore, the network layer appears. Its main purpose is to translate the network address into the corresponding physical address, packet transmission, and routing selection. The transmission unit of this layer is datagram (packet). The TCP protocol in the TCP/IP protocol that needs to be paid attention to in this layer.
  4. Transport layer: As network needs continue to expand, a large amount of data needs to be transmitted during communication, and the network may be interrupted. In order to ensure the accuracy of transmitting a large number of files, the data to be sent needs to be segmented and sent one by one. Consider how to splice the segmented segments into complete data at the receiving end, and how to deal with the loss of segments. The protocols TCP and UDP need to be paid attention to.
  5. Session layer: establishes and manages sessions between users on different machines. It is used to ensure that applications can automatically send and receive packets and address them.
  6. Presentation layer: semantic syntax of information, encryption and decryption, conversion and translation, compression and decompression.
  7. Application layer: It stipulates that both parties must use a fixed-length message header, and the message header must record information such as the message length. It should be noted that the HTTP protocol in the TCP/IP protocol.

TCP/IP four-layer model

It is an implementation of OSI, including application layer, transport layer, internet layer and network interface layer.

A hierarchical parsing process of an HTTP request

As shown in the right figure above, a server deploys a static page, which is deployed on the public Internet through nginx. The browser accesses it through the domain name. How does it work after the browser enters the domain name and presses Enter?

  1. http://www.dumain.com

The server only recognizes the IP address, and the browser resolves the domain name and checks whether there is a DNS cache corresponding to the domain name in the browser. If so, it directly obtains the server's IP address. If not, it checks the local host file to see if it is configured. If not, it will initiate a DNS request to obtain the server's IP address.

DNS is also a server with its own IP address. At this time, the application layer will construct a DNS request message. The application layer will call the transport layer's interface, a socket API. DNS uses UDP by default to implement data transmission, that is, the application layer calls the transport layer's API. The transport layer will add a UDP request header to the DNS request message. The transport layer will hand over the data to the network layer. The network layer will also add an IP request header to the UDP request message. The network layer will hand over the IP request message to the data link layer. The data link layer will add its own MAC header and hand over the corresponding request message to the next machine. The MAC address of the next machine will also be added. The MAC address of the next machine is found through the network layer ARP protocol. ARP will send some requests to see what the MAC address of your corresponding IP address is, and finally it will be transmitted through the physical layer physical medium, usually to the router.

The router is a three-layer device (from bottom to top) and starts connecting from the physical layer. The physical layer is handed over to the data link layer. The data link layer checks to see if the address is for me. If it is, it will be resolved. If not, it will be discarded. The message is then passed to the upper network layer. The network layer will pass the data to the address of the next router, which will be transmitted to the operator's router through the operator's network interface. The operator has its own DNS server. If the operator's own DNS server is configured, it will directly look up its corresponding domain name in this DNS server to get the corresponding IP address, that is, the address of the DNS message just requested, and then return to the original path for resolution until the application layer gets the IP address corresponding to the domain name just requested. In this way, the HTTP request message can be sent, and then the transport layer protocol is called with TCP parameters, and a header is added at each layer.

HTTP

What is HTTP?

Hypertext Transfer Protocol is a stateless, application layer protocol based on request and response. It is often based on TCP/IP protocol to transmit data. It is the most widely used network protocol on the Internet. All WWW files must comply with this standard. The original intention of designing HTTP was to provide a method for publishing and receiving HTML pages.

HTTP Features

  1. Stateless: The protocol has no state storage for the client and no "memory" ability for transaction processing. For example, accessing a website requires repeated login operations.
  2. Connectionless: Before HTTP/1.1, due to its stateless nature, each request required four handshakes through the TCP three-way handshake to reestablish a connection with the server. For example, if a client requests the same resource multiple times in a short period of time, the server cannot distinguish whether it has already responded to the user's request, so it needs to respond to the request again each time, which consumes unnecessary time and traffic.
  3. Request and response based: The basic feature is that the client initiates the request and the server responds.
  4. Simple, fast and flexible.
  5. The communication uses plain text, and the requests and responses do not confirm the communicating parties, and the integrity of the data cannot be protected.

The HTTP protocol version has evolved to version 3.0. For more information about the protocol version, see Quickly Understand the Features and Differences of HTTP 1.0 1.1 2.0 3.0

HTTP message format

The structure of HTTP request and response messages is basically the same, consisting of three parts:

  • Start line: describes the basic information of the request or response
  • Header field set (header): Use key-value format to describe the message in more detail
  • Message body (entity): The actual data transmitted, which is not necessarily plain text, can be binary data such as pictures and videos

The fields in the start line and the header are collectively called the request header or response header, and the message body is also called the entity, called the body. The HTTP protocol stipulates that each message sent must have a header, but there can be no body, that is, the header information is required, and the entity information can be omitted. There must be a blank line (CRLF) between the header and the body.

Request Line Message Format

  • Request method: such as GET/HEAD/PUT/POST, indicating the operation on the resource;
  • Request target: usually a URI that marks the resource that the request method is to operate on;
  • Version number: Indicates the HTTP protocol version used in the message.

Response message format

  • Version number: indicates the HTTP protocol version used by the message;
  • Status code: a three-digit number that indicates the result of the processing in the form of a code, such as 200 for success and 500 for server error;
  • Reason: As a supplement to the digital status code, it is a more detailed explanation text to help people understand the reason.

Comparison of request and response message formats

HTTP Header Fields

The header field is in the form of key-value, with ":" separating the key and value, and CRLF used to wrap the text.

The segment ends. For example, when separating the front and back ends, you often need to negotiate with the back end about the type of data to be transmitted, "Content-type: application/json". Here, the key is "Content-type" and the value is "application/json". The HTTP header field is very flexible. Not only can you use the existing headers such as Host and Connection in the standard, you can also add custom headers at will, which brings unlimited expansion possibilities to the HTTP protocol.

Header Field Notes

  • Field names are not case sensitive. Spaces are not allowed in field names. Hyphens "-" can be used, but not
  • You can use underscores "_" (some servers will not parse header fields with "_"). The field name must be followed by ":" without spaces, but there can be multiple spaces before the field value after ":";
  • The order of the fields is meaningless and can be arranged arbitrarily without affecting the semantics;
  • In principle, fields cannot be repeated unless the semantics of the field itself allows it, such as Set-Cookie.

There are many header fields in the HTTP protocol, but they can basically be divided into four categories: general headers, entity headers, request headers, and response headers.

For more information about HTTP header fields, please refer to "In-depth understanding of the basic concepts of the four HTTP headers"

TCP

TCP (Transmission Control Protocol): A connection-oriented, reliable, byte-stream-based transport layer communication protocol. It helps you determine whether a computer is connected to the Internet and whether data is transmitted between them. A TCP connection is established through a three-way handshake, which is the process used to start and confirm a TCP connection. Once the connection is established, data can be sent. When the data transmission is completed, the connection is disconnected by closing the virtual circuit.

TCP Features

  • Connection-based: A connection needs to be established before data transmission
  • Full-duplex: bidirectional transmission
  • Byte stream: no limit on data size, packed into message segments, guaranteed orderly reception, duplicate messages automatically discarded
  • Traffic buffering: solving the mismatch between the processing capabilities of both parties
  • Reliable transmission service: ensure reachability and achieve reliability through retransmission mechanism when packets are lost
  • Congestion control: Preventing network from experiencing severe congestion

TCP message format

  • 16-bit source port/16-bit destination port: responsible for data transmission between applications
  • 32-bit sequence number/32-bit confirmation number: used to implement TCP packet sequence management at the transport layer - TCP delivers data in an orderly manner
  • 4-bit header length: in units of 4 bytes; the maximum number stored in 4 bits is 15; therefore, the maximum length of the TCP header is 15*4=60 bytes
  • 6 reserved bits;
  • 6-bit flag:
  • URG - Emergency Pointer Sign
  • ACK——Confirmation reply flag
  • PSH - prompt immediate acceptance
  • RST - Reset connection bit
  • SYN - connection establishment request bit
  • FIN - Disconnect request bit
  • 16-bit window size: sliding window mechanism –> flow control –> tell the other end the maximum amount of data that can be sent
  • Checksum: Binary one’s complement sum –> Check data consistency
  • Urgent pointer: indicates which data is urgent data
  • Option data: Data for negotiating the MSS size during the three-way handshake

TCP connection: 4-tuple [source address, source port, destination address, destination port]

TCP three-way handshake

Initial sequence number (ISN) between two parties in synchronous communication

Negotiate TCP communication parameters (MSS, window information, specify checksum algorithm)

Before understanding the specific process, let's first understand a few concepts

Initially, the TCP processes at both ends are in the CLOSED state, A actively opens the connection, and B passively opens the connection.

A and B are in the closed state CLOSED — B is in the listening state LISTEN — A is in the synchronized sent state SYN-SENT — B is in the synchronized received state SYN-RCVD — A and B are in the established state ESTABLISHED

B's TCP server process first creates a transmission control block TCB, ready to accept the client process's connection request. Then the server process is in the LISTEN state, waiting for the client's connection request. If there is one, it responds.

  • SYN: Its full name is Synchronize Sequence Numbers. It is the handshake signal used when TCP/IP establishes a connection. It is the first signal sent when a TCP connection is established between a client and a server. When the client sends a SYN message, it generates a random value X in its own segment.
  • SYN-ACK: After receiving SYN, the server responds to the client connection and sends a SYN-ACK as a reply. The confirmation number is set to one more than the received sequence number, that is, X + 1. The sequence number selected by the server for the data packet is another random number Y.
  • ACK: Acknowledge character, confirmation character, indicating that the data sent has been confirmed to be received correctly. Finally, the client sends ACK to the server. The sequence number is set to the received confirmation value, that is, Y+1.

Let's take a look at how the three-way handshake works through an example

Deploy a static page on the Nginx server (my port is: 8000)

tcpdump specifies the network card to monitor and capture messages

  1. tcpdump -i en0 -S -c 3 port 8000

Use the nc network tool on the client to send a request

  1. nc 192.168.109.200 8000

The three-way handshake monitoring results are as follows:

Some of the things the kernel does in the three-way handshake are as follows:

Connection status check

  1. netstat -tpn # t: TCP connection installation, p: process display, n: digital form
  2.  
  3. # Check once per second

netstat -tpn -c 1

TCP four times wave

  • A: Sends a FIN packet, indicating that A will no longer send data.
  • B: receives the request and starts to respond, avoiding A from resending FIN (response mechanism)
  • B: After processing the data, close the connection and send a FIN request
  • A: After receiving the request, it sends an ACK response, and service B can release the connection

Release the connection after waiting for 2MSL

  1. Prevent message loss, causing B to send FIN repeatedly
  2. Prevent messages from being stuck in the network and disrupting data on newly established connections

Byte stream protocol

TCP regards the data delivered by the application as a series of unstructured byte streams. TCP does not know the meaning of the byte stream. TCP does not care how large a message the application sends to the TCP cache at one time. Instead, it decides how many bytes a message segment should contain based on the window value given by the other party and the current degree of network congestion.

  1. MSS: Max Segment Size , default 536byte actual data

The following situations may occur during network transmission:

  • If the client does not receive an ack message for a period of time, it will retransmit
  • If the buffer is full, packets may be lost or delayed and need to be retransmitted.
  • Reorder the packets according to the sequence number field and discard duplicate packets.

Reliability of data transmission

The stop-and-wait protocol is as follows:

Stop and wait protocol, low efficiency

The retransmission mechanism is as follows:

ack message lost

Request message lost

Sliding Window Protocol and Cumulative Confirmation (Delayed ACK)

The above efficiency is low, so TCP proposed a new protocol - sliding window protocol and cumulative confirmation (delayed ack).

The sliding window size is negotiated with the peer through the TCP three-way handshake and is affected by network conditions.

The above are messages one by one, but a batch of messages can be sent actually. The server does not confirm them one by one, and sending an ACK is a waste of resources. When responding to a message alone, a TCP message itself is at least 20 bytes plus an IP header message of 20 bytes, so an ACK is at least 40 bytes.

Therefore, the sending of delayed ack can be done by confirming the last message such as 5 as shown in the figure below. However, this also has a problem. For example, if message 3 is lost, only messages 1 and 2 can be confirmed. All messages after 3 need to be retransmitted, and the confirmed messages are discarded in the buffer.

<<:  The Ministry of Industry and Information Technology reminds you to set the SIM card password in time. Doing these four things after losing your phone is more important than calling the police.

>>:  5G sets new standards for IoT connectivity in vertical industries

Recommend

What does the TTL value returned by the Ping command mean and what does it do?

The ping command is used to test the connection t...

2021: Connectivity disruptors

From 5G to Wi-Fi 6, connectivity is opening up ne...

5G is coming: 3 ways it will benefit your business

5G is still on the way, and telecom operators are...

SD-WAN first or security first?

[[419685]] The right secure access service edge (...

After a year, Wi-Fi 6 has become standard. Here is everything you want to know

Back in September 2019, Apple officially released...

What is Fiber to the Home (FTTH)?

Fiber to the home (FTTH) is the transmission of c...

Beyond 5G: The next generation of wireless technology is coming

The transition to 5G is still underway, but talk ...

5G base stations, intelligent energy storage system is the key

Large-scale 5G construction has begun. As the bas...

The invisible shift of HCI

Arthur C. Clarke, a famous British science fictio...