TCP/IP Appetizer: HTTP

TCP/IP Appetizer: HTTP

  [[381273]]

This article is reprinted from the WeChat public account "sowhat1412", author sowhat1412. Please contact the public account sowhat1412 to reprint this article.

1 TCP/IP

1.1 TCP/IP Definition

The TCP/IP protocol suite is a collection of protocols, also known as the Internet protocol suite. Computers can only communicate if they follow these rules. TCP and IP are just two important protocols, so TCP/IP is used to name this Internet protocol suite. In fact, it roughly includes four layers of protocols.

1.2 TCP/IP Functionality

As mentioned above, TCP/IP is divided into four layers at a macro level. Next, let’s talk about the specific functions of the four layers.

Application layer

The application layer directly provides users with different network service protocols, such as HTTP, Email, FTP, etc. These protocols are generated to solve different needs in real life. Users also operate and assemble data at this layer most of the time, which is socket programming! As for how the specific data is transmitted over the network, it is the responsibility of the following three layers.

Transport layer

The transport layer provides communication services for the application layer. It is the highest layer facing the communication part and the lowest layer in the user function. The transport layer provides logical communication for application processes that communicate with each other. It mainly includes TCP protocol and UDP protocol.

TCP provides connection-oriented data stream support, reliability, flow control, multiplexing and other services.

UDP does not provide complex control mechanisms.

The role of the transport layer:

  1. Segment and encapsulate data sent from the application layer.
  2. Provide end-to-end transmission services.
  3. Establishes logical communication between the sending host and the receiving host.

1.2.3. Network layer

The function of the network layer is to realize the routing and forwarding of data packets. Wide area networks usually use many hierarchical routers to connect scattered hosts or local area networks. Therefore, two communicating hosts are generally connected through multiple intermediate node routers. The task of the network layer is to select these intermediate nodes to determine the communication path between the two hosts. At the same time, the details of the network topology connection are hidden from the upper layer protocol, so that the two communicating parties are directly connected in the eyes of the transport layer and network applications.

The IP protocol is at this layer, providing routing and addressing functions, enabling two end systems to interconnect and determine the best path, and has certain congestion control and flow control capabilities.

1.2.4. Link Layer

The data link layer implements the network driver of the network card interface to handle the transmission of data on the physical medium. Two commonly used protocols in the data link layer are the ARP protocol (Address Resolve Protocol) and the RARP protocol (Reverse Address Resolve Protocol). They realize the mutual conversion between IP addresses and machine physical MAC addresses.

1.2.5 Data Transmission

  1. When using the TCP/IP protocol family for network communication, communication is carried out with the other party in a layered order. The sender goes down from the application layer, and the receiver goes up from the link layer.
  2. When the sender transmits data between layers, each time it passes through a layer, it will be marked with a header information belonging to that layer. Conversely, when the receiver transmits data between layers, each time it passes through a layer, it will delete the corresponding header.
  3. This practice of packaging data information is called encapsulation.

However, it should be noted that the IP layer has a Maximum Transmission Unit (MTU) limit. Similarly, the TCP layer has a Maximum Segment Size (MSS) limit during a data transmission.

The MTU of Ethernet is 1500, the basic IP header length is 20, and the TCP header is 20, so the maximum MSS value can reach 1460 (MSS does not include the protocol header, only the application data).

Therefore, a large application layer message may be divided into several blocks and transmitted one by one. The receiver receives the application layer data of each packet and assembles it into application layer data, and then a request is considered received. This is the significance of the Content-Length field.

Data packet sending

1.3 OSI and TCP/IP

OSI

OSI, also known as the Open Systems Interconnection Communication Reference Model, is a conceptual model proposed by the International Organization for Standardization. It is a standard framework that attempts to interconnect various computers into a network worldwide. It focuses on what are the necessary functions of the communication protocol.

TCP/IP

The real network transmission communication protocol in real life, focusing on what kind of program should be developed to implement the protocol on the computer.

Differences between OSI and TCP/IP

  1. OSI introduced the concepts of service, interface, protocol, and layering. TCP/IP borrowed these concepts from OSI to establish the TCP/IP model.
  2. OSI first has a model and then a protocol, first has a standard and then practice.
  3. TCP/IP first had protocols and applications and then proposed a model, which was based on the OSI model.
  4. OSI is a theoretical model, while TCP/IP has been widely used and has become the de facto standard for network interconnection.

After introducing the macroscopic TCP/IP protocol suite, let us now enter the world of the network from top to bottom.

2 Application Layer HTTP

2.1 A brief introduction to HTTP

2.1.1 HTTP Definition

HyperText Transfer Protocol, also known as Hypertext Transfer Protocol. HTTP is the agreement and specification for transmitting hypertext data such as text, pictures, audio and video between any two points in the computer world.

HTTP

2.1.2 URI, URN, URL

URI: Uniform Resource Identifier, which represents every available resource on the web. URI is just a concept. It doesn’t matter how it is implemented. The key is to identify a resource.

URN: Universal Resource Name, which identifies a resource by a unique name or ID in a specific namespace.

URL: Universal Resource Locator, URL is actually a subset of URI. It not only identifies a resource but also tells you how to access it. A standard URL must include: protocol, host, port, and path.

URL Templates

  1. protocol: What protocol is used by both parties to communicate, HTTP, ftp, file, etc.
  2. IP: The real IP address of the server.
  3. Port: The port where the service resource is exposed on the IP machine.
  4. path: The storage path of the resource on the server, usually a file or access directory.
  5. query: optional configuration, separated by &, parameters are stored in KV format.

An example of the relationship between the three:

  1. You want to find a person, where person is a resource URI.
  2. If you use ID number + name to search, it is URN. ID number + name only identifies the person resource, but cannot confirm the address of the resource.
  3. If the address is: Resident of Room XX, Unit XX, District XX, City XX, Province, then it is a URL, which not only identifies the resource as a person, but also locates its address.

2.2 HTTP Message Format

Both request and response messages consist of four parts: start line, header, blank line, and entity, but the start line is slightly different.

2.2.1 Request

Request message format

2.2.1.1 Request Line

The request line consists of three parts: request method, URL, and protocol version. They are separated by spaces, and the request line ends with a carriage return + a line feed.

Request method: Indicates what operation you want to perform on the target resource. HTTP1.1 defines 8 request methods listed in the following table, of which GET and POST are the most commonly used.

URL: specifies the target address for this visit.

Protocol version: specifies the HTTP version currently supported by the client. Currently, the commonly used HTTP versions are 1.1, 2.0, and 3.0. If the requester specifies 1.1, the responder will also use HTTP 1.1 protocol to reply after receiving it.

2.2.1.2 Request Header

The request header is used to inform the server of some additional information about the request and the client itself. Each request header is a key-value pair, with the key and value separated by a colon. Each request header forms a separate line, and they end with a carriage return and a line feed. Among all the request headers, only Host is required, and other request headers are optional. Here are some common request headers:

2.2.1.3 Blank lines

It contains only a carriage return and a line feed, and nothing else. This blank line is used to mark the end of the request header, and it is required.

2.2.1.4 Request Body

Generally, it is a user-defined information body, and the type can be specified through Content-Type in the message header.

2.2.1.5 Request Example

Request Sample

2.2.2 Response

Response message format

2.2.2.1 Response Line

Specify the HTTP version, response status code, and simple reason corresponding to the returned information.

2.2.2.2 Response Header

As for the blank line and message body, they are almost the same as the request, and the message body type is specified by Content-Type.

2.2.2.4 Response Example

Sample response

2.3 HTTP Header Fields

The HTTP protocol specifies a large number of header fields that can implement a variety of functions, but they can basically be divided into the following four categories:

  1. Common fields: can appear in both request headers and response headers.
  2. Request field: can only appear in the request header to further describe the request information or additional conditions.
  3. Response field: can only appear in the response header, and supplements the information of the response message.
  4. Entity fields: They are actually common fields, but they specifically describe the extra information of the body.

By setting HTTP header fields, HTTP provides the following important functions:

  1. Content negotiation: The client and the server agree on the content of the response resource, such as language, character set, encoding method, and compression type.
  2. Cache management: Based on resource characteristics, you can decide whether to cache resources to the client. Pay attention to the differences between max-age, no-cache, no-store, and must-revalidate.
  3. Entity type: Get the MIME type of the request and response by parsing Content-Type.
  4. Connection management: long and short connections are realized by reading configuration parameters.

2.4 HTTPS and HTTP

HTTP is transmitted in plain text, which poses the following risks:

  1. Eavesdropping risk: Information confidentiality, such as the content of communications being accessible over the communication link.
  2. Risk of tampering: Information integrity, such as forced entry of spam ads.
  3. Impersonation risk: identity recognition, such as a generic website impersonating a shopping website such as Taobao.

2.4.1 SSL/TLS Overview

SSL/TLS

In order to ensure security, HTTPS came into being. HTTPS adds SSL/TLS encryption protocol between HTTP and TCP layers to solve the above three problems.

  1. The confidentiality of information is achieved through hybrid encryption.
  2. Integrity is achieved by means of a digest algorithm, which generates a unique serial number for the data.
  3. Putting the server public key into the digital certificate eliminates the risk of impersonation.

Please note that the default port for HTTP is 80, while the default port for HTTPS is 443.

2.4.2 Encryption Algorithm

Encryption algorithms are divided into symmetric encryption and asymmetric encryption.

  1. Symmetric encryption: Encryption and decryption use one key, the operation speed is fast, the key must be kept confidential, and secure key exchange is impossible. Common encryption algorithms include AES, DES, RC4, BlowFish, etc.
  2. Asymmetric encryption: uses two keys, public key and private key. The public key can be distributed arbitrarily while the private key is kept confidential. This solves the key exchange problem but is slow. The derivation process from private key to public key is one-way, which can ensure the security of the private key. Common encryption algorithms include RSA, DSA, Diffie-Hellman, etc.

HTTPS uses symmetric encryption + asymmetric encryption = hybrid encryption:

  1. Asymmetric encryption is used to exchange keys before communication is established, and asymmetric encryption is no longer used afterwards.
  2. During the communication process, all plaintext data is encrypted using symmetric encryption session keys.

2.4.3 Digest Algorithm

The main feature of the digest algorithm is that the encryption process does not require a key, and the encrypted data cannot be decrypted. Currently, the only algorithm that can be decrypted and reversed is the CRC32 algorithm. Only by inputting the same plaintext data and passing it through the same message digest algorithm can the same ciphertext be obtained.

Message digest algorithms are mainly used in the field of digital signatures as digest algorithms for plaintext. Famous digest algorithms include RSA's MD5 algorithm and SHA-1 algorithm and their numerous variants.

Verify integrity

  1. The client generates a digest for the plaintext data using the specified digest algorithm.
  2. Plaintext data + digest algorithm are transmitted after being encrypted with the public key.
  3. After receiving the information, the server uses the private key to decrypt the information and obtain the plaintext + summary.
  4. The server generates a digest for the plaintext using the same digest algorithm.
  5. Compare the two summaries generated by the client and the server to see if they are the same, in order to detect whether the data is complete.

2.4.4 CA Certificate

In asymmetric encryption, the client saves the public key. How to ensure the accuracy of the public key is a difficult problem. If someone steals the server's public key to do something, the client and the server will not be able to perceive the existence of the third party during the entire data transmission process, but the information has already been leaked!

Asymmetric encryption information leakage

The key to the problem is how to ensure that the client receives the server's public key! At this time, the digital certificate appears. It is based on the private key mentioned above to encrypt data and the public key to decrypt to verify its identity.

The CA ensures that the public key is transmitted correctly

  1. CA is an authoritative certificate issuing agency. There are only a few companies in the world that are relatively authoritative. The agency uses RSA to generate a pair of public and private keys.
  2. Server public key content + issuer ID + subject to whom the certificate is issued + validity period + other information = plain text content P
  3. The plain text content P is converted into H1 through the Hash algorithm, and H1 is encrypted with the CA's private key using RSA to obtain S.
  4. P + S = Digital Certificate.
  5. After the client obtains the digital certificate, it uses the same hash algorithm to hash P to obtain H2.
  6. We decrypt S with the CA public key and get H3.
  7. Compare H2 and H3 to see if they are the same. If they are the same, it means the certificate is OK. If they are different, it means P has been modified or the certificate is not issued by CA.
  8. You can also get the server public key correctly, done!

2.4.5 SSL/TLS establishment process

First, perform a three-way handshake of TCP, and then prepare for encrypted communication. Before starting encrypted communication, the client and server must first establish a connection and exchange parameters. This process is called a handshake, which is the SSL/TLS module mentioned earlier. So what is its main workflow? You can think of it as ClientHello, ServerHello, and Finish.

SSL/TLS establishment process

  • Client Request

The client initiates an encrypted communication request to the server: the client provides the SSL/TLS protocol version number + a random number Random1 generated by the client + the encryption method supported by the client.

  • Server Request

The server confirms whether the SSL/TLS version is supported, confirms the encryption algorithm used, generates a random number Random2 (used to generate a session key), and generates a server digital certificate.

  • Client Certificate Authentication
  1. The client confirms the authenticity of the server digital certificate through the CA public key and retrieves the server public key.
  2. The client generates a random number Random3, encrypts it with the server's public key to generate a PreMaster Key, sends it to the server, and then sends an agreed encryption algorithm.
  3. The server uses the private key to decrypt PreMaster Key to get Random3. At this point, the server and the client use the same encryption algorithm to encrypt Random1 + Random2 + Random3 = Session Key, and this will be used for encrypted communication in the future.
  4. The client generates a summary of the previous handshake message and then encrypts it with the negotiated key. This is the first encrypted message sent by the client. After receiving it, the server will decrypt it with the key. If it can be decrypted, it means that the previously negotiated keys are consistent.
  • Server last response
  1. The server receives Random3 + final encryption algorithm and finally determines the session key Session Key.
  2. The server informs the client of the encryption algorithm change, and will use the Session Key to encrypt information later.
  3. The server will also generate a summary of the message in the handshake process and encrypt it with the secret key. This is the first encrypted message sent by the server. After receiving it, the client will decrypt it with the secret key. If it can be decrypted, it means that the negotiated secret key is consistent.
  • Send data normally

At this point, both parties have securely negotiated the same secret key, and the SSL/TLS handshake phase is complete. All application layer data will be encrypted with this secret key and then reliably transmitted over TCP.

2.4 HTTP Development History

Currently, HTTP versions are divided into three versions: HTTP/1.1, HTTP/2, and HTTP/3, and the first two are the mainstream ones.

HTTP Version Comparison

2.4.1 HTTP/1.1

HTTP/1.1 has the following advantages and disadvantages compared to the old version:

advantage:

  1. TCP began to use long connections instead of short connections to avoid unnecessary performance overhead.
  2. For example, when sending ABC, there is no need to wait for A to be sent before sending B.

shortcoming:

  1. The request/response header is sent without compression, only the Body part can be compressed.
  2. Redundant configuration information is sent back and forth.
  3. It can cause head obstruction.
  4. FIFO mode, no priority concept.
  5. Only the client can request and the server can respond.

2.4.1 HTTP/2

The HTTP/2 protocol is based on HTTPS and is backward compatible with the following optimizations.

  1. Header compression: HPACK algorithm is introduced. A header information table is maintained on both the client and the server. All fields are stored in this table. The header information is repeated back and forth, and the original value is no longer sent. Instead, the index number is sent directly.
  2. Binary transmission: The new version uses a more computer-friendly binary mode for transmission, and data is transmitted in frames.
  3. Stream priority transmission: Different request and response data packets are distinguished by stream, and each stream has an independent number. And the priority can also be specified.
  4. Multiplexing: Multiple streams in a connection can send and receive request-response data frames at the same time. The data packets in each stream are transmitted and assembled in sequence. Each stream is independent, so whoever processes the request first can send the response to the other party through the connection first.
  5. Server push: The server will actively push static variables such as JS and CSS that may be used.

shortcoming:

Blockage problem: HTTP/2's frame transmission is performed at the application layer, and the final data must be transmitted through TCP, which is a reliable connection with a packet loss retransmission function. If a packet is lost, all HTTP requests will wait for the lost packet to be retransmitted.

2.4.1 HTTP/3

HTTP/3 changes the TCP protocol to UDP, because UDP does not care about the order or packet loss. At the same time, Google also adds TCP connection management, congestion window, flow control and other mechanisms on the basis of UDP. We call this protocol the QUIC protocol. In general, the optimization points of HTTP/3 are as follows:

  1. QUIC has a unique mechanism to ensure the reliability of transmission. When a stream is lost, only this stream will be blocked, and other streams will not be affected.
  2. The TLS algorithm was also upgraded from 1.2 to 1.3, and the header compression algorithm was upgraded to QPack.
  3. Before HTTP/3, communication required three TCP handshakes and three TLS encryption interactions. The QUIC underlying layer combines the six steps into three.
  4. QUIC is a multiplexed protocol of TCP + TLS + HTTP/2 on top of UDP.

2.5 HTTP Features

  • Flexible expansion

The great thing about HTTP is that it only specifies the basic framework of header + body, and users can customize what is filled in it. At the same time, its underlying components are all pluggable, such as the addition of SSL/TLS, binary frame transmission, UDP replacing TCP, etc.

  • Reliable transmission

Both TCP and QUIC ensure the reliability of data transmission.

  • Request-Reply Pattern

HTTP implements data transmission based on a request-response model.

  • Stateless

Each HTTP request-response is stateless, so each message sent and received is completely independent. If you want to implement some chain reactions, you need to use the Session and Cookie mechanism.

  • Application layer protocols

HTTP is just a transmission protocol specified at the application layer. Its underlying layer uses the TCP protocol to transmit data.

2.6 Common HTTP Status Codes

There are five common types of HTTP status codes.

3 Appendix

Only the application layer and transport layer of the TCP/IP protocol were briefly explained. The network layer will be discussed in the next article for a more detailed version of the TCP/IP protocol.

TCP/IP

4 References

SSL/TLS: https://www.bilibili.com/read/cv1003133

HTTP 10,000-word handout: https://t.1yb.co/gcKW

Xiaolin Network Special: https://t.1yb.co/fQG3

HTTP status code: http://tools.jb51.net/table/http_status_code

TCP/IP explanation: https://developer..com/art/201906/597961.htm

<<:  The number of terminal connections has exceeded 200 million, and 5G commercial use still needs to break through

>>:  5G New Year's Guide

Recommend

HTTPS protocols: TLS, SSL, SNI, ALPN, NPN

HTTPS is now widely used. While it brings securit...

What network automation certification options are available today?

Networks are increasingly reliant on software and...

The latest version of WeChat has been updated to fix these problems

According to the normal update rhythm, WeChat iOS...

Zigbee, BLE and Bluetooth Mesh, how to choose the best solution?

In the world of IoT, wireless communication techn...

After 5G, four wireless technologies worth paying attention to

5G NR is a complex of contradictions. It is diffi...