During the interview process, HTTP is often asked. I have collected five categories of HTTP interview questions that are often asked. These five categories of questions are closely related to the development and evolution of HTTP.
Image via Pexels Below I will help you further learn and understand the HTTP protocol through questions and answers + diagrams from the shallower to the deeper:
HTTP Basic Concepts What is HTTP? To describe, HTTP is HyperText Transfer Protocol. Can you explain the Hypertext Transfer Protocol in detail? HTTP stands for Hypertext Transfer Protocol, and it can be broken down into three parts:
protocol In life, we can also see "protocols" everywhere, for example:
The protocols in life are essentially the same as those in computers. The characteristics of the protocols are:
Regarding the HTTP protocol, we can understand it this way: HTTP is a protocol used in the computer world. It uses a language that computers can understand to establish a standard for communication between computers (more than two participants), as well as various related control and error handling methods (behavioral conventions and specifications). transmission The so-called "transmission" is easy to understand. It means moving a bunch of things from point A to point B, or from point B to point A. Don't underestimate this simple action, it contains at least two important pieces of information. The HTTP protocol is a two-way protocol. When we surf the Internet, the browser is the requester A, and the Baidu website is the responder B. The two parties agree to use the HTTP protocol to communicate, so the browser sends the request data to the website, and the website returns some data to the browser, and finally the browser renders it on the screen, so that you can see the pictures and videos. Although data is transmitted between A and B, it is allowed to be transferred or relayed in the middle. Just like if a student in the first row wants to pass a note to a student in the last row, then the transmission process needs to go through many students (middlemen), so the transmission method changes from "A < ---> B" to "A <-> N <-> M <-> B".
In HTTP, a middleman is required to comply with the HTTP protocol and can add any additional things as long as they do not disturb the basic data transmission. Regarding transmission, we can further understand HTTP. HTTP is a convention and specification used in the computer world to transmit data between two points. Hypertext The content transmitted by HTTP is "hypertext". Let's first understand "text". In the early days of the Internet, it was just simple characters and text, but now the meaning of "text" has been expanded to include pictures, videos, compressed packages, etc., which are all considered "text" in the eyes of HTTP. Let’s understand “hypertext” again. It is a text that goes beyond ordinary text. It is a mixture of text, pictures, videos, etc. The most important thing is that it has hyperlinks, which can jump from one hypertext to another. HTML is the most common hypertext. It is just a plain text file, but it uses many tags to define links to pictures, videos, etc. After being interpreted by the browser, what is presented to us is a web page with text and pictures. OK, after a detailed explanation of these three terms in HTTP, we can give a more accurate and technical answer than the seven words "Hypertext Transfer Protocol": HTTP is a "convention and specification" in the computer world that is specifically used to "transmit" "hypertext" data such as text, images, audio, video, etc. between "two points". Is it correct to say that HTTP is the protocol used to transfer hypertext from Internet servers to local browsers? This statement is incorrect. Because it can also be "server < -- > server", it is more accurate to use the description between the two points. What are the common HTTP status codes? Five major categories of HTTP status codes 1xx : 1xx status codes are prompt messages and are an intermediate state in protocol processing. They are rarely used in practice. 2xx : 2xx status codes indicate that the server successfully processed the client's request, which is also the status we most want to see. "200 OK" is the most common success status code, indicating that everything is normal. If it is a non-HEAD request, the response header returned by the server will have body data. "204 No Content" is also a common success status code, which is basically the same as 200 OK, but the response header has no body data. "206 Partial Content" is used for HTTP block download or resumable download, indicating that the body data returned in the response is not the entire resource, but a part of it, and also the status of successful server processing. 3xx : 3xx status codes indicate that the resource requested by the client has changed, and the client needs to resend the request with a new URL to obtain the resource, which is a redirect. "301 Moved Permanently" means permanent redirection, which means that the requested resource no longer exists and needs to be accessed again using a new URL. "302 Found" indicates a temporary redirection, which means that the requested resource is still there, but it needs to be accessed using another URL temporarily. Both 301 and 302 use the Location field in the response header to indicate the URL to be redirected subsequently, and the browser will automatically redirect to the new URL. "304 Not Modified" does not mean a redirect, it means that the resource has not been modified. It redirects an existing buffer file, also known as cache redirection, and is used for cache control. 4xx : 4xx status codes indicate that the message sent by the client is incorrect and the server cannot process it, which is the meaning of the error code. "400 Bad Request" means that there is an error in the client request message, but it is only a general error. "403 Forbidden" means that the server prohibits access to resources, not that the client's request is wrong. "404 Not Found" means that the requested resource does not exist or is not found on the server, so it cannot be provided to the client. 5xx : 5xx status codes indicate that the client request message is correct, but an internal error occurred when the server processed it. These are server-side error codes. "500 Internal Server Error" is a general error code similar to 400. We don't know what error occurred on the server. "501 Not Implemented" means that the function requested by the client is not yet supported, which is similar to the meaning of "opening soon, so stay tuned". "502 Bad Gateway" is usually an error code returned by the server when it acts as a gateway or proxy, indicating that the server itself is working normally, but an error occurred when accessing the backend server. "503 Service Unavailable" means that the server is currently very busy and cannot respond to the server temporarily, which is similar to "the network service is busy, please try again later". What are the common fields in HTTP? ①Host When the client sends a request, it is used to specify the server's domain name.
With the Host field, you can send requests to different websites on the "same" server. ②Content-Length field When the server returns data, there will be a Content-Length field to indicate the length of the data in this response.
As shown above, it tells the browser that the data length of this server response is 1000 bytes, and the following bytes belong to the next response. ③Connection field The Connection field is most often used by the client to request the server to use a TCP persistent connection so that it can be reused for other requests. The default connections of HTTP/1.1 are all persistent connections, but in order to be compatible with older versions of HTTP, you need to specify the value of the Connection header field as Keep-Alive.
A reusable TCP connection is established until the client or server actively closes the connection. However, this is not a standard field. ④Content-Type field The Content-Type field is used by the server to tell the client what format the data is in when responding.
The above type indicates that a web page is sent and the encoding is UTF-8. When the client makes a request, it can use the Accept field to declare which data formats it can accept.
In the above code, the client declares that it can accept data in any format. ⑤Content-Encoding field The Content-Encoding field describes the data compression method. It indicates what compression format is used for the data returned by the server.
The above indicates that the data returned by the server is compressed using gzip, and tells the client that it needs to be decompressed in this way. When making a request, the client uses the Accept-Encoding field to indicate which compression methods it can accept.
GET and POST What is the difference between GET and POST? The Get method means requesting to obtain resources from the server. This resource can be static text, pages, pictures, videos, etc. For example, when you open my article, the browser will send a GET request to the server, and the server will return all the text and resources of the article. GET request The POST method does the opposite. It submits data to the resource specified by the URI, and the data is placed in the body of the message. For example, if you type a message at the bottom of my article and click "Submit" (implying that you leave a message), the browser will execute a POST request, put your message text into the message body, and then splice the POST request header and send it to the server via the TCP protocol. POST request Are both GET and POST methods safe and idempotent? Let's first explain the concepts of safety and idempotence:
Then it is obvious that the GET method is safe and idempotent because it is a "read-only" operation. No matter how many times the operation is performed, the data on the server is safe and the result is the same every time. POST is an operation of "adding or submitting data", which will modify the resources on the server, so it is unsafe. Submitting data multiple times will create multiple resources, so it is not idempotent. HTTP Features What advantages of HTTP (1.1) do you know and how are they manifested? The most prominent advantages of HTTP are "simplicity, flexibility and easy scalability, wide application and cross-platform". ①Simple The basic HTTP message format is header + body. The header information is also in the form of simple key-value text, which is easy to understand and reduces the threshold for learning and use. ②Flexible and easy to expand Each component requirement in the HTTP protocol, such as various request methods, URI/URL, status code, header field, etc., is not fixed, and developers are allowed to customize and expand them. At the same time, since HTTP works at the application layer (OSI layer 7), its lower layer can change at will. HTTPS adds an SSL/TLS secure transport layer between the HTTP and TCP layers. HTTP/3 even replaces the TCPP layer with UDP-based QUIC. ③Widely used and cross-platform Since the development of the Internet, HTTP has been widely used, from desktop browsers to various APPs on mobile phones, from reading news and browsing forums to shopping, financial management, and playing PUBG. HTTP applications are flourishing and naturally have the advantage of being cross-platform. What about its disadvantages? The HTTP protocol is a double-edged sword with both advantages and disadvantages, namely "stateless, plain text transmission", and also has a major disadvantage of "insecurity". ①Statelessness is a double-edged sword The advantage of statelessness is that, because the server does not remember the HTTP status, no additional resources are needed to record status information, which can reduce the burden on the server and allow more CPU and memory to be used to provide external services. The downside of statelessness is that since the server has no memory capability, it will be very troublesome to complete related operations. For example, login → add to cart → place order → checkout → payment, this series of operations requires knowing the user's identity. But the server does not know that these requests are related, and it has to ask for identity information every time. In this way, you have to verify the information every time you operate. Can such a shopping experience still be pleasant? Don’t ask, just ask! There are many solutions to the stateless problem, one of the simpler ones is to use Cookie technology. Cookies control the client state by writing cookie information in request and response messages. This is equivalent to the server sending a "small sticker" with the client's information after the client's first request. When the client subsequently requests the server, it will bring the "small sticker" and the server will be able to recognize it. Cookie Technology ②Plaintext transmission is a double-edged sword Plain text means that the information in the transmission process is easy to read. It can be viewed directly with the naked eye through the browser's F12 console or Wireshark packet capture, which greatly facilitates our debugging work. But this is exactly the case. All HTTP information is exposed in broad daylight, which is equivalent to information nakedness. During the long transmission process, the content of the information has no privacy at all and can be easily stolen. If your account and password information is included in it, your account will be lost.
③ Unsafe The most serious disadvantage of HTTP is that it is insecure:
The security issues of HTTP can be solved by using HTTPS, that is, by introducing the SSL/TLS layer, which achieves the ultimate in security. So what do you think about the performance of HTTP/1.1? The HTTP protocol is based on TCP/IP and uses the "request-response" communication model, so the key to performance lies in these two points. Long connection : A big performance problem of early HTTP/1.0 is that each time a request is initiated, a new TCP connection (three-way handshake) must be created, and it is a serial request, which makes the TCP connection establishment and disconnection unnecessary, increasing the communication overhead. In order to solve the above TCP connection problem, HTTP/1.1 proposed a long connection communication method, also called persistent connection. The advantage of this method is that it reduces the extra overhead caused by repeated establishment and disconnection of TCP connections and reduces the load on the server side. The characteristic of a persistent connection is that the TCP connection state is maintained as long as neither end explicitly requests to disconnect. Short connection and long connection Pipeline network transmission : HTTP/1.1 uses a long connection method, which makes pipeline network transmission possible. That is, in the same TCP connection, the client can initiate multiple requests. As long as the first request is sent, there is no need to wait for it to come back before sending the second request, which can reduce the overall response time. For example, the client needs to request two resources. In the past, the client would first send request A in the same TCP connection, then wait for the server to respond, and then send request B after receiving the response. The pipeline mechanism allows the browser to send request A and request B at the same time. Pipeline network transmission However, the server still responds to request A in order, and then responds to request B after it is completed. If the previous response is particularly slow, there will be many requests waiting in line. This is called "head of line blocking". Head-of-line blocking : The "request-response" model exacerbates HTTP's performance problems. Because when a request in the sequence of sequentially sent requests is blocked for some reason, all the requests queued behind it are also blocked, which will cause the client to be unable to request data, which is called "head of line blocking". It is like a traffic jam on the way to work. Head of Line Blocking In short, the performance of HTTP/1.1 is average, and the subsequent HTTP/2 and HTTP/3 are aimed at optimizing the performance of HTTP. HTTP vs HTTPS What are the differences between HTTP and HTTPS?
What problems does HTTPS solve for HTTP? Because HTTP is transmitted in plain text, there are the following three security risks:
HTTPS adds the SSL/TLS protocol between the HTTP and TCP layers. HTTP vs HTTPS The above risks can be well solved:
It can be seen that as long as you don't do "evil", the SSL/TLS protocol can ensure the security of communication. How does HTTPS solve the above three risks?
① Hybrid encryption The confidentiality of information can be guaranteed through hybrid encryption, eliminating the risk of eavesdropping. Hybrid Encryption HTTPS uses a "hybrid encryption" method that combines symmetric encryption and asymmetric encryption:
Reasons for using "hybrid encryption":
②Digest algorithm The summary algorithm is used to achieve integrity and can generate a unique "fingerprint" for the data, which is used to verify the integrity of the data and eliminate the risk of tampering. Verify integrity Before sending plaintext, the client will calculate the "fingerprint" of the plaintext through a digest algorithm. When sending, the "fingerprint + plaintext" will be encrypted into ciphertext and sent to the server. After the server decrypts it, it will use the same digest algorithm to calculate the plaintext sent. By comparing the "fingerprint" carried by the client with the currently calculated "fingerprint", if the "fingerprint" is the same, it means that the data is complete. ③Digital Certificate The client first requests the public key from the server, and then encrypts the information with the public key. After receiving the ciphertext, the server decrypts it with its own private key. This brings up some problems. How to ensure that the public key is not tampered with and is trustworthy? Therefore, we need to use a third-party authority, CA (Digital Certificate Authority), to put the server public key in a digital certificate (issued by a Digital Certificate Authority). As long as the certificate is trustworthy, the public key is trustworthy. Digital Certificate Workflow The identity of the server public key is guaranteed by means of digital certificates to eliminate the risk of impersonation. How does HTTPS establish a connection? What happens during the interaction? The basic process of SSL/TLS protocol:
The first two steps are the establishment process of SSL/TLS, which is the handshake phase. The "handshake phase" of SSL/TLS involves four communications, as shown in the following figure: HTTPS connection establishment process Detailed process of establishing SSL/TLS protocol: ①ClientHello First, the client initiates an encrypted communication request to the server, which is a ClientHello request. In this step, the client mainly sends the following information to the server:
②SeverHello After receiving the client request, the server sends a response to the client, which is SeverHello. The server responds with the following content:
③Client response After receiving the response from the server, the client first confirms the authenticity of the server's digital certificate through the CA public key in the browser or operating system. If there is no problem with the certificate, the client will extract the server's public key from the digital certificate, and then use it to encrypt the message and send the following information to the server:
The random number in the first item above is the third random number in the entire handshake phase. In this way, the server and the client have three random numbers at the same time, and then use the encryption algorithm agreed upon by both parties to generate the "session key" for this communication. ④Last response from the server After the server receives the third random number (pre-master key) from the client, it calculates the "session key" for this communication through the negotiated encryption algorithm. Then, the final message is sent to the client:
At this point, the entire SSL/TLS handshake phase is over. Next, the client and server enter into encrypted communication, using the normal HTTP protocol, but using the "session key" to encrypt the content. HTTP/1.1, HTTP/2, HTTP/3 evolution What performance improvements does HTTP/1.1 have compared to HTTP/1.0? HTTP/1.1 performance improvements over HTTP/1.0:
But HTTP/1.1 still has performance bottlenecks:
What optimizations has HTTP/2 made to address the performance bottleneck of HTTP/1.1 mentioned above? The HTTP/2 protocol is based on HTTPS, so the security of HTTP/2 is also guaranteed. The performance improvements of HTTP/2 compared to HTTP/1.1 are: ①Head compression HTTP/2 compresses the header. If you send multiple requests at the same time and their headers are the same or similar, the protocol will help you eliminate duplicate headers. This is the so-called HPACK algorithm: a header information table is maintained on both the client and the server, all fields are stored in this table, an index number is generated, and the same field will not be sent in the future, only the index number will be sent, thus increasing the speed. ②Binary format HTTP/2 is no longer a plain text message like HTTP/1.1, but fully adopts binary format. Both the header information and the data body are binary and are collectively referred to as frames: header information frame and data frame. Message Difference Although this is not user-friendly, it is very computer-friendly, because computers only understand binary. Therefore, after receiving the message, there is no need to convert the plaintext message into binary, but to directly parse the binary message, which increases the efficiency of data transmission. ③Data flow HTTP/2 packets are not sent in order, and consecutive packets in the same connection may belong to different responses. Therefore, the packet must be marked to indicate which response it belongs to. All data packets for each request or response are called a data stream. Each data stream is marked with a unique number, where the data stream numbered by the client is odd and the data stream numbered by the server is even. The client can also specify the priority of the data stream. The server will respond to the request with the highest priority first. HTTP/1 ~ HTTP/2 ④Multiplexing HTTP/2 allows multiple requests or responses to be sent concurrently in a single connection, rather than in a sequential order. The serial requests in HTTP/1.1 are removed, and there is no need to wait in line, so there will no longer be a "head of line blocking" problem, which reduces latency and greatly improves connection utilization. For example, in a TCP connection, the server receives two requests from clients A and B. If it finds that the processing of A is very time-consuming, it will respond to the processed part of A's request, then respond to B's request, and after that, respond to the remaining part of A's request. Multiplexing ⑤Server push HTTP/2 also improves the traditional "request-response" working mode to a certain extent. The service is no longer a passive response, but can also actively send messages to the client. For example, when the browser just requests HTML, static resources such as JS and CSS files that may be used are actively sent to the client in advance to reduce waiting delays. This is server push (Server Push, also called Cache Push). What are the defects of HTTP/2? What optimizations have been made in HTTP/3? The main problem with HTTP/2 is that multiple HTTP requests reuse a TCP connection, and the underlying TCP protocol does not know how many HTTP requests there are. Therefore, once packet loss occurs, the TCP retransmission mechanism will be triggered, so all HTTP requests in a TCP connection must wait for the lost packet to be retransmitted:
These are all problems based on the TCP transport layer, so HTTP/3 changes the TCP protocol under HTTP to UDP! HTTP/1 ~ HTTP/3 UDP does not care about the order or packet loss, so there will be no head-of-line blocking problem in HTTP/1.1 or the problem of retransmission of all lost packets in HTTP/2. Everyone knows that UDP is an unreliable transmission, but the QUIC protocol based on UDP can achieve reliable transmission similar to TCP:
TCP HTTPS(TLS/1.3) and QUIC HTTPS So, QUIC is a pseudo-TCP + TLS + HTTP/2 multiplexing protocol on top of UDP. QUIC is a new protocol. Many network devices do not know what QUIC is and only treat it as UDP, which will cause new problems. Therefore, the popularization of HTTP/3 is very slow now, and I don’t know whether UDP will be able to overtake TCP in the future. |
<<: Why does the TCP protocol have a sticky packet problem?
Wesbytes is a foreign hosting company founded in ...
At the MWC that just ended last week, 5G can be s...
On August 7, a team of domestic developers and le...
HostHatch has released a new promotion plan on LE...
CUBECLOUD has launched a promotion for Christmas ...
[51CTO.com original article] Changsha, the capita...
As networks become increasingly software-based, l...
Domestic cloud service providers such as Alibaba ...
[[404070]] Hello everyone, I am IT sharer, also k...
Yecao Cloud is a Chinese hosting company founded ...
SD-WAN deployments are quickly becoming a major f...
[[405370]] Linkerd provides many features, such a...
The China Internet Network Information Center (CN...
In web development and network applications, impl...
Nowadays, 5G seems to be like the swallows flying...