HTTP protocol interview challenges

HTTP protocol interview challenges

I am an atypical interviewer. For the first question about HTTP protocol, most people would ask what are the commonly used status codes. I don’t ask that. My question is what is the full name of HTTP? Say it to me in English!

[[222114]]

What is the full name of HTTP?

HyperText Transfer Protocol, don't mispronounce these words. The so-called hypertext is text with tags, and it originally referred to HTML. Now the HTTP protocol can transmit more than just HTML, it can also transmit forms, JSON, XML, and files.

What are the commonly used HTTP status codes?

Most students know 200, 404, 500, and 302 errors. If you don't even know 404, you will be despised by the editor. Why is 500 error so common? Because there are always bugs during development. A big exception is thrown and the browser will return 500. 500 means InternalServerError, which means internal server error. If it is not a bug, it is usually because the database is down.

If you ask a few more status codes, many people won’t know them, because most companies’ software services do not use standard HTTP status codes. Many status codes will never appear, so students naturally won’t know them.

  • 400 Bad Request is used for parameter validation, such as missing a parameter or wrong parameter type.
  • 502 Bad Gateway When the backend service is down or under too much pressure, the request received by Nginx cannot be passed to the backend service for processing in time, and a 502 error will occur. This is also very common. This is the error that often occurs when Zhihu Douban website is absent.
  • 304 Not Modified Very few people know this error, because most back-end developers have little experience in front-end Javascript development. When you use Chrome to open a frequently visited website and look at the static resources transmitted by the Network, you can see a lot of 304 status codes. It means that the resource is cached by the browser and does not need to be re-requested from the server.
  • 401 Unauthorized Insufficient permissions. This is easy to understand. The resource exists but you are not allowed to access it.
  • 403 Forbidden The resource is not accessible. This error will occur if your IP is blacklisted.

In fact, there are many status codes, but I haven't studied them carefully because I don't really use them in my work. If you are interested, please continue reading Wikipedia

What are the methods in HTTP?

  • GET No explanation is needed. If the reader does not know, it is recommended that you should not work in the IT circle.
  • POST is generally used to create or modify resources. In the RESTFUL specification, POST is only used to create resources and returns a 201 Created status code to indicate successful creation. However, most websites do not follow the strict RESTFUL specification, and it is very common to use POST to modify resources.
  • PUT is used to modify resources, such as modifying a specific attribute of a resource.
  • DELETE is used to delete resources.
  • HEAD is not commonly used. It is similar to GET, but does not return the body content, but only the HTTP header information. It is generally used to obtain resource meta information, such as length, modification time, etc.
  • OPTIONS I have never used it.
  • I have never used TRACE.
  • I have never used CONNECT.

If you are interested in the latter three, please read the RPC specification. I took a quick look and said I didn't quite understand it. If you are good, you can challenge it.

What is the HTTP protocol format?

  • HTTP request and response message protocols are the same, divided into three parts, the start line, the message header and the message body. These three parts are separated by CRLF. The last message header has two CRLFs to indicate the end of the message header.

  • The first line of an HTTP request is called a request line, such as GET /index.html HTTP/1.1
  • The first line of an HTTP response is called a status line, such as 200 OK

The message header consists of many key-value pairs, and CRLF is used as a separator between multiple key-value pairs. There can also be no key-value pairs at all. For example, Content-Encoding: gzip

The message body is a string, and the length of the string is specified by the Content-Length key in the message header. If there is no Content-Length field, it means there is no message body. For example, a GET request has no message body, and the message body of a POST request is generally used to store form data. The page content returned by the response to a GET request is also placed in the message body. The JSON content returned by our usual API calls is placed in the message body.

What is chunked delivery?

When a browser requests a resource from a server, this resource is a dynamic resource and the server cannot predict the size of the resource in advance. At this time, block transmission can be used.

The server first generates a chunk, sends this chunk, then generates another chunk, and sends another chunk, until all resources are transferred.

Chunked transmission requires adding a special key-value pair transfer-encoding: chunked to the request header, so that the content of the message body is transmitted in chunks.

The chunked transmission format is shown in the figure. It is composed of a series of chunks. Each chunk consists of a length line and a chunk body. The last chunk has a length of 0, indicating the end.

What is the mechanism of persistent connection?

In early versions of HTTP, each request would initiate a connection. In addition to the HTML of a web page, a web page would also have many static resources and many API calls. If each request had a connection, it would inevitably create multiple connections with the server for each web page load, which would waste server resources and slow down the client's access speed. After HTTP1.0, Keep-Alive persistent connections were introduced, which became the default option in HTTP1.1. It allows one HTTP connection to serve multiple requests continuously, effectively saving resources and increasing the client's page loading speed.

It is not advisable to maintain persistent connections all the time. After all, each connection will occupy server resources. If too many people open the web page, the server resources will also be tight. Therefore, the server will generally configure a KeepAlive Timeout parameter and a KeepAlive Requests parameter to limit the duration of a single connection and the maximum number of requests served.

If the timeout period set by the server is 0, it degenerates to a non-persistent connection. A non-persistent connection adds a header message Connection: Close to the response header to notify the client that the connection needs to be closed immediately after receiving the current response.

Likewise, the browser will not keep the connection open forever just because the server has configured the KeepAlive Timeout to be infinite. Each browser has its own built-in limits, which vary from browser vendor to browser vendor.

What is Pipeline?

HTTP1.0 does not support pipelining. The order of processing requests for the same connection is one-by-one. Processing one request requires one TTL, which is the round-trip time from the client to the server. Processing N requests requires N TTLs. When there are many page requests, the page loading speed will be very slow.

HTTP 1.1 requires servers to support pipelining, which means that multiple requests can be sent to the server at the same time, and then the responses can be read one by one. The principle of pipelining is the same as that of Redis, and the order of responses must be consistent with the order of requests.

How to understand the statelessness of HTTP protocol?

The so-called statelessness of the HTTP protocol means that the server's protocol layer does not need to establish any correlation between different requests. It specifically refers to the statelessness of the protocol layer. However, this does not mean that applications built on the HTTP protocol cannot maintain state. The application layer can track the correlation between user requests through the session. The server will bind a unique session ID to each session object. The browser can record the session ID in the local cache LocalStorage or Cookie. The subsequent requests will carry this session ID, and the server can find the corresponding session state for each request.

<<:  Spring is coming, the cancellation of data roaming charges? Beware of scams

>>:  The battle of data center network switching equipment architecture

Recommend

Fiber-optic interconnects: How to improve cloud computing networks

Since the beginning of the 21st century, cloud co...

5G is not yet popular, 6G is on the way, and 7G will achieve space roaming

[[332143]] This article is reprinted from the WeC...

Log Analysis for Software Defined Data Center (SDDC)

Modern infrastructure is generating log data at a...

The beauty of 5G phone is like being in your ear

[[352290]] This article is reprinted from the WeC...

What is in the Http Header?

The author has developed a simple, stable, and sc...

Let’s talk about the brief history of world communications

This article is reprinted from the WeChat public ...

...