Essential HTTP knowledge for front-end developers! Just read this article! !

Essential HTTP knowledge for front-end developers! Just read this article! !

HTTP Origin

HTTP was initiated by Tim Berners-Lee at the European Organization for Nuclear Research (CERN) in 1989.

The most famous of these is RFC 2616[1], published in June 1999, which defines the version of the HTTP protocol that is widely used today - HTTP 1.1.

What is HTTP

Full name: HyperText Transfer Protocol.

Concept: HTTP is a communication protocol that can obtain network resources such as HTML, images, etc. It is the basis for data exchange on the web and is a client-server protocol.

HTTP - The Internet's Multimedia Messenger - "HTTP Authoritative Guide". The role of HTTP on the Internet: It acts as a messenger, doing errands, transmitting information between the client and the server, but we cannot do without it. The HTTP protocol is an application layer protocol and is the protocol most closely related to front-end development. The HTTP requests, HTTP cache, Cookies, cross-domain, etc. that we encounter in daily life are actually closely related to HTTP.

Basic features of HTTP

  • Extensible protocol. The introduction of HTTP headers in HTTP 1.0 makes it easier to extend the protocol. As long as the server and client agree on the semantics of the headers, new features can be easily added.
  • HTTP​ is stateless and session-based. In the same connection, there is no relationship between two successfully executed HTTP​ requests. This brings up a problem: users cannot interact continuously on the same website. For example, on an e-commerce website, a user adds a product to the shopping cart, switches to a page, and then adds the product again. There is no relationship between the two requests to add products, and the browser cannot know which products the user ultimately selected. Using HTTP​ header extensions, HTTP Cookies​, can solve this problem. Adding cookies to the header creates a session so that each request can share the same context information and achieve the same state.
  • HTTP​ and connections. Sent over TCP​, or TLS​ - encrypted TCP​ connections, in theory any reliable transport protocol can be used. Connections are controlled by the transport layer, which is not fundamentally the scope of HTTP.

That is, HTTP relies on connection-oriented TCP for message delivery, but the connection is not required. It only needs to be reliable, or not lose messages (at least return errors).

By default, HTTP/1.0 opens a separate TCP connection for each HTTP request/response pair. When multiple requests need to be initiated in succession, this mode is less efficient than multiple requests sharing the same TCP link. For this reason, HTTP 1.1 has the concept of persistent connections, and the underlying TCP connection can be implemented through the connection header. However, HTTP 1.1 is not perfect in terms of connections, which we will mention later.

HTTP-based component system

The component systems of HTTP include clients, web servers, and proxies.

Client: user-agent

Browsers are programs used by engineers, in particular, and web developers to debug applications.

Web Server

The Web Server serves and provides the documents requested by the client. Each request sent to the server will be processed by the server and return a message, which is the response.

Proxies

Between the browser and the server, there are many computers and other devices that forward HTTP messages. They may appear at the transport layer, network layer, and physical layer, which are transparent to the HTTP application layer.

There are some functions as follows:

  • cache
  • Filtering (like antivirus scanning, parental controls)
  • Load Balancing
  • Authentication (permission control for different resources)
  • Log Management

HTTP message composition

HTTP has two types of messages:

  • Request - sent by the client to trigger an action on the server.
  • Response – The answer from the server.

HTTP messages consist of multiple lines of text encoded in ASCII. In HTTP/1.1 and earlier, these messages are sent openly over a connection. In HTTP2.0, messages are divided into multiple HTTP frames. HTTP messages are exposed through configuration files (for proxies or servers), APIs (for browsers), or other interfaces.

Typical HTTP Session

  • Establishing a connection In client-server protocols, the connection is established by the client. Opening a connection in HTTP​ means starting a connection at the underlying transport layer, usually TCP​. When using TCP​, the default port number for HTTP​ servers is 80​, although 8000​ and 8080 are also commonly used.
  • Send the client request.
  • The server responds to the request.

HTTP Requests and Responses

HTTP requests and responses both include a start line, HTTP Headers, an empty line, and a body, as shown in the following figure:

Start line. The start line of the request: request method, request path and HTTP version number. The start line of the response: HTTP version number, response status code and status text description.

The following is a detailed description of the request path. There are several types of request paths:

1) An absolute path followed by a '?' and a query string. This is the most common form, called the origin form, used by the GET, POST, HEAD, and OPTIONS methods.

 POST / HTTP / 1.1
GET / background .png HTTP / 1.0
HEAD / test.html ?query = alibaba HTTP / 1.1
OPTIONS / anypage.html HTTP / 1.0

2) A complete URL. Mainly used when connecting to a proxy using the GET method.

 GET http://developer.mozilla.org/en - US / docs / Web / HTTP / Messages HTTP / 1.1

3) The authority component of the URL, consisting of a domain name and an optional port number (prefixed with ':'), is called the authority form. It is only used when establishing an HTTP tunnel using CONNECT.

 CONNECT developer .mozilla .org : 80 HTTP / 1.1

4) Asterisk form: A simple asterisk ('*') is used with the OPTIONS method to represent the entire server.

 OPTIONS * HTTP / 1.1
  • Headers​ Request headers or response headers. See Headers below for details. A case-insensitive string followed by a colon (':') and a structure depending on the value of the header.
  • Blank line. Many people tend to overlook this.
  • Body.

Request Body: Some requests send data to the server in order to update data: a common case is a POST request (containing HTML form data). There are generally two types of request message bodies. One is a single-file body defined by Content-Type and Content-Length. The other is composed of multiple bodies, usually associated with HTML Forms. The difference between the two lies in the value of Content-Type.

1) Content-Type —— application/x-www-form-urlencoded For form content in application/x-www-form-urlencoded format, it has the following characteristics:

I. The data will be encoded into key-value pairs separated by &.

II. Characters are encoded in URL encoding.

 // Conversion process : { a : 1 , b : 2 } -> a = 1 & b = 2 -> as follows ( final form )
"a%3D1%26b%3D2"

2)Content-Type——multipart/form-data.

The Content-Type field in the request header will contain the boundary, and the boundary value is specified by the browser by default. Example: Content-Type: multipart/form-data;boundary=----WebkitFormBoundaryRRJKeWfHPGrS4LKe.

The data will be divided into multiple parts, each of which is separated by a separator. Each part is described by an HTTP header, such as Content-Type. The final separator will be added with -- to indicate the end.

 Content - Disposition : form - data ; name = "data1" ;
Content - Type : text / plain
data1
----WebkitFormBoundaryRRJKeWfHPGrS4LKe
Content - Disposition : form - data ; name = "data2" ;
Content - Type : text / plain
data2
----WebkitFormBoundaryRRJKeWfHPGrS4LKe--

Response Body:

1) Consists of a single file of known length. This type of body is defined by two headers: Content-Type and Content-Length.

2) Consists of a single file of unknown length, using chunks encoding by setting Transfer-Encoding to chunked.

Content-Length will be mentioned in HTTP 1.0 below. This is a very important header added in HTTP 1.0.

method

Safe methods: HTTP defines a set of methods called safe methods. Both the GET method and the HEAD method are considered safe, which means that neither the GET method nor the HEAD method will produce any action - the HTTP request will not produce any results on the server, but this does not mean that no action has occurred. In fact, this is more determined by web developers.

  • GET: Request the server to send a resource.
  • HEAD: Similar to the GET method, but the server only returns the header in the response. The body of the entity is not returned.
  • PUT: Writes a document to the server. Semantics: Uses the body of the request to create a new document named by the requested URL.
  • POST: used to input data into the server. Usually we submit form data to the server. [POST is used to send data to the server, and the PUT method is used to store data in resources (such as files) on the server].
  • TRACE: Mainly used for diagnosis. It implements message loop-back testing along the path leading to the target resource and provides a practical debugging mechanism.
  • OPTIONS: Requests the web server to inform it of the various functions it supports. You can ask the server which methods it supports, or which methods it supports for certain special resources.
  • DELETE: Requests the server to delete the resource specified in the request URL.

Difference between GET and POST

First, we need to understand the concepts of side effects and idempotence. Side effects refer to modifications to server-side resources. Idempotence means that after sending M and N requests (the two are different and both are greater than 1), the status of resources on the server is consistent. In application scenarios, get is side-effect-free and idempotent. Post mainly has side effects and is not idempotent.

Technically there are the following distinctions:

  • Cache: Get requests can be cached, but Post requests cannot.
  • Security: Get requests are not as secure as Post requests because the request is in the URL and will be saved in the browser history. POST is placed in the request body, which is more secure.
  • Limitation: URLs have length limits, which will interfere with Get requests. This is determined by the browser.
  • Encoding: GET requests can only be URL-encoded and can only accept ASCII characters, while POST has no restrictions. POST supports more encoding types and does not restrict data types.
  • From the TCP perspective, a GET request will send the request message all at once, while a POST will be divided into two TCP packets, first sending the header part, and then sending the body part if the server responds with 100 (continue). (Except for Firefox, which only sends one TCP packet for a POST request).

Status Code

  • 100 ~199 - Informational status code

101 Switching Protocols. When HTTP is upgraded to WebSocket, if the server agrees to the change, it will send status code 101.

  • 200~299——Success status code

200 OK, indicating that the request sent from the client is processed correctly on the server.

204 No content, indicating that the request is successful, but the response message does not contain the entity body.

205 Reset Content, indicating that the request is successful, but the response message does not contain the entity body. However, it is different from the 204 response in that it requires the requester to reset the content.

206 Partial Content, for range request.

  • 300~399——Redirection status code

301 moved permanently, a permanent redirect, indicates that the resource has been assigned a new URL.

302 found, temporary redirection, indicating that the resource is temporarily assigned a new URL.

303 see other, indicating that the resource exists at another URL and the GET method should be used to obtain the resource.

304 not modified, indicating that the server allows access to the resource, but the request conditions are not met.

307 temporary redirect, temporary redirect, has a similar meaning to 302, but the client is expected to keep the request method unchanged and send a request to the new address.

  • 400~499——Client error status code

400 bad request. The request message contains a syntax error.

401 unauthorized, indicating that the request sent requires authentication information through HTTP authentication.

403 forbidden, indicating that access to the requested resource is denied by the server.

404 not found, which means the requested resource was not found on the server.

  • 500~599——Server error status code

500 internal server error, indicating that an error occurred on the server while executing the request.

501 Not Implemented, indicating that the server does not support a function required by the current request.

503 service unavailable, indicating that the server is temporarily overloaded or down for maintenance and cannot process the request.

First

HTTP Headers

1. General headers are applicable to both request and response messages, but are irrelevant to the data transmitted in the final message body, such as Date.

2. Request headers contain more information about the resource to be obtained or the client itself, such as User-Agent.

3. Response headers contain additional information about the response.

4. Entity headers contain more information about the entity body, such as the length of the body (Content-Length) or its MIME type, such as Accept-Ranges.

For detailed headers, see HTTP Headers Collection [2].

HTTP: Past and Present

HTTP (HyperText Transfer Protocol) is the basic protocol of the World Wide Web. Dr. Tim Berners-Lee and his team created it between 1989 and 1991. [HTTP, web browser, server].

HTTP version 0.9 was released in 1991, version 1.0 was released in 1996, and version 1.1 was released in 1997. Version 1.1 is also the most widely transmitted version to date. Version 2.0 was released in 2015, which greatly optimized the performance and security of HTTP/1.1. Version 3.0, released in 2018, continued to optimize HTTP/2 and radically replaced the TCP protocol with UDP. Currently, HTTP/3 is supported by Chrome, Firefox, and Cloudflare on September 26, 2019.

HTTP 0.9

Single-line protocol, the request consists of a single-line instruction. It starts with the only available method GET. It is followed by the path of the target resource.

 GET /mypage.html

Response: Includes only the response document itself.

 < HTML >
This is a very simple HTML page
</ HTML >
  • No response headers, only HTML files are transferred
  • No status code

HTTP 1.0

RFC 1945[3] proposed HTTP1.0 to build better scalability.

  • Protocol version information is sent with every request.
  • Response status code.
  • The concept of HTTP headers was introduced, allowing the transmission of metadata, whether in requests or extensions, making the protocol more flexible and extensible.
  • The Content-Type request header enables the transmission of documents of other types besides plain text HTML files. In the response, the Content-Type header tells the client the content type of the content actually returned.

Media type is a standard used to indicate the nature and format of a document, file, or byte stream. Browsers usually use MIME (Multipurpose Internet Mail Extensions) types to determine how to handle URLs, so it is very important that the web server configures the correct MIME type in the response header. If it is not configured correctly, it may cause the website to not work properly. The structure of MIME is very simple; it consists of two strings, type and subtype, separated by '/'.

HTTP takes a part of the MIME type to mark the data type of the message body. These types are reflected in the Content-Type field. Of course, this is for the sender. If the receiver wants to receive a specific type of data, it can also use the Accept field.

The values ​​of these two fields can be divided into the following categories:

 - text : text / html , text / plain , text / css, etc.
- image : image / gif , image / jpeg , image / png, etc.
- audio / video : audio / mpeg , video / mp4, etc.
- application : application / json , application / javascript , application / pdf , application / octet - stream

At the same time, in order to agree on the compression method, supported language, character set, etc. of request data and response data, the following header is also proposed.

1. Compression method: Sending end: Content-Encoding (the server informs the client how the server encodes the main body of the entity) and receiving end: Accept-Encoding (the encoding method supported by the user agent). The values ​​are gzip: the most popular compression format today; deflate: another well-known compression format; br: a compression algorithm invented specifically for HTTP.

2. Supported languages: Content-Language and Accept-Language (the set of natural languages ​​supported by the user agent).

3. Character set: Sending end: Specified by the charset attribute in Content-Type. Receiving end: Accept-Charset (character set supported by the user agent).

 // Sender
Content - Encoding : gzip
Content - Language : zh - CN , zh , en
Content - Type : text / html ; charset = utf - 8

// Receiver
Accept - Encoding : gzip
Accept - Language : zh - CN , zh , en
Accept - Charset : charset = utf - 8

Although HTTP1.0 has made many improvements based on HTTP 0.9, it still has many shortcomings.

The main disadvantage of HTTP/1.0 is that only one request can be sent per TCP connection. After sending the data, the connection is closed. If you want to request other resources, you must create a new connection. The cost of creating a new TCP connection is very high because it requires a three-way handshake between the client and the server, and the sending rate is slow at the beginning (slow start).

The earliest model of HTTP, which is also the default model of HTTP/1.0, is a short connection. Each HTTP request is completed by its own independent connection; this means that there will be a TCP handshake before each HTTP request is initiated, and it is continuous.

HTTP 1.1

HTTP/1.1 was released in January 1997 as RFC 2068[4].

HTTP 1.1 eliminated a lot of ambiguity and introduced several technologies:

  • Connections can be reused. Persistent connection: connection: keep-alive. HTTP 1.1 supports persistent connections (PersistentConnection), which can transmit multiple HTTP requests and responses on a TCP connection, reducing the consumption and delay of establishing and closing connections. Connection: keep-alive is enabled by default in HTTP1.1, which to some extent makes up for the shortcoming of HTTP1.0 that each request must create a connection.
  • HTTP Pipelinling technology has been added to allow the second request to be sent before the first response is completely sent, in order to reduce communication latency. During the reuse of the same TCP connection, even if multiple requests are sent simultaneously through the pipeline, the server responds in the order of the requests; and the client will block subsequent requests (wait in line) before receiving responses to all previously sent requests, which is called "head-of-line blocking".
  • Support response chunking and chunk encoding transmission: Transfer-Encoding: chunked Content-length declares the data length of this response. A keep-alive connection can transmit multiple responses one after another, so Content-length is used to distinguish which response the data packet belongs to. The prerequisite for using the Content-Length field is that the server must know the data length of the response before sending the response. For some time-consuming dynamic operations, this means that the server has to wait until all operations are completed before sending data, which is obviously inefficient. A better way to deal with it is to send a piece of data as soon as it is generated, using "stream mode" instead of "buffer mode". Therefore, HTTP 1.1 stipulates that the Content-Length field can be omitted and "chunked transfer encoding" can be used. As long as the request or response header information has the Transfer-Encoding: chunked field, it indicates that the body may be composed of multiple data blocks of an undetermined number. There will be a line containing a hexadecimal value before each data block, indicating the length of the block; the last block of size 0 means that the data of this response has been sent.
  • Introducing additional cache control mechanisms. In HTTP1.0, the If-Modified-Since, Expires, etc. in the header are mainly used as the cache judgment standard. HTTP1.1 introduces more cache control strategies such as Entity tag, If-None-Match, Cache-Control and other optional cache headers to control the cache strategy.
  • Host header. Different domain names are configured with the same IP address server. Host is a new request header added in HTTP 1.1 protocol, mainly used to implement virtual host technology.

Virtual hosting is also known as shared web hosting. It can use virtual technology to divide a complete server into several hosts, so that multiple websites or services can be run on a single host.

For example, there is a server with an IP address of 61.135.169.125, on which the websites of Google, Baidu, and Taobao are deployed. Why do we see the homepage of Google instead of the homepage of Baidu or Taobao when we visit https://www.google.com? The reason is that the Host request header determines which virtual host to visit.

HTTP 2.0

In 2015, HTTP2.0 was released. rfc7540[5].

  • HTTP/2 is a binary protocol rather than a text protocol. Let's look at a few concepts first:

Frame: The client and server communicate by exchanging frames, which is the smallest unit of communication based on this new protocol.

Message: refers to the logical HTTP message, such as request, response, etc., which consists of one or more frames.

Stream: A stream is a virtual channel in a connection that can carry bidirectional messages; each stream has a unique integer identifier.

Frames in HTTP 2.0 break HTTP/1.x messages into frames and embed them into streams. Data frames and header frames are separated, which allows header compression. Multiple streams are combined, a process called multiplexing, which allows for more efficient underlying TCP connections.

That is to say, streams are used to carry messages, and messages are composed of one or more frames. Binary transmission further improves transmission performance. Each data stream is sent in the form of a message, and a message is composed of one or more frames. A frame is the unit of data in a stream.

HTTP framing is now transparent to web developers. In HTTP/2, this is an additional step between HTTP/1.1 and the underlying transport protocol. Web developers do not need to make any changes in the APIs they use to take advantage of HTTP framing; HTTP/2 will be turned on and used when both browsers and servers are available.

  • This is a multiplexing protocol. Parallel requests can be processed over the same connection, removing the ordering and blocking constraints of HTTP/1.x. Multiplexing allows multiple request-response messages to be initiated simultaneously over a single HTTP/2 connection.

As we mentioned before, although HTTP 1.1 has long connections and pipelining, it still has head-of-line blocking. HTTP 2.0 solves this problem. The new binary framing layer in HTTP/2 breaks through these limitations and implements full request and response multiplexing: clients and servers can break down HTTP messages into independent frames, send them interleaved, and finally reassemble them at the other end.

As shown in the figure above, the snapshot captures multiple data streams running in parallel on the same connection. The client is transmitting a DATA frame (data stream 5) to the server, while at the same time, the server is sending a series of frames from data stream 1 and data stream 3 to the client, interleaved. Therefore, there are three parallel data streams on one connection at the same time.

Breaking down HTTP messages into separate frames, sending them interleaved, and then reassembling them at the other end is the most important enhancement of HTTP 2. In fact, this mechanism will trigger a series of chain reactions throughout the entire network technology stack, resulting in huge performance improvements, allowing us to: 1. Send multiple requests in parallel and interleave them without affecting each other. 2. Send multiple responses in parallel and interleave them without interfering with each other. 3. Use a single connection to send multiple requests and responses in parallel. 4. Eliminate unnecessary delays and improve the utilization of existing network capacity, thereby reducing page load time. 5. No more work to get around HTTP/1.x limitations (such as sprites)...

Connection sharing, that is, each request is used as a connection sharing mechanism. One request corresponds to one id, so there can be multiple requests on a connection, and the requests of each connection can be randomly mixed together. The receiver can attribute the request to different server requests according to the request id.

For a comparison between HTTP 1.1 and HTTP 2.0, please refer to this website demo [6].

HTTP 1.1 is demonstrated below:

HTTP2.0 demonstration is as follows:

  • Compressed headers. HTTP1.x headers carry a lot of information and have to be sent repeatedly each time, which causes performance loss. To reduce this overhead and improve performance, HTTP/2 uses the HPACK compression format to compress request and response header metadata. This format uses two simple but powerful techniques: This format supports encoding transmitted header fields through static Huffman codes, thereby reducing the size of each transmission. This format requires the client and server to simultaneously maintain and update an index list of previously seen header fields (in other words, it can establish a shared compression context), which is then used as a reference to effectively encode previously transmitted values.

Server push. It allows the server to fill data in the client cache and request it in advance through a mechanism called server push. The server pushes resources to the client without the client explicitly requesting it. The server can push necessary resources to the client in advance, which can reduce the request delay time. For example, the server can actively push JS and CSS files to the client instead of waiting until the HTML is parsed to send a request to reduce the delay time. The general process is shown in the following figure:

How to upgrade your HTTP version

Using HTTP/1.1 and HTTP/2 is transparent to sites and applications. It is enough to have an up-to-date server that interacts with new browsers. Only a small group of people need to make changes, and as older browsers and servers are updated, adoption will naturally increase without any effort on the part of web developers.

HTTPS

HTTPS also transmits information through the HTTP protocol, but uses the TLS protocol for encryption.

Symmetric and asymmetric encryption

Symmetric encryption means that both parties have the same secret key, and both parties know how to encrypt and decrypt the ciphertext. However, because data transmission is done over the network, if the secret key is transmitted over the network, once the secret key is intercepted, there is no point in encryption.

Asymmetric encryption

As we all know, public keys can be used to encrypt data. However, to decrypt data, private keys must be used, and the private key is in the hands of the party that issues the public key. First, the server publishes the public key, so the client knows the public key. Then the client creates a secret key, encrypts it with the public key, and sends it to the server. After receiving the ciphertext, the server uses the private key to decrypt the correct secret key.

TLS handshake process

The TLS handshake process uses asymmetric encryption

  • Client Hello: The client sends a random value (Random1) and the required protocol and encryption method.
  • Server Hello and Certificate: The server receives the random value from the client, generates a random value (Random2) of its own, uses the corresponding method according to the protocol and encryption method required by the client, and sends its own certificate (if verification of the client certificate is required, this needs to be stated).
  • Certificate Verify: The client receives the server's certificate and verifies whether it is valid. If the verification is successful, a random value (Random3) will be generated, encrypted with the public key of the server's certificate and sent to the server. If the server needs to verify the client's certificate, the certificate will be attached.
  • Server generates secret: The server receives the encrypted random value and uses the private key to decrypt it to obtain the third random value (Random3). At this time, both ends have three random values, and can use these three random values ​​to generate a key according to the previously agreed encryption method. The subsequent communication can be encrypted and decrypted using this key.

HTTP Cache

Strong Cache

Strong caching is mainly determined by the two headers Cache-control and Expires.

The value of Expires and the value of the Date attribute in the header are used to determine whether the cache is still valid. Expires is a header field in the response message of the Web server. When responding to an http request, it tells the browser that the browser can directly retrieve data from the browser cache before the expiration time without requesting again. One disadvantage of Expires is that the expiration time returned is the server time, which is an absolute time. This poses a problem. If the client time differs greatly from the server time (for example, the clocks are not synchronized, or across time zones), then the error will be large.

Cache-Control specifies the validity period of the current resource, and controls whether the browser directly retrieves data from the browser cache or re-requests the server to retrieve data. However, it sets a relative time.

Specify the expiration time: max-age is the number of seconds from the time the request was initiated. For example, the following means that the strong cache can be hit within 31536000 seconds from the time the request was initiated.

 Cache - Control : max - age = 31536000

Indicates no cache.

 Cache - Control : no - store

There is a cache but it needs to be revalidated.

 Cache - Control : no - cache

Private and public caches.

public means that the response can be cached by any middleman (such as an intermediate proxy, CDN, etc.), while private means that the response is dedicated to a single user, the middleman cannot cache this response, and the response can only be applied to the browser's private cache.

 Cache - Control : private
Cache - Control : public

Authentication method: The following means that once a resource expires (for example, it has exceeded max-age), the cache cannot use the resource to respond to subsequent requests before successfully authenticating with the origin server.

 Cache - Control : must - revalidate

Cache-control has a higher priority than Expires.

The following is a Cache-Control strong cache process:

  • The first request is obtained directly from the server, and max-age=100 is set.
  • The second request, age=10, is less than 100, so it hits the cache and is returned directly.
  • The third request, age=110, is greater than 110. The strong cache is invalid, and the server needs to be requested again.

Negotiation Cache

  • If-Modified-Since——Last-Modified

Last-Modified indicates the date when the local file was last modified. The browser will add If-Modified-Since (the value of Last-Modified returned last time) to the request header to ask the server whether the resource has been updated after that date. If so, the new resource will be sent back.

However, if you open the cached file locally, the Last-Modified will be modified, so ETag appears in HTTP/1.1.

  • If-none-match——ETags

Etag is like a fingerprint. Any resource change will cause ETag to change, regardless of the last modification time. ETag can ensure that each resource is unique. The If-None-Match header will send the last returned Etag to the server, asking whether the Etag of the resource has been updated. If there is a change, a new resource will be sent back.

If-none-match and ETags have higher priority than If-Modified-Since and Last-Modified.

First request:

Second request to the same page:

Negotiate cache, if there is no change, return 304, if changed, return 200 Resource

  • 200: Strong cache. When Expires/Cache-Control expires, a new resource file is returned.
  • 200 (from cache): When both Expires/Cache-Control exist and the resource has not expired and Cache-Control takes precedence over Expires, the browser successfully obtains the resource from the local server.
  • 304 (Not Modified): When the negotiated cache Last-modified/Etag is not expired, the server returns status code 304.

Now 200 (from cache) has become disk cache (disk cache) and memory cache (memory cache)

Revving Technology

The above mentioned HTTP cache is related, but sometimes we need to update online resources after going online.

Web developers have developed a technique that Steve Souders calls revving. Files that are updated infrequently are named in a special way: a version number is appended to the URL (usually the file name).

Disadvantage: When the version number is updated, the version numbers of all places that reference these resources must be changed.

Web developers usually use automated build tools to complete these trivial tasks in actual work. When the low-frequency updated resources (js/css) change, only the entry point needs to be changed in the high-frequency changed resource file (html).

Cookies

HTTP Cookie (also called Web Cookie or Browser Cookie) is a small piece of data sent by the server to the user's browser and saved locally. It will be carried and sent to the server the next time the browser makes a request to the same server.

Creating cookies

Set-Cookie response header and Cookie request header.

 Set - Cookie : < cookie name >=< cookie value >

Session Cookies

Session cookies are the simplest cookies: they are automatically deleted after the browser is closed, which means they are only valid during the session. Session cookies do not need to specify an expiration time (Expires) or a validity period (Max-Age). It should be noted that some browsers provide a session recovery function. In this case, even if the browser is closed, the session cookie will be retained, as if the browser had never been closed.

Persistent Cookies

Unlike session cookies that expire when you close your browser, persistent cookies can specify a specific expiration time (Expires) or validity period (Max-Age).

 Set - Cookie : id = a3fWa ; Expires = Wed , 21 Oct 2015 07 : 28 : 00 GMT ;

Secure and HttpOnly Cookie Flags

Cookies marked as Secure should only be sent to the server via requests encrypted by the HTTPS protocol.

Cookies marked as Secure should only be sent to the server through requests encrypted by the HTTPS protocol. However, even if the Secure tag is set, sensitive information should not be transmitted through Cookies, because Cookies are inherently insecure and the Secure tag cannot provide real security guarantees.

Cookies with the HttpOnly flag are not accessible through the JavaScript Document.cookie API. This is done to prevent cross-site scripting attacks (XSS).

 Set - Cookie : id = a3fWa ; Expires = Wed , 21 Oct 2015 07 : 28 : 00 GMT ; Secure ; HttpOnly

Scope of Cookies

The Domain and Path tags define the scope of the cookie: that is, which URLs the cookie should be sent to.

Domain identifies which hosts are specified to accept cookies. If not specified, the default is the current host (not including subdomain). If Domain is specified, the subdomain is generally included.

For example, if Domain=mozilla.org is set, then cookies are also included in the subdomain (such as developer.mozilla.org).

The Path identifies which paths under the host can accept cookies (the URL path must exist in the request URL). Subpaths are matched with the character %x2F ("/") as the path separator.

For example, if Path=/docs is set, the following addresses will match:

 /docs
/docs/Web/
/docs/Web/HTTP

SameSite Cookies

SameSite Cookie allows the server to require a cookie not to be sent during cross-site requests, thus preventing cross-site request forgery attacks.

None browser will continue to send cookies under requests on the same site and cross-site requests, which are case-insensitive. [Before the default Chrome version of old versions of Chrome 80].

The Strict browser will only send cookies when visiting the same site.

Lax will be reserved for some cross-site sub-requests, such as image loading or frames calls, but will only be sent when the user navigates from an external site to the URL. For example, link link:

 Set-Cookie: key=value; SameSite=Strict

None Strict Lax

In the new version of browser (after Chrome 80), the default attribute of SameSite is SameSite=Lax. In other words, when the cookie does not set the SameSite attribute, it will be considered that the SameSite attribute is set to Lax - which means that cookies will not be automatically sent when the current user is used. If you want to specify that cookies are sent on the same site and cross-site requests, you need to explicitly specify SameSite as None. Because of this, we need to check whether the old system clearly specifies SameSite, and recommend that the new system clearly specifies SameSite to be compatible with the old and new versions of Chrome

For more cookie related, you can view an article about cookies that I have summarized before. Cookie knowledge summary for front-end instructions [7]

HTTP Access Control (CORS)

Cross-domain resource sharing (CORS) is a mechanism that uses additional HTTP headers to tell the browser that a web application running on an origin (domain) is allowed to access specified resources from different origin servers.

The cross-domain resource sharing standard has added a set of HTTP header fields, allowing the server to declare which source sites have permission to access which resources through the browser.

Simple request

A simple request (which will not trigger CORS preflight request) needs to meet the following three points at the same time:

  • The method is one of GET/HEAD/POST.
  • The value of Content-Type is only available as one of text/plain, multipart/form-data, application/x-www-form-urlencoded.
  • The HTTP header cannot exceed the following fields: Accept, Accept-Language, Content-LanguageContent-Type (additional restrictions need to be paid attention to) DPR, Downlink, Save-Data, Viewport-Width, Width.

The following is a simple request message and a response message:

Simplify the following:

The request header field Origin indicates that the request comes from http://foo.example.

In this example, the Access-Control-Allow-Origin: * returned by the server indicates that the resource can be accessed by any unavailable domain. If the server only allows access from http://foo.example, the content of the header field is as follows:

 Access - Control - Allow - Origin : http : // foo .example

Access-Control-Allow-Origin should be * or contain the domain name specified by the Origin header field.

Pre-check request

The specification requires that HTTP request methods that may have side effects on server data. The browser must first use the OPTIONS method to initiate a preflight request to know whether the server allows the cross-domain request.

The actual HTTP request will be initiated only after the server confirms that the permission is allowed. In the return of the preflight request, the server can also notify the client whether it is necessary to carry identity credentials (including cookies and HTTP authentication related data)

The following two header fields are carried in the pre-flight request:

 Access - Control - Request - Method : POST
Access - Control - Request - Headers : X - PINGOTHER , Content - Type

The header field Access-Control-Request-Method tells the server that the actual request will use the POST method. The header field Access-Control-Request-Headers tells the server that the actual request will carry two custom request header fields: X-PINGOTHER and Content-Type. The server decides whether the actual request is allowed.

The response to the preflight request includes the following fields

 Access - Control - Allow - Origin : http : // foo .example
// Indicates that the server allows the client to initiate requests using POST , GET and OPTIONS methods
Access - Control - Allow - Methods : POST , GET , OPTIONS
// Indicates that the server allows field X to be carried in the request - PINGOTHER and Content - Type
Access - Control - Allow - Headers : X - PINGOTHER , Content - Type
// Indicates that the valid time of the response is 86400 seconds, which is 24 hours. During the valid time, the browser does not need to initiate a preflight request again for the same request.
Access - Control - Max - Age : 86400

HTTP requests and responses Generally speaking, for cross-domain XMLHttpRequest or Fetch requests, the browser will not send credential information. If you want to send credential information, you need to set a special flag of XMLHttpRequest. For example, if the withCredentials flag of XMLHttpRequest is set to true, you can send cookies to the server.

For requests with credentials, the server must not set the value of Access-Control-Allow-Origin to "*". This is because the request is carried with cookie information. If the value of Access-Control-Allow-Origin is "*", the request will fail. If the value of Access-Control-Allow-Origin is "*", the request will be executed successfully.

The request and response headers involved in CORS are as follows: HTTP response header field

  • Access-Control-Allow-Origin Allows access to the resource's out-domain URI. For requests that do not require credentials, the server can specify that the value of this field is a wildcard, indicating that requests from all domains are allowed.
  • Access-Control-Expose-Headers header lets the server put the header that allows browser access to the whitelist
  • The Access-Control-Max-Age header specifies how long the result of the preflight request can be cached
  • The Access-Control-Allow-Credentials header specifies whether the browser is allowed to read the contents of the response when the browser's credentials is set to true.
  • The Access-Control-Allow-Methods header field is used to preflight the request's response. It specifies the HTTP method allowed to use for the actual request.
  • The Access-Control-Allow-Headers header field is used to preflight the response of the request. It specifies the header field allowed to be carried in the actual request.

HTTP request header field:

  • Origin The header field indicates the source site for the preflight request or actual request
  • The Access-Control-Request-Method first field is used to preflight the request. Its function is to tell the server the HTTP method used by the actual request.
  • The Access-Control-Request-Headers header field is used to preflight the request. Its function is to tell the server the header field carried by the actual request.

refer to

  • MDN[8]
  • The development of HTTP [9]
  • HTTP Overview[10]
  • Introduction to HTTP/2 [11]
  • Caching (II)——Browser caching mechanism: strong cache, negotiated cache [12]
  • (Suggested to read carefully) HTTP soul question to consolidate your HTTP knowledge system [13]

References

[1]RFC 2616: https://tools.ietf.org/html/rfc2616

[2]HTTP Headers collection: https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Headers

[3]RFC 1945: https://tools.ietf.org/html/rfc1945

[4]RFC 2068: https://tools.ietf.org/html/rfc2068

[5]rfc7540: https://httpwg.org/specs/rfc7540.html

[6] Website demo demo: https://http2.akamai.com/demo

[7] Cookie knowledge summary for front-end instructions: https://juejin.im/post/6844903841909964813

[8]MDN: https://developer.mozilla.org/zh-CN/docs/Web/HTTP

[9]HTTP development: https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Basics_of_HTTP/Evolution_of_HTTP

[10]HTTP Overview: https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Overview

[11]HTTP/2 Introduction: https://developers.google.com/web/fundamentals/performance/http2?hl=zh-cn

[12] Cache (II) - Browser caching mechanism: strong cache, negotiated cache: https://github.com/amandakelake/blog/issues/41

[13] (Suggested to read carefully) HTTP soul question to consolidate your HTTP knowledge system: https://juejin.im/post/6844904100035821575#heading-62

<<:  "Rehabilitate" the Ethernet all-optical network! China Construction Association publishes the first standardization document for smart park construction

>>:  Five API Gateway Technology Selections, yyds

Recommend

A brief history of the development of instant messaging (IM)

We are not unfamiliar with instant messaging (IM)...

Why is 50 ohms used in RF?

[[416676]] In RF circuits, RF devices with variou...

CAN bus: operating principle, advantages and disadvantages

The CAN bus was originally designed by Bosch in t...

How cloud services enable a 5G-driven future

As high-speed cellular networks become mainstream...

Interesting DHCP chat

[[386236]] In this article, we will talk about th...

F5 Cloud Native Keywords: Transformation, Construction, Integration

[51CTO.com original article] Cloud native is one ...

As we enter 2021, is the speed of 5G mobile phones faster or slower?

In China, 5G has blossomed in the past year. Not ...

If you still don’t understand HTTPS after reading this article, come to me!

As an aspiring programmer, it is necessary to und...