Interview Frequent: Talk about everything about HTTP caching

Interview Frequent: Talk about everything about HTTP caching

Speed, speed, and speed. If a website wants to provide a good experience, it must be displayed at the fastest speed in the first place. If MySQL query is slow, add a layer of redis for caching. If the website resources are slow to load, how to do it? Use HTTP cache.

HTTP caching has been around since HTTP/1.0 in order to reduce server pressure and speed up web page response.

Target of cache operation

HTTP caches can only store responses to GET requests and are powerless against other types of requests.

History of Cache Development

HTTP/1.0 proposed the concept of cache, namely strong cache Expires and negotiated cache Last-Modified. Later, HTTP/1.1 had a better solution, namely strong cache Cache-Control and negotiated cache ETag.

Why don't Expires and Last-Modified apply?

Expires is the expiration time, but the problem is that this time point is the server time. If there is a difference between the client time and the server time, it will be inaccurate. So Cache-Control is used instead, which indicates the expiration time, which is unambiguous.

Last-Modified is the last modification time, and the unit of time it can perceive is seconds. That is to say, if it is changed multiple times within 1 second, although the content file has changed, the display is still the previous one, which may be inaccurate. Therefore, ETag is used to mark resources through content to determine whether the resources have changed.

The following table is helpful for comparison and understanding:

Version Strong Cache Negotiation Cache HTTP/1.0 Expires Last-Modified HTTP/1.1 Cache-Control ETag.

Comparison of the two major cache types

The previous article introduced the cache types under different versions. At that time, there was a sentence about strong cache and negotiated cache, but no specific introduction. Now let's talk about these two cache types.

Strong Cache

Cache-Control

  • HTTP/1.1.
  • The cache is controlled by the expiration time. There are many corresponding fields, such as max-age. For example, Cache-Control: max-age=3600 means that the cache time is 3600 seconds and expires.
  • Cache request instructions: Cache-Control: max-age= Cache-Control: max-stale[=] Cache-Control: min-fresh= Cache-control: no-cache Cache-control: no-store Cache-control: no-transform Cache-control: only-if-cached.
  • Cache response directives: Cache-control: must-revalidate Cache-control: no-cache Cache-control: no-store Cache-control: no-transform Cache-control: public Cache-control: private Cache-control: proxy-revalidate Cache-Control: max-age= Cache-control: s-maxage=.
  • The key points are: Cache-control: no-cache Skip the current strong cache and send an HTTP request (if there is a negotiated cache identifier, directly enter the negotiated cache stage) The meaning of no-cache is the same as max-age=0, that is, skip the strong cache and force refresh Cache-control: no-store Do not use cache (including negotiated cache) Cache-Control: public, max-age=31536000 Generally used to cache static resources public: The response can be cached by intermediate proxies, CDN, etc. private: Dedicated to personal cache, intermediate proxies, CDN, etc. can cache this response max-age: The unit is seconds.
  • For more instructions, refer to the instruction list.

Expires

  • HTTP/1.0.
  • Syntax: Expires: .
  • That is, the expiration time, which exists in the response header returned by the server. Expires: Mon, 11 Apr 2022 06:57:18 GMT means that the resource will expire at 6:57 on April 11, 2022. If it expires, a request will be sent to the server.
  • If the "max-age" or "s-max-age" directive is set in the Cache-Control response header, the Expires header will be ignored.
  • Disadvantage: The server time and browser time may not be consistent.
  • For more instructions, refer to the instruction list.

Cache-Control VS Expires

  • Cache-Control is more precise than Expires.
  • When both exist, Cache-Control takes precedence over Expires.
  • Expires was proposed by HTTP/1.0 and has better browser compatibility. Cache-Control was proposed by HTTP/1.1 and can exist at the same time. When there is a browser that does not support Cache-Control, Expires will prevail.

Negotiation Cache

Negotiation cache needs to be used in conjunction with strong cache. The prerequisite for using negotiation cache is to set strong cache. Set Cache-Control: no-cache or pragma: no-cache or max-age=0 to tell the browser not to use strong cache.

Pragma is a field in HTTP/1.0 that prohibits web page caching. Its value no-cache has the same effect as no-cache in Cache-Control.

ETag/If-None-Match

  • HTTP/1.1.
  • That is, a unique identifier for the file is generated to determine whether it is expired. This value will change as long as the content changes.
  • In conjunction with If-None-Match, ETag is a unique identifier returned to each resource file after requesting the server. The client will store this identifier in the client (i.e., the browser), and will include its value in the If-Nono-Match of the request header during the next request. The server determines whether If-None-Match is consistent with the ETag on its own server. If they are consistent, it returns 304 and redirects to use the local cache; if they are inconsistent, it returns 200 and returns the latest resource to the client with ETag.
  • For more instructions, refer to the instruction list.

Last-Modified/If-Modified-Since

  • HTTP/1.0.
  • Last modification time, that is, to determine whether it is expired by the last modification time. After the browser sends a request to the server for the first time, the server will add this field to the response header.
  • In conjunction with If-Modified-Since, when the client accesses server resources, the server will put Last-Modified in the response header, that is, the last modification time of the resource on the server. The client caches this value. When requesting this resource next time, the browser will detect Last-Modified in the request header and add If-Modified-Since. If the value of If-Modified-Since is consistent with the last modification time of the resource on the server, it will return 304 and redirect to use the local cache; if they are inconsistent, it will return 200 and return the latest resource to the client with Last-Modified.
  • Disadvantages: Although the file is modified, the final content does not change, so the file modification time will still be updated. Some files are modified within seconds, so recording with a granularity of seconds is not applicable. Some servers cannot accurately obtain the last modification time of the file.
  • For more instructions, refer to the instruction list.

ETag VS Last-Modified

  • Precision ETag > Last-Modified. ETag uses the content to identify resources to determine whether the resource has changed, but Last-Modified is different. The accuracy will fail in some scenarios. For example, if a file is edited but the file content remains unchanged, the cache will fail; or if it changes multiple times within 1 second, the unit time that Last-Modified can perceive is seconds.
  • Performance Last-Modified > ETag. Last-Modified only records a time point, while ETag needs to generate a hash value based on the specific content of the file.
  • If both are supported, the server will give priority to ETag.

Negotiating cache conditional requests

As mentioned earlier, negotiation caching is to add If-None-Match or If-Modified-Since to the request header. What are these request headers and what is the use of adding them?

Strong caching controls the cache by expiration or expiration time. This poses a problem. If some files are modified, the browser will still display the original data because of strong caching. Therefore, strong caching cannot be used as a caching strategy for data that changes frequently. Therefore, there is negotiated caching, which tells the browser that the cache is invalid through file changes. Before using it, it is necessary to verify whether it is the latest version on the server.

In this way, the browser will send two requests in succession to verify:

  • First, a HEAD request is made to obtain the resource's modification time, hash value and other meta-information, and then it is compared with the cached data. If there is no change, the cache is used.
  • Otherwise, send another GET request to obtain the latest version.

However, the network cost of making two such requests is too high, so the HTTP protocol defines a series of conditional request fields starting with "If", which are specifically used to check whether the verification resource is expired, and combine the two requests into one request. The responsibility for verification is also given to the server.

  • If-Modified-Since: Compare with Last-modified to see whether it has been modified.
  • If-None-Match: Compare with ETag to check whether the unique identifier is consistent.
  • If-Unmodified-Since: Compared with Last-modified, whether it has been modified.
  • If-Match: Compare with ETag to see if it matches.
  • If-Range .

Among them, the most common ones are If-Modified-Since and If-None-Match. They correspond to Last-Modified and ETag respectively. The first response message needs to provide Last-Modified and ETag in advance, and then the second request can bring the original address in the cache to verify whether the resource is the latest.

If the resource has not changed, the server responds with a 304 Not Modified message, indicating that the cache is still valid. The browser can then update the expiration date and use the cache.

Caching process

When to use strong caching and when to use negotiated caching?

First, the weight of strong cache is greater than that of negotiation cache. When strong cache exists, negotiation cache can only be used. Secondly, the cache identifier in HTTP/1.1 is greater than HTTP/1. So when Cache-Control exists, look at it. If it does not exist, look at Expires. If the strong cache is set to Cache-Control: no-cache, Cache-Control: max-age=0, pragma: no-cache, it tells the browser not to use strong cache, then negotiation cache is used.

Determine whether there is ETag in the last response. If yes, initiate a request with the conditional request If-None-Match in the request header. If no, determine whether there is Last-Modified in the last response. If yes, initiate a conditional request with If-Modified-Since in the request header. If no, it means that there is no negotiated cache, and you can initiate an HTTP request. Whether it is a request with If-None-Match or a request with If-Modified-Since, the status will be returned (the server determines whether the resource has changed). If it is 304, it means that the cached resource has not changed, and the local cache is used; if it is 200, it means that the resource has changed, and an HTTP request is initiated, and the ETag/Last-Modified in the response header is remembered.

The general flow chart is as follows:

Cache judgment flow chart

So which resources should use strong caching and which resources should use negotiated caching?

It is not difficult to understand that resources such as static resources that we will not change for a long time should use strong caching; while files that we often modify should use negotiated caching. If the resource has not changed, then when the user enters for the second time, the resource will still be used. If the resource is modified, the user enters and initiates an HTTP request to obtain the latest resource.

When we visit a website, if we pay attention, we can observe something in F12. As shown in the figure, my five-year front-end and three-year interviews are placed on the github server. When I enter Network in F12, I can see the information in the return header. Cache-Control, Expires, ETag, and Last-Modified all exist.

Five years of front-end experience and three years of interviews

Cache location

It is often mentioned above that whether using strong cache or negotiated cache, it will be obtained from the browser local storage, so where is the browser's local storage stored, and what are their categories?

According to the cache location classification, it is divided into four categories: Memory Cache, Disk Cache, Service Worker, and Push Cache.

Memory Cache

Because memory is limited, not all resource files are cached in memory. It is mainly used to cache resources with preloader related instructions. For example, the preloader can parse js/css files while requesting the next resource from the network.

Disk Cache

Cache on disk. Among all browser caches, disk cache has the largest coverage. It determines which resources need to be cached and which resources have expired and need to be requested from the server again based on the fields in the HTTP Header.

Service Worker

Independent thread, borrowed from the idea of ​​Web Worker. That is, let JS run outside the main thread. Because it is separated from the browser window, it cannot directly access the DOM, but it can still do many things, such as

  • Offline cache, Service Worker Cache.
  • Message push.
  • Network proxy.
  • It is an important implementation mechanism of PWA.

Push Cache

That is push cache, the last line of defense in the browser, content in HTTP2.

Priority: Service Worker-->Memory Cache-->Disk Cache-->Push Cache.

practice

After talking about so much theoretical knowledge, I am still confused when it comes to actual combat. What should I do?

The above are all verbal debates, only practice can reveal the truth (the above are all interview debates, only practice can reveal the truth).

Currently, front-end projects are all packaged with webpack or webpack-like tool libraries. By configuring the hash in webpack, the front-end caching work is completed.

The effect we want to achieve is:

  • HTML: Negotiate caching.
  • CSS, JS, images and other resources: strong caching, with hash in the file name.

There are three types of hashes in webpack: hash, chunkHash, and contentHash.

  • Hash: related to the construction of the entire project. As long as the project file is changed, the hash value of the entire project construction will change.
  • chunkHash: It is related to the chunk packaged by webpack. Different entries will generate different chunkHash values.
  • contentHash: defines the hash based on the file content. If the file content remains unchanged, the contentHash remains unchanged.

Here, CSS needs to be processed with contentHash, and other resources need to be processed with chunkHash.

Non-front-end engineering projects

That is, traditional front-end pages are generally placed in static servers, so it is necessary to perform version control on the modified files, such as adding a version number (index-v2.min.js) or a timestamp (time=1626226) to the entry file index.js as a caching strategy.

Backend caching practice

The real caching function is in the backend, which sets the caching policy and tells the browser whether to cache. Here we will make a demo of strong caching and negotiated caching to experiment.

Strong caching solution

The code is as follows:

 const express = require ( 'express' );
const app = express ();
var options = {
etag : false , // disable negotiation cache
lastModified : false , // disable negotiation cache
setHeaders : ( res , path , stat ) => {
res.set ( 'Cache-Control' , 'max-age=10' ); // Strong cache timeout is 10 seconds
},
};
app.use ( express.static (( __dirname + '/public' ), options ));
app.listen ( 3008 );

PS: The code comes from: Illustrated HTTP Cache. When doing tests, please note that under strong caching, refreshing the page cannot be tested, and it will only be effective after clicking Return.

Strong cache effect

Negotiating caching solutions

The code is as follows:

 const express = require ( 'express' );
const app = express ();
var options = {
etag : true , // Enable negotiation cache
lastModified : true , // Enable negotiation cache
setHeaders : ( res , path , stat ) => {
res.set ({
'Cache-Control' : 'max-age=00' , // The browser does not enforce caching
'Pragma' : 'no-cache' , // The browser does not use strong cache
});
},
};
app.use ( express.static (( __dirname + '/public' ), options ));
app.listen ( 3001 );

The effect is as follows:

Negotiating cache effects

Summarize

Why HTTP needs caching? To share server pressure and to make page loading faster.

What means are there? HTTP strong cache and negotiated cache. Strong cache is used for resources that do not change much (such as imported libraries, js, css, etc.), and negotiated cache is suitable for frequently updated files (such as html).

What is strong cache? In HTTP/1.0, it is based on Expires, but it is not accurate. After the HTTP protocol was upgraded to 1.1, a new identifier Cache-Control was used to replace it. However, the two can exist at the same time, and Cache-Control has a greater weight.

What is negotiated cache? In HTTP/1.0, it is based on Last-Modified, which is the last expiration modification time. It is also inaccurate. After HTTP is upgraded to 1.1, it is replaced by a new identifier ETag. The two can exist at the same time, and the latter has a greater weight.

Both Expires and Last-Modified are based on time points. Theoretically, there should be no problems, but problems did occur, so a new solution was proposed.

When strong cache exists, the browser will use strong cache identifier to cache. When the strong cache is set to invalid, the browser will use negotiated cache as the cache strategy.

The above is what I understand about HTTP caching.

<<:  A pitfall when using HTTP Client that you must avoid

>>:  From East-West Computing to Computing-Network Integration, Mobile Cloud Empowers New Digital Development

Recommend

Unveiling the mystery of MPLS, do you know all this?

Before formally learning MPLS, let us review the ...

Driving innovation and unleashing the unlimited potential of fiber optic LAN

It’s no secret that there are hard limits associa...

Application and development of machine learning tools in data centers

At the beginning of the Internet, data centers we...

How does private 5G impact Industry 4.0 transformation?

Private 5G networks have become very popular as r...