Don’t abuse HTTP cache anymore! Here’s a recommended best practice for cache settings!

Don’t abuse HTTP cache anymore! Here’s a recommended best practice for cache settings!

When setting up cache, everyone may consider it from a performance perspective, but if you are not careful or set it up improperly, cache may also have a negative impact on the security of our website and user privacy.

Straight to the point

As usual, I'll first state the recommended configuration, and then I'll go into detail later:

  • To prevent intermediary caching, it is recommended to set: Cache-Control: private
  • It is recommended to set an appropriate secondary cache key: If the response we request is related to the requested cookie, it is recommended to set: Vary: Cookie

So why are these two configurations recommended? What risks will our website face if they are not configured? Let me explain below.

Review of HTTP caching

When it comes to caching, you may quickly think of two caching methods and the corresponding request headers. Let's quickly review them.

Normally, our browser client will initiate a request to the server, and then the server will return the data response to the client.

However, a server may have to respond to requests from thousands of clients, many of which are duplicate requests, which puts a lot of pressure on the server.

Therefore, we usually perform some caching between the client and the server. For some repeated request data, if the previous response has been stored in the cache database, it will be directly retrieved from the cache if certain conditions are met, and will not reach the server.

Then, HTTP cache is generally divided into two types, strong cache and negotiated cache:

Strong Cache

Strong cache: if the cached data is not invalid, the client can directly use the cached data without interacting with the database.

Then, judging whether a request is invalid mainly depends on two HTTP Headers:

  • Expires: The cache expiration time of the data. When the next request is made, if the request time is less than the expiration time returned by the server, the cached data will be used directly.
  • Cache-Control: You can specify a max-age field to indicate that the cached content will become invalid after a certain period of time.

Negotiation Cache

Negotiated caching, as the name implies, requires a negotiation with the server. When the browser makes its first request, the server returns the cache identifier and data to the client, and the client backs up both to the cache database.

When requesting data again, the client sends the backup cache identifier to the server. The server makes a judgment based on the cache identifier. If the judgment is successful, it returns a 304 status code to inform the client that the request is successful and the cached data can be used.

The following two sets of HTTP Headers are mainly used to judge requests:

  • Last-Modified: A Response Header that the server uses to tell the browser when a resource was last modified when responding to a request.
  • if-Modified-Since: A Request Header. When you request the server again, this field is used to inform the server of the last modification time of the resource returned by the server during the last request.

The server will compare the received If-Modified-Since with the last modification time of the resource to determine whether to use the cache.

  • Etag: A Response Header that uniquely identifies the resource returned by the server
  • If-None-Match: A Request Header. When requesting the server again, this field is used to inform the server of the unique identifier of the client cached data.

The server compares the received If-None-Match with the unique identifier of the resource to determine whether to use the cache.

Common Misconceptions About Caching

The knowledge mentioned above is probably what everyone usually recites, but have you ever thought about a question seriously? Must the cached data we obtain be cached in the browser?

In fact, this is not the case: there are usually multiple levels of resource caches, some caches are dedicated to a single user, some caches are dedicated to multiple users. Some are controlled by the server, some are controlled by the user, and some are controlled by the intermediary layer.

  • Browser caches: Generally and exclusively for a single user, implemented in the browser client. They improve performance by avoiding fetching the same response multiple times.
  • Local proxy: It may be installed by the user or managed by an intermediary layer: such as the company's network layer or network provider. Local proxy usually caches a single response for multiple users, which constitutes a "public" cache.
  • Origin Server Cache/CDN. Controlled by the server, the goal of an origin server cache is to reduce the load on the origin server by caching the same response for multiple users. The goal of a CDN is similar, but it is distributed across the globe and then distributes it to the closest set of users to reduce latency.

In addition, we often use locally configured proxies that can cache HTTPS resources by configuring trusted certificates.

Spectre Vulnerability

So how can cache pose a threat to the security of our website and the privacy of our users? Let’s look at a very famous vulnerability: Spectre.

Attackers can exploit the Spectre vulnerability to read the memory of the operating system process, which means they can access unauthorized cross-domain data.

Especially when using some APIs that need to interact with computer hardware:

  • SharedArrayBuffer (required for WebAssembly Threads)
  • performance.measureMemory()
  • JS Self-Profiling API

To this end, browsers once disabled high-risk APIs such as SharedArrayBuffer.

Many friends are interested in the specific attack principle. How to access data without permission through several JavaScript APIs? Next time I will write a special article to talk about this.

How does caching affect Spectre?

So what does Spectre have to do with cache? We can understand it simply like this:

If we open a page that is subject to cross-domain restrictions normally, we will definitely not be able to obtain data. However, if our Cache-Control is set to Public, the data may be cached in a Public Cache (such as our local proxy cache).

Although we do not have permission to access this data, the data is stored in the cache database. Once the data has been stored, attackers can use the Spectre vulnerability to obtain the cache data.

So why can Spectre be used to gain unauthorized access to cached data? Let's take a simple example:

For example, we have a website whose login password is conardli. If an attacker wants to crack our password, assuming that our password must be composed of lowercase letters, then the attacker will need at least 26 to the 8th power to guess our password. This is a very large number and it is almost impossible to crack it successfully.

Suppose that our password is stored in a piece of memory that the attacker has no access to, and then the attacker uses a separate piece of memory to store all 26 English letters and sets this memory to be non-cacheable.

At this time, the attacker has crossed the boundary and accessed the storage area of ​​our password, and has accessed the letter c. However, due to permission issues, he will definitely not be able to access it and will be rejected by the computer.

However, even though it cannot be accessed, the letter c will be cached.

At this time, the attacker goes back to traverse the memory of his 26 letters and finds that the access speed of c has become faster...

So, the first digit of your password is c...

I will just briefly talk about it here. In the next article, I will talk specifically about the Spectre vulnerability, which is quite clever... If you are interested, please let me know in the comment section.

Recommended configuration for the website

Because of the above problems, we recommend the following two configurations for all important website data:

Disable Public Cache

Set Cache-Control: private, which can disable all public caches (such as proxies), which reduces the possibility of attackers accessing public memory across boundaries.

Note that the private value is not an independent value. For example, it can coexist with max-age, and its performance is not much different from public. Let's open Google's website and take a look:

Set the appropriate second-level cache key

By default, our browser cache uses URL and request method as cache key.

This means that if a website requires login, requests from different users will be cached in one memory because their request URLs and methods are the same.

This is obviously a bit problematic, and we can avoid this problem by setting Vary: Cookie.

When the user identity information changes, the cached memory will also change.

Of course, if your resource is a public CDN resource that everyone can access, then your cache can be set up casually. If your resource data is relatively sensitive, it is recommended to use the above two settings.

<<:  What did Chinese operators show the world at the Winter Olympics?

>>:  Let's talk about NAT protocol???

Recommend

Thoughts and insights on cloud resource orchestration

[[414382]] 1. Background On July 9, 2018, I joine...

5G development requires a long process

In terms of network construction scale, the numbe...

China Unicom experts: 5G should embrace AI from five aspects

5G uses large-scale antenna systems and ultra-den...

CC attack & TCP and UDP correct opening posture

introduction: 1: CC attack is normal business log...

The development trend of enterprise-level wireless coverage (WiFi6)

[[402903]] 1. Trends in enterprise-level wireless...

What is the first step that the Industrial Internet must take?

As we all know, Ethernet has become the most wide...

Powered by EMUI 9.1, Huawei Enjoy 10S brings users a brand new smart experience

In the era of information explosion, consumers ar...

CloudCone: $9.5/year-512MB/30GB/3TB@1Gbps/Los Angeles data center

Updated again, CloudCone's Christmas promotio...