20 pictures to thoroughly understand the principles of HTTPS!

20 pictures to thoroughly understand the principles of HTTPS!

[[355627]]

Preface

In recent years, major companies have paid more and more attention to the secure transmission of information, and have gradually upgraded their websites to HTTPS. So do you know the principle of HTTPS? How does it ensure the secure transmission of information? There are many introductions to HTTPS on the Internet, but I find that there are always some omissions and incomplete explanations. Today I try to explain HTTPS from the shallow to the deep. I believe that everyone will be able to grasp the principle of HTTPS after reading it. The outline of this article is as follows:

1. Why HTTP is not secure

2. Four principles of secure communication

3. Brief description of HTTPS communication principle

  • Symmetric encryption
  • Digital Certificates
  • Asymmetric encryption
  • Digital Signature

4. Other HTTPS related issues

Why HTTP is not secure

HTTP is transmitted in plain text, so there are three main risks

1. Eavesdropping Risk

The middleman can obtain the communication content. Since the content is in plain text, there is a security risk after obtaining the plain text.

2. Risk of tampering

The middleman can tamper with the message content and send it to the other party, which is very risky.

3. Impersonation risk

For example, you think you are communicating with a certain treasure, but in fact you are communicating with a phishing website.

HTTPS obviously exists to solve these three risks. Next, let’s see what problems HTTPS solves.

Four principles of secure communication

After reading the previous section, it is not difficult to guess that HTTPS was created to address the above three risks. Generally, we believe that secure communication needs to include the following four principles: confidentiality, integrity, authentication, and non-repudiation.

  • Confidentiality: Data encryption eliminates the risk of eavesdropping, because even if the middleman eavesdrops, he cannot get the plain text because the data is encrypted.
  • Integrity: refers to the data not being tampered with during transmission, no more, no less, and remaining the same. If even a punctuation mark is changed in the middle, the receiver can recognize it and never judge that the received message is illegal.
  • Identity authentication: confirm the other party's true identity, that is, prove that "your mother is your mother", which solves the risk of impersonation. Users don't have to worry about visiting Taobao but communicating with a phishing website.
  • Non-repudiation: It means that the behavior that has occurred cannot be denied. For example, if Xiao Ming borrowed 1,000 yuan from Xiao Hong but did not write a receipt, or wrote a receipt but did not sign it, it would cause Xiao Hong to lose money.

Next, let's take a look at how HTTPS is implemented step by step to meet the above four principles of secure communication.

A brief introduction to HTTPS communication principles

Symmetric encryption: the ultimate form of encryption for HTTPS

Since HTTP is transmitted in plain text, why don't we just encrypt the message? Since encryption is required, we must negotiate the key between the two communicating parties. One way is for the two communicating parties to use the same key, that is, symmetric encryption to encrypt and decrypt the message.

As shown in the figure: both parties using symmetric encryption use the same key for encryption and decryption.

Symmetric encryption has the characteristics of fast encryption and decryption speed and high performance, and is also the encryption form ultimately adopted by HTTPS. However, there is a key issue here. The two parties in symmetric encryption communication must use the same key. How is this key negotiated? If the key is transmitted directly through messages, the subsequent communication is actually still naked, because the key can be intercepted or even replaced by the middleman, so that the middleman can use the intercepted key to decrypt the message, or even replace the key to achieve the purpose of tampering with the message.

Some people say that it is enough to encrypt the key, but if the other party wants to decrypt the key, they still have to transmit the encrypted key to the other party, which will still be intercepted by the middleman. It seems that direct transmission of the key cannot get rid of the Russian doll problem no matter what, and it is not feasible.

Asymmetric encryption: Solving the problem of one-way symmetric key transmission

From the analysis in the previous section, it is not possible to directly transmit the key from either end. Here we will look at another encryption method: asymmetric encryption.

Asymmetric encryption means that the encryption and decryption parties use different keys, one as a public key that can be made public, and the other as a private key that cannot be made public. The ciphertext encrypted by the public key can only be decrypted by the private key, and the content encrypted by the private key can only be decrypted by the public key.

Note: The term private key encryption is not rigorous. To be precise, private key encryption should be called private key signature, because the public key can decrypt the privately encrypted information, while the public key is public and can be obtained by anyone. Decrypting with the public key is called signature verification.

In this case, the server only needs to keep the private key safe and publish the public key to other clients. Other clients only need to encrypt the symmetric encryption key and send it to the server. In this way, since only the private key can decrypt the public key encryption, and only the server has the private key, it can ensure that the transmission from the client to the server is safe. After decryption, the server can get the symmetric encryption key. After exchanging keys, you can communicate using the symmetric encryption key.

But the question arises again, how does the server transmit the public key to the client securely? If the public key is transmitted directly, there is also the risk of being replaced by a middleman.

Digital certificates solve the trust problem of public key transmission

How to solve the problem of public key transmission? Let's find the answer from real-life scenarios. When employees join the company, the company generally requires them to provide proof of academic qualifications. Obviously, not just any book can be called academic qualifications. This academic qualification must be issued by a third-party authority (Certificate Authority, CA for short), namely the Ministry of Education. Similarly, the server can also apply for a certificate from the CA, attach the public key to the certificate, and then pass the certificate to the client. The site administrator applies for the certificate from the CA, and submits information such as the DNS host name when applying. The CA will generate a certificate based on this information.

In this way, when the client gets the certificate, it can obtain the public key on the certificate, and then use this public key to encrypt the symmetric encryption key and pass it to the server. It seems perfect, but here we need to consider two issues

Question 1: How to verify the authenticity of the certificate and how to prevent the certificate from being tampered with

Imagine the academic qualifications we mentioned above. How can an enterprise determine whether the academic qualification certificate you provide is authentic? The answer is to use the academic qualification number. After the enterprise obtains the certificate, it can use the academic qualification number to check on the China Higher Education Student Information and Career Center website to know the authenticity of the certificate. The academic qualification number is actually what we often call a digital signature, which can prevent certificate forgery.

Back to HTTPS, how is the digital signature of the certificate generated? A picture is worth a thousand words.

The steps are as follows: 1. First, use some digest algorithms (such as MD5) to generate a digest of the certificate plaintext (such as certificate serial number, DNS host name, etc.), and then encrypt (sign) the generated digest with the private key of a third-party authority.

A message digest is an algorithm that kneads input of any length to produce a pseudo-random input of fixed length. No matter how long the input message is, the length of the calculated message digest is always fixed. Generally speaking, as long as the content is different, the generated digest must be different (the probability of the same digest can be considered close to 0), so it is possible to verify whether the content has been tampered with.

Why do we need to generate a digest first and then encrypt it? Why can't we encrypt it directly?

Because using asymmetric encryption is very time-consuming, if the entire certificate content is encrypted to generate a signature, the client also needs to decrypt the signature to verify the signature. The certificate plaintext is long, so client verification takes a long time. If a summary is used, the long plaintext will be compressed into a much smaller fixed-length string, and client verification will be much faster.

2. After the client gets the certificate, it also uses the same digest algorithm to calculate the digest of the certificate plaintext. A comparison of the two can reveal whether the message has been tampered with. Why use the private key of a third-party authority (Certificate Authority, CA for short) to encrypt the digest? Because the digest algorithm is public, the middleman can replace the certificate plaintext, and then calculate the digest based on the digest algorithm on the certificate and replace the digest on the certificate! In this way, after the client gets the certificate and calculates the digest, it finds that it is the same, and mistakenly thinks that the certificate is legal and falls into the trap. Therefore, the CA's private key must be used to encrypt the digest and generate a signature. In this way, the client must use the CA's public key to decrypt the signature, and the digest obtained is a legal summary that has not been tampered with (private key signature, public key can decrypt)

After the server passes the certificate to the client, the client's signature verification process is as follows

In this case, since only the CA's public key can decrypt the signature, if the client receives a fake certificate, it cannot be decrypted using the CA's public key. If the client receives a real certificate but the content on the certificate has been tampered with and the digest comparison fails, the client will also determine that the certificate is illegal.

Careful people must have discovered the problem. How to securely transmit the CA public key to the client? If it is still transmitted from the server to the client, the risk of the public key being swapped cannot be solved. In fact, this public key exists on the CA certificate, and this certificate (also called the Root CA certificate) is trusted by the operating system and built into the operating system. It does not need to be transmitted. If you are using a Mac, you can open the keychain and check it out. You can see many built-in trusted certificates.

The server transmits the certificate issued by the CA. After receiving the certificate, the client uses the public key in the built-in CA certificate to decrypt the signature and verify the signature. This solves the risk of the public key being swapped during transmission.

Question 2: How to prevent certificates from being swapped

In fact, any site can apply for a certificate from a third-party authority, and middlemen are no exception.

Both normal sites and middlemen can apply for certificates from CA. Since the certificates obtained are all issued by CA, they are all legal. So, can the middleman replace the certificate sent by the normal site to the client with his own certificate during the transmission process? As shown below

The answer is no, because in addition to verifying the legitimacy of the certificate by means of signature verification, the client also needs to verify whether the domain name on the certificate is consistent with the domain name requested by itself. Although the middleman can replace the legitimate certificate applied for by the CA, the domain name in this certificate is inconsistent with the domain name requested by the client, and the client will consider it as not passed!

But the certificate swapping above gives us an idea. What idea? Think about it, since HTTPS is encrypted, why can "middlemen" like Charles capture plaintext packets? In fact, it is the method of certificate swapping. Think about it, what should we do before using Charles to capture HTTPS packets? Of course, we should install the Charles certificate.

This certificate contains Charles' public key, so Charles can replace the certificate sent by the server to the client with its own certificate. After the client gets it, it can use the Charles certificate you installed to verify the signature, etc. After the verification is passed, the public key in the Charles certificate will be used to encrypt the symmetric key. The whole process is as follows

From this we can see that the premise for middlemen like Charles to capture HTTPS packets is to trust their CA certificates, and then they can conceal the truth by replacing the certificates. Therefore, we must not trust third-party certificates casually to avoid security risks.

Other HTTPS related issues

What is two-way authentication?

In the above description, we only verified the legitimacy of the certificate transmitted by the server on the client side, but how does the server verify the legitimacy of the client? It still uses certificates. When we transfer money online, do we need to plug the U shield issued by the bank into the computer first? In fact, it is because the U shield has a built-in certificate. When communicating, the certificate is sent to the server, and communication can begin after the server verification is passed.

Voiceover: Identity authentication is only one of the functions of the USB shield. Other functions, such as encryption and decryption, are all performed in the USB shield, ensuring that the key will not appear in the memory.

What is a certificate trust chain?

As mentioned above, we can apply for certificates from CA, but there are only a few top CAs (Root CA) in the world. Many people apply for certificates from them every day, and they are too busy to handle it. What can they do? Imagine if everyone in a company goes to the CEO to do things, he will go crazy. What can he do? Authorize. He will give the power to the CTO, CFO, etc. In this way, you only need to appoint the CTO. If the CTO is too busy, continue to delegate authority.

Similarly, since the top CA is too busy, it can authorize the next level CA, and then the next level CA. In this way, we only need to apply for certificates from the first level/second level/third level CA. How to prove that these certificates are authorized by the Root CA? Smaller CAs can ask larger CAs to sign and authenticate. For example, the first level CA asks the Root CA to sign and authenticate, and the second level CA asks the first level CA to sign and authenticate. No one signs and authenticates the Root CA, so it can only prove itself. This certificate is called a "self-signed certificate" or "root certificate". We must trust it, otherwise the certificate trust chain will not work. (As we mentioned earlier, this root certificate is actually built into the operating system.)

Certificate trust chain

Now let's see how the client verifies the certificate if the site applies for a certificate issued by a secondary CA. In fact, the service not only passes the certificate to the secondary CA, but also passes the certificate trust chain to the client. The client will then verify the certificate as follows:

The browser uses the trusted root certificate (root public key) to parse the root certificate of the certificate chain to obtain the public key + summary signature of the primary certificate

Decrypt the first-level certificate with the public key of the first-level certificate, and obtain the public key and summary of the second-level certificate for signature verification.

Then use the public key of the secondary certificate to decrypt the secondary certificate sent by the server, obtain the server's public key and summary signature, and the verification process is over.

Summarize

I believe that after reading this article, you should have a clear understanding of the principle of HTTPS. HTTPS is nothing more than HTTP + SSL/TLS

The function of SSL/TLS is essentially the process of how to negotiate a secure symmetric encryption key and use this key for subsequent communications. With this question, I believe it is not difficult for you to understand the two confusing meanings of digital certificates and digital signatures. If you understand these, you will understand why HTTPS is encrypted, but tools like Charles can capture plaintext packets.

Shoulders of Giants

https://juejin.cn/post/6844903958863937550

https://showme.codes/2017-02-20/understand-https/

Geek time, a look at the HTTP protocol

https://zhuanlan.zhihu.com/p/67199487

This article is reprinted from the WeChat public account "Ma Hai", which can be followed through the following QR code. To reprint this article, please contact the Ma Hai public account.

<<:  Alibaba final interview: How to use UDP to implement TCP?

>>:  Humans will have a "third brain" in the future, and 5G will speed up everything

Recommend

Let's talk about the DHCP protocol

[[375124]] 01Introduction to DHCP Protocol DHCP (...

TmhHost Hong Kong CN2 high-defense server online and simple test

TmhHost recently launched the Hong Kong CN2 high-...

Issues that need to be resolved before NFV large-scale deployment

NFV is a key technology that enables network reco...

Expert opinion: AI is still very "weak", how can it compete with humans?

[51CTO.com original article] "I am neither a...

3 Reasons Your IoT Needs SD-WAN

We live in an era of fast-paced digital transform...

AI adds power, lossless network leads to the next stop

How efficient can network transmission be? Let...

How cloud services enable a 5G-driven future

As high-speed cellular networks become mainstream...