We won’t talk about HTTP and HTTPS first. Let’s start with a chat software. We want to enable A to send a hello message to B: If we want to implement this chat software, this article only considers security issues. We want to implement a hello message packet sent by A to B. Even if it is intercepted by a middleman, the content of the message cannot be known. How to be truly safe? When it comes to this question, many people immediately think of various encryption algorithms, such as symmetric encryption, asymmetric encryption, DES, RSA, XX, crackling ~ And I want to say that encryption algorithms are just solutions. The first thing we need to do is to understand our problem domain - what is security? My personal understanding is: The content of the communication between A and B can only be seen by A and B. OK, the problem domain has been defined (of course there is more than one definition in reality). As for the solution, it is easy to think of encrypting the message. Off topic, but is this the only way? I don’t think so. Maybe in the future there will be a material that breaks the current communication assumptions in the world and achieves true confidentiality. For a simple communication model like A and B, it is easy to make a choice: This is a symmetric encryption algorithm, where the key S in the figure plays the role of both encryption and decryption. The specific details are beyond the scope of this article. As long as the key S is not disclosed to a third party and is sufficiently secure, we have solved the problem domain we set at the beginning, because there are only A and B in the world who know how to encrypt and decrypt messages between them. However, in the WWW environment, the communication model of our Web server is not so simple: If the server uses the same symmetric encryption algorithm for all client communications, it is equivalent to no encryption. So what should we do? Can we use a symmetric encryption algorithm without disclosing the key? Please think about it for 21 seconds. The answer is: the web server uses a different symmetric encryption algorithm with each client: How to determine the symmetric encryption algorithm Wait, another question comes up. How does our server tell the client which symmetric encryption algorithm to use? Of course, through negotiation. However, your negotiation process is not encrypted and can still be intercepted by the middleman. So we can symmetric encrypt the negotiation process, but the encryption of the negotiation process is still not encrypted, what should we do? Isn't it better to encrypt it again? Well, it's a chicken-and-egg problem. How to encrypt the negotiation process A new question arises: how to encrypt the negotiation process? In the field of cryptography, there is an encryption algorithm called "asymmetric encryption", which is characterized by the fact that the ciphertext encrypted by the private key can be decrypted by the public key, but the ciphertext encrypted by the public key can only be decrypted by the private key. Only one person has the private key, while the public key can be sent to everyone. Although the direction from the server to A, B, etc. is still unsafe, at least A and B are safe toward the server. Well, we have solved the problem of how to negotiate the encryption algorithm: use an asymmetric encryption algorithm to conduct the symmetric encryption algorithm negotiation process. Now, do you understand why HTTPS requires both symmetric and asymmetric encryption algorithms? Negotiate what encryption algorithm How can we achieve that the Web server uses a different symmetric encryption algorithm for each client, and at the same time, we cannot let a third party know what the symmetric encryption algorithm is? Using random numbers means using random numbers to generate symmetric encryption algorithms. In this way, each interaction between the server and the client can use a new encryption algorithm, and the encryption algorithm is only determined at the time of interaction. Now you understand why there are so many random numbers in the HTTPS protocol handshake phase. How to get the public key? Careful people may have noticed that if an asymmetric encryption algorithm is used, our clients A and B need to hold the public key from the beginning, otherwise they will not be able to perform encryption. Now, we have a new problem: how can clients A and B obtain the public key securely? The only options I can think of are: Solution 1. The server sends the public key to each client Solution 2: The server puts the public key on a remote server, and the client can request it We choose solution 1 because solution 2 requires one more request and also needs to deal with the placement of the public key. What if the public key is swapped? Is this another chicken-and-egg problem? But there is a problem with solution 1: What if the server sends the public key to the client and it is swapped by a middleman? I drew a picture to help you understand: Obviously, it is unrealistic to have every browser on every client save the public keys of all websites by default. Using a third-party public key to solve the chicken-and-egg problem The problem of public key being swapped occurs because our client cannot tell whether the person who returns the public key is the middleman or the real server. This is actually the identity authentication problem mentioned in cryptography. If you were asked to solve the problem, how would you solve it? If you know about HTTPS, you will know that digital certificates can be used to solve the problem. But have you ever thought about the essence of certificates? Please put aside your existing knowledge about HTTPS and try to find a solution yourself. This is how I solved it. Since the server needs to pass the public key to the client, the process itself is not secure, so why don't we encrypt the process itself again? But, do you use symmetric encryption or asymmetric encryption? Now I feel like I'm in the chicken-and-egg problem again. The difficulty of the problem is that if we choose to pass the public key directly to the client, we will never be able to solve the problem of the public key being swapped by a middleman. Therefore, we cannot directly pass the server's public key to the client. Instead, a third-party organization uses its private key to encrypt our public key and then passes it to the client. The client then uses the third-party organization's public key to decrypt it. The following figure is the *** version of the "digital certificate" we designed. The certificate only contains the public key that the server handed over to the third-party organization, and this public key is encrypted by the private key of the third-party organization: If it can be decrypted, it means that the public key has not been swapped by the middleman. Because if the middleman uses his own private key to encrypt something and sends it to the client, the client cannot use the third-party public key to decrypt it. At this point, I thought the problem was solved. But in reality, HTTPS also has a concept of digital signature, and I can't understand why it was designed. It turns out that I missed a scenario: the third-party organization cannot issue a certificate only for your company. It may also issue a certificate to a company with bad intentions like the middleman. In this case, the middleman has the opportunity to switch your certificate. In this case, the client cannot tell whether it is your certificate or the middleman's. Because both the middleman and your certificate can be decrypted using the third-party organization's public key. Like this: Situation where a third-party agency issues certificates to multiple companies: The client can decrypt all certificates issued by the same third party: This ultimately allows other middlemen holding certificates from the same third-party organization to perform the swap: Digital signatures solve the problem of different certificates issued by the same organization being tampered with To solve this problem, we must first think clearly about one question: where should we place the responsibility of distinguishing different certificates under the same institution? It can only be placed on the client. This means that after the client obtains the certificate, it has the ability to determine whether the certificate has been tampered with. How can this ability be achieved? We seek inspiration from reality. For example, if you are an HR, you have a candidate's academic certificate in your hand. The certificate lists the certificate holder, issuing agency, issuing time, etc. At the same time, the certificate also lists the most important thing: the certificate number! How can we identify the authenticity of this certificate? Just take the certificate number to the relevant agency to check. If the certificate holder is the same as the candidate in reality, and the certificate number also matches, then it means that the certificate is authentic. Can our client adopt this mechanism? Like this: But where is this "third party"? Is it a remote service? Impossible? If it is a remote service, the entire interaction will be slow. Therefore, the verification function of this third party can only be placed locally on the client. How does the client verify the certificate locally? How does the client verify the certificate locally? The answer is that the certificate itself tells the client how to verify the authenticity of the certificate. That is, the certificate states how to generate a certificate number based on the content of the certificate. After the client receives the certificate, it generates a certificate number based on the method on the certificate. If the generated certificate number is the same as the certificate number on the certificate, then the certificate is authentic. At the same time, in order to prevent the certificate number itself from being replaced, it is encrypted using a third-party private key. This is a bit abstract, let's take a look at a picture to help understand: The production of the certificate is shown in the figure. The "number generation method MD5" in the certificate tells the client: you can get a certificate number by evaluating the content of the certificate using MD5. After the client gets the certificate, it starts to verify the contents of the certificate. If the certificate number calculated by the client is the same as the certificate number in the certificate, the verification is successful: But how did the public key of the third-party organization end up on the client's machine? There are so many machines in the world. In reality, browsers and operating systems maintain a list of authoritative third-party organizations (including their public keys). Because the issuing authority is written in the certificate received by the client, the client looks for the corresponding public key locally based on the value of the issuing authority. Off topic: If the defense line of the browser and operating system is breached, there is nothing you can do. Thinking back to the unconventional XP system I installed back then, I am scared. At this point, I believe everyone already knows what was said above. The certificate is the digital certificate in HTTPS, the certificate number is the digital signature, and the third-party agency refers to the digital certificate issuing agency (CA). How does CA issue digital certificates to servers? When I heard this question, I mistakenly thought that our server needed to send a network request to the CA department's server to get the certificate. Is it my comprehension problem or... In fact, the question should be how the CA issues it to our website administrator, and how our administrator puts this digital certificate on our server. How do we apply to a CA? Every CA organization is similar. I found one online: After getting the certificate, we can configure it on our server. How to configure it? Here are the details, which we can leave to Google. Maybe we need to sort out our thoughts We try to restore the design process of HTTPS by extrapolation. In this way, we can understand why HTTPS has so many more interactions than HTTP, why HTTPS has poor performance, and find the performance optimization points of HTTPS. All of the above work is to allow the client and the server to securely negotiate a symmetric encryption algorithm. This is the main work of the SSL/TLS protocol in HTTPS. The rest is that both parties use this symmetric encryption algorithm for encryption and decryption during communication. The following is a real interaction diagram of the HTTPS protocol (copied from the Internet, I forgot where I got it from, please let me know if there is any infringement): Can you summarize HTTPS in one sentence? The answer is no, because HTTPS itself is too complicated. But I still try to summarize HTTPS in one sentence: HTTPS must use a symmetric encryption algorithm to ensure the security of the communication process between the client and the server. However, the process of negotiating the symmetric encryption algorithm requires the use of an asymmetric encryption algorithm to ensure security. However, the process of directly using asymmetric encryption itself is not safe, and there is a possibility that the middleman will tamper with the public key. Therefore, the client and the server do not directly use the public key, but use the certificate issued by the digital certificate issuing agency to ensure the security of the asymmetric encryption process itself. In this way, a symmetric encryption algorithm is negotiated through these mechanisms, and both parties use the algorithm for encryption and decryption. This solves the communication security problem between the client and the server. It’s a long paragraph. postscript The above is a self-constructive opinion that I made up to understand HTTPS. At most, it can only be regarded as a popular science article about HTTPS. If there are any errors, please point them out. Thank you very much. So why do I think it’s easier to understand HTTPS this way? My personal answer is: when you cook for your family once, you will understand the difficulty of your mother cooking every day. |
I searched the blog and found that information ab...
Passive Optical Network (PON) technology has beco...
UFOVPS is currently carrying out a spring promoti...
In the third and fourth quarters of 2021, China T...
[51CTO.com original article] At the beginning of ...
Black Friday is still a long way off, but some bu...
Recently, a piece of news circulated on the Inter...
RackNerd is a foreign VPS hosting company founded...
[[350246]] This article is reprinted from the WeC...
Although work has not yet been fully resumed, the...
[[423739]] 1. Background and Architecture We all ...
I received a message from Justhost.ru, hoping to ...
According to various sources, the conditions for ...
Intime, Mixc, Impression City and Alipay jointly ...
【51CTO.com Quick Translation】 Since its birth, th...