I have read some information about the working principle of HTTPS before, but I am still vague about the overall process and some details. I also had some questions before: "How is the certificate verified?", "What is the TLS handshake process like?", "How is the symmetric key calculated?", "How many random numbers are used to calculate the pre-master key?", etc. Based on these questions, it took me some time to gradually understand them. Based on my understanding, I have made a series of articles on HTTPS, hoping to help readers who have these questions.
This article is the first in a series. With some questions, we will gradually understand what these concepts, such as symmetric encryption, asymmetric encryption, digital certificates, key agreement, etc. are and what they can do, and unveil their mystery layer by layer. Potential Problems with Using HTTPIn HTTP, data is transmitted over the network in plain text, which can be easily stolen and attacked by a middleman. The data can be forged and then sent to the server. The server cannot determine whether the source of the data is accurate after receiving it. If you ask why we should use HTTPS, the simple answer is “HTTP is not secure” and cannot accurately guarantee the confidentiality, authenticity, and integrity of data. What is HTTPS protocolHTTPS is not a new protocol. It is an HTTP protocol built on the SSL/TLS transport layer security protocol, which is equivalent to HTTPS = HTTP + SSL/TLS. It can protect the integrity and confidentiality of data transmission between user computers and website servers. From the OSI model diagram, it can be seen that there is an additional SSL/TLS protocol directly between the application layer and the transport layer. The most important part here, SSL/TLS, is the key part of our study of HTTPS. As a secure encryption protocol, SSL/TLS provides us with a secure communication channel on top of an insecure infrastructure. The name SSL/TLS is sometimes confusing. Nowadays, when we talk about SSL/TLS, we generally refer to the TLS protocol. Let’s take a look at its development history. SSL/TLS Development HistorySSL is the abbreviation of secure socket layer. It was first developed by Netscape, and the first version of the protocol was never released. Since November 1994, the second version, SSL 2, was basically developed without consultation with security experts outside Netscape. This version was considered to have serious defects and eventually failed. After the failure of SSL2, Netscape focused on SSL 3 and completely redesigned the protocol. It was released in 1995. The SSL 3 protocol is still in use today, but it was later renamed TLS 1.0. Perhaps many people don't know this. In May 1996, the TLS working group was established and began to migrate SSL from Netscape to IETF. Due to the dispute between Netscape and Microsoft over the dominance of the Web, the entire migration work also went through a long process. In January 1999, the IETF organization standardized SSL and TLS 1.0 was released, the predecessor of which was SSL 3. TLS is short for transport layer security. In April 2006, TSL 1.1 was released, which fixed some key security issues and added protection against CBC attacks (implicit IV was replaced with explicit IV, and padding errors in block cipher mode were changed). In August 2008, TLS 1.2 was released, which mainly included: adding SHA-2 cryptographic hash function, AEAD encryption algorithm, TLS extension definition and AES cipher combination. In August 2018, TLS 1.3 was released, which made many changes to enhance security and improve performance. For example, in terms of security, unsafe or outdated algorithms such as MD5 and SHA-1 were removed, and only a few algorithms such as ECDHA and SHA-2 were retained. In terms of performance, the TLS handshake process was improved from the previous 2-RTT handshake to 1-RTT handshake and initially supported 0-RTT. Choose the appropriate encryption algorithmWhen we talk about https, we all know that it is secure because the data is encrypted during transmission. First, let's understand which encryption method it chooses to achieve this. Symmetric encryptionSymmetric encryption is a shared key algorithm. The client and the server share a key to encrypt data for transmission. If the key is only held by the communicating parties and cannot be leaked, security can be guaranteed. This is obviously not the case in the real world. For example, when a browser interacts with a server, the server transmits a shared key to the browser. How can we ensure that this key is not intercepted or tampered with during the transmission process? Asymmetric encryptionTo further improve the security factor, "asymmetric encryption" has emerged. Also known as "public key encryption", this algorithm has two asymmetric keys. Its characteristic is that only the corresponding private key can decrypt when the public key is used for encryption, and vice versa, only the corresponding public key can decrypt when the private key is used for encryption. Note that the private key is only visible to oneself, and the public key is exposed to the outside. Asymmetric encryption is more secure than symmetric encryption, but it requires more computing and is not suitable for scenarios with large amounts of data. It also does not work if the communication speed is not guaranteed. The TLS encryption algorithm does not fully adopt this encryption algorithm. Hybrid EncryptionThe so-called "strengths complement weaknesses", TLS combines asymmetric encryption and symmetric encryption in encryption algorithms, which we call the "hybrid encryption" algorithm. Asymmetric encryption is used for identity authentication and shared key negotiation, which is only used once. Symmetric keys are used for data transmission in subsequent communications. In addition, the process of exchanging public keys between the client and the server is still subject to eavesdropping. The classic example is the man-in-the-middle attack. Because the public key is visible during the transmission process, the middleman can play the role of the server to the client or the role of the client to the server, and can still tamper with the data, but the server cannot determine whether the source is reliable, and the problem still exists. Here is an example:
What should we do if this doesn't work? Using "hybrid encryption" here strikes a balance between security and performance. Using asymmetric encryption to exchange symmetric encryption keys has achieved the confidentiality we need. Now we have to solve the next question: How to ensure that the public key obtained by the browser is credible? Digital certificates solve trust issuesFor example, in the real world, when we go to a bank to do business, you say you are Zhang San at the counter. To handle business, the bank staff first needs you to show your ID to prove that you are the real Zhang San. The thing that can prove yourself is your "ID card", a document issued by an authoritative agency (the Public Security Bureau in the real world) that is recognized by everyone. The “Public Security Bureau” of the Internet WorldThen the Public Security Bureau in the Internet world is what we often call CA, Certificate Authority, and we also need to apply for a digital certificate for the website. A certificate is a digital certificate file that contains version, serial number, signature algorithm, issuer, validity period, public key, etc. Before using HTTPS, our website will apply for a digital certificate from the CA organization in advance and install it on its own server. After the browser initiates a request, the server can return the digital certificate to the browser. How can we ensure that the digital certificate is not modified during this process? The Public Security Bureau uses certain anti-counterfeiting technologies when issuing our ID cards. Similarly, the CA will digitally sign the certificate when issuing it to ensure the integrity of the certificate. image.png Digest AlgorithmA digest algorithm is a one-way encryption algorithm, also known as a "hash algorithm". No key is required when encrypting data, and the encrypted data cannot be reversed. It can encrypt a large file and then map it to a small file, just like extracting a summary from an article. However, if the original text changes, even if a punctuation mark is added or deleted, the result after re-encryption will be completely different. Currently, some commonly used summary algorithms (MD5, SHA-1) are considered to have security issues and have been removed in TLS 1.3. SHA-2, such as SHA256, is now recommended. The CA will perform a digest algorithm on the plaintext data to generate an irreversible decrypted hash value. This hash value cannot be transmitted in plaintext to prevent the middleman from modifying the digest algorithm after modifying the certificate. Digital SignatureDigital signature, this name also applies to the real world. For example, if I give you a certificate, the most effective way to prove that it is given to you by me is to sign or press your fingerprint, which cannot be forged. CA also has a pair of public and private keys. Combined with the hash value generated by the above digest algorithm, the CA private key plus this hash value is used to generate a digital signature, which can only be decrypted by the corresponding public key. Digital CertificatesCA integrates the digital signature and the information we applied for (server name, public key, host name, name and information of the authority, etc.), generates a digital certificate, and issues it to the server. Below is a screenshot of the domain name www.nodejs.red. With digital certificates, the client and server can use asymmetric keys to negotiate symmetric encryption keys for data encryption during interaction. Negotiate symmetric encryption keysCertificate VerificationWe open an HTTPS protocol website in the browser to initiate a request. After the TCP link is established, the TLS handshake protocol will be initiated, and then the server will return a series of messages, including certificate messages. There is a certificate trust chain problem in certificate verification. The certificates we apply for from CA are usually issued by intermediate certificate authorities. For example, for the domain name www.nodejs.red, you will see that its certificate issuer is "R3". It is a free certificate launched by Let's Encrypt on November 20, 2020. Through R3, we can find that its issuer is "ISRG Root X1". "ISRG Root X1" has no superior issuer and is now considered a root certificate. The figure below shows the certificate chain relationship of the domain name website www.nodejs.red. Some certificates of authoritative organizations are pre-installed in our operating system. The browser trusts the root certificate. If the root certificate is local, the public key of the root certificate "ISRG Root X1" is used to verify whether the intermediate certificate authority "ISRG Root X1" is trustworthy. If the verification passes, "ISRG Root X1" is used to verify whether the final entity certificate "www.nodejs.red" is trustworthy. If it passes, the certificate "www.nodejs.red" is considered to be trustworthy. Certificate verification is basically in this mode, and ultimately the locally installed root certificate must be found, and then reversed step-by-step verification is performed to confirm that the website issuer is credible. As shown in the figure below. If the certificate returned by the server is verified, the browser can obtain the plain text and signature information of the digital certificate and perform the following operations:
If the certificate information is tampered with, the signature cannot be changed without the certificate private key. After receiving the certificate, the client will compare the signature with the original information to see if it has been tampered with. Another question, hypothetical: "What if our certificate is replaced by a legitimate certificate by a hacker?" The domain name and other information of the certificate cannot be tampered with. Even if the hacker replaces it with his own legitimate certificate, because the domain name information is different, the problem can be found by comparing it when the browser requests it. There is no absolute security. If a hacker installs his own root certificate on your computer, he can issue fake certificates for any domain name. Therefore, if you encounter some untrusted files, it is better not to install them randomly to ensure the security of the root certificate. Calculating the encryption keyThe above browser initiates a request to the server, and the server returns a certificate. During this process, both parties will exchange two parameters, namely the client's random number and the server's random number, which are used to generate the master key. However, the generation of the master key also depends on a pre-master key. Different key exchange algorithms have different methods for generating pre-master keys. One key exchange algorithm is RSA, whose key exchange process is very simple. The client generates a pre-master key, which is a 46-byte random number. It is encrypted with the server's public key and sent to the server through a key exchange message. The server can then decrypt the pre-master key using the private key. The RSA-based key exchange algorithm is considered to have a serious vulnerability threat. Anyone who can access the private key (for example, due to politics, bribery, forced entry, etc.) can recover the pre-master key and then build the same master key. The final key leak can decrypt all previously recorded traffic. This key exchange algorithm is being replaced by other algorithms that support forward secrecy. For example, the ECDHE algorithm uses independent master keys for each link during key exchange. If a problem occurs, it only affects the current session and cannot be used to retroactively decrypt any other traffic. ECDHE is a temporary elliptic curve key exchange algorithm. The client and server will exchange two pieces of information, Server Params and Client Params, respectively. In each connection, a new pair of temporary public and private keys will be generated. Based on the ECDHE algorithm, the client and server can calculate the premaster secret respectively. At this time, the client and server have three random numbers: Client Random, Server Random, and Premaster Secret respectively. The master secret in TLS v1.2 is calculated by a pseudo-random function master_secret = PRF(pre_master_secret, "master secret", ClientHello.random + ServerHello.random). However, the master key is not the final session key. The final session key is generated by passing the master key, client random number, and server random number using the PRF pseudo-random function.
This final session key includes: symmetric encryption key (symmetric key), message authentication code key (mac key), initialization key (iv key, generated only when necessary) All of the above is done in the TLS handshake protocol. After the handshake is completed, the client/server establishes a secure communication tunnel and can send application data. HTTPS complete process diagramNegotiating symmetric encryption keys mainly involves the TLS handshake protocol. This process is very complicated, and there is still a lot of content that is not explained in detail at the end of this article. The figure below is a handshake interaction diagram drawn by the author. In the next article, the Wireshark tool will be used to capture network data packets for analysis, and a practical explanation will be given to gain a deeper understanding of the principles of HTTPS. |
<<: Deny 5G and believe in Starlink? IQ is a good thing
>>: LoRaWAN becomes an international standard, injecting new impetus into the development of LoRa
[[423700]] The State Council Information Office h...
[[345832]] "Read the Papers" is a serie...
With the extensive publicity of the media and the...
Within 48 hours, Apple, Qualcomm, and Intel, thre...
Traditional WANs can no longer keep up. In the br...
Last month, many friends who wanted to buy Tencen...
RAKsmart's December year-end promotion has be...
Although the ZTE incident has not yet reached a f...
Before the global outbreak of the coronavirus, ed...
[[278077]] Cisco is primarily known for its switc...
[[357291]] Preface First, let’s take a look at a ...
definition The purpose of streams is to use a uni...
[[344283]] This article is reprinted from the WeC...
CloudSilk.io is a Chinese hosting company founded...
Since I started working on Serverless tools, I of...