For a long time, the laboratory and even the entire company have emphasized "security and privacy". In the past six months, when I used GoLang to develop network communication projects for intelligent edge computing devices, I was often asked to pay attention to "communication security and privacy". During this period, I connected with multiple partners, some of whom required "preventing domain name hijacking on the public network", some required "client reports must carry 'certificates', which is safer", and some required that in addition to using HTTPS, secondary hashing, digests, encryption, etc. should be performed in the business logic. Since the author's understanding of HTTPS only stays at the level of "HTTPS is more secure than HTTP", he often encountered many communication-related problems in project development and was often helpless, so he calmed down and studied HTTPS seriously and recorded it in this article. 1. Three Questions about HTTPS When it comes to HTTPS, everyone in IT or even those who don’t work in IT knows that “HTTPS is more secure than HTTP.” Therefore, when developing a project involving network transmission, there is always a requirement to “use HTTPS.” 1. What is HTTPS? Wikipedia explains HTTPS as follows:
The key point is: HTTPS = HTTP over SSL/TLS, that is, HTTPS has an extra layer of SSL/TLS between the transport layer TCP and the application layer HTTP. It can be seen that TLS/SSL is the core of HTTPS! So, what is this TLS/SSL? The article How to use SSL/TLS to Secure Your Communications: The Basics points out:
Borrowing the picture in the article, you can intuitively feel that SSL and TLS are both encryption protocols. SSL, the full name of Secure Socket Layer, was first proposed by Netscape in 1994. Version 1.0; TLS, the full name of Transport Layer Security, was improved based on SSL version 3.0 in 1999. The official recommendation is to abandon SSL and retain and adopt TLS, but due to historical reasons, SSL still exists, and many people are already accustomed to the term SSL, so now it is simply called SSL/TLS. 2. Why HTTPS? There must be many students who think without thinking: "Of course HTTP is not secure, HTTPS is secure, so choose HTTPS!" So, what is HTTPS better than HTTP? Wikipedia explains HTTP as follows:
The HTTP protocol is an application layer protocol for transmitting web hypertext (text, images, multimedia resources) and regulating the methods for clients and servers to request resources from each other. The first HTTP version 0.9 was launched in 1989, and HTTP 1.1, released in 1999, is still the widely used version today (2020). However, this HTTP 1.1 version has a big problem - plaintext transmission (Plaintext/Clear Text). This problem is fatal in today's Internet age. Once the data is intercepted by a third party in the public network, the communication content can be easily stolen. Therefore, HTTPS came into being, and its three recognized advantages are:
To be long-winded: Point 1 does solve the problem of HTTP plain text transmission; as for points 2 and 3, some other application layer protocols will also encounter them (the server being impersonated and the data being tampered with are "common" problems in network transmission). As an aside: HTTP 2, launched in 2015, is a major improvement over HTTP 1.1, one of which is HTTPS. 3. How to do HTTPS? This article will explain the three advantages of HTTPS, namely:
Finally, for the completeness of the entire article, the following contents will be added:
2. Data encryption: symmetric encryption and asymmetric encryption of HTTPS I believe many students will say, "What's so good about symmetric encryption and asymmetric encryption? The former has only one key for encryption and decryption; the latter has two keys, a public key and a private key, which are used for mutual encryption and decryption. The public key is given to the other party, and the private key is used by oneself. HTTPS has both. Okay, this chapter can end." Three questions:
Okay, let’s take these questions one at a time. Question 1: Why does HTTPS have both symmetric and asymmetric encryption? It is assumed that you already know about symmetric encryption and asymmetric encryption (just understand the basic principles). For those who are not clear about it, I recommend reading the Zhihu article "Symmetric Encryption and Asymmetric Encryption". The article finally points out the advantages and disadvantages of these two encryption methods. The original text is as follows:
So is there a solution? Yes, the article then says:
Indeed, that’s what HTTPS did (in the beginning)! The idea is roughly as follows:
This design of HTTPS takes into account both security and efficiency. I admire the wisdom of the pioneers! Question 2: How is the HTTPS symmetric encryption key SK generated and transmitted? Through the first question, we know that HTTPS is divided into 2 processes:
Process 2 Data communication stage: The sender first uses the key SK to symmetrically encrypt the communication content, and then transmits it through the network; after receiving the data, the other end uses SK to decrypt the data first, and then obtains the communication content. Here, the author has a question that has not been verified: During the data transmission process, will the communication data be hashed to ensure that it is not tampered with? Let's just record it. Process 1 TLS handshake phase: negotiate the key SK. There are many articles on this topic online, but some are outdated and some are incomplete. Regarding the negotiation of the key SK, I recommend "Learning about HTTPS and SSL/TLS Protocols [3]: Key Exchange (Key Negotiation) Algorithms and Principles" and "Key Negotiation Mechanism". This article directly throws out the conclusion that there are many ways to negotiate the symmetric encryption key SK for HTTPS. Here are three common methods:
(1) Method 1: Based on asymmetric encryption algorithm Wikipedia's explanation of asymmetric encryption algorithms: Public-key cryptography (English: Public-key cryptography) is also called asymmetric cryptography (English: Asymmetric cryptography) is a cryptographic algorithm that requires two keys, one is a public key and the other is a private key; one is used for encryption and the other is used for decryption. The ciphertext obtained by encrypting the plaintext with one of the keys can only be decrypted with the corresponding other key to get the original plaintext; even the key originally used for encryption cannot be used for decryption. Since encryption and decryption require two different keys, it is called asymmetric encryption; it is different from symmetric encryption that uses the same key for encryption and decryption. Although the two keys are mathematically related, if one of them is known, the other cannot be calculated based on it; therefore, one of them can be made public, called the public key, and can be released to the outside at will; the undisclosed key is the private key, which must be kept strictly confidential by the user himself, and will never be provided to anyone through any means, nor will it be disclosed to the trusted other party to communicate. The asymmetric encryption RSA key negotiation method is the earliest method of HTTPS (strictly speaking, SSL/TLS protocol), and the process is as follows:
(2) Method 2: Based on a dedicated key exchange algorithm Method 1 is well known to most people, but there is a problem: if the server's private key PrK is leaked, the encryption performed by HTTPS will be insecure. Therefore, there is a key exchange algorithm (some say it is a keyless method). I have not studied the principle of the method in depth (my mathematical knowledge is limited, and I still feel annoyed when I see a lot of formulas and proofs...), the general process of DH and ECDH negotiation key algorithm is as follows: A and B in the ECDH algorithm are also called PreMaster-Secret in some materials. The key SK finally negotiated is also called Master Secret, also known as Session Key. The conclusion is: ECDH is faster than the DH algorithm, some say it is 10 times faster; and ECDH is more difficult to crack than DH and is more feasible. (3) Method 3: Based on shared secret This approach is to preset symmetric encryption keys on the client and server, and only need to pass key IDs during the handshake phase. Representative algorithms include PSK. Question 3: How many sets of asymmetric encryption does HTTPS have? What is the purpose? Can it be omitted? Direct answer: 2 sets of asymmetric encryption. The first set is used to negotiate the symmetric encryption key SK (the content discussed in question 2); the second set is used for digital certificate signature encryption (the CA certificate will be discussed in detail in the next chapter). The difference between the two is that the former is generated by the server (if it is a two-way authentication, the client will also have a set of asymmetric encryption public and private keys). The private key is on the server; the latter is generated by the CA organization, and the private key is on the CA organization. Furthermore, neither of these two sets can be omitted. (This statement is not rigorous, but in actual operation, it is indeed not recommended to omit either.) 3. Authentication: Certificates for HTTPS The author believes that for most programmers, 80% to 90% of the HTTPS-related problems encountered at work are related to certificates. Therefore, understanding certificates is very important! 1. What is a certificate? Before explaining this issue, let's look at a few key words: CA, CA organization, digital certificate, digital signature, (certificate) fingerprint, (CA) certificate, HTTPS certificate, SSL/TLS certificate. Let’s sort out the relationship between the above keywords:
Think Tank Encyclopedia explains digital certificates as follows:
Key point: Digital certificates are used for subject identity authentication. First of all, digital certificate = subject information + digital signature. In Windows, we can click the "lock" icon in the address bar on the Chrome browser to display a drop-down box, then click "Certificate" to see the digital certificate when accessing the server via HTTPS. The specific operations are as follows:
After looking at the certificate on the Chrome browser, let's take a look at the digital certificate through Wireshark capture: Comparing the digital certificate seen on Chrome and the digital certificate captured by Wireshark, we can see that the contents of the certificates presented by both are consistent. In summary, a complete digital certificate includes:
The abstraction is as follows: 2. Why a digital certificate? HTTPS has already encrypted the communication data, so why do we need to verify the identity? What about the "basic trust between people?" Isn't this because hackers are always creating various attacks? (Covering face) One of the famous man-in-the-middle attacks (MITM attacks) is that the "middleman" eavesdrops or even tampers with the communication information between the client and the server without the knowledge of the client and the server. The process is shown in the figure below (the figure is quoted from "HTTPS Man-in-the-Middle Attack Practice (Principle and Practice)"): During the handshake phase of HTTPS, one end sends a request to the other end, and the other end returns its public key; while one end directly negotiates the key without verifying the identity and public key of the other end. The "man in the middle" saw this loophole, intercepted the public key of the other end, and replaced it with his own public key. It is this step of "taking the wrong public key" or "trusting the wrong end" that makes all the efforts made by HTTPS for encryption (asymmetric encryption of key negotiation and symmetric encryption of communication data) go to waste. It can be seen that in HTTPS, "ensuring the identity of the other end is correct" means "ensuring the public key is correct", and the so-called "identity" in network communication generally refers to the domain name, IP address or even Mac address of one end of the communication. Therefore, the digital certificate contains both the identity information and public key information of one end of the communication. However, digital certificates are transmitted over the network (from the end that is required to verify the identity to the other end through the network), which means that the certificate may also be stolen and tampered with. At this time, the authoritative CA organization stepped in and came up with a solution: adding an "anti-counterfeiting mark" - a digital signature. The specific steps are as follows:
Here are some words: The digital signature generation process is to first hash the original text, map a variable-length text into a fixed-length character space, and then use the CA's private key to encrypt the fixed-length characters. This greatly improves the overall computing efficiency. 3. How do certificates work? To understand how certificates perform "authentication", i.e. "anti-impersonation", we need to explain it from two perspectives:
Please note that there is a prerequisite here: this certificate must be issued by an authoritative CA organization and still valid; or it must be a trusted private certificate. (1) Apply for a certificate This article does not discuss the classification of CA organizations and certificates. Here we discuss certificates issued by formal and authoritative CA organizations. Whether it is DV, OV or EV is just a matter of security strength, and the working principle is the same. To summarize the process of applying for a certificate: the user submits his or her information (such as domain name) and public key (the asymmetric encryption public key generated by the user, used to negotiate keys with the other end during the TLS handshake phase) to the CA, and the CA generates a digital certificate, as shown in the following figure: (2) Verify the certificate After receiving the certificate from the other end, perform the "reverse process" of the certificate application, as summarized in the following figure: The end receiving the certificate first performs the same hash algorithm on the other parts of the divisor signature (the hash algorithm is specified in the certificate) to obtain the hash mapping of this text, recorded as H1; obtain the public key of the CA organization to decode the digital signature attributes, and obtain the hash mapping calculated by the CA organization, recorded as H2. Compare whether the two strings H1 and H2 are strictly equal. If so, it means that the information of the certificate has not been tampered with and the certificate is valid; otherwise, the content of the certificate has been tampered with and the certificate is invalid. If the certificate is valid, the receiving end will verify the identity of the other party (verify the domain name). If the identity verification is successful, the receiving end will use the public key on the certificate (which is also the asymmetric encryption public key generated by the other party itself) to encrypt the information of the entire TLS handshake phase and send it to the other party. There is a problem in this process: How to obtain the public key of the CA organization? Answer: Built-in in advance. As we all know, operating systems and browsers will place a bunch of certificates in their specific directories during the software installation phase. For example, Windows root certificate management is under certmgr: These certificates all have one feature: a root certificate issued by an authoritative CA organization. The root certificate has several features:
These root certificates will be installed on the user's device along with many software, including the operating system and browser. Even if they are not installed in advance, these root certificates can be obtained on the official website of the CA organization. Currently, the world's largest and most authoritative CA organizations include Symantec, GeoTrust, Comodo, and RapidSSL, and the SSL digital certificates issued by these organizations have a very high market share. (Excerpt from "What are the SSL Certificate Authorities") There are so many root certificates built into the local computer. How do I know which root certificate I should use to verify this certificate? Answer: Certificate trust chain. There are three types of certificates in the trust chain: root certificate, intermediate certificate and user certificate. The root certificate has been explained above. The user certificate is the certificate sent by the other end, or the certificate in which the user binds his identity (mainly the domain name) and his public key to the authoritative CA organization. The intermediate certificate can be understood as a digital certificate issued by an agency appointed by the authoritative CA organization. It is recommended to read "What is an intermediate certificate?" The existence of the intermediate certificate or the intermediary organization is to ensure the security of the key of the root certificate. Careful students will find a category called "Intermediate Certificate Authority" when they look at certmgr. This is where the intermediary certificates are stored. Most user certificates are issued by the intermediary agencies of the authoritative CA agency. In this case, isn't it more difficult to find the corresponding root certificate based on the user certificate sent by the other end? Self-questioning and self-answering: This is a process of searching for the root node from the leaf node in a tree data structure. Wouldn’t it be enough to just use the most primitive deep search (DFS)? For example, as shown in the following figure (quoted from Wikipedia-Chain of trust):
For more information about the trust chain, we recommend reading What is the SSL Certificate Chain? 4. What about the certificate? I believe that many students have more or less come into contact with certificate files, such as .pem, .crt, .cer, .key, etc., so the question is: "Why are there so many certificates with different suffixes? What are the connections and differences between them?" To answer this question, we need to analyze it from three levels:
(1) Certificate Standards The format of digital certificates generally adopts the X.509 international standard. Wikipedia explains X.509 as follows:
(2) Certificate encoding format X.509 standard certificate files have different encoding formats: PEM and DER.
PEM, full name Privacy Enhanced Mail, is stored in text format and begins with -----BEGIN XXX-----begins, -----END XXX-----ends, and the content in between is BASE64 encoded data. The text content is roughly as follows:
Usually, the PEM format can store data such as public keys, private keys, and certificate signing requests. To view the information of a PEM format certificate, you generally use the following command:
Apache and Nginx servers prefer this encoding format.
DER, the full name of Distinguished Encoding Rules, is stored in binary format, so the file structure cannot be directly previewed and can only be viewed through the following command:
The DER format can also store data such as public keys, private keys, and certificate signing requests. Java and Windows applications tend to use this encoding format. Of course, different encodings of the same X.509 certificate can be converted to each other:
(3) File extension The different extensions can be divided into the following categories:
4. Integrity verification: HTTPS hash Hash, a key-value data structure, maps one space to another space through a hash function. It is a very useful tool, and its shadow can be found everywhere, such as consistent hashing for load balancing, various hashes (SHA, MD5, etc.) used for information encryption or data verification in cryptography, GeoHash for two-dimensional spatial positioning, SimHash for object similarity, etc. HTTPS hashes are stored in two places: 1. Digital signature of the certificate The specific method has been described in the certificate section above, so I will not repeat it here. The purpose of using hash here is mainly to reduce the overhead of the asymmetric encryption algorithm RSA on long texts. 2. Symmetrically Encrypted Message Digest During the data communication phase, SSL/TLS will hash the original message to obtain a summary of the message, called a message digest. After receiving the message, the other end will use the negotiated symmetric encryption key to decrypt the data packet and obtain the original message; then it will also perform the same hash algorithm to obtain a summary, and compare the sent message digest with the calculated message digest to determine whether the communication data has been tampered with. 5. HTTPS Communication Process At this point, the key issues involved in HTTPS have been basically covered. This chapter summarizes the entire HTTPS communication process: A few additional points: 1. Negotiated key: Client/Server random number, Client/Server Key The ECDH introduced in the encryption chapter is at the principle level. In practice, in addition to PreMaster-Secret (i.e. Client/Server Key), the key negotiation also involves random numbers on the client and server. Refer to the article "Https: TLS Handshake Protocol". The figure quoted in the article is from the actual ECDH key negotiation method: 2. Change Cipher Spec Change Cipher Spec is to inform the other party that encryption parameters are needed. The article "TLS Change Cipher Spec Protocol" points out:
The SSL Modify Ciphertext Protocol is one of the three specific protocols of the SSL high-level protocol that uses the SSL Record Protocol service, and it is also the simplest one. The protocol consists of a single message that contains only a single byte with a value of 1. The only function of this message is to copy the pending state to the current state and update the cipher suite used for the current connection. In order to ensure the security of the SSL transmission process, both parties should change the encryption specifications at regular intervals. 3. Encrypted Handshake Message The function of the Encrypted Handshake Message is to confirm the correctness of the negotiated symmetric encryption key SK. After the client and server negotiate the symmetric encryption key SK, they send each other a message encrypted with SK. If the encrypted message is successfully decrypted and verified, it means that the symmetric encryption key SK is correct. 4. One-way and two-way authentication All the cases discussed in this chapter are based on one-way verification, that is, the client requests a certificate from the server and verifies the server's identity. In some actual scenarios, the security requirements are higher, and the server requires verification of the client's identity, that is, two-way verification. Two-way verification is based on one-way verification, and adds the step of "after the server sends the certificate, it sends a 'request certificate' request to the client, and then verifies the client's identity". Refer to the figure below (the source of the picture is not checked): 6. HTTPS practical problem record 1. Question: Does HTTPS require domain name hijacking? No need. The reason is as follows: In the step of certificate verification, the client not only verifies the validity of the certificate by comparing the digital signature, but also compares whether the domain name on the certificate is consistent with the domain name it wants to access. Therefore, as long as the server's certificate is credible and the client does not skip the "certificate verification" step, https can prevent domain name hijacking. The author has done work to prevent domain name hijacking in practice. The specific approach is: first, the client requests the IP address corresponding to the domain name from a trusted domain name server; then, the client replaces the domain name with the IP to make a network request. This is called HTTPS IP direct connection. But in practice, there is a problem: the domain name identity is incorrect. The fundamental solution is: 1). When verifying the certificate, select the hostname you define for verification; 2). Before verifying the certificate, replace the IP of the URL back to the domain name. In many language implementations, the solution is simpler: add a host-name field to the request header and fill in the domain name as the value. 2. Question: "x509: certificate signed by unknown authority" The problem is that the client gets the server's certificate to authenticate, but finds that the certificate trust chain is broken and cannot find the root certificate. To put it bluntly, the client does not have the root certificate or intermediary certificate that issued the user certificate. The actual solutions are: 1). Install what is missing. If there is no root certificate/intermediary certificate, install it; 2). Skip the certificate authentication step. Implementation of skipping authentication in GoLang:
3. Question: In an https project, how does the server manage certificates and RSA keys? It depends. If it is a formal company, there will generally be a unified access layer to help with this, and backend developers only need to focus on their own business logic. Operation and maintenance colleagues or colleagues responsible for access layer development will regularly update certificates and help with HTTPS RSA management. Moreover, these certificates are all issued by formal and authoritative CA organizations, and the corresponding root certificates are pre-installed on common client devices. If you are an individual, you generally cannot afford the cost of an HTTPS certificate, so you can choose to issue a self-signed certificate yourself, or apply for a certificate from a free certificate issuing agency. However, this type of certificate requires the corresponding root certificate to be installed on the client at the same time. 4. Question: In an https project, does the client's root certificate need to be installed in advance? As mentioned in the previous question:
However, when developing on edge computing devices, I found that the "tinny OS" such as cameras is a "castrated" version of Linux system, so no root certificate is installed. To perform https communication on such devices, you must install the root certificate in advance. |
<<: 6 tips to avoid automation disasters
>>: The latest global ranking of 5G standard essential patents: Huawei first, Samsung second
In order to actively respond to the national stra...
RAKsmart is carrying out the "New Year's...
In the previous article "Entering the Billio...
[51CTO.com original article] The severe situation...
[[381851]] This article is reprinted from the WeC...
Wholesale data centers offer more control, while ...
My recent work is related to network protocols, w...
This article is reproduced from Leiphone.com. If ...
In March 2019, Beijing was still chilly in early ...
SD-WAN technology helps make wide area networks m...
One of the benefits of the Internet age is that w...
In the past, when people mentioned Ethernet, the ...
With the continuous development of information te...
[[347603]] background In the early stages of ente...