10,000-word article on HTTPS, no more panic in interviews!

10,000-word article on HTTPS, no more panic in interviews!

The HTTP protocol only establishes the standard for Internet transmission, simplifying the difficulty of directly using the TCP protocol for communication.

[[333773]]

Image via Pexels

The concept of less is more is good in itself, but being too simple also has some consequences:

①Communication uses plain text, and the content may be eavesdropped

HTTP itself does not have encryption capabilities, so it is impossible to encrypt the entire communication (the content of requests and responses using HTTP protocol communications). That is, HTTP messages are sent in plain text (referring to messages that have not been encrypted).

② It is impossible to know whether the message is sent by a legitimate user and it may be tampered with.

The HTTP protocol cannot prove the integrity of the communication message. Therefore, even if the content of the request or response is tampered with during the period from when the request or response is sent to when it is received by the other party, there is no way to know.

In other words, there is no way to confirm that the request/response sent and the request/response received are identical.

③Failure to verify the sender's identity may result in impersonation

The requests and responses in the HTTP protocol do not confirm the communication party. In HTTP protocol communication, since there is no processing step to confirm the communication party, anyone can initiate a request.

In addition, as long as the server receives a request, it will return a response no matter who the other party is (but this is only limited to the premise that the sender's IP address and port number are not restricted by the Web server).

The HTTP protocol cannot verify the identity of the communicating parties. Anyone can forge a fake server to deceive users and carry out phishing fraud without users being able to detect it.

Because of the above problems and with the development of the times, hackers' technical capabilities are getting stronger and stronger. Non-encrypted HTTP requests can easily cause related network security problems, and the demand for attack and defense control of security performance is becoming more and more urgent.

What is security

Since HTTP is not secure, what kind of communication process is secure?

It is generally believed that if the communication process has four characteristics, it can be considered secure:

  • Confidentiality: refers to the confidentiality of data, which can only be accessed by trusted people and is invisible to others. In simple terms, it means that irrelevant people should not be able to see things that they should not see.
  • Integrity (also called consistency) means that the data has not been tampered with during the transmission process, and it remains completely intact and neither more nor less.
  • Authentication: It means confirming the true identity of the other party, that is, proving that you are really you, and ensuring that messages can only be sent to trusted people. If the other party in the communication is a fake website, then the confidentiality of the data is useless. Hackers can use the fake identity to extract all kinds of information, and the encryption is the same as the non-encryption.
  • Non-repudiation/Undeniable, also known as non-repudiation, means that one cannot deny an action that has already occurred, and one cannot go back on one's word or play dirty.

The first three features can solve most problems of secure communication. However, if non-repudiation is missing, the authenticity of the communication transactions cannot be guaranteed, and there is a possibility of defaulters.

For example, Xiao Ming borrowed 1,000 yuan from Xiao Hong, but did not write a receipt. The next day, he denied it. Xiao Hong could not produce any evidence of borrowing the money, so she could only admit that she had bad luck.

Another situation is that Xiao Ming borrowed money and returned it to Xiao Hong, but he did not write a receipt. Xiao Hong then denied that Xiao Ming had returned the money, saying that he had not returned it at all, and asked Xiao Ming to pay another thousand yuan.

Therefore, only when the four characteristics of confidentiality, integrity, authentication, and non-repudiation are simultaneously possessed can the interests of both parties in communication be protected and can it be considered truly secure.

What Problems Does HTTPS Solve?

HTTPS adds the four major security features just mentioned to HTTP.

HTTPS is actually a very simple protocol. The RFC document is only 7 pages long and specifies the new protocol name HTTPS and the default port number 443.

As for other requests, response modes, message structures, request methods, URIs, header fields, connection management, etc., they all follow HTTP completely, with nothing new.

That is to say, except for the two differences in the protocol name HTTP and the port number 80, the HTTPS protocol is exactly the same as HTTP in terms of syntax and semantics, and has all the advantages and disadvantages (except for plain text and insecurity, of course).

How can HTTPS achieve security features such as confidentiality and integrity?

The secret lies in the S in the name HTTPS. It changes the underlying transmission protocol of HTTP from TCP/IP to SSL/TLS, and turns HTTP over TCP/IP into HTTP over SSL/TLS. HTTP is now running on the secure SSL/TLS protocol, and sending and receiving messages no longer use the Socket API, but instead call a dedicated security interface.

HTTPS itself does not have any earth-shattering capabilities. It relies entirely on the support of SSL/TLS. Once you learn SSL/TLS, HTTPS will be easy to use.

The main functions of the HTTPS protocol basically rely on the TLS/SSL protocol. The implementation of TLS/SSL functions mainly relies on three basic algorithms: hash functions, symmetric encryption and asymmetric encryption. Asymmetric encryption is used to implement identity authentication and key negotiation. Symmetric encryption algorithms use negotiated keys to encrypt data and verify the integrity of information based on hash functions.

What is SSL/TLS

Now let’s take a look at SSL/TLS and what it is.

SSL stands for Secure Sockets Layer, which is at layer 5 (session layer) in the OSI model. It was invented by Netscape in 1994 and has two versions: v2 and v3. However, v1 has never been made public due to serious defects.

By the time SSL developed to v3, it had proven itself to be a very good secure communication protocol, so the Internet Engineering Task Force (IETF) renamed it TLS (Transport Layer Security) in 1999, formally standardized it, and recalculated the version number from 1.0, so TLS1.0 is actually SSLv3.1.

To date, TLS has developed three versions, namely 1.1 in 2006, 1.2 in 2008, and 1.3 last year (2018). Each new version keeps pace with the development of cryptography and the current status of the Internet, continuously enhancing security and performance, and has become an authoritative standard in the field of information security.

The most widely used TLS at present is 1.2, and the previous protocols (TLS1.1/1.0, SSLv3/v2) are already considered unsafe, and major browsers will stop supporting them around 2020, so the following explanations will focus on TLS1.2.

TLS consists of several sub-protocols, including the record protocol, handshake protocol, warning protocol, password specification change protocol, and extension protocol. It uses a combination of symmetric encryption, asymmetric encryption, identity authentication and many other cutting-edge cryptographic technologies.

When using TLS to establish a connection, browsers and servers need to select an appropriate set of encryption algorithms to achieve secure communication. The combination of these algorithms is called a cipher suite (also called an encryption suite).

The naming of TLS cipher suites is very standardized and the format is very fixed. The basic form is key exchange algorithm + signature algorithm + symmetric encryption algorithm + digest algorithm.

For example, the cipher suite just now means: use the ECDHE algorithm for key exchange during handshake, use RSA for signature and identity authentication, use the AES symmetric algorithm for communication after the handshake, the key length is 256 bits, the grouping mode is GCM, and the digest algorithm SHA384 is used for message authentication and random number generation.

OpenSSL

When talking about TLS, we have to mention OpenSSL, which is a well-known open source cryptography library and toolkit that supports almost all public encryption algorithms and protocols. It has become a de facto standard. Many application software use it as the underlying library to implement TLS functions, including commonly used web servers Apache, Nginx, etc.

OpenSSL was developed from another open source library SSLeay. It was once considered to be named OpenTLS, but at that time (1998), TLS had not yet been officially established, and SSL was already well known, so the name OpenSSL was finally used.

OpenSSL currently has three major branches. Both 1.0.2 and 1.1.0 will no longer be maintained at the end of this year (2019). The latest long-term support version is 1.1.1.

Since OpenSSL is open source, it also has some code branches, such as Google's BoringSSL and OpenBSD's LibreSSL.

These branches have deleted some old codes and added some new features based on OpenSSL. Although they have big sponsors behind them, they are still far from replacing OpenSSL.

[[333774]]

Confidentiality Implementation: Encryption

The most common method to achieve confidentiality is encryption, which is to convert the message into garbled code that no one can understand. Only those who have a special key can convert the original text.

The key here is called a secret key. The message before encryption is called plain text/clear text. The encrypted garbled text is called cipher text. The process of using a secret key to restore plain text is called decryption.

All encryption algorithms are public and can be analyzed and studied by anyone, but the keys used in the algorithms must be kept confidential.

What is this crucial key? Since HTTPS and TLS both run on computers, the key is a long string of numbers, but the conventional unit of measurement is bit, not byte.

For example, if the key length is 128, it is a 16-byte binary string, and the key length is 1024, it is a 128-byte binary string.

Based on how the keys are used, encryption can be divided into two categories:

  • Symmetric encryption
  • Asymmetric encryption

Symmetric encryption

It is easy to understand. It means that the key used for encryption and decryption is the same and symmetrical. As long as the security of the key is guaranteed, the entire communication process can be said to be confidential.

For example, if you want to log in to a website, you only need to agree with it in advance to use a symmetric password. During the communication process, all the ciphertext transmitted is encrypted with the key, and only you and the website can decrypt it.

Even if a hacker is able to eavesdrop, all he sees is garbled text, because he cannot decipher the plaintext without the key, so confidentiality is achieved.

There are many symmetric encryption algorithms to choose from in TLS, such as RC4, DES, 3DES, AES, ChaCha20, etc., but the first three algorithms are considered unsafe and are usually prohibited from use. Currently, the commonly used ones are AES-128, AES-192, AES-256 and ChaCha20.

The full name of DES is Data Encryption Standard, which is a symmetric key algorithm used for digital data encryption.

Although its short key length of 56 bits makes it too insecure for modern applications, it has been highly influential in the development of cryptography.

AES means Advanced Encryption Standard. AES-128, AES-192 and AES-256 all belong to AES.

The key length can be 128, 192 or 256. It is a replacement for the DES algorithm, with high security strength and good performance, and some hardware will be specially optimized, so it is very popular and the most widely used symmetric encryption algorithm.

ChaCha20 is another encryption algorithm designed by Google. Its key length is fixed at 256 bits. Its pure software performance exceeds that of AES. It was once popular on mobile clients, but ARMv8 later added AES hardware optimization, so it no longer has a clear advantage.

Grouping Mode

Symmetric algorithms also have the concept of grouping mode, which allows the algorithm to encrypt plaintext of any length with a fixed-length key, converting a small secret (i.e., the key) into a large secret (i.e., the ciphertext).

There were several grouping modes at first, such as ECB, CBC, CFB, OFB, etc., but they were all found to have security vulnerabilities one after another, so they are basically not used now.

The latest grouping mode is called AEAD (Authenticated Encryption with Associated Data), which adds authentication function while encrypting. Commonly used ones are GCM, CCM and Poly1305.

Combining the above, we get the symmetric encryption algorithms defined in the TLS cipher suite.

For example, AES128-GCM means the AES algorithm with a key length of 128 bits and the grouping mode used is GCM; ChaCha20-Poly1305 means the ChaCha20 algorithm and the grouping mode used is Poly1305.

Asymmetric encryption

Symmetric encryption seems to achieve confidentiality perfectly, but there is a big problem: how to securely transmit the key to the other party, which is called key exchange.

Because in symmetric encryption algorithms, as long as you have the key, you can decrypt it. If the key you agreed upon with the website is stolen by a hacker during transmission, he can decrypt the sent and received data at will afterwards, and the communication process will have no confidentiality at all.

How to solve this problem? Generally speaking, unless the two parties have agreed on the key privately, the key that changes every time you communicate cannot be brought to the other party during the communication process, so you are in an awkward situation of having to encrypt the key again. Therefore, asymmetric encryption (also called public key encryption algorithm) appeared.

It has two keys, one is called the public key and the other is called the private key. The two keys are different and asymmetric. The public key can be made public to anyone, while the private key must be kept strictly confidential.

The public key and private key have a special one-way property. Although both can be used for encryption and decryption, encryption with the public key can only be decrypted with the private key, and vice versa, encryption with the private key can only be decrypted with the public key.

Asymmetric encryption can solve the problem of key exchange. The website keeps the private key in secret and distributes the public key arbitrarily on the Internet. If you want to log in to the website, you only need to encrypt it with the public key. The ciphertext can only be decrypted by the private key holder.

Since hackers do not have the private key, they cannot crack the ciphertext.

The design of asymmetric encryption algorithms is much more difficult than that of symmetric algorithms. There are only a few of them in TLS, such as DH, DSA, RSA, ECC, etc.

RSA is probably the most famous one among them. It can almost be said to be synonymous with asymmetric encryption. Its security is based on the mathematical problem of integer factorization. It uses the product of two very large prime numbers as the material for generating keys. It is very difficult to deduce the private key from the public key.

Ten years ago, the recommended length of RSA keys was 1024 bits, but with the improvement of computer computing power, 1024 bits is no longer safe, and it is generally believed that at least 2048 bits is required.

ECC (Elliptic Curve Cryptography) is a rising star in asymmetric encryption. It is based on the mathematical problem of elliptic curve discrete logarithm and uses specific curve equations and base points to generate public and private keys. The sub-algorithm ECDHE is used for key exchange and ECDSA is used for digital signatures.

ECDHE is a DH algorithm that uses elliptic curve (ECC). Its advantage is that it can achieve the same security level as RSA with a smaller prime number (256 bits).

The disadvantages are that the algorithm is complex to implement, has a short history of being used for key exchange, and has not been subjected to long-term security attack tests.

The two most commonly used curves are P-256 (secp256r1, called prime256v1 in OpenSSL) and x25519.

P-256 is the curve recommended by NIST (National Institute of Standards and Technology) and NSA (National Security Agency), while x25519 is considered the most secure and fastest curve.

Compared with RSA, ECC has obvious advantages in security strength and performance. 160-bit ECC is equivalent to 1024-bit RSA, and 224-bit ECC is equivalent to 2048-bit RSA.

Because the key is short, the corresponding amount of calculation, memory consumption and bandwidth are less, and the encryption and decryption performance is improved, which is very attractive for today's mobile Internet.

Hybrid Encryption

Do you think that we can abandon symmetric encryption and only use asymmetric encryption to achieve confidentiality?

Unfortunately, although asymmetric encryption does not have the problem of key exchange, because they are all based on complex mathematical problems, the calculation speed is very slow. Even ECC is several orders of magnitude worse than AES.

If only asymmetric encryption is used, although security is guaranteed, the communication speed is as slow as a turtle or a snail, and its practicality becomes zero.

Is it possible to combine symmetric encryption and asymmetric encryption so that they can complement each other and achieve both efficient encryption and decryption and secure key exchange?

This is the hybrid encryption method currently used in TLS. In fact, it is very simple: use asymmetric algorithms, such as RSA and ECDHE, at the beginning of communication to solve the problem of key exchange first.

Then use the random number to generate the "session key" used by the symmetric algorithm, and then encrypt it with the public key. Because the session key is very short, usually only 16 bytes or 32 bytes, it doesn't matter if it's a little slower.

After receiving the ciphertext, the other party uses the private key to decrypt and obtain the session key. In this way, the two parties have achieved the secure exchange of symmetric keys, and no asymmetric encryption is used in the future, and all symmetric encryption is used.

This hybrid encryption solves the key exchange problem of the symmetric encryption algorithm, and takes both security and performance into account, perfectly achieving confidentiality.

Completeness

Digest Algorithm

The main means of achieving integrity is the digest algorithm (Digest Algorithm), which is commonly known as the hash function or hash function.

You can roughly understand the digest algorithm as a special compression algorithm that can compress data of any length into a fixed-length, unique digest string, just like generating a digital fingerprint for the data.

From another perspective, the digest algorithm can also be understood as a special one-way encryption algorithm. It only has an algorithm but no key. The encrypted data cannot be decrypted, and the original text cannot be reversed from the digest.

The summary algorithm actually maps data from a large space to a small space, so there is a possibility of conflict (also called collision). Just like fingerprints in reality, there may be two different original texts corresponding to the same summary.

A good digest algorithm must be able to resist conflicts and minimize the possibility of such conflicts. Because the digest algorithm has a one-way and avalanche effect on the input, a slight difference in the input will lead to a drastic change in the output, so it is also used by TLS to generate pseudo-random numbers (PRF, pseudo random function).

You must have heard of or used MD5 (Message-Digest 5) and SHA-1 (Secure Hash Algorithm 1) in your daily work. They are the two most commonly used digest algorithms that can generate digital digests of 16 bytes and 20 bytes in length.

However, the security strength of these two algorithms is relatively low and not secure enough, so they have been banned in TLS. Currently, TLS recommends using SHA-2, the successor of SHA-1.

SHA-2 is actually a general term for a series of digest algorithms, a total of 6 types, the commonly used ones are SHA224, SHA256, and SHA384, which can generate 28-byte, 32-byte, and 48-byte digests respectively.

The digest algorithm ensures that the digital summary is completely equivalent to the original text. Therefore, as long as we attach its summary after the original text, we can ensure the integrity of the data.

For example, you send a message: There is an insider, terminate the transaction, and then add a SHA-2 digest. After receiving it, the website also calculates the message digest and compares the two fingerprints. If they are consistent, it means that the message is complete and credible and has not been modified.

If a hacker changes even one punctuation mark in the middle, the summary will be completely different. The website will find out through calculation and comparison that the message has been tampered with and is unreliable.

However, the digest algorithm is not confidential. If it is transmitted in plain text, hackers can modify the message and the digest as well, and the website still cannot identify the integrity.

Therefore, true integrity must be based on confidentiality, and the session key must be used to encrypt messages and summaries in a hybrid encryption system so that hackers cannot know the plaintext and cannot tamper with it.

There's a term for this, it's called a Hash Message Authentication Code (HMAC).

Digital Signature

The encryption algorithm combined with the digest algorithm makes our communication process relatively safe. However, there is still a loophole, which is the two endpoints of the communication.

As mentioned at the beginning, hackers can pretend to be websites to steal information. In turn, they can also pretend to be you and send payment, transfer and other messages to the website. The website has no way to confirm your identity, and the money may be stolen.

In real life, the means to solve identity authentication are signatures and seals. As long as you write your signature or stamp a seal on a piece of paper, you can prove that the document was indeed issued by you and not someone else.

Recall the introduction above. There is something in TLS that is very similar to a signature or seal in reality, which can only be held by the person himself, and no one else has it? Just use this thing to prove your identity in the digital world.

That's right, this thing is the private key in asymmetric encryption. Using the private key plus the digest algorithm, you can implement digital signatures, and at the same time achieve identity authentication and non-repudiation.

The principle of digital signature is actually very simple, which is to reverse the usage of public key and private key. Previously, public key encryption and private key decryption were used, but now private key encryption and public key decryption are used.

However, because asymmetric encryption is too inefficient, the private key only encrypts the summary of the original text. This way, the amount of computation is much smaller, and the resulting digital signature is also very small, making it easier to store and transmit.

The signature is completely public like the public key and can be obtained by anyone. However, this signature can only be decrypted with the public key corresponding to the private key. After obtaining the summary, it can be compared with the original text to verify the integrity, and it can be proved that the message was indeed sent by you, just like signing a document.

The two actions just mentioned also have special terms, called signing and signature verification.

As long as you and the website exchange public keys, you can use signatures and signature verification to confirm the authenticity of the message. Because the private key is kept confidential, hackers cannot forge signatures, and the identities of both parties in the communication can be guaranteed.

For example, you use your own private key to sign a message "Ma Dongmei, don't run away". After receiving it, the website uses your public key to verify the signature and confirm that there is no problem with your identity. Then it also uses its private key to sign the message "Ten Years of Reunion", breaking up a couple.

After you receive it, you can verify it with its public key. This way, both you and the website know that the other party is not fake, and you can use hybrid encryption for secure communication later.

Digital Certificates and CA: Identity Authentication

So far, by using symmetric encryption, asymmetric encryption and digest algorithms, we have achieved the four major security features. Is it perfect?

No, there is also a public key trust issue. Because anyone can publish a public key, we still lack the means to prevent hackers from forging public keys. In other words, how can we determine whether this public key is yours?

We can use a method similar to key exchange to solve the public key authentication problem, using other private keys to sign the public key. Obviously, this will fall into the Russian nesting doll again.

[[333778]]

But this time there is really no other way. To end this vicious circle, we must introduce external forces and find a recognized and trusted third party to serve as the starting point of trust and the recursive end point to build a trust chain for the public key.

This third party is what we often call CA (Certificate Authority). It is like the Public Security Bureau, Ministry of Education, and Notary Center in the online world. It has a very high credibility. It signs each public key and uses its own reputation to ensure that the public key cannot be forged and is trustworthy.

The CA's signature authentication of the public key also has a format. It is not simply binding the public key to the identity of the holder. It must also include the serial number, purpose, issuer, validity period, etc. These are packaged into a package and then signed to fully prove the various information associated with the public key to form a digital certificate.

There are only a few well-known CAs in the world, such as DigiCert, VeriSign, Entrust, Let's Encrypt, etc. The certificates they issue are divided into three types: DV, OV, and EV, and the difference lies in the degree of trustworthiness.

DV is the lowest, which is only credible at the domain level, and no one knows who is behind it. EV is the highest, which has been strictly verified by laws and audits and can prove the identity of the website owner (the company name will be displayed in the browser address bar, such as Apple and GitHub websites).

However, how does CA prove itself? This is still a question of trust chain.

Smaller CAs can ask larger CAs to sign and authenticate their certificates, but the end of the chain, the "Root CA", can only prove itself. This is called a "Self-Signed Certificate" or "Root Certificate".

You have to believe it, otherwise the entire certificate trust chain will not work.

With this certificate system, the operating system and browser have built-in root certificates of major CAs. When you go online, as long as the server sends its certificate, you can verify the signature in the certificate.

By verifying layer by layer along the certificate chain until the root certificate is found, it can be determined that the certificate is trustworthy, and thus the public key inside is also trustworthy.

Weaknesses of the certification system

Although the certificate system (PKI, Public Key Infrastructure) is the security infrastructure of the entire network world, absolute security does not exist. It also has weaknesses, and the key word is trust.

If the CA makes a mistake or is deceived and issues the wrong certificate, the certificate will be genuine, but the website it represents will be fake.

There is also a more dangerous situation, that is, the CA is hacked, or the CA has malicious intent, because it (the root certificate) is the source of trust, all certificates in the entire trust chain are unreliable.

These two things are not sensational, they have actually happened before. So we need to patch up the certificate system.

For the first type, CRL (Certificate revocation list) and OCSP (Online Certificate Status Protocol) were developed to revoke problematic certificates in a timely manner.

For the second type, because there are too many certificates involved, the operating system or browser can only take drastic measures from the root, revoke the trust in the CA, and blacklist it, so that all certificates it issues will be considered unsafe.

Let's take a look at what Github's digital certificate looks like:

The information on the certificate includes: the type is EV, with the highest authority certification; expiration date; the organization to which the certificate belongs; and the certificate issuing authority.

TLS protocol components

What happens when you type a URI beginning with HTTPS in the browser address bar and press Enter? The browser first extracts the protocol name and domain name from the URI.

Because the protocol name is HTTPS, the browser knows that the port number is the default 443. It then uses DNS to resolve the domain name, obtains the target IP address, and then can use the three-way handshake to establish a TCP connection with the website.

In the HTTP protocol, after the connection is established, the browser will immediately send a request message. But now it is the HTTPS protocol, which requires another handshake process to establish a secure connection on TCP before sending and receiving HTTP messages.

This handshake process is somewhat similar to TCP and is the most important and core part of the HTTPS and TLS protocols.

Before talking about the TLS handshake, let's briefly introduce the composition of the TLS protocol.

TLS contains several sub-protocols, which you can also understand as being composed of several modules with different responsibilities. The more commonly used ones are: record protocol, alert protocol, handshake protocol, change password specification protocol, etc.

The Record Protocol defines the basic unit of TLS data transmission and reception: record.

It is a bit like the segment in TCP, and all other subprotocols need to be sent through the record protocol.

However, multiple record data can be sent at once in one TCP packet, and there is no need to return ACK like TCP.

The responsibility of the Alert Protocol is to send alert information to the other party, which is a bit like the status code in the HTTP protocol.

For example, protocol_version means that the old version is not supported, and bad_certificate means that there is a problem with the certificate. After receiving the alert, the other party can choose to continue or terminate the connection immediately.

The Handshake Protocol is the most complex sub-protocol in TLS, much more complex than TCP's SYN/ACK. During the handshake process, the browser and server will negotiate the TLS version number, random number, cipher suite and other information, and then exchange certificates and key parameters. Finally, the two parties negotiate to obtain the session key for the subsequent hybrid encryption system.

Change Cipher Spec Protocol is very simple, it is a notification to tell the other party that the subsequent data will be protected by encryption. Conversely, before it, the data is in plain text.

The following figure briefly describes the TLS handshake process, where each box is a record, and multiple records are combined into a TCP packet to be sent.

Therefore, the handshake can be completed after a maximum of two message round trips (4 messages), and then HTTP messages can be sent in a secure communication environment to implement the HTTPS protocol.

After TCP completes the three-way handshake to establish a connection, HTTPS begins the encryption authentication handshake process. The TLS handshake process is as follows:

We can use the wireShark packet capture tool to see the above process.

After TCP establishes a connection, the browser will first send a client_hello message, which contains the client version number, supported cipher suites, and a random number (Client Random) for subsequent generation of session keys:

It can be seen that the client sends to the server 16 cipher suites that it supports, and the TLS version used by the client is 1.2.

After receiving the Client Hello, the server will return a Server Hello message, check the version number, and also give a random number (Server Random).

Then select a cipher suite from the client's list as the cipher suite to be used for this communication. Here it selects Cipher Suite: TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (0xc02f).

Then, in order to prove its identity, the server sends the certificate to the client (Server Certificate).

The next step is a critical operation. Because the server selects the ECDHE algorithm, it will send a Server Key Exchange message after the certificate, which contains the elliptic curve public key (Server Params) used to implement the key exchange algorithm, plus its own private key signature authentication.

  1. Handshake Protocol: Server Key Exchange
  2. EC Diffie-Hellman Server Params
  3. Curve Type: named_curve (0x03)
  4. Named Curve: x25519 (0x001d)
  5. Pubkey: 3b39deaf00217894e...
  6. Signature Algorithm: rsa_pkcs1_sha512 (0x0601)
  7. Signature: 37141adac38ea4...

This is equivalent to saying: The cipher suite I just selected is a bit complicated, so I will give you another algorithm parameter, which is as useful as the random number just now. Don't lose it. In order to prevent others from impersonating, I stamped it again.

This is followed by the Server Hello Done message, where the server says: That's all my information, hello done.

This completes the first message round trip (two TCP packets). As a result, the client and server share three pieces of information in plain text: Client Random, Server Random, and Server Params.

The client has also obtained the server's certificate. Is this certificate authentic and valid? The client will then verify the authenticity of the certificate. The verification process is as follows:

①First, read the certificate owner's validity period and other information for verification, find the built-in recipient certificate issuing agency CA and compare it with the certificate CA to verify whether it is issued by a legitimate agency.

This step will do the following:

  • Verify the credibility of the certificate chain (trusted certificate path) to see if the CA that issued the server certificate is reliable.
  • There are two ways to revocation a certificate: offline CRL and online OCSP. Different clients behave differently.
  • Expiry date: whether the certificate is within the valid time range.
  • Domain name (domain), check whether the certificate domain name matches the current access domain name, and analyze the matching rules later.

② After the first step of verification is passed, a random number Pre-master is generated, encrypted with the certificate public key, and sent to the server as the key for subsequent symmetric encryption.

The client sends a Client Key Exchange to the server. Finally, the client and the server send Change Cipher Spec, Encrypted Handshake Message to each other:

Next, the server receives the Pre-master sent by the client, decrypts the encrypted Pre-master, and then notifies the client: the handshake phase is over, and all subsequent communications will be carried out using symmetric encryption.

Two-way authentication

The TLS handshake has been discussed above. From the above process, it is not difficult to see that only the client authenticates the identity of the server, while the server does not authenticate the identity of the client. We call it one-way authentication.

Usually, after one-way authentication, secure communication has been established, and the user's true identity can be confirmed by simple means such as account number and password.

However, in order to prevent account and password theft, sometimes (such as online banking) USB shields are used to issue client certificates to users to achieve two-way authentication, which is safer.

The process of two-way authentication has not changed much. The only difference is that after Server Hello Done and before Client Key Exchange, the client needs to send a Client Certificate message. After receiving the message, the server also goes through the certificate chain to verify the identity of the client.

However, TLS1.2 is an old protocol from 10 years ago (2008). Although it has stood the test of time, time is not forgiving. It can no longer keep up with today's Internet in terms of security and performance.

After four years and nearly 30 drafts, TLS1.3 was launched in 2018, once again establishing a new standard in the field of information security.

Maximize compatibility

Since protocols such as 1.1 and 1.2 have been around for many years, many applications and middle agents (officially called MiddleBox) only recognize the old recording protocol format, making it difficult or even impossible to update and transform them (device rigidity).

In early experiments, it was found that once the version number in the record header field was changed from 0x303 (TLS1.2) to 0x304 (TLS1.3), a large number of proxy servers and gateways could not handle it correctly, eventually causing the TLS handshake to fail.

In order to ensure that these widely deployed old devices can continue to be used and avoid the impact of the new protocol, TLS1.3 has to make compromises, keep the existing record format unchanged, and achieve compatibility through disguise, making TLS1.3 look like TLS1.2.

So, how to distinguish 1.2 from 1.3?

This requires the use of a new extension protocol, which is a bit like a supplementary clause. New functions are added by adding a series of extension fields at the end of the record. Old versions of TLS that do not recognize it can be ignored directly, thus achieving backward compatibility.

When the Version field of the record header is fixed for compatibility, as long as it is the TLS1.3 protocol, the Hello message of the handshake must be followed by the supported_versions extension, which marks the TLS version number and can be used to distinguish between the new and old protocols.

You can take a look at the extensions behind Client Hello messages, but because the server does not support 1.3, it is backward compatible downgraded to 1.2.

TLS1.3 uses extensions to implement many important functions, such as supported_groups, key_share, signature_algorithms, server_name, etc., which will be discussed later.

Strengthen safety

TLS1.2 has gained a lot of valuable experience in more than ten years of application, and has discovered many vulnerabilities and weaknesses of encryption algorithms, so TLS1.3 has patched these unsafe factors in the protocol.

for example:

  • The pseudo-random number function is upgraded from PRF to HKDF (HMAC-based Extract-and-Expand Key Derivation Function).
  • It is explicitly prohibited to use compression in the recording protocol.
  • The RC4 and DES symmetric encryption algorithms have been abolished.
  • Traditional grouping modes such as ECB and CBC have been abolished.
  • The MD5, SHA1, SHA-224 abstract algorithms were abolished.
  • RSA, DH key exchange algorithms and many named curves are abolished.

After this weight loss, only the AES and ChaCha20 symmetric encryption algorithms are retained in TLS1.3, and the grouping mode can only use AEAD's GCM, CCM and Poly1305.

The summary algorithm can only use SHA256 and SHA384, the key exchange algorithm is only ECDHE and DHE, and the elliptic curve has been cut to only 5 types, including P-256 and x25519.

The algorithm streamlining brought an expected benefit: the numerous algorithms and parameter combinations made the cipher suite very complex and difficult to choose, but now there are only 5 suites in TLS1.3, and neither client nor server will suffer from the difficulty of choosing again.

Here we will also specifically talk about the reasons for abolishing the RSA and DH key exchange algorithm. In RSA key exchange, the shared key is generated by the client, and the client then encrypts the shared key (extracted from the certificate) using the server's public key (extracted from the certificate) and sends it to the server.

The DH algorithm was invented by Diffie and Hellman in 1976, the so-called Diffie-Hellman key exchange.

In Diffie-Hellman, both the client and the server start by creating a pair of DH parameters. They then send the common part of their DH parameters to the other party.

When both parties receive the other party's public parameters, they combine it with their own private key and eventually calculate the same value: the former master key. The server then uses a digital signature to ensure that the exchange has not been tampered with.

If both the client and the server select a new DH parameter for each key exchange, the key exchange is called "Ephemeral" (DHE).

DH is a powerful tool, but not all DH parameters can be used "safely". The security of DH depends on the difficulty of a discrete logarithmic problem in mathematics.

If the discrete logarithmic problem of a set of parameters can be solved, the private key can be extracted and the security of the protocol can be broken.

Generally speaking, the larger the number used, the harder it is to solve the discrete logarithmic problem. Therefore, if you choose a smaller DH parameter, you may be attacked.

Both modes give the client and server a shared secret, but RSA mode has a serious disadvantage because it does not have Forward Secrecy.

Suppose there is such a patient hacker who has been collecting all messages sent and received by hybrid encryption systems for a long time.

If the encryption system uses RSA in the server certificate for key exchange, once the private key is leaked or cracked (using social engineering or megacomm), the hacker can use the private key to decrypt the Pre-Master of all previous messages, then calculate the session key, and crack all ciphertexts.

The ECDHE algorithm generates a pair of temporary public and private keys every time the handshake. The key pairs of each communication are different, that is, one secret at a time. Even if the hacker spends a lot of effort to crack the session key this time, it is only this time that the communication is attacked, and the previous historical messages will not be affected and are still safe.

Therefore, mainstream servers and browsers no longer use RSA during the handshake stage, but instead use ECDHE. TLS1.3 clearly abolishes RSA and DH in the protocol to ensure forward security at the standard level.

RSA key exchange has been problematic for a while, not just because it supports forward confidentiality. It is because it is not easy to implement RSA key exchange correctly.

In 1998, Daniel Bleichenbacher discovered a vulnerability while using RSA encryption in SSL and created what is called a "million-message attack."

It allows an attacker to decrypt the message by sending millions of messages or some specific messages to the server, calculate the encryption key based on the different error codes of the server's response.

Over the years, this attack has been improved and in some cases it only takes thousands of times to crack, which makes it possible to crack on laptops.

Recently, it was discovered that many large websites (including facebook.com) were also affected by the Bleichenbacher variant vulnerability, namely the ROBOT attack in 2017.

To reduce the risks posed by non-forward encrypted connections and Bleichenbacher vulnerabilities, RSA encryption has been removed from TLS 1.3, with Diffie-Hellman Ephemeral as the only key exchange mechanism.

Improve performance

When establishing a connection with HTTPS, in addition to TCP handshake, TLS handshake also needs to do a TLS handshake. In 1.2, two more message round trips (2-RTT), which may cause delays of tens of milliseconds or even hundreds of milliseconds, and the delay in mobile networks will be even more serious.

Now that the cipher suite is greatly simplified, there is no need to go through the complex negotiation process as before.

TLS1.3 compressed the previous Hello negotiation process, deleted Key Exchange messages, reduced the handshake time to 1-RTT, and doubled the efficiency.

So how did it be done? In fact, the specific approach is to use extensions.

In the Client Hello message, the client directly uses supported_groups to bring the supported curves, such as P-256 and x25519, uses key_share to bring the client public key parameters corresponding to the curve, and uses signature_algorithms to bring the signature algorithm.

After receiving the server, select a curve and parameter in these extensions, and then use the key_share extension to return to the public key parameters on the server, which realizes the key exchange between the two parties. The subsequent process is basically the same as in 1.2.

In addition to the standard 1-RTT handshake, TLS1.3 also introduces 0-RTT handshake, using pre_shared_key and early_data extensions to immediately establish a secure connection and send encrypted messages after TCP connection. However, this requires some prerequisites, and will not be explained here due to space limitations.

HTTPS usage cost

HTTPS So far, large and medium-sized domestic enterprises have basically supported and used it.

Generally speaking, before using HTTPS, you may pay close attention to the following issues:

  • Certificate fee and update maintenance: The fee is mainly for certificate application, and now there are free certificate agencies, such as the famous free certificate project initiated by mozilla: let's encrypt supports free certificate installation and automatic update.
  • HTTPS Reduces user access speed: HTTPS will reduce speed to a certain extent, but as long as it is reasonably optimized and deployed, the impact of HTTPS on speed is completely acceptable.

In many scenarios, HTTPS speed is not inferior to HTTP. If SPDY is used, HTTPS speed is even faster than HTTP.

Author: rickiyang

Editor: Tao Jialong

Source: Reprinted from WeChat official account rickiyang

<<:  Soul-searching Question 9: What are the unknown secrets of the bizarre online world? (Must-ask questions for online interviews)

>>:  The UK officially bans Huawei 5G equipment! Officials respond quickly

Blog    

Recommend

Apple CEO Cook: 5G promotion is still in the "early stages"

Apple CEO Tim Cook believes that 5G promotion is ...

Network security attack and defense: wireless network security WEP

[[392852]] The WEP (Wired Equivalent Privacy) pro...

Cloud computing in 2018: Switch or die

Cloud computing technology is creating a new and ...

Is Your Ethernet Cable Faulty? Signs to Watch Out For

​Cables are an unwanted but necessary thing, and ...

In-depth Explanation: What is "Time Granting"?

[[340662]] This article is reprinted from the WeC...