Twelve questions about Internet knowledge, do you know?

Twelve questions about Internet knowledge, do you know?

  [[379905]]

Preface

Countdown to Chinese New Year~

Today is the last article on network. Network knowledge is also a frequently tested topic in interviews, so you must lay a solid foundation.

Twelve Questions about the Internet for everyone.

Can you answer these questions?

I have summarized some issues related to the Internet. Please take a look. If you can answer them all, you can skip this article.

  • What is the process of network communication and what protocol is used in the middle?
  • TCP connection process, three-way handshake and four-way wave, why?
  • Commonly used status codes.
  • Talk about the differences and scenarios between TCP protocol and UDP protocol
  • Socket and WebSocket
  • Https link establishment process
  • Explain why digital signatures are authentic and reliable
  • Certificate chain security mechanism
  • The creation process is time-consuming, so how can we optimize it?
  • Talk about the difference between Http and Https
  • What are the ways to transfer pictures via Http?
  • How to achieve block transmission and breakpoint resume?

The process of network communication and what protocols are used in the middle

I made an animation specifically for this question before, you can turn to the previous article to see it:

This is how network data is transmitted (combined with animation analysis)

Let’s briefly summarize:

Client:

  • 1. Enter the URL in the browser
  • 2. The browser parses the URL and generates an http request message
  • 3. The browser calls the system resolver and sends a message to the DNS server to query the IP corresponding to the domain name
  • 4. After getting the IP address, it is sent to the TCP module of the operating system protocol stack together with the request message.
  • 5. Divide the data into individual packets and add a TCP header to form a TCP packet
  • 6. The TCP header includes the sender port number, the receiver port number, the data packet sequence number, and the ACK number.
  • 7. Then hand the TCP message to the IP module.
  • 8. The IP module will add the IP header and MAC header.
  • 9. The IP header includes the IP address, which is used by the IP module, and the MAC header includes the MAC address, which is used by the data link layer.
  • 10. The IP module will hand over the entire message packet to the network hardware, that is, the data link layer, such as Ethernet, WIFI, etc.
  • 11. The network card will then convert these packets into electrical signals or optical signals, send them out through network cables or optical fibers, and then be delivered to the recipient by forwarding devices such as routers.

Server side:

  • 1. The data packet arrives at the server's data link layer, such as Ethernet, and then it is converted into a data packet (digital signal) and handed over to the IP module.
  • 2. The IP module will send the contents after the MAC header and the IP header, that is, the TCP data packet, to the TCP module.
  • 3. The TCP module will parse the TCP header information and then communicate with the client to indicate that the data packet has been received.
  • 4. After receiving all the data packets of the message, the TCP module will encapsulate the good news, generate the corresponding message and send it to the application layer, that is, the HTTP layer.
  • 5. When the HTTP layer receives a message, such as HTML data, it will parse the HTML data and finally draw it on the browser page.

TCP connection process, three-way handshake and four-way wave, why?

Connection phase (three-way handshake):

  • Create a socket. The server will create it when it starts, and the client will create a socket when it needs to access the server.
  • Then initiate a connection operation, which is actually the connect method of Socket
  • At this time, the client will generate a TCP data packet. The TCP header of this data packet has three important information: SYN, SEQ, and ACK.

SYN, synchronization sequence number, is the handshake signal used when TCP/IP establishes a connection. If this value is 1, it means it is a connection message.

SEQ, data packet sequence number, is a sequential number for sending data.

ACK, confirmation number, is a sequence number of received data.

  • So the client generates such a data packet, in which the SYN header is set to 1, indicating a connection. SEQ is set to a random number, representing the initial sequence number, such as 100. ACK is not set because it is the first time to send data and no ACK is required.
  • Then the server receives this message and knows that the client is connecting (SYN=1) and knows the initial sequence number of the transmitted data (SEQ=100).
  • The server also needs to generate a data packet and send it to the client. The TCP header of this data packet will contain three values: SYN (SYN=1) indicating that I also want to connect to you, I have received the confirmation number ACK of your previous data packet (ACK=SEQ+1=101), and a sequence number SEQ randomly generated by the server (for example, SEQ=200).
  • Finally, after the client receives this message, it indicates that the connection from the client to the server is correct, and then sends another data packet to indicate that it has also confirmed receipt of the data packet sent by the server. The header of this data packet is mainly an ACK value (ACK=SEQ+1=201).
  • At this point, the connection is successful, the three-way handshake is completed, and the subsequent data will be transmitted normally, and the SEQ and ACK values ​​in the TCP header must be carried each time.

Here is a question about why a three-way handshake is needed?

The main reason is that both parties in the communication need to confirm that their messages have been accurately conveyed.

A sends a message to B, and B replies with a message to indicate that it has been received. This process ensures the communication capability of A. B sends a message to A, and A replies with a message to indicate that it has been received. This process ensures the communication capability of B.

That is to say, four messages can ensure that the message sending of both parties is normal. Among them, B's reply message and B's message sending can be merged into one message, so there is a three-way handshake.

Data transmission phase:

One change in the data transmission phase is that the ACK confirmation number is no longer SEQ+1, but SEQ+data length. For example:

  • Data packet sent from A to B (SEQ=100, length=1000 bytes)
  • B returns the data packet to A (ACK=100+1000=1100)

This is the header information of a data transmission. ACK represents which byte the next data packet should start from, so it is equal to the SEQ + length of the previous data packet. SEQ is equal to the ACK of the previous data packet.

Of course, TCP communication is bidirectional, so each message of actual data will have SEQ and ACK:

  • Data packet sent from A to B (ACK=200, SEQ=100, length=1000 bytes)
  • B returns the data packet to A (ACK=100+1000=1100, SEQ=ACK of the previous data packet=200, length=500 bytes)
  • A sends a data packet to B (SEQ=1100, ACK=200+500=700)

Disconnection phase (four waves):

As in the connection stage, the TCP header also has a value called FIN that is specifically used to close the connection.

  • When the client is ready to close the connection, it sends a TCP data packet with the header information including (FIN=1 means to disconnect).
  • The server receives the message and replies with a data packet to the client, with the header information including the ACK confirmation number. However, the normal business of the server may not be completed at this time, and the server still needs to process the data and close it.
  • The client receives the message.
  • The server continues to process data.
  • After the server has processed the data and is ready to close the connection, it will send a TCP data packet to the client, and the header information includes (FIN=1 means to disconnect)
  • The client receives the message and replies with a data packet to the server, including the ACK confirmation number in the header information.
  • The server receives the message and completes the connection closing work.
  • After a period of time (2MSL), the client automatically enters the closing state, and the client completes the connection closing work.

MSL is Maximum Segment Lifetime, the maximum survival time of a message. It is the longest time that any message exists on the network. If this time is exceeded, the message will be discarded.

Here is a question about why it takes four waves?

A sends a disconnect message to B, and B replies with a message to indicate that it has been received. This process ensures that A has successfully disconnected. B sends a disconnect message to A, and A replies with a message to indicate that it has been received. This process ensures that B has successfully disconnected.

In fact, the difference from the connection stage is that B's confirmation message and disconnection message cannot be merged here. Because when A wants to disconnect, B may still have data to process and send, so it has to wait until the normal business is processed before sending the disconnection message.

Common status codes

  • 1XX - Temporary message. The server received the request and needs the requester to continue.
  • 2XX - Request Successful. The request was successfully received, understood, and processed.
  • 3XX - Redirect. Further action is required to complete the request.
  • 4XX - Client Error. The request contained a syntax error or the request could not be completed.
  • 5XX - Server Error. An error occurred while the server was processing your request.

Common status codes:

200 OK - The client request is successful 301 - The resource (webpage, etc.) is permanently transferred to another URL 302 - Temporary jump 400 Bad Request - The client request has a syntax error and cannot be understood by the server 404 - The requested resource does not exist, wrong URL. 500 - An unexpected error occurred inside the server. 503 Server Unavailable - The server cannot currently process the client's request and may return to normal after a period of time.

Talk about the differences and scenarios between TCP protocol and UDP protocol

Let me first talk about two scenarios, and you may be able to understand it better.

1) The first scenario is browsing the web. (TCP scenario)

When we visit a web page, the web page must display all the data correctly. If the packet is lost during this process, it will definitely be retransmitted. It is impossible to display only part of the web page (to ensure data correctness)

Similarly, the content on a web page must be in order. For example, if I draw a lottery, I can't give you the prize before you draw it. (To ensure the order of data)

Next, in this process with strict data requirements, we definitely need the two parties to establish a reliable connection, that is, we need to go through the three-way handshake before starting data transmission, and each data packet needs a receipt (connection-oriented)

The data transmitted in this kind of connection is transmitted using byte stream, that is, there is a pipe. You can transmit data however you want and receive data however you want, as long as it is within this pipe.

Therefore, TCP is needed in scenarios that require accurate data, correct order, and stability and reliability.

2) The second scenario is playing games. (UDP scenario)

The most important thing when playing games is real-time. Otherwise, if I use a skill and you haven't been hit yet, then you can't play the game.

  • Therefore, UDP needs to ensure the immediacy of data, but does not guarantee that every data packet is received correctly. Even if a packet is lost, it will not find out which packet is lost, because it needs to display the current data packet at the current time. (Data correctness and data order are not guaranteed, and packets may be lost)
  • Similarly, for the sake of data immediacy, UDP will not establish a connection. There is no need for a three-way handshake, and you have to confirm whether you have received it each time. No matter whether you have received it or not, I just need to quickly throw each data packet to you. (Connectionless)
  • Because it is connectionless, there is no need to use byte streams. Just send a datagram to you at a time, and the receiver can only accept one datagram (it cannot be confused with datagrams from other senders). (Datagram-based)

If you are still a little confused, you can read this article (Adam and Eve), which is a very vivid metaphor: https://www.zhihu.com/question/51388497?sort=created

Socket and WebSocket

Although the names of these two products are similar, they are actually not on the same level.

  • Socket, socket. As mentioned above, in the process of TCP establishing a connection, the relevant API of Socket is called to establish this connection channel. So it is just an interface, a class.
  • WebSocket is at the same level as HTTP and is an application layer protocol. It is introduced by the HTML5 specification to solve the problem of long-term communication. It is a full-duplex communication protocol based on the TCP protocol. The lower layer also needs TCP to establish a connection, so it also needs a socket.

Popular Science: After the TCP connection is established, WebSocket needs to perform a handshake via Http, that is, send a GET request message to the server via Http, telling the server that I want to establish a WebSocket connection, please be ready, the specific method is to add relevant parameters in the header information. Then the server responds, "I understand", and changes the connection protocol to WebSocket, and starts to establish a long connection.

If we have to say that the two are related, it is that the WebSocket protocol also uses TCP connections, and TCP connections use the Socket API.

Https connection establishment process

After talking about HTTP and TCP/IP, let’s talk about HTTPS.

The previous article talked about how HTTPS ensures secure data transmission, link: https://mp.weixin.qq.com/s/dbmwBVxHkvQ0fzWaSdtPYg

The main thing used is the digital certificate.

Now let's take a look at the complete Https connection establishment (also called TLS handshake process):

  • 1. The client sends a Client Hello packet message.

The message content includes a random number (randomC), encryption family (key exchange algorithm, i.e. asymmetric encryption algorithm, symmetric encryption algorithm, hash algorithm), and Session ID (used for recovery).

To establish communication, the client will send the first message, also called the Client Hello message, after the TCP handshake. This message mainly sends the above content. The ciphertext family is to send some algorithms supported by the client to the server, and then the server compares it with the algorithms supported by the server to obtain the optimal algorithm supported by both parties.

  • 2. The server replies with three data packets: Server Hello, Certificate, and Server Hello Done.

The Server Hello message content includes a random number (randomS), the encryption group obtained after comparison, and the Session ID (used for resuming the session).

At this point, both parties have two random numbers. We will see what these two random numbers are used for later. As mentioned earlier, the encryption algorithm is negotiated by the server and three algorithms are sent back to the client.

The Certificate message is used to send a digital certificate. I will not go into details here.

The Server Hello Done message is a sign of completion, indicating that all the messages that should be sent have been sent to you.

  • 3. Symmetric key generation process

1) First, the client will verify the certificate sent, such as the digital signature, certificate chain, certificate validity period, and certificate status. 2) After the certificate verification is completed, the client will encrypt and send a random number pre-master secret with the server public key in the certificate. After receiving it, the server will decrypt it with its own private key. 3) At this point, the client and the server have three random numbers: randomC, randomS, and pre-master secret. 4) Then the client and the server use the three random numbers to generate symmetric keys according to a fixed algorithm.

  • 4. Generate Session ID

This step corresponds to the Session ID in the first two hello messages.

A session ID will be generated. If the subsequent session is disconnected, the conversation can be restored through this Session ID without having to send the certificate and generate the key again.

  • 5. Transmitting data with symmetric keys

After obtaining the symmetric key, both parties can use the symmetric key to encrypt and decrypt data and communicate normally.

Extension: Why do we need to use asymmetric encryption algorithms to negotiate symmetric encryption?

First, network transmission of data requires a relatively high transmission speed. Under the premise of ensuring security, symmetric encryption is used instead of asymmetric encryption algorithms, which are more time-consuming. Secondly, under the premise of determining symmetric encryption for data transmission, if the transmission of symmetric encryption keys is a security issue, a more secure asymmetric encryption algorithm is used, and the certificate chain mechanism is added to ensure the security of the transmission of symmetric key-related data.

Please explain to me why digital signatures are authentic and reliable

Digital signature, also known as electronic signature mentioned above, is briefly reviewed:

Digital signature is actually a use of asymmetric encryption.

Its usage is:

A uses the private key to encrypt the hash value of the data. The encrypted ciphertext is called a signature, and then transmits the ciphertext and the data itself to B.

After B receives it, he decrypts the signature with the public key and then compares it with the hash value of the transmitted data. If they are the same, it means that the signature is indeed signed by A, and only A can sign it because only A has the private key.

The actual situation is:

The server uses another private key to sign the hash value of the data, which is the data we want to transmit (public key), and then transmits it together with the data (public key). The client then uses another public key to decrypt the signature. If the decrypted data and the hash value of the data (public key) are consistent, it can be proved that the source is correct and not forged.

  • The source is reliable. A digital signature can only be signed by the party with the private key, so its existence ensures that the source of the data is correct.
  • The data is reliable. The hash value is fixed. If the decrypted data is consistent with the original data hash value, it means that the data has not been modified.

Certificate chain security mechanism

A certificate authority (CA) is an institution that issues digital certificates. It is an authoritative institution responsible for issuing and managing digital certificates. As a trusted third party in e-commerce transactions, it is responsible for verifying the legitimacy of public keys in the public key system.

In actual situations, the server will pass its public key and some information about the server to the CA, and then the CA will return a digital certificate to the server, which includes:

  • Server's public key
  • Signature Algorithm
  • Server information, including host name, etc.
  • The CA's own private key signs this certificate

The server then passes this certificate to the client during the connection phase. How does the client verify it?

Careful friends must know that each client, whether it is a computer or a mobile phone, has its own system root certificate, which includes the issuing authority of the server digital certificate. Therefore, the system root certificate will use their public key to help us decrypt the signature of the digital certificate, and then compare it with the data hash value in the certificate. If they are the same, it means that the source is correct and the data has not been modified.

Of course, the middleman can also apply for a certificate through the CA, but the certificate will contain the server's host name, and this host name (domain name, IP) can verify which host your source comes from.

To expand:

In fact, there is another layer of structure between the server certificate and the root certificate: it is called the intermediate certificate. We can open any web page and click the ?? button in the upper left corner to see the certificate details:

You can see that a complete SSL/TLS certificate generally has three layers:

  • The first layer: root certificates. These are the ones that come with the client. Root certificates are all self-signed, which means that they use their own public and private keys to complete the signature creation and verification.
  • The second layer: Intermediate certificates. Generally, the root certificate will not directly issue the server certificate, because this behavior is more dangerous. If it is found that the issuance is wrong, it will be very troublesome and require the modification of the certificate. Therefore, the intermediate certificate is generally referenced. The root certificate signs the intermediate certificate, and then the intermediate certificate signs the server certificate, one layer after another.
  • The third layer: server certificate. This is the certificate related to our server.

The creation process is time-consuming, so how can we optimize it?

  • 1. Upgrade HTTP2.0

HTTP 2.0 was first tested for interoperability in August 2013. HTTP 2.0 will only be used for https:// URLs on the open Internet, while http:// URLs will continue to use HTTP/1. The goal is to increase the use of encryption technology on the open Internet to provide strong protection against active attacks.

HTTP2 has the following main features:

Binary framing. Data is transmitted in binary format, which is easier to parse and optimize than text transmission.

Multiplexing. All communications under the same domain name are completed on a single connection, and a single connection can also carry any number of bidirectional data streams.

Header optimization. HTTP/2 uses HPACK (a compression format designed specifically for http/2 headers) to compress and transmit message headers, which can save network traffic occupied by message headers.

  • 2. Using SessionID

This has been mentioned before. In order to repeat the connection process after disconnection and reconnection, SessionID is used to record the session ID, and then the session can be reused to locate it. This eliminates the process of repeatedly sending certificates and generating keys.

  • 3. TLS False Start

This is the optimization solution proposed by Google. The specific approach is:

In the second stage of TLS handshake negotiation, that is, after the client verifies the certificate and sends the pre-master secret, it directly brings the application data, such as requesting web page data.

After receiving the pre-master secret, the server generates a symmetric key, decrypts the application data directly with the symmetric key, and responds to the client.

In fact, it is to mix the two steps into one step. The client does not need to wait for the server to confirm before sending the application data. Instead, it is sent directly to the server together with the pre-master secret in the second stage, which reduces the handshake process and thus reduces the time.

  • 4. OCSP Stapling

OCSP is an online query service that verifies and checks the revocation status (legitimacy) of a certificate.

One of the steps in the certificate verification process is to verify the legitimacy of the certificate. We can let the server first query the legitimacy of the certificate through OCSP, and then send the result together with the certificate to the client. The client does not need to verify the legitimacy of the certificate separately, thereby improving the efficiency of TLS handshake. This function is called OCSP Stapling.

Extensions:

If we ignore the establishment process and consider the entire HTTPS transmission process, what are the optimization points?

You can take a look at this article: https://www.cnblogs.com/evan-blog/p/9898046.html

Talk about the difference between HTTP and HTTPS

After the above long explanation, the difference between the two should be very clear:

  • HTTP is the Hypertext Transfer Protocol, and information is transmitted in plain text. HTTPS adds a secure SSL/TLS encryption transmission protocol under the HTTP layer, which requires a CA certificate.
  • HTTP does not have identity authentication, so the client cannot know the true identity of the other party. HTTPS adds a CA certificate to confirm the other party's information.
  • The default port for HTTP is 80 and for HTTPS is 443.
  • HTTP is easily attacked or traffic hijacked because it transmits in plain text.

How to achieve block transmission and breakpoint resume?

Chunked transfer

Under normal circumstances, the server will disconnect the connection after sending all the data.

Therefore, the value of the Connection field in the request header is generally set to keep-alive, which means that the connection should be maintained until the value of the Connection field in a certain data packet is close.

Another way to maintain a TCP connection is to transmit the request data in blocks.

Block transmission means that the data sent by the server to the client can be divided into multiple parts for transmission.

Directions:

  • Set the message header to Transfer-Encoding: chunked
  • Each piece will indicate the length
  • The chunk ends with a length of 0.

Purpose:

Allow clients to respond quickly and reduce waiting time. Maintain long connections.

However, this block transmission is only available in HTTP1.1. HTTP2.0 supports multiplexing, and a single connection can carry any number of bidirectional data streams, that is, bidirectional transmission can be performed on any connection, and the block transmission function is no longer needed.

Resume download

It means that the client wants to start downloading or uploading the file from the point where it was last interrupted. This way, even if there is a network problem that causes the download or upload to be interrupted, it is fine, ensuring a good user experience.

Directions:

  • The client adds a Range field to the header of the request message to indicate the byte from which the download starts and the byte to which the download ends (Range: bytes=0-499)
  • The server adds Content-Range to the response message header to indicate the range of the data currently being sent and the total file size (Content-Range: bytes 0-499/22400).
  • The ETag field indicates the uniqueness of a file.

Actual use process:

  • The first time the client requests a download, the server returns the file content and the Etag identifier with a status code of 200.
  • The second time the client requests breakpoint resume, two header information will be sent (Range: bytes=200-499, If-Range: Etag).
  • The server will then determine whether the Etag matches. If so, it will return this portion of data (Content-Range: bytes 200-499/22400) with a status code of 206, indicating that this is part of the data you requested. Otherwise, it will return the entire file data with a status code of 200.

What are the ways to transfer pictures via Http?

In fact, this question is about the understanding of Content-Type. There are three methods:

  • multipart/form-data

Form type file transfer request. Set content-type to multipart/form-data to send binary format files. Supports uploading multiple files and text parameters.

This is the most common practice.

  • image/png, image/jpeg

This method is to directly convert the image into a binary stream for transmission, and the server side can directly read the data in the stream and convert it into an image.

But this method has a disadvantage that only one picture can be uploaded at a time.

  • application/x-www-form-urlencoded, text/plain

Another way is to convert the image into a Base64 format string and then transmit it. Just like ordinary text parameters, set the Content-Type such as application/x-www-form-urlencoded or text/plain.

refer to

https://wetest.qq.com/lab/view/110.html https://www.zhihu.com/question/271701044 https://www.cnblogs.com/wqhwe/p/5407468.html http://www.ruanyifeng.com/blog/2017/06/tcp-protocol.html https://network..com/art/201909/602938.htm https://www.dazhuanlan.com/2019/11/21/5dd5aeeff1d0b/ https://zhuanlan.zhihu.com/p/26559480 "How the Network is Connected"

This article is reprinted from the WeChat public account "Ma Shang Ji Mu", which can be followed through the following QR code. To reprint this article, please contact the WeChat public account "Ma Shang Ji Mu".

<<:  5G packages are expensive, and you can't afford to change to a 5G phone? In fact, you can connect to 5G without a 5G package

>>:  Benefits of 5G for IoT

Blog    

Recommend

DMIT: $36.9/year-1GB/10G SSD/450GB@500Mbps/Los Angeles CN2 GIA

DMIT has released the latest special package for ...

Follow WeChat! Weibo launches new emojis: they can also “split”

Weibo and WeChat are two well-known social platfo...

An article giving you a first experience with Apache APISIX

Apache APISIX is a dynamic, real-time, high-perfo...

What is a routing table?

[[343348]] This article is reprinted from the WeC...

Is homogeneous competition among telecom operators serious?

Michael Porter, a famous American strategic exper...

5G in 2021: Expectations and Developments

5G is the fastest growing mobile technology in hi...

Three trends driving cyberattacks in 2024

Ransomware claim activity is set to grow more tha...

...

The U.S. "Officialdom Exposed" in the ZTE Incident

The much-watched ZTE ban incident has experienced...

Gartner: China's IT spending is expected to grow 7.7% in 2021

According to the latest forecast by Gartner, the ...

Diagram | Why HTTP3.0 uses UDP protocol

This article is reprinted from the WeChat public ...