Usually when we open a web page, such as a shopping website Taobao, we click on a product in the list and jump to the web page to see the product details. From the perspective of the HTTP protocol, when you click a button on a web page, the front end sends an HTTP request, and the website returns an HTTP response. This method of active request by the client and response by the server also meets the functional scenarios of most web pages. But have you noticed that in this case, the server will never actively send a message to the client. Just like the girl you like will never take the initiative to look for you. But if now, when you are browsing the web, a small advertisement suddenly pops up in the lower right corner, reminding you that [you can only play secretly when you are alone at home]. The thirst for knowledge, love of learning, and diligence, these things engraved in your DNA are all put into action. You click and find out. Gu, who looks ordinary, reminds you that "the Taoist priest has 9 dogs and they are walking sideways all over the server." The movie star Mr. Hui tells you "If you are a brother, come and chop me up." Now that you are here, you choose a character and enter the game interface. At this time, a small monster will come up, walking from a distance, and then hit you madly with a wooden stick. You didn't click the mouse at all during the whole process. The server automatically sent you the monster's movement data and data. This... is so heartwarming. While I was moved, the question came up. In a scenario like this where it seems like the server actively sends messages to the client, how is this done? Before we actually answer this question, let’s talk about some relevant background knowledge. Continuous polling using HTTPIn fact, the pain point of the problem lies in how to ensure that the web page receives messages and changes without the user taking any action. The most common solution is that the front-end code of the web page continuously sends HTTP requests to the server at regular intervals, and the server responds to the client after receiving the request. This is actually a form of pseudo server push. In fact, it is not the server that actively sends messages to the client, but the client itself keeps secretly requesting the server, but the user is unaware of it. There are many scenarios where this method is used, the most common of which is logging in by scanning a QR code. For example, on a WeChat official account platform, after the QR code appears on the login page, the front-end web page has no idea whether the user has scanned it, so it keeps asking the back-end server to see if anyone has scanned the code. And it keeps sending requests at intervals of about 1 to 2 seconds, so that users can get timely feedback within 1 to 2 seconds after scanning the code, without having to wait too long. Using HTTP timed polling But there are two obvious problems.
The user experience is that after the QR code appears, you scan it with your phone, and then click confirm on the phone. At this time, there will be a pause of 1~2 seconds before the page jumps. Keep polling to see if there is a scan code So the question is, is there a better solution? Yes, and it can be done at a very low cost. Long PollingWe know that after an HTTP request is sent, the server is generally given a certain amount of time to respond, such as 3 seconds. If there is no response within the specified time, it is considered a timeout. If our HTTP request timeout is set to a large value, such as 30 seconds, within these 30 seconds, as long as the server receives the code scanning request, it will immediately return the web page to the client. If it times out, the next request will be initiated immediately. This reduces the number of HTTP requests, and because in most cases users scan the code within a 30-second interval, the response is also timely. Long Polling For example, Baidu Cloud Disk does this. So you will find that once you scan the code and click Confirm on your phone, the web page on your computer will jump in seconds, which is a great experience. Long polling as an alternative It really kills two birds with one stone. This mechanism of initiating a request and waiting for a server response for a long time is called the long training wheel mechanism. In RocketMQ, our commonly used message queue, this method is also used when consumers fetch data. RocketMQ consumers get data through long polling This kind of technology, in which the server pushes data to the browser without the user's knowledge, is called server push technology. It also has an unrelated English name, comet technology, but you've just heard of it. In essence, the two solutions mentioned above still require the client to actively retrieve data. It can still be used for simple scenarios like scanning code to log in. But if it is a web game, the game usually has a large amount of data that needs to be actively pushed from the server to the client. This requires talking about websocket. What is websocketWe know that at both ends of a TCP connection, both parties can actively send data to each other at the same time. This is called full-duplex. The most widely used HTTP1.1 is also based on the TCP protocol. At the same time, only one of the client and the server can actively send data. This is the so-called half-duplex. In other words, the full-duplex TCP is used as half-duplex by HTTP. Why? This is because when the HTTP protocol was first designed, it was designed with the scenario of viewing web page texts in mind. It was sufficient for the client to initiate a request and the server to respond. It did not consider scenarios such as web games, where the client and server would actively send large amounts of data to each other. So in order to better support such scenarios, we need another new protocol based on TCP. So a new application layer protocol websocket was designed. Don't be misled by the name. Although the name contains "socket", in fact, there is no relationship between socket and websocket, just like Leifeng and Leifeng Pagoda. The position of websocket in the four-layer network protocol How to establish a websocket connectionWe usually browse the web on a browser, sometimes browsing pictures and texts, using the HTTP protocol, and sometimes opening a web game, in which case we have to switch to the newly introduced websocket protocol. In order to be compatible with these usage scenarios, browsers all use the HTTP protocol to communicate once after the TCP three-way handshake establishes a connection.
Connection : Upgrade These headers mean that the browser wants to upgrade the protocol (Connection: Upgrade) and wants to upgrade to the websocket protocol (Upgrade: websocket). At the same time, bring a randomly generated base64 code (Sec-WebSocket-Key) and send it to the server. If the server supports upgrading to the websocket protocol, it will go through the websocket handshake process, and use a public algorithm to convert the base64 code generated by the client into another string, put it in the Sec-WebSocket-Accept header of the HTTP response, and send it back to the browser with a 101 status code. HTTP / 1.1 101 Switching Protocols\r\n We have seen many cases where the http status code = 200 (normal response). 101 is indeed uncommon, and it actually refers to a protocol switch. Convert base64 to a new string Afterwards, the browser uses the same public algorithm to convert the base64 code into another string. If this string is consistent with the string sent back by the server, the verification is successful. Compare the strings generated by the client and the server After two HTTP handshakes, the websocket is established, and the two parties can then communicate using the websocket data format. Establish a websocket connection.drawio websocket packet captureWe can use wireshark to capture a packet and actually see the situation of the data packet. The client requests to upgrade to websocket In the above picture, please note the message line 2445 with a red frame, which is the first handshake of websocket, meaning that an HTTP request with a special header is initiated. The server agrees to upgrade to the websocket protocol The 4714th line of the message with a red frame in the figure above is the second handshake that the server responds to after receiving the first handshake. You can see that this is also an HTTP type message, and the returned status code is 101. At the same time, you can see that the returned message header also contains various websocket related information, such as Sec-WebSocket-Accept. After two HTTP requests, websocket communication is officially used The picture above is the full picture. From the annotations on the screenshot, we can see that websocket is a TCP-based protocol like HTTP. After three TCP handshakes, the HTTP protocol is upgraded to the websocket protocol. You may see a statement on the Internet: "websocket is a new protocol based on HTTP", but this is actually not true, because websocket only uses HTTP when establishing a connection, and it has nothing to do with HTTP after the upgrade is completed. It's like the girl you like asks you for your college roommate's WeChat, and then they start chatting. Can you say that the girl is communicating with your roommate through you? No. You are just a tool, just like HTTP. This is a bit like "laying eggs in a shell". The relationship between HTTP and websocket Websocket message formatAs mentioned above, after the protocol upgrade is completed, both ends will communicate using the webscoket data format. Data packets are called frames in websocket. Let's take a look at what its data format looks like. Websocket message format There are many fields here, but we only need to focus on the following ones. opcode field: This is used to indicate what type of data frame this is. When it is equal to 1, it refers to a data packet of text type (string). A data packet equal to 2 is a binary data type ([]byte). Equal to 8 is the signal to close the connection Payload field: stores the length of the data we actually want to transmit, in bytes. For example, if the data you want to send is the string "111", then its length is 3. In addition, we can see that there are several fields for storing the payload length. We can use the first 7 bits, or the following 7+16 bits or 7+64 bits. So here comes the question. We know that at the data level, everyone is a binary stream of 01. How do I know when I should read 7 bits and when I should read 7+16 bits? Websocket uses the first 7 bits as flags. No matter how large the following data is, it reads the first 7 bits first, and decides whether to read 16 bits or 64 bits based on its value.
The payload length is between 0 and 125.
The payload length is between 126 and 65535.
When the payload length is greater than or equal to 65536 Payload data field: This is where the actual data to be transmitted is stored. After knowing the payload length above, you can intercept the corresponding data based on this value. Have you noticed a small detail? The data format of websocket is also in the form of data header (including payload length) + payload data. As mentioned in the previous article "Since there is HTTP protocol, why do we need RPC", the TCP protocol itself is full-duplex, but using pure TCP to transmit data directly will cause the "problem" of packet sticking. To solve this problem, the upper-layer protocol generally repackages the data to be sent in the format of message header + message body. The message header usually contains the length of the message body, and this length can be used to intercept the actual message body. The HTTP protocol and most RPC protocols, as well as the websocket protocol we introduce today, are all designed in this way. Message boundary length flag Use scenarios of websocketWebsocket perfectly inherits the full-duplex capability of the TCP protocol, and also provides a solution to the sticky packet problem. It is suitable for most scenarios that require frequent interaction between the server and the client (browser), such as web/applet games, web chat rooms, and some web collaborative office software like Feishu. Back to the question at the beginning of the article, in web games that use the websocket protocol, monster movement and player behavior are generated by server logic. Data such as the damage caused to the player needs to be actively sent by the server to the client, and the client will display the corresponding effect after receiving the data. Use scenarios of websocket Summarize
|
<<: H3C SD-WAN security solution builds an integrated security assurance system for enterprise WAN
>>: The Internet is like this: Network optimization practice for quick payment transaction scenarios
DevOps has transformed the workflow and tradition...
In the era of the Internet of Everything, with th...
In the past two days, Linode released a blog post...
Amazon Web Services (AWS) has launched the AWS Pr...
Many communication protocols are often used in em...
ExtraVM recently released a 70% discount on the f...
This month, edgeNAT launched a new Korean native ...
According to Electronic Times, industry sources r...
Wireless routers have become a must-have for ever...
Many devices on the network today, such as IP cam...
When China leads the world in 5G technology, the ...
As the digitalization process deepens, the value ...
As the global 5G latest version standard is locke...
AT&T said it tested 400 Gbps Ethernet (400GbE...
Telecom operator Telenor has officially launched ...