A Brief Discussion on WebSocket Protocol-RFC 6455

A Brief Discussion on WebSocket Protocol-RFC 6455

Labs Guide

Before the emergence of WebSocket, the common two-way data exchange methods between the client and server of a Web application (instant messaging, multi-person collaboration) include short polling, long polling, and SSE (Server-Sent Events). These methods have many problems in terms of efficiency and network bandwidth utilization. The WebSocket protocol came into being, providing a simple two-way data transmission capability.

Part 01: Introduction

WebSocket is a network communication protocol for full-duplex communication over TCP connections. It was born in 2009, and was standardized by IETF (The Internet Engineering Task Force) in 2011 and released RFC 6455 Internet Standard Track document. In 2016, RFC7936 document was released to supplement it. WebSocket API is also standardized by W3C.

picture

The WebSocket protocol was originally designed to replace HTTP communication, because RFC6202 mentioned that the HTTP protocol was not originally used for two-way data communication. The WebSocket protocol does not completely abandon HTTP. It achieves the goal of two-way communication in the existing environment based on HTTP basic services. As stated in RFC 6455, the design philosophy of WebSocket is a minimally constrained framework. The only constraint is that the protocol is based on frames rather than streams, and supports both Unicode text and binary frames.

Part 02: Handshake

The WebSocket protocol is divided into three parts: connection handshake, message transmission, and disconnection handshake. The overall process is shown in the figure below.

2.1 Connection handshake - client

In order to be compatible with HTTP server-side applications and proxies, the client's connection handshake (including connections made through proxies or TLS encrypted tunnels) is a valid HTTP upgrade request that complies with the definition in RFC2616. The client connection handshake request header fields are shown in the figure below. In addition, once the client sends the connection handshake, it must wait for a response from the server.

- Request URI

Format: ws-URI = "ws:" "//" host [ ":" port ] path [ "?" query ] or wss-URI = "wss:" "//" host [ ":" port ] path [ "?" query ]; any invalid value will cause the connection to fail.

- Request Line

The method must be GET and the HTTP version must be at least 1.1

- Upgrade

The value must be "websocket", ASCII value, case-insensitive

- Connection

The value must contain "Upgrade", ASCII value, case-insensitive

-Sec-WebSocket-Key

A 16-byte base64-encoded string randomly generated by the client for this connection

-Origin

Source address, required for browser clients, optional for non-browser clients

-Sec-WebSocket-Protocol

One or more comma-separated subprotocols supported by the client, in order of priority

-Sec-WebSocket-Version

The protocol version number that the client intends to use must be 13. Historical versions 9, 10, 11, and 12 are no longer valid values

-Sec-WebSocket-Extensions

The client intends to use protocol extensions. Currently, the HyBi Working Group has developed multiplexing extensions and compression extensions. The multiplexing extension implements the sharing of the underlying TCP connection. The compression extension adds compression capabilities to the WebSocket protocol, such as x-webkit-deflate-frame

2.2 Connection Handshake - Server

When a client establishes a WebSocket connection with a server, the server must respond to the client's connection handshake request. The fields in the handshake response header are shown in the figure below.

picture

- Status Line

HTTP/1.1 101 Switching Protocols, indicating that the client connection is accepted. If the server wants to stop processing the client's handshake, it can return an HTTP response with an error code such as 401.

- Upgrade

The value must be "websocket"

- Connection

Value must contain "Upgrade"

-Sec-WebSocket-Accept

If the server accepts the client connection, it generates this value. First, concatenate the Sec-WebSocket-Key value in the client request header with the globally unique identifier "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" defined in the RFC4122 document, perform SHA-1 hashing, and then perform base64-encoding to obtain this value.

-Sec-WebSocket-Protocol

The protocol that the server intends to use. This value is selected from the Sec-WebSocket-Protocol sent by the client. If the server does not support it, the value is empty.

-Sec-WebSocket-Extensions

The server intends to use the protocol extension

2.3 Disconnect Handshake

Both the client and the server can send a control frame (Close control frame) containing a specified control sequence to start the closing handshake. When one party receives the closing control frame, it only needs to send a closing frame in response and then close the connection. The TCP closing handshake is not always a reliable end-to-end handshake in the presence of intercepting proxies, etc. The above closing handshake process is intended to supplement the TCP closing handshake (FIN/ACK).

Part 03, Data Transmission

Once the client and server have successfully completed the handshake, both parties can start data transmission. This is a two-way communication channel. Based on the concept of "message" in the RFC 6455 specification, both parties can send data independently and at will. A message contains one or more data frames (not necessarily corresponding to the message in the network layer). The Websocket frame format is shown in the figure below.

3.1 Frame Structure

-FIN

1 bit, indicating whether it is the last fragment of a message.

- RSV1, RSV2, RSV3

1 bit. The default value is 0 if the extended function is not used.

-Opcode

4 bits, defines the "Playload data" data type.

  • 0 (decimal): continuous frames
  • 1: Text frame
  • 2: Binary frame
  • 3-7: Reserved non-control frames
  • 8: Connection closed frame
  • 9: Heartbeat ping frame
  • 10: Heartbeat pong frame
  • 11-15: Reserved control frame

-MASK

1 bit, whether to shield "Playload data", 1 is yes, 0 is no.

- Payload length

7 bits, or 7+16 bits, or 7+64 bits, represent the length of the Payload data. Specifically, if Payload length is less than 125, the data length is represented by Payload length; if Payload length is equal to 126, the data length is represented by the 16 bits after Payload length; if Payload length is equal to 127, the data length is represented by the 64 bits after Payload length.

- Masking-key

32 bits, storing the mask sent by the client. To prevent proxy cache pollution attacks, RFC6455 requires that the mask must come from a strong entropy source and cannot be predicted. The conventional algorithm traverses the payload data in bytes. For the i-th byte of the payload data, i is modulo 4 to get j. The value of the i-th byte of the payload data covered by the mask is the bitwise XOR operation of the original i-th byte and the j-th byte of the Masking-Key.

- Payload data

Payload data is divided into extended data and application data. The use of extended data is negotiated during the handshake phase, and application data comes after the extended data.

3.2 Control Frame

The control frame is determined by the Opcode value. The opcodes of the control frame currently defined by the protocol include 0x8 (Close), 0x9 (Ping), and 0xA (Pong). The control frame must have a payload length of less than or equal to 125 bytes. For the Close control frame, the first 2 bytes of the payload represent the status code, and the remaining bytes represent the reason for closing, as shown in the following figure.

3.3 Message Sharding

Message fragmentation refers to sending a conceptual "message" through multiple data frames. Message fragmentation allows messages of unknown size to be sent without buffering the entire message. At the same time, message fragmentation combined with the extension of the multiplexing protocol can split the message into smaller segments to share the output channel.

In the protocol, the FIN bit of the start frame of a fragmented message is 0, and the opcode bit is non-0, indicating that the frame is a fragment of a message. The FIN bit of the intermediate frame is 0, and the opcode bit is 0. Finally, the end of the fragmentation is marked by the FIN bit being 1 and the opcode being 0. The protocol requires that the fragmented data frames be sent to the other end in sequence.

Part 04: Summary

WebSocket is designed on top of the TCP layer, so there is no need to consider data length, data packet sticking and unpacking. It can also be combined with HTTP/2 multiplexing through extended functions to make full use of bandwidth. Developers only need to process the message fragmentation logic in sequence in the server and client codes.

<<:  How to improve the energy efficiency of communication construction?

>>:  PoE, PoE+, PoE++ switches: How to choose?

Recommend

South Korea: 14 6G communication satellites will be launched before 2031

June 21 news, according to foreign media reports,...

How to use SSL/TLS in Node.js

This article is reprinted from the WeChat public ...

Future trends driving unified communications in 2021

[[360050]] While 2021 is full of uncertainty for ...

SDN helps unify wired and wireless campus networks

IT professionals are faced with the challenge of ...

Expert Viewpoint: Looking into the future of the Internet

How will businesses’ approach to networking evolv...

PacificRack: Windows VPS in Los Angeles Data Center starting at $12 per year

PacificRack has released several discounted VPS p...