The past and present of ultra-low latency live broadcast technology

The past and present of ultra-low latency live broadcast technology

Foreword:

According to the "Statistical Report on the Development of China's Internet" released by the China Internet Network Information Center, as of June 2022, the scale of online live broadcast users in my country has reached 716 million, accounting for 68.1% of the total number of netizens. The main reason is that during the 2020 epidemic, the number of people working and entertaining at home has increased sharply, and new media interactive live broadcast has become one of the most important ways of leisure and entertainment for the majority of netizens.

With the continuous expansion and improvement of the live broadcast industry chain, the division of labor in each link of the relevant industry chain has gradually become clear and the number of participants in each link has gradually increased. In order to meet different employment needs and trigger an increase in the number of related employment, traditional industries are empowered to upgrade and transform through live broadcast, and integrated with high-tech innovation to optimize the business model of traditional industries, such as live broadcast sales and new media advertising and media transformation.

Rich content such as traditional culture, news, competitive sports, law, knowledge sharing, etc. can be displayed and disseminated more efficiently through interactive live broadcast on mobile terminals, which not only allows high-quality live content to be spread explosively, but also allows users to have more opportunities to experience, learn and even actively participate in live broadcast interactions, achieving a win-win situation for all parties on the content supply side and demand dissemination.

It can be said that ultra-low latency live broadcast technology is on a new development path. InfoQ will work with the Volcano Engine video live broadcast team to launch the "Ultra-Low Latency Live Broadcast Technology Evolution Path" series, which will take you to explore the evolution of ultra-low latency live broadcast technology, reveal the challenges and breakthroughs behind it, and its impact on the future live broadcast industry.

In this article today, we will talk about the past and present of ultra-low latency live broadcast technology~

Factors such as network infrastructure upgrades, audio and video transmission technology iterations, and WebRTC open source have driven the gradual reduction of audio and video service latency, making ultra-low latency live broadcast technology a hot research direction. Real-time audio and video services are booming in the consumer Internet field and are gradually accelerating their penetration into the industrial Internet field. After experiencing the first round of dividend explosion in the industry, the scene performance of China's real-time audio and video industry has gradually deepened and entered a stage of rational growth.

The choice of latency indicator depends largely on the degree of interactive coupling between users and content producers, and the scenarios are rich and varied.

In these extreme scenarios, users want the delay to be as small as possible. The low-latency mode close to real-time communication can maximize the user's sense of participation, seamlessly interact with content producers, and mobilize users' enthusiasm for what they see is what they get. For example, in the key links of PK, gift giving, union ranking, and rewarding activities in the anchor show, the big depositors on both sides of the competition hope to observe the reaction of their anchors after the gift ranking is brushed in real time, and provide the backend operation decision-making team or subsequent activity strategies with the first-hand information feedback.

The figure below reflects a comprehensive consideration of the role of low-latency live broadcast technology from the three perspectives of technology/product/operation; and considers the impact of technological changes on the positive cycle of the entire ecosystem from a comprehensive external and internal perspective.

1. Limitations of traditional standard live broadcast technology

1. Latency issues with RTMP protocol

The RTMP protocol is the most traditional live broadcast protocol. The host uses the RTMP protocol to push H.264/5 and AAC-encoded audio and video data to the cloud vendor CDN server for transpackaging and distribution. The end-to-end delay is generally controlled at 3 to 7 seconds. The problem is that the scalability of RTMP is defective, and there are certain technical difficulties in further reducing the delay. In the case of the RTMP protocol: In order to reduce the delay, the download buffer of the player must be compressed, which will cause significant stuttering problems, making the playback experience uncomfortable (the delay drops to less than 2 seconds).

2. The shortcomings of traditional live broadcast technology in real-time interactive scenarios

  • There is a significant difference between the video delay and the delay of the barrage interaction. The interactive content of the problem chat does not match the rhythm of the video transmission image.

  • The interaction between the audience and the anchor is single-sided, with content transmission being one-way and not two-way (this could not be significantly resolved before the introduction of RTC technology).
  • The first limitation of one-way transmission is that the streaming transmission at the audience end cannot be adaptively adjusted according to the network conditions. Users can only stream media at a fixed bit rate and cannot perceive it dynamically. In scenarios where the network conditions change in real time (such as weak network, mobile base station switching, etc.), fixed one-way bit rate transmission has a high probability of causing frame loss and other factors that affect the viewing experience; on the other hand, when the network conditions are better, fixed bit rate transmission cannot dynamically increase the video transmission bit rate (higher picture quality brings a more comfortable experience)
  • In an interactive live broadcast scenario where live broadcast and live broadcast scenarios coexist, when the anchor uses traditional RTMP streaming and encounters a live broadcast PK scenario, there will be a switching problem between streaming/local live broadcast confluence/server live broadcast confluence. This scene change switching will cause momentary lag problems on the audience side. If an ultra-low latency live broadcast solution based on webRTC live broadcast technology is adopted, this streaming-confluence switching problem of live broadcast can be solved relatively friendly (only the distribution logic of the server forwarding-subscription stream channel needs to be changed, and the bypass scheduling switching of the streaming media data stream is not involved).

3. The difference between ultra-low latency live broadcast and standard live broadcast

  • Ultra-low latency live broadcast is a new type of application that has emerged in recent years. For example, scenarios such as e-commerce live broadcast and event live broadcast have the characteristics of high concurrency and low latency. The 3-20s latency of traditional live broadcast is difficult to meet their needs, but the requirements for real-time interaction are not as high as typical real-time audio and video applications such as video conferencing, and there is no need to reduce the latency to below 400ms. To this end, ultra-low latency live broadcast combines the technical architecture of traditional live broadcast and real-time audio and video, and achieves an end-to-end latency between the two by taking advantage of their strengths and weaknesses. Although there is no standard technical path for ultra-low latency live broadcast manufacturers, it can be roughly summarized as the transformation of three aspects: pull stream protocol, network architecture and push stream protocol. In the actual application process, manufacturers will balance factors such as cost and performance indicators and choose between different protocols and network architectures.
  • The difference between transport layer protocols (based on the reliability optimization of UDP protocol, providing a basis for weak network countermeasure strategy)
  • Traditional live broadcast FLV/RTMP uses TCP protocol (or QUIC protocol). TCP is a reliable transmission protocol that sacrifices transmission real-time performance in exchange for data integrity. In a weak network environment, the "three-way handshake" connection before data transmission will cause a large delay. As an unreliable transmission protocol, UDP has the greatest advantage of high real-time performance, but it does not guarantee the arrival and order of data. Real-time audio and video products (such as RTM **** ultra-low latency live broadcast ) often use UDP protocol, and optimize the protocol layer and algorithm layer on top of it to improve the reliability and logic of transmission.
  • UDP protocol optimization:
  • UDP protocol often appears together with RTP/RTCP protocol in practical applications. RTP is responsible for data transmission. The fields such as sequence number, port type, and timestamp in its protocol header can provide a logical basis for the grouping, assembly, and sorting of data packets. RTCP, as the control protocol of RTP, is responsible for statistical feedback on the transmission quality of RTP and provides control parameters for weak network countermeasures.

2. The evolution of ultra-low latency live streaming technology

  • The evolution of live broadcast technology based on business scenario development ( delayed main line )
  • The evolution of the RTM protocol itself
 a=extmap:18 "http://www.webrtc.org/experiments/rtp-hdrext/decoding-timestamp" a=extmap:19 "uri:webrtc:rtc:rtp-hdrext:video:CompositionTime" a=extmap:21 "uri:webrtc:rtc:rtp-hdrext:video:frame-seq-range" a=extmap:22 "uri:webrtc:rtc:rtp-hdrext:video:frame-type" a=extmap:23 "uri:webrtc:rtc:rtp-hdrext:video:reference-frame-timestamp" a=extmap:27 "uri:webrtc:rtc:rtp-hdrext:audio:aac-config"
  • a=extmap:18 "http://www.webrtc.org/experiments/rtp-hdrext/decoding-timestamp"
  • a=extmap:19 "uri:webrtc:rtc:rtp-hdrext:video:CompositionTime"
  • a=extmap:21 uri:webrtc:rtc:rtp-hdrext:video:frame-seq-range
  • a=extmap:22 uri:webrtc:rtc:rtp-hdrext:video:frame-type
  • a=extmap:23 uri:webrtc:rtc:rtp-hdrext:video:reference-frame-timestamp
  • a=extmap:27 uri:webrtc:rtc:rtp-hdrext:audio:aac-config
  • RTP uses RTP private extension header to carry DTS/CTS value. Each frame of RTP data packet carries the DTS value of the frame through RFC5285-Header-Extension extension header. The first RTP packet and VPS/SPS/PPS packet of each frame carries the CTS value of the frame through RFC5285-Header-Extension extension header. The timestamp of the current frame is calculated through PTS = DTS + CTS. It is used to start fast audio and video synchronization and accurate audio and video synchronization of player broadcast control logic .
  • The extended header carries the start/end sequence number of the frame: if the first few packets of the first frame are lost, retransmission can be quickly initiated based on the start sequence number to speed up the first frame; if the last few packets of the current frame are lost, retransmission can be quickly initiated based on the end sequence number of the frame to reduce latency and reduce jamming .
  • The extension header carries the frame type: If the correct frame type is carried and parsed, the client does not need to parse the metadata; at the same time, in a weak network situation, the client can skip the B frame and directly decode the P frame, speeding up frame output and reducing potential freezes .
  • The extension header carries the reference frame information of the P frame: If a weak network situation occurs, the client can skip the B frame**** decoding according to the reference frame relationship and its corresponding timestamp specified in the extension header to reduce the occurrence of jamming .
  • In order to speed up the signaling interaction, CDN can not query the media information under certain conditions, and directly return the supported audio and video capabilities to the client; at this time, the media description of SDP will not contain specific audio and video configuration details. At the audio level, AnswerSDP does not contain the header information required for aac decoding; at this time, we need to use RTP extension header mode to carry AAC-Config for the client to parse and process the decoding action at the time of RTP packet reception, which can reduce the signaling interaction time and improve the success rate of streaming .
  • MiniSDP signaling standard implementation part (TikTok)
  • CDN signaling asynchronous back to source
  • RTP carries extension header components

1. Transplantation of WebRTC protocol in live broadcast player

  • RTM low-latency live broadcast is derived from WebRTC technology. There are generally the following steps to build point-to-point transmission based on the WebRTC standard:
  • The two communicating parties need to conduct media negotiation and exchange session description protocol (SDP).
  • Then interactive network address negotiation (querying the real IP address of the other end) is carried out to prepare for building a media transmission channel;
  • When the above conditions are ready, the final Peer to Peer media data transmission will begin.

  • The client-server part of the signaling part was developed independently, utilizing the SDP standard message mode; the media transmission part adopted the open source WebRTC framework and Byte's self-developed real-time audio and video media engine for media transmission.

2. Transformation and upgrade of RTC **** signaling protocol (MiniSDP compression protocol)

https://github.com/zhzane/mini_sdp

  • Standard SDP is relatively long (about 5-10KB), which is not conducive to fast and efficient transmission. In live broadcast scenarios, it will especially affect the first frame time. MiniSDP performs high-efficiency compression on the standard SDP text protocol, converting the native SDP into a smaller binary format so that it can be transmitted through a UDP packet.
  • Reduce signaling interaction time, improve network transmission efficiency, reduce first frame rendering time of live streaming, and improve QoS statistical indicators such as streaming opening rate/success rate.

Playback Protocol

RTM-HTTP signaling

RTM-MiniSDP signaling

FLV

First frame time (preview)

600ms

510ms

350ms

Stream Pull Success Rate (Preview)

97.50%

98.00%

98.70%

3. CDN asynchronous back-to-origin optimization for RTM **** signaling

  • Reduce the RTM signaling interaction time and reduce the first frame rendering time of RTM streaming.
  • The original process needs to wait for the source to get the data when the server cache does not hit, and then return the AnswerSDP with AacConfig information. The client sends STUN after receiving the AnswerSDP, and the server can only start sending data after receiving STUN. (As shown in the left figure below); In the case of asynchronous back-to-source: the server no longer waits for the back-to-source result and directly returns the AnswerSDP, and then the back-to-source and WebRTC connection establishment processes are carried out synchronously. Wait until the WebRTC connection is successfully established and the back-to-source gets the data, and then send the RTP data immediately. (As shown in the right figure below)

4. Optimization of video rendering jams (100-second jams are reduced by an average of 4 seconds)

  • Improve the average viewing time per person, change the framing/decoding strategy of the RTC engine; prohibit RTC from dropping frames in low-latency mode, and improve the video rendering freeze of live broadcasts.

Experimental Group

Video rendering freezes for 100 seconds (live broadcast scene)

RTM default JitterBuffer policy

8.3s

RTM improved JitterBuffer non-frame loss strategy

3.6s

  • Traditional RTC scenarios prioritize latency, and the entire link will trigger various frame losses (including but not limited to decoding modules and network modules). FLV live broadcast scenarios prioritize viewing experience (no frame loss, good audio and video synchronization). In order for RTM to reduce lag and achieve qoe benefits, the broadcast control strategy needs to be customized, and the customized logic modification points are:
  • Ensure that jitterbuffer is not blocked by time-consuming decoding of software decoding or other API operations such as dequeuinputbuffer of hardware decoding. The kernel layer has a layer of mandatory audio and video synchronization logic to ensure the audio and video playback experience.
  • At the same time, the upper layer monitors the cache length of the network module and the decoding module, and has corresponding backup logic:
  1. If the hard solution is indeed unsuccessful and there are too many dec_cache_frames, an error will be reported and the code will be downgraded to soft solution.
  2. The jitterbuffer is abnormal, and there are too many cached frame_lists, which triggers the player's abnormal logic, reports an error, and re-pulls the stream.

5. Optimization of RTM broadcast control logic

  • Improve mobile viewing and broadcast penetration. The RTC unified kernel solution has inherent defects (MediaCodec hardware decoder initialization takes a long time); migrate the RTM video decoding module from the RTC kernel to the TTMP playback kernel, and reuse the FLV video decoding module (MediaCodec avoids re-initialization); significantly reduce the first frame rendering time of the Android platform and improve the success rate of pulling the stream.
  • RTC core general logic

  • Improved RTM kernel playback control logic

picture

The above is all the content of the "Evolution" of the evolution of ultra-low latency live broadcast technology. In the second part "Practical" we will focus on how to implement ultra-low latency live broadcast technology on a large scale. Please continue to pay attention~

<<:  5G and eSIM are now a must for IoT companies

>>:  Which is better, Wi-Fi6, Wi-Fi6E or Wi-Fi7?

Recommend

How likely is it that 700M will be jointly built and shared in rural areas?

At the online investor exchange meeting for the i...

5G and the Internet of Things: A New Era of Digitalization

There is no doubt that 5G mobile networks will be...

How remote work is changing the future of network management

During the COVID-19 pandemic, businesses have und...

Is 5G really useful? Please give technology some time

[[393766]] What is 5G network? "5G" act...

From 1G to 5G and then to 6G, 30 years of mobile communication technology

Since the 1980s, mobile communications have seen ...

Operators hijacked the system and even changed Json

Operator hijacking is a common tactic used by thi...

My boss told me not to use strings to store IP addresses, no!

[[432371]] How to store IP address data in the da...

Data center "cloudification" solves the embarrassment of virtualization

Virtualization technology is being used more and ...

Record an incident where a network request connection timed out

[[338985]] This article is reprinted from the WeC...

ASUS releases PG27VQ gaming monitor: 165Hz, RGB light

RGB lights have now spread to every corner inside...

Ten basic skills for Linux operation and maintenance engineers

I am a Linux operation and maintenance engineer a...