Preface: With the continuous upgrading of bandwidth and WiFi, the popularization of mobile phones, and the continuous breakthroughs in live broadcast technology, various barriers are being lowered, and the era of universal live broadcast has arrived. Live broadcast has also penetrated into various industries, such as online education and the financial industry, and has gradually become a standard feature of various industries. Since its establishment, Yunfan Accelerator has been committed to enterprise services in the field of streaming media, especially for live broadcasting. It has launched live broadcast cloud solutions for different scenarios, which can save customers more R&D costs while ensuring the user experience of the majority of users. Whether it is a traditional enterprise transformation or a startup, Yunfan Accelerator will provide targeted solutions for its live broadcasting. At present, Yunfan Accelerator has established cooperative relationships with 50+ top-level customers in the streaming media field and provides services.
According to the reporter's understanding, there are several key points in the technical analysis of the live broadcast solution: whether the latency is low enough to ensure real-time interaction, and whether it is compatible with all mobile terminals to cover all users and protocol choices. What is Lianmai? To understand the mobile live broadcast and microphone connection implementation architecture, it is necessary to define the participating roles. First, the client is introduced (Figure 1), and the user roles in the microphone live broadcast are defined as: anchor, microphone connector (fans), and audience. Anchor refers to the host who is currently broadcasting live, which is equivalent to the host. He can actively invite users to connect to the microphone or approve the current audience's connection request, and can also close the connection of a certain person; the video on the anchor side is generally displayed in full screen. The participants (fans) are the audience participating in the current live broadcast. They can apply to the host for a live broadcast, or accept the host's invitation to a live broadcast, and conduct an audio or video live broadcast. When they do not want to connect to the live broadcast, they can actively disconnect. The video of the participant is generally only displayed in a certain area on the right, and the video size is small so as not to affect the display of the host's video. The audience is the audience of mobile live broadcast. The following is an introduction to the structure of the mobile live video cloud platform. To simplify the model, data storage and various types of server clusters are not considered. Only the simplest server type required for mobile live broadcast and microphone connection is described, as shown in Figure 2: The server cluster is used to manage the online conversations between anchors and anchors, and between anchors and microphone users, to realize the scheduling and computing capabilities of the audio and video cloud, and specifically includes signaling servers, streaming media server clusters, etc. The CDN network receives media data sent by the host and the microphone co-hosts, provides buffering, storage and forwarding capabilities, and distributes live content to the audience. The following are its characteristics. Compared with the one-way live broadcast of the anchor, the technical difficulty of linking microphones is much greater, as follows: Audio mixing : the host mixes his own voice with the voice of the person who is connected to the microphone. Video mixing : the host combines his own video with the video of the person who is connected to the microphone. Noise reduction , remove noise and howling in the live broadcast environment. Echo cancellation eliminates near-end echo from speakers and microphones. Low-latency interaction , with a delay jitter of 500ms~800ms, ensures real-time audio and video interaction between the host and the host. Live interactive scene between anchor and fans There are four roles involved in the architecture of mobile live broadcast and microphone connection, namely the host, the microphone connection user, the audience, and the server. From the analysis of the roles in the mobile live broadcast and microphone connection interactive scene, there are two types of microphone connections between roles: 2.1. Live broadcast between anchor and fans The host can interact with one or more fans through a live broadcast, and other fans can watch the interaction. This function can instantly increase the sense of participation and happiness of ordinary users on the live broadcast platform and increase user stickiness. 2.2. Hosts connect with other hosts The anchors need to connect with other anchors via live broadcasts to interact with each other to increase popularity, increase each other's fans, drive fans among the anchors, and achieve a win-win economy. Theoretically, these four roles can be responsible for audio and video mixing, that is, realizing the synthesis function of the microphone connection, so as to ensure that each viewer can see the video and hear the audio after the microphone connection. From a cost perspective, mixing on the server side will cause large delays on the viewer side, which is relatively costly and has no advantages. We will only discuss two solutions: mixing on the host side and mixing on the viewer side. Live interactive solution for anchors and fans Solutions for live interaction between anchors and fans: 3.1 Stream Mixing on the Anchor End This implementation requires the host to synthesize his own video data with the video data of the fans connected to the microphone, and then push the synthesized video stream, the host's own audio data, and the audio data of the fans connected to the microphone to the CDN network, and distribute it to all viewers through the CDN. Therefore, the host's mobile phone has a heavier task, and the requirements for mobile phone performance and network performance are also higher than those of ordinary live broadcasts. The basic process of the host's mixing part when the host and fans are connected to the microphone is shown in Figure 3: After the host and the fans establish a live broadcast session, they both push the original audio and video streams to the CDN network. The host and the fans obtain each other's media data from the CDN network. After the host pulls the audio and video data of the connected person from the CDN network, the host performs the corresponding stream mixing work on the host side. On the one hand, it is used for his own video display and sound playback, and on the other hand, it is sent to the CDN network for the audience to pull the stream and watch; the connected fans get the host's video and audio, perform echo cancellation, noise reduction and other work, and use it for their own video display and sound playback. The host synthesizes his own video data and the video data of the fans connected to the microphone to replace the video image of the host's original stream, and pushes the host's own audio data and the audio data of the fans connected to the microphone to the CDN network for the audience to watch. The stream mixing work performed by the host includes: picture synthesis, echo cancellation, noise reduction, and mixing. After the audience pulls the mixed video data of 1 channel and audio data of 2 channels from the host, they watch the synthesized picture. The mixed stream picture they see is: A large window and B small window. The advantages and disadvantages of the mixed streaming solution on the host side when the host and fans are connected via microphone: Disadvantages : The host side is under great pressure. The host side needs to mix videos and face computing pressure. The requirements for mobile phone performance and network performance are also higher than those for ordinary live broadcasts. It is not suitable for multiple people to connect to the microphone. Advantages : Solve the upstream bandwidth bottleneck, push one video stream, and there is no upstream bandwidth bottleneck on the anchor side; low cost, it reduces costs in two aspects: computing resources and network bandwidth; 3.2 Stream Mixing at the Audience End This implementation requires the audience to pull the audio and video data of the host and the person who is connected to the microphone separately, and then mix the streams on the audience side. The basic process of the host and the fans connecting to the microphone and the audience mixing the streams is shown in Figure 4: After the host and fans establish a live broadcast session, they both push the original audio and video streams to the CDN network. The host and the co-host obtain each other's media data from the CDN network, perform echo cancellation, noise reduction and other tasks on their own end for their own video display and sound playback, and send their own audio and video data to the CDN network so that the audience end can pull the stream and perform responsive mixing. The audience end pulls 2 channels of video data and 2 channels of audio data and performs corresponding stream mixing, which includes: picture synthesis, echo cancellation, noise reduction, and audio mixing. After the audience mixes the streams, they see the mixed streams in the large window A and the small window B. The advantages and disadvantages of the mixed streaming solution for the host and fans: Disadvantages : The pressure on the audience side is high. The audience needs to pull multiple streams for mixing, which puts a lot of pressure on the downlink bandwidth of the audience side. The decoding pressure is high. The audience needs to decode multiple streams, which has a high overhead. Advantages : Simple and easy to implement, can be quickly built; supports multiple people connecting to microphones; A solution for live broadcast interaction between anchors 4.1 Stream Mixing at the Audience End This implementation requires the audience to pull the audio and video data of all the hosts connected to the microphone, and then mix the streams on the audience side. The basic process of the hosts connecting to the microphone and the audience side mixing the streams is shown in Figure 6: After the anchors establish a live broadcast session with each other, they push the original audio and video streams to the CDN network. The anchors obtain the other party's media data from the CDN network, perform echo cancellation, noise reduction and other tasks on their own end for their own video display and sound playback, and send their own audio and video data to the CDN network so that the audience can pull the stream and perform responsive mixing. After the audience pulls the 2-channel video data and 2-channel audio data, it performs the corresponding stream mixing work, which includes: picture synthesis, echo cancellation, noise reduction, and audio mixing. After the audience has mixed the streams, they watch the composite image. If before the A and B hosts are connected, C1 is the audience of A and C2 is the audience of B. After the connection, the mixed stream image that C1 sees is: A large window, B small window, and the mixed stream image that C2 sees is: B large window, A small window. The advantages and disadvantages of the solution of connecting anchors and audiences to the same microphone: Disadvantages: The audience side has great pressure, the audience needs to pull multiple streams for mixing, and the audience side has great pressure on the downlink bandwidth; the decoding pressure is great, the audience needs to decode multiple streams, and the overhead is relatively high; Advantages: Simple and easy to implement, can be quickly built; supports multiple people connecting to microphones. Advantages of Yunfan's solution for accelerating interactive live streaming 5.1 The stream converging solutions on the host side are based on the self-developed UDP private protocol to solve the bandwidth bottleneck problem caused by the stream converging on the host side Yunfan Accelerator's live streaming and live broadcasting interactive solution based on the UDP private protocol not only has the cost-saving advantage of the live streaming and live broadcasting solution, but also includes more advantageous technical features through technological innovation, extensive testing and practice: The upstream bandwidth bottleneck of the host-side mixed stream has been solved. The mainstream host-side mixed stream solution is implemented through 2-way RTMP stream. Before the microphone connection, the host pushes 1 channel of audio and video to the CDN network; after the microphone connection, the host pushes 2 channels of video streams and 2 channels of audio streams (their own audio and video data and the audio and video data of the host) to the CDN network. The mixed streams are mixed at the host end, and the upstream network bandwidth consumed is doubled compared to before the microphone connection. Yunfan Accelerator's stream mixing solution for the anchor side is implemented through a single video stream based on the UDP private protocol. The difference from the mainstream solution is: The entire live broadcast interaction is based on a self-developed UDP private protocol. After the host's stream is mixed, the synthesized video image replaces the host's original stream video image, and only one video stream is pushed to the CDN network. After the live broadcast, only the uplink bandwidth consumed by pushing the audio of the host is increased. According to my country's network conditions, if the downlink network bandwidth is 100M bps, then the uplink network bandwidth is generally 1M bps, and the better ones will reach 4M bps. According to Yunfan Acceleration's experience, the average bit rate of the audio and video streams of 360P resolution high-definition video on mobile terminals is 864 kbps, of which the average video bit rate accounts for 800 kbps. The mainstream host-side connection solution will push two streams after the connection: one host audio and video stream, and one video stream of the person connecting the microphone. Therefore, the bit rate of the mainstream connection solution pushing two audio and video is about 1.73Mbps. The Yunfan accelerated host-side mixed stream solution always pushes one video stream and adds the audio of the person connecting the microphone after the connection. The total bit rate is 0.93 Mbps, which solves the uplink bandwidth bottleneck problem of the mainstream host-side mixed stream solution. UDP reliable transmission : All upstream streams on the anchor side no longer rely on the TCP-based RTMP protocol, but use Yunfan's self-developed high-performance private protocol based on UDP. The QoS guarantee of the transport layer is more intelligent and efficient. Adaptive bitrate and adaptive frame rate : Use the audio and video bitrate adaptive algorithm based on the network status to adaptively reduce or increase the audio and video bitrate and frame rate according to the current network packet loss and delay. This method can reduce network congestion and improve call quality. Supports more anchor interactions : Currently supports 2-person video, which can be expanded to multiple anchors, multiple people interacting via microphone, and multiple people interacting via microphone with pure audio; Support large and small window switching : The anchor side supports customized large and small stream switching, and can independently choose the switching of video window size. 5.2 Full-process optimization technical details The microphone connection technology includes four major technical points: network, video, audio, and adaptation. We have fully optimized these four aspects. 5.2.1 Network Optimization Network transmission protocol: The solution supports TCP/self-developed UDP private protocol/self-developed UDP private protocol + RTMP protocol. We recommend using UDP or UDP+RTMP solutions. Choosing TCP is a good solution when the network conditions are controllable, but it will cause freezes, delays, and high disconnection rates in large-scale cross-network and terminal network instability. Self-developed UDP private protocol: The entire UDP transport layer uses the forward error correction FEC algorithm for intelligent protection to maximize the effect of real-time audio and video calls. According to our actual tests, after using the above QoS guarantee strategy, audio and video calls can withstand 20% packet loss and 800ms network jitter. HTTP DNS resolution optimization : The local cache of the resolution results of the domain name used for live broadcast and playback, pre-resolution of the domain name, no need to perform the DNS process every time live broadcast and playback are required. This can save tens to hundreds of milliseconds of opening delay. Intelligent QoS guarantee : The streaming end will control the audio and video data packetization and encoding according to the current uplink network conditions, and will select appropriate strategies to control audio and video transmission according to the network conditions. For example, when the network is very poor, the streaming end will give priority to sending audio data to ensure that users can hear the sound, and send key frame data at a certain interval to ensure that users can see some changes in the picture after a certain time interval. Network status feedback : The network status of the streaming end will be monitored in real time and reported back. Adaptive bitrate and adaptive frame rate : Use the audio and video bitrate adaptive algorithm based on the network status to adaptively reduce or increase the audio and video bitrate and frame rate according to the current network packet loss and delay. This method can reduce network congestion and improve call quality. Optimization of streaming in weak networks : Based on UDP customized packet headers, transmission prediction is achieved, packet loss is proactively prevented, and packet replenishment is performed appropriately. The system will determine whether to replenish packets based on network conditions and packet types, and perform optimal replenishment. The self-developed UDP congestion mechanism will dynamically increase or decrease the bitrate based on real-time network packet feedback, ensuring that streaming is not jammed even in weak network conditions. If the QoS on the streaming end is abnormal, the system can detect in real time and perform adaptive bitrate and frame rate, or manually adjust the bitrate and frame rate on the streaming end. 5.2.2 Video Optimization H.264 encoding based on time domain layering: Through a large number of tests and optimizations, it has better adaptability to network jitter. Ultra-fast opening: Supports millisecond-level startup screen, no need to wait to watch live broadcasts. Before audio and video decoding, the decoder type is preset to save the time of detecting file types; supports millisecond-level startup screen, no need to wait to watch live broadcasts; even when the network is not good, it can provide users with a smooth live viewing experience. 5.2.3 Audio Optimization Cloud model and network adaptation: Before streaming and playing, the terminal will obtain the model configuration, network status, and IP information of the current model reported through the protocol, and continuously iterate and improve the model codec adaptation library. Hardware codec model adaptation: Through the continuous improvement of cloud models, the cloud will return the most suitable codec strategy configuration; 5.3 Server deployment and intelligent scheduling 5.3.1 Global Node Deployment · Multiple BGP computer rooms and third-line computer rooms in China, with 100% coverage of major cities in China; Overseas data center nodes; · Transnational agency fiber optic dedicated line. 5.3.2 Intelligent node allocation · Analyze the user's region, operator, and ISP and make intelligent allocation; · Real-time load of service nodes; · Real-time network status monitoring of service nodes; 5.3.3 Service availability and stability · High service availability: 99% high availability architecture deployment; · High-performance and stable physical machine; Automatic downtime recovery and policy switching. 5.3.4 Self-developed live broadcast CDN system and third-party CDN network support Yunfan's accelerated interactive live broadcast solution is not only perfectly compatible with the self-developed live broadcast CDN system, but also supports access to third-party CDN networks. Summary: Since the second half of 2016, the live broadcast industry has undergone great changes: Live broadcast interaction has become a standard feature of the live broadcast industry. As a service company in the field of streaming media, Yunfan Accelerator has always been adhering to the concept of providing customers with higher value, transmitting immeasurable value to users, reducing the burden on enterprises, and providing one-stop live broadcast solutions for all walks of life. |
<<: 5G spectrum competition is "fighting" and deployment is surging
>>: How to build your own CAN-bus application layer protocol
The Chinese New Year is getting closer and closer...
From ERP and compliance to data visualization, th...
At present, domestic policies mainly revolve arou...
When enterprises compare SD-WAN vs. VPN services,...
In the era of cloud computing, IT system construc...
"Times make heroes" is an eternal truth...
TMThosting has launched a summer promotion, with ...
In November 2023 , the " China Enterprise &q...
On November 26, 2019, the last IPv4 address in th...
On August 20, 2019, Aruba, a subsidiary of Hewlet...
Today I will share with you 9 practical computer ...
During the 2021 Global Mobile Broadband Forum, th...
The Internet has been developed for decades. Face...
spinservers is offering a 50% discount coupon for...
Communication technology, as a basic technology, ...