Three o'clock in the morning, staying up late, backing up data, trembling with fear, checking every step, business interruption, testing and verification, emergency rollback, the big boss personally present... This is a cutover scenario that communication engineers are familiar with.
Every software version upgrade, especially for the core network, affects the entire network. A single mistake can destroy the entire network, which is a very painful thing for network operation and maintenance. Why is it so difficult for the telecommunications industry to upgrade software, while the Internet giants can easily achieve it? Amazon deploys 10,000 software packages every hour, Google changes half of its source code every month, and WeChat has been upgraded N times. Why can they frequently upgrade software without users noticing? Entering the 5G era, with the rise of various dazzling new 5G applications such as high-definition live broadcast, VR/AR, cloud games, industrial control, drones, autonomous driving, and telemedicine, the number of business demands and the rapid changes have exceeded any previous 2/3/4G era. This not only requires network functions to be quickly upgraded according to business needs, but also requires businesses to be able to quickly trial and error. Faced with the business demands of the 5G era, can the telecommunications industry upgrade software as easily and frequently as Internet giants? Can. This requires grayscale upgrade. Grayscale upgrade, also known as grayscale release, refers to a release method that can smoothly transition between black and white. A/B testing can be performed on it, that is, some users continue to use product feature A, and some users start using product feature B. If users have no objection to B, then gradually expand the scope and migrate all users to B. Grayscale release can ensure the stability of the overall system. Problems can be discovered and adjusted during the initial grayscale to ensure their impact. The period from the beginning to the end of grayscale release is called the grayscale period. Grayscale release reduces the risk of business changes through small-scale rapid trial and error, and has become one of the basic capabilities for online business release in the Internet industry. So how can grayscale release be achieved in the telecommunications industry? In the core network evolving towards 5G, Huawei has further combined the characteristics of telecommunications services with advanced IT concepts based on concepts such as Cloud Native, creating the industry's only 5G core network grayscale upgrade solution, which can help 5G services go online agilely and reliably, and has successfully achieved the industry's first commercial implementation. Let's take Huawei's grayscale upgrade solution as an example to see how it is implemented. Software architecture microservices make releases and upgrades quick and easy Speaking of microservices, let’s first take a brief look at the evolution of network cloudification. As shown in the figure above, network cloudification has evolved from network function virtualization (NFV) to cloud native (Cloud Native), which is generally a process of "decomposition" and "re-decomposition". NFV decouples traditional dedicated network devices that integrate software and hardware into software (VNF) and general-purpose hardware. Network function software is no longer restricted by dedicated hardware and can be flexibly deployed on general-purpose hardware, allowing operators to launch new functions and new services through software upgrades. But this is not enough. VNFs that are separated from dedicated hardware are large-grained telecommunications software packages that are very large and complex, involving millions of lines of software code. This means that the entire process from software development to release and testing is extremely labor-intensive and is estimated to take a year, making it impossible to respond agilely to the rapidly changing needs of 5G services. What should we do? Based on the cloud-native design principles, we can further decompose the large-grained VNF into multiple small-grained microservices, such as access management microservices, dialogue management microservices, database management microservices, interface management microservices, etc. Microservices are not only small in size, but also have independent lifecycle management, which can achieve more fine-grained software development, release, testing and upgrades. This improves operational agility, accelerates innovation and the launch of new businesses, and adapts to rapidly changing market and business changes. If traditional telecommunications software is compared to woodblock printing, a single Chinese character error will cause the entire page to be scrapped, which is time-consuming and laborious; then, microservices completely subvert the traditional software architecture. Just like movable type printing, individual errors will not cause the entire page to be reworked, and efficiency will increase exponentially. Huawei's grayscale upgrade solution is based on the software microservice architecture. It decomposes large software packages into independent small software modules, and develops, tests and releases versions in a microservice manner. Each microservice has its own independent version number. During the upgrade, the version number of each microservice in the source version and the target version is automatically determined, and only the changed microservices are upgraded, so that small, fast and flexible incremental software releases can be achieved. The surrounding equipment is not aware of it and the upgrade operation can be performed even in broad daylight Microservices solve the problem of telecommunications software being too large and complex, but to make network functions more resilient and robust, stateless design and three-layer decoupling of software are also required. Note that the three-layer decoupling architecture here does not refer to the "three-layer decoupling" of NFV's hardware layer, virtualization layer, and VNF layer, but rather divides the software into a stateless three-layer architecture. As we all know, traditional telecom network elements will always save the context information of related UEs to ensure uninterrupted connections, which is called stateful. This stateful design is a hindrance to the flexible migration of telecom software between virtual machines/containers. To this end, the industry refers to the stateless design of IT software, separates context information from service software, and forms an independent state database layer, so that the service processing software (VNF component) becomes an agile and elastic stateless service processing layer. At the same time, the independent load balancing software module at the interface layer can effectively and quickly balance the load between service processing software to support the high throughput of the entire system. This divides telecommunications software into three layers: data layer, service processing layer and load balancing layer. Huawei's grayscale upgrade solution is designed based on statelessness and three-layer software decoupling. In the traditional state, the release of telecom software versions is exclusive, and only one software version can exist at the same time. In addition, in order to achieve lossless service during the upgrade process, the first thing to do is to spend a lot of time to migrate users on the devices to be upgraded to other devices in the pool. It is necessary to evaluate the software and hardware resources, business models, etc. of these online devices, and at the same time, make related configuration linkage modifications for surrounding wireless, data communication and other devices. If it is impossible to migrate users in the pool, the direct upgrade method must be used, which must be performed at night, resulting in business interruption for more than 30 minutes, and lossless service cannot be achieved. Huawei's grayscale upgrade solution breaks the black-and-white software version release rules. Based on the three-layer decoupling of software and the stateless design principle, it achieves data compatibility of multiple version sessions, coexistence of multiple version services or microservice instances, and achieves unawareness of external network devices through load balancing and intelligent business distribution capabilities. Two software versions can exist in the system at the same time. Through gradual rolling upgrades, migration to the final target version is possible. There is no need to prepare user migration equipment in advance, and there is no need to associate and modify wireless, data communication and other equipment. All-weather upgrade operations are possible regardless of time period, with "0" business interruption. User migration is divided into batches, and business changes are low-risk After the traditional telecom software version is upgraded, all users have to accept the test of the new version. Once a problem occurs, all user services will be damaged and the loss is immeasurable. Therefore, the migration of users between versions should not be done all at once. In the grayscale upgrade scenario, the system supports service dialing tests on the new version to reduce or avoid test bed testing. Use test users to test existing and new features. When problems are found in the dialing test, a rollback operation can be performed. During the rollback, only the dialing test user is deleted, and other users are not affected. Small-scale trial and error migration is supported, and batch migration is supported. A small number of users can be migrated in the first batch to verify the correctness of the migration process. If there is a problem, rollback can be performed, affecting only these users. Subsequent batches can also coexist with the old and new versions for a long time to observe the service, and the next batch can be migrated if there is no problem, further improving the reliability of the upgrade process. In this way, through grayscale release, we have solved the problem of traditional upgrade and cutover difficulties and reduced the risk of business changes, and enabled the network to respond agilely and robustly to diverse and rapidly changing new 5G businesses, further promoting the digitalization of the 5G industry and achieving small steps and quick progress towards 5G. |
<<: Why has the number portability service suddenly become silent? Should this service be cancelled?
>>: Fixed-line broadband rates drop again, how should operators respond?
Since the release of the one-size-fits-all policy...
As we all know, my country's operators have b...
The Chinese New Year is getting closer and closer...
Based on the problems encountered by myself and m...
HostYun has launched a promotion during the Mid-A...
At present, in the ever-changing and complex inte...
In recent years, there have been more and more vo...
[[420295]] 1. Introduction When using Redis, we o...
By setting a strong password, you can prevent WiF...
Share the VPS host information provided by HostYu...
[51CTO.com original article] If we were to select...
[Wuzhen, China, December 3, 2017] Recently, the w...
Bandwagonhost currently has most packages availab...
A few days ago, we shared the information about S...
resize2fs is a command used to expand or shrink t...