Cutover failed, resulting in 3/4/5G network communication failure

Cutover failed, resulting in 3/4/5G network communication failure

[[429420]]

A cutover and replacement by Japanese operator DoCoMo caused a large-scale failure in the country's entire network, which caused strong dissatisfaction among many users, and even the Japanese Minister of Internal Affairs and Communications had to come out to deal with and explain.

It is reported that this was originally a simple upgrade and replacement. The replaced equipment was: the server that stores the user/location information of IoT terminal devices. Problems occurred during the migration of the location information of approximately 200,000 IoT terminals from old equipment to new equipment.

So the operator initiated a rollback operation and reverted to the old equipment. This rollback was exactly the key to the problem: the rollback caused a large number of IoT terminals to re-initiate location registration information to the old server, and a surging "signaling storm" quickly caused network congestion, directly paralyzing the 3/4/5G core network.

What is hard to understand is that this "upgrade-cutover-rollback" operation took place during the afternoon rush hour on a weekday. (I'm surprised that the island country doesn't require cutovers to be done at night?)

[[429421]]

Starting at around 5 p.m. on October 14, 2021, a network incident occurred that made DoCoMo's voice calls and data communication services difficult to use.

At 7:57 pm on October 14, 2021, the operator took emergency network operations and the fault began to gradually recover, but due to network congestion, some customers were still unable to connect to the network.

At 5:05 am on October 15, 2021, 5G and 4G networks returned to normal, but the 3G network in some areas is still difficult to use, and efforts are being made to restore it. We have informed the majority of users that users who have subscribed to 4G packages and have 3G signals can connect to the 4G network by restarting their phones to obtain normal communication.

On the afternoon of October 15, 2021, the vice president of NTT DoCoMo stated at a press conference that "no clear time can be given" for the restoration of the 3G network, explaining that the outlook is unclear.

NTT DoCoMo management publicly apologized, expressed deep regret for the inconvenience caused to customers and many people by the accident, and said it would work hard to prevent the accident from happening again.

[[429422]]

Well, in the island nation, there is no problem that cannot be solved by bowing. If there is, all three of us bow together!

[[429423]]

After the accident, Japan's Minister of Internal Affairs and Communications said at a press conference after the cabinet meeting:

It is regrettable that a large-scale failure of the mobile network, an important infrastructure that affects people's daily lives, occurred. The Ministry of Internal Affairs and Communications takes this matter very seriously and has asked NTT DoCoMo to promptly investigate and report the cause and extent of the accident so as to provide a full explanation to the majority of users. We hope that NTT DoCoMo will fulfill its social responsibility and take all possible measures to prevent similar accidents from happening again.

Three glasses of wine as punishment, done!

[[429424]]

Revelation:

Although this incident happened on the opposite island country, we still need to learn lessons from it. Today's mobile network is like infrastructure like water, electricity and gas, especially in the 5G era, applied to industrial Internet, coal mines, hospitals, etc., the network is by no means a trivial matter.

1. Upgrade and cutover cannot be performed during busy hours.

This is almost impossible in our country. It is done late at night, which has become an iron rule in communications over the past 20 years. Thanks to our "communication night walkers" for their hard work.

2. The network has sufficient redundancy and backup mechanisms.

The network status is always unpredictable. To ensure that the network does not have problems, the most reliable way is to use redundancy and backup mechanisms, from A to B, to cluster pools, to fully ensure redundancy mechanisms in the core network, transmission network, and access network. This will inevitably increase investment, but it is necessary for a quality network.

3. The core network is of paramount importance.

Other failures generally affect local areas, while the core network affects the entire network. In addition to ensuring redundancy and backup, the network architecture should be upgraded as soon as possible. The 5G SA core network SBA architecture can ensure the safe operation of the network while saving investment as much as possible.

<<:  Cutover failure leads to major communication failure

>>:  5G FWA is booming: Views from the MBBF2021 5G FWA Industry Forum

Recommend

404 Not Found? It crashed again...

The dreaded "404 Page Not Found" error ...

Engineers announce QUIC protocol completes RFC 9000 release

According to foreign media, the Internet Engineer...

Bluetooth, WiFi and Zigbee: Which wireless technology is better?

Wireless technology is all the rage these days! F...

What affects WiFi speed is not only old equipment but also signal interference!

Because Wi-Fi transmits over radio waves, it is m...

How Open RAN and 5G impact sustainability

Mobile service providers, infrastructure manufact...

How data centers work today and in the future

The data center of the future will rely on cloud ...

Overcoming the Security Challenges of Software-Defined Networking

Today, more and more organizations are embracing ...