Facing these possible accidents, is your operation and maintenance team ready?

Facing these possible accidents, is your operation and maintenance team ready?

With a loud bang, the data center collapsed

The data center of a Romanian bank was interrupted for about 10 hours. The reason was that when the data center was testing the fire protection system, it opened the cylinder containing "incinerator". The "incinerator" was evenly sprayed in a small space through a hose and nozzle. When the gas was released through the nozzle, the pressure was too high. When the "incinerator" was released, an abnormally loud noise was emitted. The loud noise exceeded 130 decibels, causing vibration. The servers and data storage equipment were affected, damaging the internal components of the equipment and paralyzing the bank's business. In fact, data center equipment is also very sensitive to noise, especially this sudden loud noise, which can easily cause the failure of internal electronic components. (Expert comment: It is necessary to add some silencing facilities in the data center to provide a relatively quiet environment, which is conducive to extending the service life of the equipment.)

Tragedy caused by an SUV

In November 2007, the Rackspace data center in Dallas, USA, encountered an unexpected disaster. A large four-wheel drive driver fell into a short coma due to diabetes while driving. The car rushed forward and hit the outer embankment of the roadside from the T-junction. After the embankment was hit, it rushed into the air and fell on the data center building of Rackspace, just hitting the power supply device. After a burst of fire and lightning, the power supply of the data center was interrupted, paralyzing its business for several hours. Rackspace paid $3.5 million in compensation to the customer for this accident, and also increased the risk of customer loss. (Expert comment: Data centers need to have certain earthquake resistance, collision resistance, and fire resistance to avoid such unexpected failures.)

Hurricane knocks out data center generators

In October 2012, the entire power supply system of a data center in Manhattan, New York, USA, failed because of the raging Hurricane Sandy that hit Manhattan. Multiple backup generators were placed on the 18th floor of the data center to provide continuous power without being affected by the flood. However, when the storm hit, it directly filled the basement of the data center building and destroyed the fuel pumping system of the emergency generator. The entire circuit soaked in seawater immediately lost its function, and the backup power generation system failed. The 18th floor used city electricity. When the hurricane hit, the entire Manhattan city power system failed, and the main and backup power supplies of the data center failed, causing the data center to lose power and all application systems to fail to operate.

[[177704]]

Solar flare events

In 1989, a solar flare was directed at the Quebec Hydro-electric Grid in Canada, causing grid voltage fluctuations, causing the tripping protection equipment to be activated, and the generator step-up transformer of a nuclear power plant was completely damaged and unable to provide services. Solar flares are the most violent solar activity with a cycle of about 11 years. The charged particles emitted in the process of generating a strong magnetic field at high speed are a devastating disaster for data centers and power grids. This is a low-probability event, but once it occurs, it is a fatal blow to the data center.

In the event of a natural disaster, are data center personnel helpless or can they do something? When a fault suddenly occurs, can the operation and maintenance personnel really complete the fault handling in the shortest possible time? Can the system really recover quickly as planned? These are all practical considerations for the data center's fault handling capabilities. Good training and comprehensive emergency plans and drills will help deal with unexpected events.

Disaster recovery drills

Taking the business-level disaster recovery drill of the data center information system participated by the entire Agricultural Bank of China as an example, it included five stages: incident response, early warning preparation, system recovery, business verification and summary rollback. The disaster recovery personnel of the head office and branches were assembled within 10 minutes; within 75 minutes, nine core businesses including public applications, internal accounting, single folding, bank cards, customer information, interbank, cash management, warehouse cash and off-balance sheet were restored, and the business verification of 36 branch business outlets across the country was passed, with a business verification success rate of 99.94%.

Be prepared for a rainy day

After Hurricane Sandy, it was discovered that many companies did not pay enough attention to the fuel supply chain. Data center disaster plans put backup generators at the top of the list: when the fuel ran out, all the engineering projects and technologies and systems performed well until the diesel fuel ran out tragically.

Even if companies can get support from fuel suppliers, they need to prevent disaster-induced transportation paralysis when needed. These problems may occur in other situations, such as earthquakes, hurricanes, tornadoes that cause major damage to civilian infrastructure. A key lesson we learned from Hurricane Sandy is the need to pay more attention to fuel supply chain redundancy, geography and alternate transportation routes.

Staff training

When natural disasters occur, redundancy is necessary so that no single person becomes the key to business operations. However, when a once-in-a-century disaster occurs, it may not be enough to arrange N+1 people in advance. When disasters like Sandy occur, more extensive cross-training will help data centers solve major problems.

<<:  Ruijie appeared at GITC: Internet companies please give me a thumbs up!

>>:  A brief analysis of the development direction of automation equipment installation and operation

Recommend

PnetLab storage is insufficient? Teach you how to expand it step by step

When using PnetLab to build a network experiment ...

KT is forced to use "Korean speed" to make Gigabit "run" on copper cables

Korea Telecom is using GIGA Wire 2.0 technology t...

Is 5G really going to kill WiFi?

If we were to say what surrounds our lives nowada...

Four Best Practices for Network Cable Management

If a cabling project is to be successful, you fir...

The ultimate secret to speeding up WiFi is here!

The previous two WeChat articles "Your offic...

spinservers: $59/month - E3-1280v5, 32GB memory, 1TB NVMe, 30TB/10Gbps bandwidth

spinservers has just released several promotional...

5G: A new vision for industrial automation

The next generation of wireless connectivity, 5G,...

Five-minute technical talk | HTTP evolution history

Part 01 Protocol Introduction HTTP is the most po...

Types of Cabling in a Structured Cabling Environment

If you are considering a structured cabling envir...