The majority of downtime incidents over the past year had known causes and were preventable through strong design and processes.
According to findings published by the Uptime Institute in summer 2018, nearly a third of data centers experienced an outage in the past year, up from 25% in 2017. But the increase wasn’t caused by some deadly new malware. Instead, the top three causes of downtime were power outages (33%), network failures (30%), and IT or software errors (28%). Most importantly, 80% of data center managers say these downtime events are preventable. There is no way to prevent a lightning strike (such as the one that destroyed a Microsoft Azure data center in San Antonio in September 2018) or a zero-day malware attack. However, with proper planning and data center design, downtime due to unexpected weather events, attacks, routine human errors, or unscheduled systems can be minimized. Getting a data center up and running quickly after an outage is equally important. According to a report this year by Information Technology Intelligence Consulting, one hour of downtime costs data center operators an average of $260,000, while five minutes of downtime costs only $2,600. Infrastructure redundancy still works At the most basic level, data center systems need to be backed up. Backing up the power supply, the main cooling system, backing up the data, or even backing up the entire data center. Uptime Institute says that many enterprises require data centers with 2N cooling and power architectures, in other words, a fully redundant mirrored system. 22% of users experienced power outages in the last year. That's one-third fewer outages than those who adopted the cheaper, less-redundant "N+1" approach, of which 33% reported downtime incidents. The backup of the entire data center can provide higher reliability. According to Uptime survey data, 40% of data center managers said they would replicate workloads and data in two or more data centers. "If you have one data center and there's a lightning strike, you're going to go down," said Markku Rossi, CTO of SSH Communications Security. "You should have a secondary data center where there's physical separation between them so they're not dependent on the same power source." He added that no data center is immune to the problem, citing the example of a lightning strike at Microsoft Corp.'s South Central U.S. data center. “If you have a second data center, you can fail over immediately,” he said. Rossi added that planning and testing are key regardless of where backup systems are located, and that planning needs to take into account the complexity of today's data centers, where some problems can trigger others. He gave the example of a recent outage that occurred during maintenance at GitHub's data center. They fixed the physical problem in minutes, but it took 24 hours for the data to sync correctly. Data center managers need to pinpoint potential problem areas and then have tools and processes in place when something happens. “Focus on building processes and a mindset that allows you to prepare for failure,” Rossi said. Strengthening security not only around the perimeter One of the biggest lessons data center managers should take away from recent malware-related outages is that it is no longer enough to have a hardened perimeter. Attackers can attack. Healthcare companies, government agencies, educational institutions, and major manufacturers were hit in 2018, though everyone should have been on high alert after last year’s record-breaking breaches. Obviously, it is critical to keep defenses up to date to prevent malware from getting in in the first place. But data center managers must be prepared in case perimeter defenses fail and have secondary protections in place. These include malicious traffic detection mechanisms, network defenses such as segmentation, and least-privilege access and communication methods. These could help prevent malware from spreading once it enters a network, or at least slow it down enough to give security teams a chance to respond, said Igor Livshitz, director of product management at Israel-based cybersecurity service Guardi Core. WannaCry specifically exploited a vulnerability in the Server Message Block Transfer Protocol. He said data centers should do more to reduce lateral communications. "In many cases like the WannaCy ransomware over the past year, the primary driver for the widespread impact of the attack was the ease with which these worms could spread once they gain a foothold within a data center," Livshitz said. "In fact, SMB traffic between servers is not necessary at all. If it had been blocked, the spread of the attack and the damage to the data center could have been greatly reduced, and the attack detected at an earlier stage before it could cause so much damage." The lesson from the breaches of 2018 is that data center managers must confront a new threat. They need to get back to basics. Nearly all data center outages are the result of poor planning and investment decisions, combined with poor processes or the inability to follow them, Andy Lawrence, executive director of research at Uptime Institute, wrote in a June 2018 survey. “Almost all outages reported or studied by Uptime Institute have occurred and are often well documented.” Lightning strikes and new types of malware may dominate industry headlines, but when it comes to resiliency, the security of your data center infrastructure remains paramount. |
<<: Ruijie Smart Town E-Day Tour
>>: 5 must-know SD-WAN security myths
Few new technologies have generated as much discu...
[51CTO.com original article] Not long ago, the Le...
Translator | Li Rui Proofread by Sun Shujuan Ther...
This is not something that happened overnight, bu...
One year after 5G was officially put into commerc...
SoftShellWeb has launched a promotion for all VPS...
ITLDC's Black Friday promotion targets regula...
As of April this year, the total number of 5G bas...
LOCVPS (Global Cloud) launched its first promotio...
At the "Joining Inspur's Thinking and In...
RAKsmart has some new changes in this month's...
[51CTO.com original article] Hangzhou, China, Oct...
Today, IT managers must be prepared for the vario...
When it comes to networking, switches are crucial...
The author has developed a simple, stable, and sc...