Effective Risk Management in Data Centers

Effective Risk Management in Data Centers

Today, data center managers are constantly battling the risks they face. In addition to using limited power and cooling systems to maximize computing resources in a limited space, their job is to ensure that computing resources can operate uninterruptedly. This means identifying and managing risks from various sources.

[[195883]]

A standards-based risk management approach can help address this challenge. It can help data center managers prioritize their significant risks and prepare for a data center or critical environment audit. So where do you start?

Understand the different types of risks

Before being able to manage data center risk, it is important to understand the different categories of operational threats. Kevin Read, GIO UK Senior Delivery Center Manager at French multinational IT consultancy Capgemini, which owns and operates its own data center facilities to serve clients, is primarily responsible for managing data center risk in his organization. He points out several risk categories that data center managers worry about.

He warned: “The number one risk category for mission-critical data centers is power outages. This risk is present for every data center and the risk management framework incorporates it. Like many other data centers, Capgemini uses a rating scale to rate risk, which helps to reveal disruptive risks such as this.”

"Capgemini designed and built the facility to Tier 3 standards, using 'N+1' and 'N+N' redundant UPS systems to provide customers with uninterruptible power to rack equipment and cooling systems," said Read. "In addition, providing power to the data center from different sources can prevent local power failures, and the use of backup generators is a technical protection of last resort."

The second risk is fire, which is caused by IT equipment failure inside the data center, leading to service interruptions. He added that the company has deployed inert gas fire suppression systems in all rooms of all data centers to extinguish fires before they spread.

"The third risk category is flooding (rising rivers and extreme weather such as lightning, rainstorms, etc.), aircraft, infectious diseases, and air pollution," he continued. "Data center construction sites should not be built on flight routes, near flood risk areas, or near factories that are polluted or may contain explosive chemicals."

Finally, Read points to security as the fourth risk category. This includes the risk of physical security and logical security breaches (hacking). The company even includes terrorist threats in this risk category.

Like other categories of risk, security naturally breaks down into many subcategories, and these categories can be further differentiated. For example, within logical security, managers may consider employee access to applications as a specific risk area and mobile device access as another risk area.

Some risks become mainstream with the advent of new technologies. For example, virtualized applications are a particular security risk, warns Paul Ferron, director of security solutions at CA Technologies. What is often described as a management and resource risk can also have an impact on data security, he warns.

"Virtual machines can be easily copied without proper security privileges," he warned. "They may not be shut down when the user is done using them."

In this case, as in many others, designing security processes for certain operations helps standardize virtualization technology and reduce the risk of vulnerabilities entering through the network. Using IT service management tools to codify and automate these processes further reduces risk.

Matt Lovell, chief technology officer at cloud hosting company Pulsant, adds health and safety risks to the mix.

He warned that the risks are multifaceted and workers will face challenges ranging from electrical practices and machinery operator safety, to environmental and noise control, and working in areas with limited space.

"It requires a lot of work measuring compliance and safety to ensure there is minimal risk to all those working in the environment," he said.

Risk Management Approach

These risks are not all created equal. Some are more threatening than others, and some have greater potential impact. So understanding what the priorities are from a budget perspective is an important part of this process.

Ferron recommends that data center managers use an updated version of the traditional risk management matrix to assess the probability of risks occurring simultaneously and the potential business impact. "It will be a three-dimensional graph," he adds, which can indicate the estimated expenditures to mitigate the associated risks.

Reading operations also has a similar approach, aimed at identifying and quantifying risks and their potential mitigation costs. It is worth noting that his risk management system is designed to be a document that evolves over time.

"At Capgemini, we have a monthly risk management system in place where all risks and issues are documented in containment and action plans. But this requires changes to the investment budget", he said.

While data centers face their own unique risks, the methods used to manage data centers are not specific to this environment. More general risk management methods are appropriate for describing and addressing risks in data centers, as they are in other areas as well.

Lovell said that this common risk management standard is ISO 31000: 2009. This standard sets out general principles and guidelines for risk management and is designed to be tailored to the type of risk each user deems appropriate. It is more aligned with the risk management framework, but Lovell said it can also be used to audit risk prevention within the data center.

"The audit process must seek to determine whether the correct response procedures are in place, that these are rehearsed and understood by employees, and that this will change over time and therefore must be continually updated," he said.

Data centers do not function in isolation. They exist on a wider continuum of technology and business objectives. Risk management technology will become part of a wider risk management approach. Large companies in particular will explore a variety of risks, from financial to regulatory and organizational.

The extent to which a data center is risky varies from company to company. In Capgemini’s case, the data center manager is responsible for the security of the facility and will manage the monthly risk and issue process. The data center manager, along with its UK data center director, will have monthly meetings with the CFO team to anticipate any significant risk expenditures.

Data center compliance teams typically report to the board in some form, Pulsant's Lovell said. "This team has management and reporting responsibilities and accountability to board members. This may be different from other IT governance programs that may report through various projects or organizational structures," he said.

Lovell added that ideally, responsibility should be shared when it comes to managing risk and reporting on findings. "The advice is always to manage risk appropriately, and this should involve an independent level of management and validation outside of the operations team that monitors and delivers the data center services. This could be an independent internal or external governance team."

Choosing an audit method

The key word here is validation. Quantifying, prioritizing and mitigating risk are part of the risk management challenge, but measuring a data center’s performance in these areas is an important part of the process. Auditing risk will help internal staff and potential customers (if necessary) understand how various sources of risk are controlled in a data center’s operations.

Before selecting an audit to cover data center risk, managers must understand what they want to achieve. Is the risk audit customer-driven? If so, what specific criteria is the customer looking for? Are there specific data center risk management metrics that the customer wants to get hit on?

Data center risk mitigation service providers may also conduct audits. For example, Capgemini’s data centers are regularly audited by its own group and government customers, as well as Capgemini’s insurers, Read said.

Auditing Standards

One of the biggest challenges with risk audits is the diversity of risk categories involved. It is difficult to audit all of these risks under one standard, which means data center managers may need to apply a variety of standards when conducting their audits.

When considering security, the ISO 27002 standard covers the code of practice for information security management. It regulates a variety of different aspects, including human resource security, physical and environmental security, and access control.

Information security is also covered by the Payment Card Industry Data Security Standard (PCI-DSS), a highly regulated standard that focuses on the organization and retention of credit card data in data centers. It covers the construction and maintenance of secure networks, the management of vulnerabilities, and network and system monitoring, among other things.

For commercial operators that handle government department information, additional audits may be required. In the UK, List X is the commonly understood security clearance system for contractors handling government data, while in the US, Facility Clearance Levels are an alternative.

“From a health and safety perspective, many data center operators are working towards compliance with OHSAS 18001, the internationally recognised standard for health and safety management and related systems,” Lovell added.

Environmental protection audits are often lower than ISO 14001 standards. Data centers may wish to consider this audit standard and environmental risks, and may adopt this standard if they require large amounts of diesel to be stored on site to handle generators.

Stakeholders

There are often multiple stakeholders involved in defining and mitigating risk, said Gavin Millard, technical director at Tenable Cybersecurity, which sells software designed to scan networks for security threats. He breaks down stakeholders into three main organizations: security teams, operations teams and business units.

The problem is that not all have the same agenda, he warned: "As many organizations have discovered, the goals and needs of each organization are often in conflict, which is where the actions required to reduce risk are defined for each specific organization," he said.

What do these conflicts look like? One example involves software patching. It's one of the most effective ways to reduce security risks to an organization. In July 2013, Australia's security agency released a series of strategies to mitigate cyber intrusions. Patching operating systems is one of those measures, patching applications is another. Meanwhile, application whitelisting and minimizing administrative privileges will eliminate 85% of hacker attacks, the agency said.

The problem is that the IT security group is focused on eliminating holes in the system that attackers could break into, thereby reducing the risk of a data breach. This requires it to patch critical vulnerabilities quickly. Conversely, the IT operations team needs to minimize the risk of downtime, which means any changes to the system must be structured, planned, and controlled. This may lead the operations team to request a less frequent patching schedule to reduce availability risks.

Business managers in an enterprise have their own separate agenda: maintaining the bottom line and hitting their performance targets. So they only want patches deployed if the benefits to the bottom line outweigh the cost of getting the job done.

"Conflicting objectives can be difficult to resolve, but one of the most effective ways to do so is to have an efficient process that continually identifies where risks lie," Millard said. "Users also need a predictable, reliable way to update systems without impacting the organization's overall business goals."

Managing risk effectively then is not just about assessing threats to the data center, but also about the willingness to collaborate between team members so that all agendas can be happily accommodated. In some cases, this may create opportunities for new working practices.

Introducing the DevOps (development/operations) discipline to streamline the workflow between development, testing, and deployment could help ease tensions such as the one Millard describes.

As with most things in IT, effective risk management is as much a people-focused process as it is a technology-focused one. Using standardized methodologies and audits can help quantify the risks facing the data center and potentially impact future budgets. It always helps to measure what the data center must manage.

<<:  Tianyi Cloud's transformation and upgrading provides cloud services that are most in line with customer needs

>>:  The overlooked hardware vulnerabilities in enterprise networks

Recommend

How to jointly build and share 5G networks?

[[379482]] This article is reprinted from the WeC...

NASA to launch laser communications relay demonstration mission this year

According to foreign media, NASA has a mission ca...

Let’s talk about gRPC that you don’t know today

Hello everyone, I am Zhibeijun. It is the last da...

5G+Industrial Internet, how is this addition “calculated”?

On September 17, the Zhongguancun Industrial Inte...

5G and the Internet of Things: Connecting Millions of Devices

As the number of connected devices continues to g...

Core Network Evolution (RCAF, PFDF and TSSF) - 3GPP REST API

Looking back at the diversity of technology devel...