What basic principles should be followed to improve data center operations planning?

What basic principles should be followed to improve data center operations planning?

Enterprises need to develop an effective and adaptable plan for the successful operation of data centers, and need to adopt specific principles to guide IT personnel to comprehensively consider their operational goals and how to achieve these goals. However, most of the planning and efforts of data center operations of many enterprises are placed on structural design and development, and after the initial goals are completed, they often forget what their ultimate goals are.

Nowadays, people are paying more and more attention to the importance of data center operations. In order to improve the level of data center operations planning, people need to remember the following five basic principles:

[[220355]]

Principle 1: Experience is the best teacher

Like many important things in life, staff need to reflect on what operational mistakes they have made in their careers, how they will avoid them, and let these lessons lay the foundation for future operations. In some cases, this may mean that the company's staff did not operate the equipment effectively, or the vendor's service did not meet the company's requirements. Regardless, previous experience in supporting mission-critical environments has led to the realization that data center operational excellence is a comprehensive and ongoing process that is reflected in the following aspects:

  • Efficient facility design
  • Effective post-handover and ongoing training
  • Use the right tools

Principle 2: Design from the perspective of operators

Effective operations planning begins in the mind of the operator, or more simply, "What is the desired success?" While this question may seem to give a simple answer, it is often found that it is usually a compilation of answers to a number of supporting queries.

Clearly, the facility itself needs to be optimized to facilitate effective maintenance and troubleshooting. In other words, concurrent maintainability is essential for Tier III data centers. The procedures themselves should be simple and clear, from the perspective of the operations staff, not the engineers. A term introduced by Japanese quality management experts, Poka Yoke (error prevention), better describes the ultimate goal of developing processes and procedures. This is a method of using automatic actions, alarms, reminders, etc. to prevent operators from making mistakes due to negligence or misoperation during the work process, which means that human errors can be minimized. And for such a situation, when more than 70% of outages can still be traced back to operator errors, there is still a long way to go in the overall simplification process.

Continuity of action should be embraced and adapted to. If for no other reason than the average data center undergoes a hardware refresh every 3-5 years, the data center is a dynamic environment and “always doing it this way” does not fully achieve the goal of continuous improvement. Feedback loops prove an effective mechanism for eliminating unnecessary steps and identifying more efficient ways to perform operations.

Principle 3: Flexibility and Control

Although flexibility and control may seem broad, the concept is really simple. In particular, the supplier's staffing activities must be aligned with the business's work rhythm. Operational requirements must be executed around the specific needs of the business. The same principles apply to operational personnel and safe staffing levels.

Principle 4: Training and Certification

Talent cultivation is a continuous improvement goal. Continuously improving the level of professional knowledge not only motivates the staff, but also improves the overall skill level of the staff and ensures the reliability of operations.

The approach to developing a more confident, capable, and effective operations and maintenance workforce requires a role-based training program that includes:

  • Formal courses
  • Objective measurement of understanding
  • An ongoing process of constant updating and improvement

The objective of this program should be to build upon a foundation of "subject matter experts" with increasing levels of certification:

  • Difficulties in the process
  • importance
  • performance

Principle 5: Focus on Eliminating Errors

In the past, technicians held a flashlight in one hand and a technical manual in the other hand to try to diagnose and repair equipment problems. This maintenance method is not conducive to solving problems quickly and effectively, but it represents the standard operation and maintenance mode of many existing data centers. Obviously, in this case, the opportunities for human error are countless.

There are many ways to achieve this. One way is to use technology solutions that convert all procedures into digital checklists. Accessed via tablets and phones, these include alerts about dangerous steps, access to videos, images and documentation for on-site reference, and technicians must confirm completion after each step of the operation before moving on to the next step, greatly reducing the possibility of human error.

Planning data center operations is a critical and often overlooked element of data center processes. Effective business processes and procedures are not the result of strict adherence to past operating models. Developing an effective and adaptable plan for successful data center operations requires specific principles to guide IT departments to fully consider their operational goals and the efforts required to achieve them.

<<:  6 top data center education and certifications IT professionals need

>>:  In the DT era, what is the trend of data center cabling?

Recommend

...

10g.biz Hong Kong CN2 VPS simple test

A group friend asked about the information about ...

What exactly is the performance problem with TCP?

Overview The performance issue of TCP is essentia...

What HTTP status codes have you seen?

[[347892]] 101 Switch Protocol 200 OK 201 Created ...

TCP state transition and production problem practice

The previous article introduced the main processe...

GigsGigsCloud: $26/year KVM-1GB/15G SSD/2TB/Los Angeles Data Center

GigsGigsCloud has launched a new VPS in the Los A...

Understand the OSI model in five minutes

The Open Systems Interconnection (OSI) model is a...

G Suite vs. Office 365: Which is the right productivity suite for your business?

Choosing an office suite used to be a simple matt...