At the beginning of the Internet, data centers were small and simple. A large e-commerce service data center only used a few 19-inch racks to deploy the required servers, storage, and network equipment. Today, super-large data centers have tens of thousands of hardware devices deployed on thousands of racks. As data center designs change, these large data centers are either built near large population centers or in remote areas with cheap electricity. As data center operations become more automated, public cloud providers like AWS or Microsoft Azure employ fewer and fewer senior data center engineers, often fewer than security staff and general technical workers. Fewer people managing more servers means that monitoring data center power and cooling infrastructure requires more reliance on sensors, which are now known as IoT hardware. These hardware help identify problems to a certain extent, but in many cases, sensors cannot replace experienced facility engineers. For example, by identifying the operation of equipment through sound, you can also understand which fan will fail or locate the leak through the sound of water drops.
A rack of servers powered by Google's custom Tensor Processing Units (TPUs) for machine learning. Data center managers need more sensors to monitor modern data center infrastructure, and a new generation of applications aims to fill this gap by applying machine learning to IoT sensor networks. The idea is to turn experience into rules to help sensors distinguish between sounds and images, for example, adding a new automated management layer to the data center that can predict and prevent failures in data center infrastructure. "Fast recovery time and efficient capacity configuration can also reduce data center risks," said Rhonda Ascierto, an analyst at 451 Research. Combining DCIM and diverse data The first step is to leverage predictive analytics in data center infrastructure management, or DCIM, software. Take, for example, software from a company called Vigilent, based in Oakland, California. “The control system is based on machine learning software that determines relationships between variables, such as rack temperature, cooling unit settings, cooling capacity, cooling redundancy, power consumption, and risk of failure. It regulates the cooling units by turning the units on and off, including variable frequency drives (VFDs), adjusting the VFDs up and down, and adjusting the temperature setpoints of the units,” Ascierto said. It uses wireless temperature sensors and predicts what will happen if the operator takes certain actions, such as shutting down the cooling unit or increasing the setpoint temperature. Another example is Oneserve Infinite, a UK company that combines sensors with multiple data points, such as using weather conditions, to provide what it calls "predictive field service management" in Exeter. The aim is to predict maintenance requirements, avoid downtime, and minimize downtime. Chris Proctor, CEO of Oneserve, said that by applying these technologies, strategic planning and procurement can be handled at the same time. "Data centers will be able to manage assets and resources more accurately and efficiently." (This capability is not yet available in any data center.) Oneserve is more concerned about maintenance issues, tracking past maintenance issues and allowing users to detail where the problem occurred each time. At present, this is still a very time-consuming and laborious manual operation method, but in the future, staff will use this data to train machine learning systems. Mining human knowledge An example of combining sensor data with operational experience is LitBit in San Jose. According to Scott Noteboom, the company's founder and CEO, they once provided data center strategy for Yahoo and Apple. LitBit's data center artificial intelligence or DAC (digital-to-analog converter) allows operators to train and adjust machines, learn from staff, and gain the ability to respond to events in the data center, thereby alerting operators or eventually performing operations automatically. The key to LitBit's approach is to use a form of assisted learning. When the system detects a new abnormal event, the system will alert the operator, and the operator will then develop a set of rules for responding to these events in the future. To collect data, LitBit has a mobile application that can accept video and then convert it into thousands of images for training. The startup offers a managed cloud service that can leverage anonymized data from many users to build more complex and accurate models. Some customers will keep their training models confidential, while others may sell them as an additional revenue source. As Ascierto noted, "The value of data center management data is multiplied when it is aggregated and analyzed at scale. By applying algorithms to large data sets aggregated from many customers, including different types of data centers and different locations, providers can predict when equipment will fail and when cooling thresholds will be reached." When operators with knowledge and experience are not around, some implicit knowledge can help the system identify problems in operation and respond faster. Data center artificial intelligence may not completely replace data center staff, but it can continuously enhance skills to help operators solve problems. This field is still immature, but it is developing rapidly. Machine learning on sensors is rapidly developing and being widely used in various industries. Microsoft Research has been working with Sierra Systems to develop machine learning-based audio analysis of oil and gas pipeline defects, using its cognitive toolkit to help classify anomalies that occur. AI-based data center management services are emerging technologies that are still under development and require a lot of training. Ascierto pointed out that your DCIM software may need more sensors. "If you want to use AI for end-to-end chiller-to-rack decisions, your equipment will need to have acoustic and vibration sensors installed, as well as environmental sensors and electrical instrumentation. If the goal is to optimize and automate the set-point temperature of the cooling unit, you may need multiple environmental sensors per rack (top, middle, bottom). It will take time for AI systems to be fully operational, just like recruiting new data center staff, but similar machine learning tools will eventually help you run your data center. |
<<: How do modern data centers meet the needs of a hyper-connected global economy?
>>: Protecting corporate intranet data security in just seven steps
"16WiFi", owned by Beijing Yilure Hotsp...
Twisted pair wire is the most commonly used trans...
If the Industrial Internet is to be implemented, ...
With the continuous extension of network technolo...
A few days ago, I shared the information about th...
On June 6, 2019, the Ministry of Industry and Inf...
Intent-based networking (IBN) has been a topic of...
HostKvm has sent a message about the new Hong Kon...
Hostio is a foreign hosting company founded in 20...
edgeNAT has launched this year's National Day...
1. Introduction In the previous thread series art...
[[181719]] As mobile communications shift from th...
Flink's network protocol stack is one of the ...
It’s clear that in the business world, digital op...
[[327682]] A 5G+ holographic remote same-screen i...