When the network scale of a data center becomes large, it is necessary to add network devices and implement multi-layer cascading. Today's data centers are often tree-shaped structures, with several devices with large forwarding capacity placed at the core, and then multiple layers of devices hanging below (due to insufficient port numbers, multiple layers may be required). Dozens or even hundreds of network devices are cascaded together. Once a fault occurs, how to quickly find the faulty device often troubles many network operation and maintenance personnel. The network equipment in the data center is redundant. When a network failure occurs, as long as the faulty device is found and isolated, the service can be restored, and then the cause of the failure can be slowly investigated. However, it is not easy to find the specific faulty device among hundreds of devices. Network failures often get fault feedback from the application side first, and then start troubleshooting. At this time, the application personnel often only describe an application access failure phenomenon. They will not tell you which specific addresses are not connected to which addresses, and sometimes even wrong information, which greatly delays the problem location time. Most of the time for problem location is spent on the process of sorting out the fault phenomenon. What should I do? How can the data center network be quickly troubleshooted? This article will give the answer.
If you want to analyze the network fault from the fault phenomenon reported by the application side, it is too late, and it is easy to be misled by the application personnel. Some application personnel report only the phenomenon they see, which is likely to be a local phenomenon and cannot reflect the fault of the entire network. Therefore, you have to rely on yourself, do a good job of network monitoring, discover problems through monitoring, and quickly find the faulty device, isolate the device or solve the fault. Early network monitoring mainly monitored some logs and port traffic of devices. More often than not, this information was not enough and problems could not be discovered in time. Many network equipment manufacturers say that their equipment logs are very complete, but in actual use, there are still some extreme cases or software bugs that result in no log output when a fault occurs. At this time, it is necessary to locate the traffic. At this time, network personnel need to find application personnel to understand the fault phenomenon, find some packet loss or unreachable IP addresses on site, and then conduct network traffic, and conduct traffic on all devices through which the fault traffic passes to find the faulty device. Since it is a tree-shaped network, there are many devices at each layer, and the traffic volume is quite large. Moreover, not all devices can support statistics on all characteristic traffic. If there are unsupported devices, the statistics will be inaccurate, which increases the difficulty of finding faulty devices. This is how I have persisted in network operation and maintenance over the years. Obviously, the previous network troubleshooting methods are effective but inefficient, take a long time to locate faults, and have a great impact on business. Today's network monitoring is all about data flow, monitoring specific data flows in the network, so that once the data flow is interrupted, the fault location can be immediately found. Here, we should mention several emerging network monitoring methods, also known as network visualization technology, which are the most effective methods for rapid troubleshooting.
With the above network monitoring methods, it is not difficult to find faults in the first place, and it can be fully automated. When a fault is found, the monitoring server automatically sends an isolation command to isolate the faulty device and automatically restore it. In this way, before the application reports the fault, the network fault location can be found, the faulty device can be isolated in time, and the business can be restored. This can greatly shorten the fault analysis time, have little impact on the business, and even the business part cannot perceive the fault at all. The actual application effect of network monitoring technologies such as INT and ERSPAN is still unknown. They are all technologies that have been mentioned recently and need to be tested in practice. SFLOW and Netstream technologies are relatively mature, but they are not used much in network troubleshooting, and they need to be promoted in this regard. Relying on these monitoring technologies, network faults can be quickly eliminated, which is of great significance to data center operation and maintenance, and greatly improves operation and maintenance efficiency. |
<<: Why choose NB-IoT when there are so many standards?
DesiVPS has released the latest promotion in July...
Earlier this month, the blog shared information a...
The year 2020 has multiple "identities"...
[[382042]] This article is reprinted from the WeC...
Fiber optic connectors play a vital role in the w...
At the start of every new year, experts and forec...
The way IT operates is changing constantly and ra...
MQTT (Message Queuing Telemetry Transport) is a &...
10gbiz has launched a promotion this month, offer...
The Ministry of Industry and Information Technolo...
[[354146]] This article is reprinted from the WeC...
background In certain scenarios, we often need to...
There are already many articles in the industry p...
With the rapid development of digital technology,...
DigitalVirt recently offered a 50% discount coupo...