BackgroundThe client company is a retail company with about 200 employees. The company's network is a classic full-gigabit three-layer network architecture, with an export iKuai soft router + an intranet switch. The general topology is shown in the figure below: Typical topology The network segment planning is as follows:
The current problem is: the company's IT often finds that the CPU of the iKuai export router often soars to more than 90%, accessing pages is particularly slow, and at the same time, the terminals below have problems such as lag and delay when accessing the Internet, and the entire network has collapsed. Existing analysis
The above tests show that there should be no major problems with routers, switches and other devices. When encountering this kind of high throughput statistics problem, it is speculated that there is a loop in the network, causing a broadcast storm to consume all the bandwidth and destroy the address table? Troubleshooting AnalysisStep 1: Confirm the problem We use the VLAN30 office area PC to ping Baidu IP: You can see that the packet loss is quite serious. Step 2: Confirm the loop problem in the network The most common loops we encounter are Layer 2 loops, which have two main manifestations:
As shown in the topology above, because only the WAN and LAN ports of the router are in use, from the previous traffic statistics analysis, "there is no large traffic on the WAN port, but the traffic on the LAN port is abnormally large". We only need to check the data statistics on the uplink interface of the core switch to confirm whether this large traffic is broadcast & multicast! The command is as follows: From the above figure, we can see that when the problem reappears, the multicast and broadcast packets on the core switch uplink GE0/0/1 interface increase very slowly within 5 seconds, which basically eliminates the possible existence of Layer 2 loops and packet flooding. But please note: if you look closely at the statistics of the core switch uplink interface, 35,000 unicast packets are sent and received per second within 5 seconds, which is equivalent to 7,000 bidirectional packets sent and received per second. This amount is amazing! How can unicast packets have such a large throughput when the entire network is paralyzed? The next step is to capture the data flow analysis. Step 3: Intranet trunk data flow analysis We mirror the flow of the upstream egress routing interface on the core switch: The captured packets are as follows: The analysis from this flow is as follows: This flow is a UDP flow from the source VLAN20 monitoring network to the destination 172.16.40.0/24. According to statistics, the throughput of these flows is as high as 1Gbps, which is the main reason for the full Gigabit link bandwidth and the main reason for the router to be fully loaded with sending and receiving packets, resulting in a continuous high CPU. So why do these "abnormal flows" appear between the core switch and the router and flood? Let's take a look at the MAC address of the data packet and compare it: Let’s look at the first one in chronological order: Look at the second one in chronological order: The analysis shows that this UDP packet circulates between the switch and the router, that is, SW—>R—>SW—>R... It keeps circulating until TTL=0 and then the UDP packet is discarded! Conclusion: There is a routing loop between the router and the switch. The destination network segment accessed by the monitoring network is 172.16.40.0/24. But strangely, this network segment does not exist in the intranet. Even if the monitoring device tries to access this network segment, it should be sent out from the WAN port of the router. Why does it bounce back? It must be related to the configuration! Step 4: Check the routing tables of core switches and routers The core switch routing table is as follows: The core switch configuration is standard and there is no problem. Let’s take a look at the iKuai routing table: For the sake of convenience, the IT staff configured the return route on iKuai as 172.16.0.0/16 (including all network segments of the intranet) with the next hop as the core. This is a disaster! Once the intranet accesses 172.16.1.X, 172.16.2.X, and 172.16.200.X, the trunk route will be looped and the link bandwidth will be exploded! Summary and solutionsThe summary is as follows
Solution The return route must include details, do not use large network segments! As follows: |
>>: One minute to show you: the complete process from entering a URL to displaying the page
Since the release of the one-size-fits-all policy...
[Shenzhen, China, July 30, 2020] Today, Huawei he...
Recently , the 2024 Energy Network Communication ...
A panel of 12 technology experts (also known as t...
5G is not yet popular, but the United States clai...
The 5G era is approaching, bringing more poetic i...
Before we dive into the ways drones can make the ...
Hello, everyone, I am amazing. I recently turned ...
CUBECLOUD (Magic Cube Cloud) has launched a promo...
What is Fiber Optic? Fiber optics is a type of ne...
HostDare is a foreign VPS hosting company founded...
Note: This article describes how to intelligently...
ThomasHost domain name was registered in 2012, an...
Last week, in the article "Why has the Inter...
ColoCrossing recently released several E3 special...