BackgroundParty A is a ship machinery parts manufacturing company that has been using a spanning tree redundant network architecture deployed with a full set of Cisco switches. The root bridge is a core layer switch, which is only used for LAN communication. The access end devices are industrial cameras and collectors, and the data is transmitted back to the central control console. The entire network topology is as follows: Network topology description:
Problem DescriptionRecently, IT staff discovered that it was very slow to access the industrial camera in workshop B from the console computer. The delay between the industrial camera and the switch where it was located was generally around 20ms: Note: The camera IP is 192.168.1.153, and the IP of switch No. 3 is 192.168.10.3 This problem did not exist for more than half a year, and the ping delay was stable ≤1ms. Recently, this fault suddenly occurred. The delay occurred at: The problem seems tricky, let’s see how to analyze it! Troubleshooting AnalysisStep 1: Check if key configurations have changed In the design topology, we can see that the STP backup link is the wireless bridge backhaul link. As we all know, wireless latency is higher and more unstable than wired latency. Could it be that the link is switched to the wireless bridge for transmission? Check the configuration items of switches 3 and 4 here: Because the STP root bridge is in the core layer, all the switches in the workshop are "non-root bridges", so each ring switch will decide a blocking port. Here, switches 3 and 4 form a ring, and the priority of ports 12 of switches 3 and 4 is higher than that of port 11 in the configuration (slightly better), so the blocking port will only appear on port 11 of switches 3 and 4, that is, the backup blocking link is the wireless bridge link, and the configuration is as expected. Step 2: Confirm whether the spanning tree topology meets expectations Confirm that the switch configuration is correct. The next step is to determine the STP topology convergence. Here we mainly look at the equipment in the "processing workshop B" problem point. Other areas do not need to be concerned for the time being. Command: View the status of the relevant Cisco switch ports: You can see:
This indicates that there is no problem with the topology convergence of the switching network, which is in line with expectations. This eliminates the possibility that the data is forwarded through the wireless bridge, which causes excessive latency. Next, consider whether the data is too high because it passes through the core layer network. Next, directly connect to access switch No. 3 for testing. Step 3: Confirm the delay of the switch No. 3 where the industrial camera is directly connected Connect the PC directly to switch No. 3 and ping the IP addresses of the switch and the industrial camera at the same time: It can be seen that there is a delay in direct connection, and the terminal response and switch response delay are consistent. It is likely that there is a problem with the switch work and a "forwarding delay" is generated. To verify the delay, the next step is to capture packets to see the ICMP interaction. Step 4: Capture PC interface interaction data packets Open Wireshark on the PC to capture packets and find that the network is flooded with a large number of "UDP unicast messages", with a packet rate of nearly 10,000 packets/second and a throughput of 100Mbps: This is very strange. The PC's own IP address is not 192.168.1.102, and the switch is not configured with mirroring. How can it receive the unicast stream sent by the industrial camera 192.168.1.153 to 102? Communicating with the site, 192.168.1.102 is the collector. The industrial camera will transmit the video back to the central control console on the one hand, and transmit it to the collector on the other hand. From the above situation, there is only one root cause of UDP unicast flooding: collector 102 is no longer in the network, but industrial camera 153 has fixed the transmission destination IP and MAC. Even if the target does not exist, it will not affect the camera's streaming. Therefore, this UDP stream is "unknown unicast frame"! This frame will be broadcast and forwarded by the switch in the network! Step 5: Confirm the collector is online The cause of the problem is that the collector 102 is not online, causing the unicast stream of the industrial camera to become an "unknown unicast frame" flood, so the PC pings the collector to confirm connectivity: Check the MAC table of switch No. 3: It can be seen that the terminal does not exist in the network. The line may be loose or the crystal head may be aging. SolutionCause: Cisco switches flood a large number of unicast frames, causing their own forwarding delay to increase
Solution: Adjust the network cable and crystal head of the collector to restore the network connection After restoring the collector online, you can see that switch No. 3 can learn its MAC address entry: The "unknown unicast frame" becomes a known unicast frame, and the traffic is forwarded by the switch unicast. The network returns to normal and the latency decreases: |
<<: vivo HTTPDNS end-to-end experience optimization practice
From May 24 to May 26, the 2022 Network Open Sour...
1. What is the difference between fog computing a...
Colocation, which involves placing IT equipment i...
Today, many enterprises are digitally transformin...
Spring brings blessings, and everything is glorio...
Since April 2021, my country's 5G development...
The epoch-making 5G technology, in addition to a ...
[[420793]] Hey guys, hello everyone, this is prog...
Zgovps is a new Chinese hosting provider that ope...
Some time ago, I shared the news of 80VPS's n...
Nowadays, with the continuous development of mobi...
Recently, I have seen many friends looking for ho...
November 12, 2017, 2:00 p.m. The autumn is crisp,...
[[387143]] What useful software do you have on yo...