Practice: Ping suddenly has high latency? Spanning tree architecture, the Cisco switch that is highly praised by network engineers is also suffering from the old sin!

Background

Party A is a ship machinery parts manufacturing company that has been using a spanning tree redundant network architecture deployed with a full set of Cisco switches.

The root bridge is a core layer switch, which is only used for LAN communication. The access end devices are industrial cameras and collectors, and the data is transmitted back to the central control console. The entire network topology is as follows:

Network topology description:

The enterprise has multiple processing workshops, and each workshop network belongs to a different VLAN and is logically isolated;
The workshop switching network is aggregated and connected to the core switching network;
The workshop switching network is interconnected in a ring, the STP protocol is enabled, and the blocked port is the switch interface of the bridge link;
The core layer switching network also enables the STP protocol and multi-link redundant interconnection;
The terminal interfaces connected to industrial cameras, collectors, etc. are STP edge interfaces, and topology changes are not included in the calculation.

Problem Description

Recently, IT staff discovered that it was very slow to access the industrial camera in workshop B from the console computer. The delay between the industrial camera and the switch where it was located was generally around 20ms:

Note: The camera IP is 192.168.1.153, and the IP of switch No. 3 is 192.168.10.3

This problem did not exist for more than half a year, and the ping delay was stable ≤1ms. Recently, this fault suddenly occurred. The delay occurred at:

The problem seems tricky, let’s see how to analyze it!

Troubleshooting Analysis

Step 1: Check if key configurations have changed

In the design topology, we can see that the STP backup link is the wireless bridge backhaul link. As we all know, wireless latency is higher and more unstable than wired latency. Could it be that the link is switched to the wireless bridge for transmission? Check the configuration items of switches 3 and 4 here:

Because the STP root bridge is in the core layer, all the switches in the workshop are "non-root bridges", so each ring switch will decide a blocking port. Here, switches 3 and 4 form a ring, and the priority of ports 12 of switches 3 and 4 is higher than that of port 11 in the configuration (slightly better), so the blocking port will only appear on port 11 of switches 3 and 4, that is, the backup blocking link is the wireless bridge link, and the configuration is as expected.

Step 2: Confirm whether the spanning tree topology meets expectations

Confirm that the switch configuration is correct. The next step is to determine the STP topology convergence. Here we mainly look at the equipment in the "processing workshop B" problem point. Other areas do not need to be concerned for the time being. Command:

 show spanning-tree interface

View the status of the relevant Cisco switch ports:

You can see:

The 11th port of switch No. 3 that accesses the terminal is in the AP port blocking state, and the 12th port is in the DP port forwarding state:
The DP ports of switches 11 and 12 of the upstream core are in forwarding state.

This indicates that there is no problem with the topology convergence of the switching network, which is in line with expectations. This eliminates the possibility that the data is forwarded through the wireless bridge, which causes excessive latency. Next, consider whether the data is too high because it passes through the core layer network. Next, directly connect to access switch No. 3 for testing.

Step 3: Confirm the delay of the switch No. 3 where the industrial camera is directly connected

Connect the PC directly to switch No. 3 and ping the IP addresses of the switch and the industrial camera at the same time:

It can be seen that there is a delay in direct connection, and the terminal response and switch response delay are consistent. It is likely that there is a problem with the switch work and a "forwarding delay" is generated. To verify the delay, the next step is to capture packets to see the ICMP interaction.

Step 4: Capture PC interface interaction data packets

Open Wireshark on the PC to capture packets and find that the network is flooded with a large number of "UDP unicast messages", with a packet rate of nearly 10,000 packets/second and a throughput of 100Mbps:

This is very strange. The PC's own IP address is not 192.168.1.102, and the switch is not configured with mirroring. How can it receive the unicast stream sent by the industrial camera 192.168.1.153 to 102? Communicating with the site, 192.168.1.102 is the collector. The industrial camera will transmit the video back to the central control console on the one hand, and transmit it to the collector on the other hand.

From the above situation, there is only one root cause of UDP unicast flooding: collector 102 is no longer in the network, but industrial camera 153 has fixed the transmission destination IP and MAC. Even if the target does not exist, it will not affect the camera's streaming. Therefore, this UDP stream is "unknown unicast frame"! This frame will be broadcast and forwarded by the switch in the network!

Step 5: Confirm the collector is online

The cause of the problem is that the collector 102 is not online, causing the unicast stream of the industrial camera to become an "unknown unicast frame" flood, so the PC pings the collector to confirm connectivity:

Check the MAC table of switch No. 3:

It can be seen that the terminal does not exist in the network. The line may be loose or the crystal head may be aging.

Solution

Cause: Cisco switches flood a large number of unicast frames, causing their own forwarding delay to increase

The collector in processing workshop B was disconnected due to loose network cables and aging of the crystal head;
The industrial camera in processing workshop B still sends UDP unicast streams to the target with IP and MAC as collectors, with a packet rate of nearly 10,000 packets/second and a throughput of 100Mbps:
Since there is no UDP packet destination MAC entry in the MAC address table on Cisco switch No. 3, this flow is an "unknown unicast packet" and is forwarded according to broadcast flooding;
The Cisco switch may have performance issues or other unknown reasons. After broadcasting and flooding this huge amount of unicast frames, "forwarding delay" is generated, resulting in high delay when the PC accesses itself and the terminal.

Solution: Adjust the network cable and crystal head of the collector to restore the network connection

After restoring the collector online, you can see that switch No. 3 can learn its MAC address entry:

The "unknown unicast frame" becomes a known unicast frame, and the traffic is forwarded by the switch unicast. The network returns to normal and the latency decreases:

<<: vivo HTTPDNS end-to-end experience optimization practice

>>:

China Telecom faces four major challenges in network reconstruction of SDN/NFV practice

Recommend

Bypassing 5G and heading straight for 6G, Russia made an "incredible" decision

Russia made an incredible decision - abandoning 5...

[LeaTech Summit Review] Red Hat Global Vice President Cao Hengkang: The secret of digital transformation lies in people "cooperation"

[51CTO.com original article] Not long ago, the Le...

The Internet of Things in the Eyes of Operators: The Story of the Internet of Things and Two Scissors

Previous article: "The Internet of Things in...

10gbiz July Offer: Hong Kong/Los Angeles VPS 40% off, monthly payment starts at $2.36, Silicon Valley dedicated server/station cluster server first month half price

I received a July discount from 10gbiz, offering ...

Frequency bands and signals: A brief discussion on LTE's MIMO multi-antenna technology

We have seen it in the parameter configurations o...

Analysis of the Art of Communication between Computers

Network Basics First, let's talk about networ...

Cisco released the IT Operations Readiness Index report, and Chinese enterprises' IT operations provide more value to their businesses

[Original article from 51CTO.com] Cisco recently ...

In the 5G and edge era, how can telecom operators improve energy efficiency?

Energy efficiency is no longer a cost and complia...

Alibaba Cloud invests 200 billion yuan in data centers to take the lead in new infrastructure

Alibaba Cloud announced yesterday that it will in...

The 5G coverage of the three major US operators was accused of false advertising: Verizon can only be connected 9.7% of the time

5G communication networks are reportedly faster t...

Practice: Ping suddenly has high latency? Spanning tree architecture, the Cisco switch that is highly praised by network engineers is also suffering from the old sin!

Background

Problem Description

Troubleshooting Analysis

Solution

China Telecom faces four major challenges in network reconstruction of SDN/NFV practice

In-depth | Only IT people can understand "Journey to the West"

In fact, IPv6 is not so perfect

EtherNetservers: $79/month-E3-1230v6/16GB memory/500G NVMe/20TB@1Gbps/Los Angeles and other data centers

What is the principle of communication? It turns out to be so simple

In the 5G era, edge computing is used to accelerate the development of interconnected manufacturing

Kingsoft Cloud wins "IPv6 Support Excellence Award"; all cloud products have IPv4/IPv6 access capabilities

Summary information: Cloudie.sh/Hongsuyun/Mondoze/Retslav/Crunchbits/Niuniu IDC

There is a 1024-bit bug. The TCP data packets are so annoying!

When WiFi Master Key quietly takes away your password, do you really not mind?

Recommend

Bypassing 5G and heading straight for 6G, Russia made an "incredible" decision

[LeaTech Summit Review] Red Hat Global Vice President Cao Hengkang: The secret of digital transformation lies in people "cooperation"

The Internet of Things in the Eyes of Operators: The Story of the Internet of Things and Two Scissors

10gbiz July Offer: Hong Kong/Los Angeles VPS 40% off, monthly payment starts at $2.36, Silicon Valley dedicated server/station cluster server first month half price

Frequency bands and signals: A brief discussion on LTE's MIMO multi-antenna technology

Analysis of the Art of Communication between Computers

Cisco released the IT Operations Readiness Index report, and Chinese enterprises' IT operations provide more value to their businesses

In the 5G and edge era, how can telecom operators improve energy efficiency?

Alibaba Cloud invests 200 billion yuan in data centers to take the lead in new infrastructure

The 5G coverage of the three major US operators was accused of false advertising: Verizon can only be connected 9.7% of the time

Interviewer asked: What are the functions of the wait and notify methods in threads?

Virtono: €11.21/year KVM-512MB/15G SSD/1TB/San Jose & Dallas & Romania, etc.

Don’t worry anymore! Teach you how to quickly locate Eth-Trunk faults and easily solve network problems!

The bitterness and helplessness of the three major operators due to repeated delays in 5G packages

Innovations in the future communications infrastructure for wireless networks