Actual combat: Can you believe it? Two VRRP hot standby gateways were installed in the operator's central computer room at the same time, but they fought with each other and caused the entire network to explode!

Actual combat: Can you believe it? Two VRRP hot standby gateways were installed in the operator's central computer room at the same time, but they fought with each other and caused the entire network to explode!

The case shared in this issue is related to VRRP hot standby.

Background

This is a big order: a project to add egress gateway equipment to a central computer room of a large operator. The original egress gateway in the computer room was running a frame router from a certain W, with two routing boards for "dual hot standby", responsible for the network egress in area A, and the core switch was also from a certain W.

Recently, due to business needs, we purchased another frame router from C, which also has two routing cards for "dual-machine hot standby" and is intended to be used as the network exit of area B:

Network topology description:

  • The core switch is used as MUX VLAN, the upstream egress routing VLAN 10 is the primary VLAN, and the A and B areas are slave VLANs 100 and 200 respectively, both of which are interoperable slave VLANs;
  • The core switch and the egress gateway are connected to the 10G optical fiber interconnection, and are connected to the aggregation switches in the Gigabit Ethernet A and B areas.
  • The two routing cards of a W chassis are connected to the core G0/1 and G0/2 ports as a hot standby group, serving as the gateway of area A.
  • Add two routing boards in a C frame and connect them to the core G0/3 and G0/4 ports as a hot standby group, serving as the gateway for area B.

Note: In MUX VLAN, slave VLAN can communicate with master VLAN, and terminals under interoperable slave VLAN can access each other.

Problem Description

It was found that after the newly added C gateway router was connected to the core network, both A and B network areas were paralyzed! Although the two sets of hot standby were in the same LAN, the internal gateway IPs were different and there was no conflict. How could the entire internal network area be paralyzed? Are the two brothers fighting?

If the C frame routing hot standby group is removed, the network is normal again:

Due to the importance of the project, not much time will be given for troubleshooting. Party A requires that the problem must be located and resolved within 1 hour. OK, let's take a look at this case together!

Troubleshooting Analysis

Step 1: Confirm Ping Packet Interaction

In order to quickly locate the message forwarding situation, we captured the packet at the downlink port of the A area where the W frame inlet and the core are connected:

It is found that the ICMP message PC is forwarded, and the source and destination MAC are correct, but a large number of ICMP requests are not forwarded to the uplink port of the core connected to the W frame. Only one request is received by the W frame router and responded:

This step shows that there is no problem with the egress box sending and receiving packets, and there is no problem with the computer terminal in area A sending and receiving packets. Then the problem points to the core switch, and it is likely that the message is transferred to the wrong interface or not forwarded?

PS: In fact, experienced engineers can probably guess at this step why the two hot standby groups will fight when connected to the network.

Step 2: Check the MAC address table of the core switch

To confirm whether the switch port forwarding is correct, first check whether the corresponding destination MAC in its MAC address table is correct. In this topology, the gateway is hot standby. Please note that it is essentially VRRP, and the gateway is a virtual IP + virtual MAC. From the above packet capture, you can see that the gateway virtual MAC is VRRP: 0000-5e00-0101. Enter the command on the core switch multiple times:

 <CenterSwitch> dis mac-address 0000-5e00-0101

The following is displayed:

Here’s a thorough explanation of why:

The MAC address of the hot standby gateway frequently drifts, causing the terminals in area A and area B to access the gateway to fluctuate. That is to say, the VRRP virtual MAC address 0000-5e00-0101 can be learned from both the G0/0/1 interface and the G0/0/3 interface. Therefore, the virtual MAC addresses of the two hot standby egress gateways are the same, resulting in a conflict! What is going on?

Step 3: View VRRP related parameter information

We analyzed the messages captured from the PC side and found that there are two VRRP announcement messages, which are sent by two hot standby groups with source IPs of 172.16.10.1 and 10.10.10.1. However, please note that their source MACs are the same, both 0000-5e00-0101:

Why are the generated VRRP virtual MACs the same?

Because the VRRP protocol stipulates that the generation of the virtual MAC address depends on the VRRP backup group number (VRID). The format of the virtual MAC address is 00-00-5E-00-01-[VRID], where [VRID] is the number of the VRRP backup group, expressed in hexadecimal. For example, if the VRRP backup group number is 5, then the corresponding virtual MAC address is 00-00-5E-00-01-05.

RFC VRRP protocol original text

Comprehensive analysis

The VRRP hot standby configured by W and C are both VRID=1, which results in the same virtual MAC being generated, causing the core switch's primary VLAN address table to be disordered, resulting in forwarding anomalies and paralysis of the entire network. The relevant VRRP group configuration is as follows:

A W dual-card hot standby chassis router:

A C dual-card hot standby chassis router:

Note: I don’t need to take screenshots on site, so I will show you the actual situation using the eNSP simulator.

Solution

Modify the VRID configuration of a newly added C to 2, and the generated virtual MAC is 0000-5E00-0102, which is unique and does not conflict:

The dual hot standby groups are connected to the network at the same time. Both areas A and B can access the Internet and communicate normally:

<<:  Weak current people must understand standard PoE power supply! Otherwise, if you don't understand the IPC power supply power and line sequence, how can you do a good project?

>>:  F5 Launches Industry’s First Integrated Application Delivery and Security Platform to Enable Hybrid Multi-Cloud Infrastructure in the AI ​​Era

Recommend

Wireless router, how many little secrets do you have?

Everyone has a wireless router at home. However, ...

Why are operators competing to launch new 4G packages as 5G is the mainstream?

[[406115]] In 2021, when 5G is rapidly popularize...

Microsoft has scrapped plans to use IPv6 only on its internal network

[[244105]] Microsoft has scrapped plans to use on...

Talk about what you want to know and don't know about SDN

SDN has been very popular for a while. For a whil...