How to explain network engineering technologies such as STP, HSRP, etc. in a simple and understandable way?

How to explain network engineering technologies such as STP, HSRP, etc. in a simple and understandable way?

During an interview, for example, I was asked about the HSRP master-slave switch time, the STP state retention time, and I knew these things existed, but I don't use them often at work, so I always can't remember them. I think it may be because my foundation is not solid enough and my knowledge is not comprehensive enough. I would like to ask, for such technical details, which are not often used at work, but how can I remember them?

text

Fully and thoroughly understanding a protocol is the key to remembering it!

Spanning tree (802.1d) convergence process

Many heroes emerge (switch power on), and each one assumes that he is the boss (root bridge), but they are all relatively humble. They remain silent for 15 seconds (listening, unable forwarding end user traffic) to collect intelligence (other switch BPDU). The mountain king who first enters the silent state (the switch that is started first) becomes impatient (listening timeout) because he has not collected any information, so he starts to distribute leaflets (it's own BPDU) every 2 seconds (BPDU interval), and brazenly declares himself the emperor (root bridge).

The bandits in various places were outnumbered (BPDU priority was low), so they had no choice but to submit (non root bridge), and only dared to obediently receive the emperor's edict (BPDU configuration), and then relay the emperor's edict to the downstream county kings...

The big boss (BPDU with the highest priority) woke up late (power on), and was not happy to see such a situation. A small shrimp pretended to be a big wolf, so he issued his own challenge letter (BPDU) and sent it out from his various city gates, meaning: Little bastard, get off the stage quickly!

The princes (switch) who received the challenge realized that a new emperor had come on the scene, so they changed their ways and re-selected their rootports with the new emperor as the center. After receiving the challenge, the old emperor obeyed the rules of the game and cooperated to submit. At this point, the situation in the world was determined, and the great hero became emperor (root bridge), so all the gates of the imperial city were selected as designated ports.

Each county chooses the gate closest to the imperial city as its own pilgrimage gate (root port). If there are two gates with the same distance (path cost), the gate upstream with a higher priority (port priority) will be chosen as the pilgrimage gate (root port).

The gates between counties and kingdoms were chosen based on their distance from the imperial city (path cost). The gate of the closest county would be chosen as the designated port, and the gate of the losing county would be the non-designated port.

Thus, a spanning tree centered on the emperor was generated...

[[255766]]

After the roles of each city gate are selected, the civilian (host/end user) address information (MAC Address — Port Number) is registered (learning, unable to forward end user traffic) first. The cycle is 15 seconds. After this process is completed, the world is at peace (network convergence).

All non-designated gates can allow free bi-directional end user traffic.

In the imperial city, all city gates can issue imperial edicts (BPDU) and receive memorials from subordinate counties and kingdoms (BPDU Topolology Change Notification).

The Emperor's edict includes:

The emperor's country name (Root Bridge ID) (Priority + MAC) The emperor's death waiting period is 20 seconds (max-age) The emperor's edict is issued at a time interval of 2 seconds (BPDU Hello) The gate silence time is 15 seconds (forward-delay) The time to register civilian addresses is 15 seconds (forward-delay)

The emperor dies and the waiting period times out (Root Bridge crashes and restarts, freezes, leading to root bridge uplinkdown), that is, if the prince does not receive the imperial edict (RootBridge BPDU) for 20 seconds, the subordinate princes can choose a new emperor (root bridge) on their own.

All gates of other counties and kingdoms can receive imperial edicts, but they will only relay imperial edicts received from the pilgrimage gate (root port) to downstream counties and kingdoms.

If there is any disturbance in the county (topology change, Link down, all MAC Address Entry downstream of this link must time out quickly, otherwise it will cause a long-term traffic black hole), it is necessary to report to the emperor in time (BPDU Notification), and it is also reported to the emperor through the pilgrimage gate (root port) level by level (TC, Topology Change). The upstream county can send a confirmation signal (TCA, Topology Change Acknowledge) by itself, and then the emperor will issue an imperial edict that the country is entering a state of emergency (BPDU configure with TC bit set). The county kings who received this imperial edict shortened the default civilian (host) address database (MAC Address Table) timeout from 5 minutes to 20 seconds (max-age).

Remember that the imperial edict (BPDU Configure) can only be sent down by the emperor (root bridge), that is, it is a one-way flow, used to manage the country and tell the princes (non-root bridge) how to keep synchronized. It can be called upstream -> downstream control flow.

BPDU notifications were sent by the princes to the emperor to inform him of local riots (topology change). This can be called downstream -> upstream reporting flow.

Data transmission process

1 Belong to the same county (same access switch) civilian exchanges

If civilians (hosts) visit relatives and friends (end user traffic), if they belong to the same county (access switch), their movement (bi-directional traffic) is limited to the county, which is the most resource-saving and will not interfere with the upstream counties (aggregate switch) and the emperor (core switch).

2. There are two types of communication between civilians from different counties (different access switches):

2.1 Same upstream counties (same aggregate switch)

The traffic needs to pass through three counties: the source county (access switch), the upstream county (aggregate switch), and the destination county (access switch), which consumes more bandwidth resources.

2.2 Different upstream counties (different aggregate switch)

The traffic needs to pass through five counties, the source county (access switch), the upstream county (aggregate switch), the core switch, the upstream county (aggregate switch) of another branch, and the destination county (access switch), which requires more bandwidth resources and the hardware processing resources of five switches.

Since the emperor always sat facing south, we call the traffic passing through the imperial city South-North Traffic, and the traffic not passing through the imperial city West-East Traffic.

In the past 10 years, most of the company's traffic has been access to the company's servers and the Internet. They need to pass through the Imperial City. The north-south traffic passing through the Imperial City (RootBridge/Core Switch) accounts for the majority. The three-tier architecture (access/aggregate/core) is more conducive to network expansion.

With the increasing popularity of virtual host computing, more east-west traffic does not need to pass through the imperial city. The traffic is extended by the aggregate switch using big layer 2 technology (big layer 2), such as VxLAN. The north-south traffic passing through the core switch is getting less and less. The current network architecture is slowly evolving into a flat architecture with big layer 2.

The bottom part of the above picture is the Access switch, with a new name: Leaf, which is used to connect the host and server.

The upper part is the Aggregate switch (VTEP, MP-BGP EVPN), with a new name: Spine. It is used for local layer 2 switching, remote layer 2 switching, and is also responsible for layer 3 routing.

Readers must have discovered a pattern. I always combine Root Bridge and Core Switch into one. Why must CoreSwitch be the Root Bridge? Or conversely, why must Root Bridge be the Core Switch?

In ancient times, Chang'an was mostly chosen as the imperial capital because it could better manage the country and was also conveniently located. If an edge area (edge ​​switch) such as Hainan Island or Heilongjiang was chosen, it would be unfavorable for economic and transportation exchanges.

The Core Switch is usually located at the center of the network, not too far away from anyone (2 hops), and has a very strong data throughput capacity. It is usually not a bottleneck for traffic, so it is the most reasonable choice to use it as the Root Bridge.

As a network designer, you need to prevent any non-core switch personnel from acting as Root Bridge on their own. Generally, you need to set the core switch value to a minimum of 4096 (the smaller the value, the better). Isn't 4095 a smaller value? Why not set it to 4095?

That is because Priority occupies the upper 4 bits of the 16 bits, and the minimum value is 0001 0000 0000 0000, and 4096 is the minimum value.

In addition, the access switch needs to be configured with Port Fast and BPDU Guard. Port Fast allows the port to enter the end user traffic forwarding state directly without going through the painful and lengthy process of listening and learning. The latter can prevent illegal software from sending high-priority BPDUs. On the one hand, because Enable Port Fast disables spanning tree functionality on this port, loops are caused; on the other hand, if the emperor role of Root Bridge is seized, it will be a disaster, because all the traffic passing through the imperial city (RootBridge) is attracted here, and the user computer silently discards all the traffic, causing a traffic black hole...

HSRP/VRRP

HSRP: Hot Standby Redundant Protocol

VRRP:Virtual Routing Redudant Protocol

The two are called FHRP, which stands for First Hop Redundancy Protocol. HSRP is a private protocol of Cisco. Later, IETF thought it was good and created an industry standard protocol, VRRP, for message exchange between products of different companies.

For user computers/servers, the default gateway is the first hop router. In other words, all non-local traffic needs to be forwarded by the default gateway. If the default gateway fails, will all hosts and servers in this network segment be isolated from the Internet? Since the default gateway is so important, a backup router is needed for backup. A message mechanism (keepalive every 3 seconds) is needed between the primary default gateway and the backup router (secondary) to discover each other, and the priority size is also needed to select who is the primary and who is the secondary. At the same time, it is best to have an authentication mechanism to authenticate each other.

Primary RouterCrash/LAN Link Down

The Primary router cannot send a keepalive message. Keepalive is sent every 3 seconds. If the backup router does not receive keepalive messages for 3 consecutive times, it will consider itself capable of serving as the primary router.

There are two key words here:

VIP:Virtual IP

VMAC:Virtual MAC

Whoever acts as the primary router will have his network interface bound to VMAC/VIP. Assuming VIP=10.1.1.1/24, VMAC=xxxxxx.000001, then VIP is the default gateway for the 10.1.1.0/24 network segment. If a host ARP requests "What is the MAC address of 10.1.1.1?", the primary router will use VMAC=xxxxxx.000001 to reply.

If the main router crashes or the interface is down, it will not be able to send keepalive messages. Keepalive messages are sent every 3 seconds. If the backup router does not receive keepalive messages for three consecutive times, it considers itself capable of serving as the main router.

Primary Router WANLink Down

The Primary router loses Internet access and proactively lowers its priority to a level lower than the backup router's priority. The backup router then considers itself capable of serving as the primary router.

If the original primary router WAN is UP again and has been stable for a period of time, its priority can be restored to normal and its primary status can be restored.

<<:  A brief discussion on the application and suggestions of IPv6 in enterprise transformation

>>:  What are public IP and private IP? What is NAT conversion?

Recommend

Fiber Optic Innovation: Exploring Cutting-Edge Research and Development

Fiber optic technology has revolutionized innovat...

Huawei: Building a smart city nervous system to help cities transform digitally

On December 20, the 8th China Internet of Things ...

Challenges of managing applications

One of the most important lessons that businesses...

5G security has become a focus, but do you really need 5G?

[[339455]] The latest insights from the Economist...

...

Ten reasons why traditional routers are abandoned (six, seven, eight)

Over the years, we've dutifully upgraded our ...