A brief introduction to ZAB protocol in Zookeeper

The full name of the ZAB protocol is Zookeeper Atomic Broadcast Protocol.

Function: The ZAB protocol can be used to synchronize data between active and standby nodes in a cluster to ensure data consistency.

Before explaining the ZAB protocol, we must understand the role of each Zookeeper node.

The role of each Zookeeper node

Leader:

Responsible for processing read and write transaction requests sent by the client. The transaction request here can be understood as this request has the ACID characteristics of the transaction.
Synchronously write transaction requests to other nodes, and ensure the order of transactions.
The status is LEADING.

Follower:

Responsible for processing read requests sent by the client
Forward the write transaction request to the Leader.
Participate in the Leader election.
The status is FOLLOWING.

Observer:

Same as Follower, the only difference is that it does not participate in the Leader election and its status is OBSERING.
Can be used to linearly scale read QPS.

How to choose a Leader during the startup phase?

When Zookeeper is first started, multiple nodes need to find a leader. How do they find one? By voting.

For example, there are two nodes in the cluster, A and B, and the schematic diagram is as follows:

Node A votes for itself first. The voting information includes the node id (SID) and a ZXID, such as (1,0). The SID is configured and unique, and the ZXID is a unique increasing number.
Node B votes for itself first, and the voting information is (2,0).
Then nodes A and B will vote their votes to all nodes in the cluster.
After receiving the voting information from node B, node A checks whether the status of node B is in this round of voting and whether it is in the LOOKING state.
Voting PK: Node A will compare its vote with others' votes. If the ZXID sent by other nodes is larger, it will update its voting information to the voting information sent by other nodes. If the ZXIDs are equal, the SIDs will be compared. Here, the ZXIDs of nodes A and B are the same, and the SID of node B is larger, so node A updates the voting information to (2, 0) and then sends the voting information out again. Node B does not need to update the voting information, but it needs to send out the vote again in the next round.

At this time, the voting information of node A is (2, 0), as shown in the following figure:

Counting votes: In each round of voting, the voting information received by each node is counted to determine whether more than half of the nodes have received the same voting information. The voting information received by node A and node B is (2, 0), and the number is greater than the number of half of the nodes, so node B is selected as the leader.
Update node status: Node A acts as a Follower and updates its status to FOLLOWING; Node B acts as a Leader and updates its status to LEADING.

What should I do if the Leader crashes during operation?

During the operation of Zookeeper, the Leader will remain in the LEADING state until the Leader crashes. At this time, the Leader must be re-elected, and the election process is basically the same as the election process in the startup phase.

Points to note:

The remaining Followers conduct elections, and Observers do not participate in the election.
The zxid in the voting information is from the local disk log file. If the zxid on this node is larger, it will be elected as the leader. If the zxids of the followers are the same, the follower with the larger node id will be elected as the leader.

How to synchronize data between nodes?

Different clients can connect to the primary node or the backup node separately.

When the client sends a read or write request, it does not know whether it is connected to the Leader or the Follower. If the client is connected to the master node and sends a write request, the Leader will execute 2PC (two-phase commit protocol) to synchronize with other Followers and Observers. However, if the client is connected to a Follower and sends a write request, the Follower will forward the write request to the Leader, and then the Leader will perform 2PC to synchronize data to the Follower.

Two-phase commit protocol:

Phase 1: The leader sends a proposal to the follower, and the follower sends an ack response to the leader. If more than half of the acks are received, the next phase begins.
Phase 2: Leader loads data from disk log files into memory, Leader sends commit message to Follower, and Follower loads data into memory.

Let's take a look at the process of Leader synchronizing data:

① The client sends a write transaction request.
② After receiving the write request, the Leader converts it into a "proposal01:zxid1" transaction request and saves it to the disk log file.
③ Send proposal to other followers.
④ After receiving the proposal, the Follower writes the disk log file.

Next, let's see how the Follower handles the proposal transaction request sent by the Leader:

⑤ Follower returns ack to Leader.
⑥ Leader receives more than half of the acks and proceeds to the next stage
⑦ Leader loads the proposal of the log file in the disk into the znode memory data structure.
⑧ Leader sends commit message to all Followers and Observers.
⑨ After receiving the commit message, Follower loads the data on disk into the znode memory data structure.

Now the data of the Leader and Follower are all in the memory and are consistent. The data read by the client from the Leader and Follower are consistent.

How does ZAB achieve sequential consistency?

When the Leader sends a proposal, it actually creates a queue for each Follower and sends the proposal to their respective queues.

The following figure shows the message broadcast process of Zookeeper:

The client sent three write transaction requests, and the corresponding proposals are:

 proposal01 : zxid1
 proposal02 : zxid2
 proposal03 : zxid3

After receiving the request, the Leader puts it into the queue one by one, and then the Follower gets the request from the queue one by one, thus ensuring the order of the data.

Is Zookeeper strongly consistent?

Official definition: sequential consistency.

Strong consistency is not guaranteed, why?

Because after the Leader sends the commit message to all Followers and Observers, they do not complete the commit at the same time.

For example, due to network reasons, different nodes receive commits later, so the submission time is also later, resulting in inconsistent data among multiple nodes. However, after a short period of time, when all nodes commit, the data will be synchronized.

In addition, Zookeeper supports strong consistency, which means manually calling the sync method to ensure that all nodes are committed for success.

Here is a question: If a node fails to commit, will the leader retry? How to ensure data consistency? Welcome to discuss.

Leader downtime data loss issue

First case:

Assume that the Leader has written the message to the local disk but has not yet sent a proposal to the Follower. At this time, the Leader crashes.

Then you need to select a new leader. When the new leader sends a proposal, the zxid auto-increment rule included in it will change:

The upper 32 bits of zxid are incremented by 1 once, and the upper 32 bits represent the version number of the Leader.
The lower 32 bits of zxid are incremented by 1, and continue to increase in size.

When the old Leader recovers, it will become a Follower. When the Leader sends the latest proposal to it, it finds that the high 32 bits of the zxid of the proposal on the local disk are smaller than the proposal sent by the new Leader, so it discards its own proposal.

Second case:

If the Leader successfully sends a commit message to the Follower, but all or some of the Followers have not had time to commit the proposal, that is, load the proposal from the disk into the memory, then the Leader crashes.

Then you need to select the Follower with the largest zxid in the disk log. If the zxids are the same, compare the node IDs and use the one with the larger node ID as the Leader.

This article tries to use plain language + drawings to explain, hoping to inspire everyone.

<<: No exaggeration or criticism! A rational view of the value and application challenges of cyberspace mapping technology

>>: 5G investment steadily declines: CAPEX spending of the three major operators collectively "shifts"

HostXen offers 50 yuan for new users, starting from 70 yuan per month for 6G memory in the United States/Singapore/Japan/Hong Kong

Blog

How the IoT market will meet the 48 trillion “new infrastructure”

Huawei and its partners participated in the competition together for the first time, and are committed to promoting joint solutions!

Blog

Industrial Internet is entering the fast lane and empowering thousands of industries

Blog

Recommend

How to find the IP address of the router to improve work efficiency

How to find the IP address of a router is an esse...

edgeNAT VPS/dedicated servers are 20% off for monthly payment and 30% off for annual payment. Hong Kong VPS bandwidth upgrade price remains unchanged

edgeNAT is a Chinese host provider established in...

Why do we need RPC when we have HTTP?

This article briefly introduces the two forms of ...

The American Internet is about to undergo a major change: Is the protective umbrella Section 230 really going to disappear?

"Revoke 230!" US President Trump wrote ...

A brief introduction to ZAB protocol in Zookeeper

The role of each Zookeeper node

How to choose a Leader during the startup phase?

What should I do if the Leader crashes during operation?

How to synchronize data between nodes?

How does ZAB achieve sequential consistency?

Is Zookeeper strongly consistent?

Leader downtime data loss issue

HostXen offers 50 yuan for new users, starting from 70 yuan per month for 6G memory in the United States/Singapore/Japan/Hong Kong

How the IoT market will meet the 48 trillion “new infrastructure”

RackNerd March Promotion: KVM for 5 Data Centers starts at $14.99 per year

The 5G digital era is coming. Recognize these 3 trends: seize new opportunities

How IPv6 works in the Internet

Popular understanding of the seven-layer network protocol

Interesting explanation of TCP three-way handshake and four-way wave

HarmonyOS Sample Network Management

Huawei and its partners participated in the competition together for the first time, and are committed to promoting joint solutions!

Industrial Internet is entering the fast lane and empowering thousands of industries

Recommend

How to find the IP address of the router to improve work efficiency

edgeNAT VPS/dedicated servers are 20% off for monthly payment and 30% off for annual payment. Hong Kong VPS bandwidth upgrade price remains unchanged

Why do we need RPC when we have HTTP?

The American Internet is about to undergo a major change: Is the protective umbrella Section 230 really going to disappear?

Six major trends in the development of enterprise campus networks

5G new scenarios and technologies bring new security threats

Industrial IoT is so important to enterprises that future security is also a focus

Cisco ACI original core technology expert reveals the birth of ACI

Four major battles of the Internet of Things broke out in 2018

Practical knowledge: Types and advantages and disadvantages of wireless network topologies

Hostio: €5/month KVM-2GB/25GB/5TB/Netherlands data center

Accelerate the release of new infrastructure value with data as the core

OneTechCloud: 20% off on all VPS, US CN2 GIA&9929/Hong Kong CN2&CMI large bandwidth/high defense optional

Kubesphere deploys Kubernetes using external IP

A must-read for newbies of MPLS static configuration! Detailed case explanations help you get started quickly!