Can you tell me about Zookeeper's ZAB protocol? Sorry, I have a stomachache!

Can you tell me about Zookeeper's ZAB protocol? Sorry, I have a stomachache!

This article is reproduced from the WeChat public account "Third Prince Ao Bing". Please contact the Third Prince Ao Bing public account for reprinting this article.

Preface

Zab (Zookeeper Atomic Broadcast) is a crash recovery atomic broadcast protocol designed for ZooKeeper. It ensures the consistency of Zookeeper cluster data and the global order of commands.

[[326662]]

Concept Introduction

Before introducing the zab protocol, you must first know several concepts related to zookeeper in order to better understand the zab protocol.

  • Cluster Role
  1. Leader: A cluster can only have one leader at a time, which provides read and write functions for clients and is responsible for synchronizing data to each node;
  2. Follower: Provides read functionality for clients and forwards write requests to the Leader for processing. It participates in Leader election when the Leader crashes and loses connection.
  3. Observer: Unlike Follower, Observer does not participate in Leader election.
  • Service Status
  1. LOOKING: When a node believes that there is no leader in the cluster, the server will enter the LOOKING state in order to find or elect a leader;
  2. FOLLOWING: follower role;
  3. LEADING: leader role;
  4. OBSERVING: observer role;

It can be seen that Zookeeper distinguishes its own role through its own status and performs its own tasks.

  • ZAB state Zookeeper also defines 4 states for ZAB, reflecting the four steps in the process from Zookeeper election to providing external services. State enumeration definition:
  1. public enum ZabState {
  2. ELECTION,
  3. DISCOVERY,
  4. SYNCHRONIZATION,
  5. BROADCAST
  6. }
  1. ELECTION: The cluster enters the election state, during which a node is selected as the leader;
  2. DISCOVERY: Connect to the leader, respond to the leader's heartbeat, and detect whether the leader's role has changed. Only after this step can the elected leader perform real duties;
  3. SYNCHRONIZATION: After the entire cluster confirms the leader, the leader's data will be synchronized to each node to ensure data consistency of the entire cluster;
  4. BROADCAST: Transitions to the broadcast state, and the cluster begins to provide external services.
  • ZXID

Zxid is an extremely important concept. It is a long (64-bit) integer divided into two parts: the epoch part and the counter part. It is a globally ordered number.

Epoch represents the leader of the current cluster. The election of a leader is similar to the change of dynasties. The sword of the previous dynasty cannot kill the officials of the current dynasty. Epoch represents the validity of the current command, and counter is an increasing number.

election

Now that we have introduced the basic concepts, let’s start by explaining how the Zab protocol supports leader election.

There are three questions about the leader election: when will it be held? What are the election rules? What is the selection process?

I will answer these three questions one by one below:

1. When the election occurs There are two times when the leader election occurs. One is when the service is started. When there is no leader node in the entire cluster, it will enter the election state. If the leader already exists, the node will be told the leader's information and connect to the leader itself, and the entire cluster does not need to enter the election state.

Another thing is that various situations may occur during service operation. When the service crashes, power outages, or network latency is very high, the leader can no longer provide services to the outside world. So when other points detect that the leader is lost through heartbeats, the cluster will enter the election state.

2. Election rules Entering the voting process, how can you elect a leader? Or what rules should be followed to make other nodes elect you as the leader?

3. The zab protocol screens votes according to several comparison rules. If your vote is better than mine, it will modify its own voting information and vote for you as the leader.

The following code is the zookeeper voting comparison rule:

  1. /*
  2. * We return   true if one of the following three cases hold:
  3. * 1- New epoch is higher
  4. * 2- New epoch is the same as   current epoch, but new zxid is higher
  5. * 3- New epoch is the same as   current epoch, new zxid is the same
  6. * as   current zxid, but server id is higher.
  7. */
  8.  
  9. return ((newEpoch > curEpoch)
  10. || ((newEpoch == curEpoch)
  11. && ((newZxid > curZxid)
  12. || ((newZxid == curZxid)
  13. && (newId > curId)))));

When the epoch of other nodes is higher than its own, it will be voted. If the epochs are the same, compare the sizes of their own zxids and elect the node with the larger zxid. Here, zxid represents the largest id of the transaction submitted by the node. The larger the zxid, the more complete the data of the node.

Finally, if epoch and zxid are equal, the serverId of the service is compared. This ID is configured by the zookeeper cluster, so when we configure the zookeeper cluster, we can configure the serverId of the cluster with higher service performance to be larger, so that the machine with good performance can play the leader role.

Election Process

Now that we have the timing and rules, here is the leader election process:

  • All nodes first elect themselves as the leader and broadcast the voting information;
  • Receive voting information from the queue;
  • Determine whether the voting information needs to be changed according to the rules, and broadcast the changed voting information again;
  • Determine whether more than half of the votes are for the same node. If the election is over, set your service status according to the voting results and the election is over. Otherwise, continue to enter the voting process.

Example

The above picture comes from "ZooKeeper: Detailed Explanation of Distributed Process Collaboration Technology". The overall process is relatively simple, so I will not analyze it in detail here.

broadcast

After the leader election, the cluster will have two more steps: connecting to the leader and synchronizing. We will not analyze the processes of these two steps in detail here, but will mainly introduce how the cluster ensures the consistency of data on each node when providing external services.

Zab ensures the following characteristics in the broadcast state

  • Reliable delivery: If a message m is delivered by one server, then it will eventually be delivered by all servers.
  • Globally ordered: If a message a is delivered by a server before message b, then all servers delivered a and b, and a came before b.
  • Causally ordered: If message a causally precedes message b and both are delivered, then a must be ordered before b.

Order is a very important property that the Zab protocol must guarantee, because Zookeeper stores data in a data structure similar to a directory structure, and naming order must be required.

For example, if a is named as /test and then b is named as /test/123, if the order cannot be guaranteed and b is named before a, the b command will fail to create because the parent node does not exist.

As shown in the figure above, the entire write request is similar to a two-phase commit.

When a write request is received from a client, the following steps are performed:

  1. The Leader receives the write request from the client and generates a transaction (Proposal) which contains zxid;
  2. The leader starts broadcasting the transaction. It should be noted that the communication between all nodes is maintained by a FIFO queue;
  3. After receiving the transaction, the Follower writes the transaction to the local disk and returns an ACK to the Leader after the write is successful;
  4. After the Leader receives more than half of the ACKs, it starts to commit the transaction and broadcasts the transaction commit information.
  5. Submit this transaction from the node.

From the above process, we can see that Zookeeper ensures the consistency of data in the cluster through two-phase commit. Because transactions can be committed only after receiving more than half of the ACKs, the data in Zookeeper is not strongly consistent.

The orderliness of the zab protocol is ensured in several aspects. First, the TCP protocol is used for communication between services to ensure orderliness in network transmission. Second, a FIFO queue is maintained between nodes to ensure global orderliness. Third, causal orderliness is ensured through the globally incremented zxid.

State Transition

As mentioned above, there are four types of Zookeeper service states and four types of ZAB states. Here we will briefly introduce the state transitions between them, which can help you better understand the role of the ZAB protocol in the Zookeeper workflow.

  1. The service status changes to LOOKING after the service is started or loses connection with the leader;
  2. If the leader does not exist, there is no leader election. If there is a direct connection to the leader, the zab protocol status is ELECTION.
  3. If more than half of the votes choose the same server, the leader election ends, the service status of the server elected as the leader is LEADING, and the service status of other servers is FOLLOWING/OBSERVING;
  4. All servers connect to the leader, and the zab protocol status is DISCOVERY;
  5. The leader synchronizes data to the learner, so that the data of each slave node is consistent with the leader. At this time, the zab protocol status is SYNCHRONIZATION;
  6. After more than half of the servers are synchronized, the cluster provides services to the outside world, and the zab state is BROADCAST.

It can be seen that the workflow of the entire Zookeeper service is similar to the transition of a state machine, and the Zab protocol is the key to driving the flow of service states. If you understand Zab, you will understand the key principles of Zookeeper's work.

Summarize

This article briefly introduces the zab protocol in the workflow of zookeeper, hoping to help everyone understand and learn zookeeper.

I’m Ao Bing, a tool who makes a living on the Internet.

<<:  5G and IoT: Compatible with each other

>>:  5G and IoT: The mobile broadband future of IoT

Recommend

Internet speed is getting slower and slower? The router is the key!

Internet speed is getting slower and slower? The ...

Application of load balancing technology in computing power network

Part 01, ECMP ECMP is a hop-by-hop, flow-based lo...

Spring is coming, the cancellation of data roaming charges? Beware of scams

Mr. Dongguo and the wolf, Lu Dongbin and the dog,...

Launchvps: $39.4/year KVM-4GB/80GB/3TB/Philadelphia

Launchvps is a foreign VPS service provider estab...

The key role of network connectivity in the development of smart cities

Smart cities aim to achieve greater efficiency, s...

Enable IPv6 protocol and experience IPv6 website

IPV4 resources have been exhausted and there is n...

What is 6G and how close are we to its launch?

No, you read that correctly - 6G. Considering tha...

Talking about new IP technology in data centers

Ethernet technology, also known as IP technology,...