Interview: ZooKeeper 23 questions, see if you can answer them

1. What is ZooKeeper?

ZooKeeper is a distributed, open source distributed application coordination service. It is an open source implementation of Google's Chubby. It is the manager of the cluster, monitoring the status of each node in the cluster and making the next reasonable operation based on the feedback submitted by the node. Ultimately, it provides users with a simple and easy-to-use interface and a high-performance, stable system.

The client's read request can be processed by any machine in the cluster. If the read request registers a listener on the node, the listener is also processed by the connected Zookeeper machine. For write requests, these requests will be sent to other Zookeeper machines at the same time and the request will return success only after reaching a consensus. Therefore, as the number of Zookeeper cluster machines increases, the throughput of read requests will increase but the throughput of write requests will decrease.

Order is a very important feature of Zookeeper. All updates are globally ordered, and each update has a unique timestamp called zxid (Zookeeper Transaction Id). Read requests are only ordered relative to updates, that is, the return result of the read request will carry the latest zxid of the Zookeeper.

2. What does ZooKeeper provide?

1. File system
2. Notification mechanism

3.Zookeeper File System

Zookeeper provides a multi-level node namespace (nodes are called znodes). Unlike the file system, these nodes can be set with associated data, while in the file system only file nodes can store data, not directory nodes. In order to ensure high throughput and low latency, Zookeeper maintains this tree-like directory structure in memory. This feature makes Zookeeper unusable for storing large amounts of data, and the upper limit of data storage for each node is 1M.

4. Four types of znodes

1. PERSISTENT-persistent directory node
After the client disconnects from Zookeeper, the node still exists

2. PERSISTENT_SEQUENTIAL-persistent sequential number directory node
After the client disconnects from Zookeeper, the node still exists, but Zookeeper assigns sequential numbers to the node names.

3. EPHEMERAL-temporary directory node
After the client disconnects from Zookeeper, the node is deleted

4. EPHEMERAL_SEQUENTIAL-temporary sequence number directory node
After the client disconnects from Zookeeper, the node is deleted, but Zookeeper assigns a sequential number to the node name.

5.Zookeeper Notification Mechanism

The client will establish a watcher event for a znode. When the znode changes, these clients will receive notifications from zk, and then the client can make business changes based on the znode changes.

6.What does Zookeeper do?

1. Naming Service
2. Configuration Management
3. Cluster management
4. Distributed Lock
5. Queue Management

7.Zk naming service (file system)

Naming service refers to obtaining the address of resources or services through specified names, and using ZK to create a global path, that is, a unique path. This path can be used as a name to point to a cluster in a cluster, the address of a provided service, or a remote object, etc.

8. zk configuration management (file system, notification mechanism)

The program is distributed and deployed on different machines. The configuration information of the program is placed under the znode of zk. When the configuration changes, that is, when the znode changes, the configuration can be changed by changing the content of a directory node in zk and notifying each client through the watcher.

9.Zookeeper cluster management (file system, notification mechanism)

The so-called cluster management focuses on two points: whether machines exit and join, and whether the master is elected.

For the first point, all machines agree to create a temporary directory node under the parent directory, and then listen to the child node change message of the parent directory node. Once a machine hangs up, the connection between the machine and Zookeeper is disconnected, the temporary directory node it created is deleted, and all other machines receive a notification: a brother directory is deleted, so everyone knows: it is on board.

The same is true when a new machine is added. All machines receive a notification: a new brother directory is added, and highcount is available again. For the second point, we make a slight change. All machines create temporary sequentially numbered directory nodes, and each time select the machine with the smallest number as the master.

10.Zookeeper distributed lock (file system, notification mechanism)

With the consistent file system of Zookeeper, the lock problem becomes easy. Lock services can be divided into two categories, one is to maintain exclusivity and the other is to control timing.

For the first type, we regard a znode on Zookeeper as a lock and implement it through createznode. All clients create /distribute_lock nodes, and the client that successfully creates the node will own the lock. After using it, delete the distribute_lock node you created to release the lock.

For the second type, /distribute_lock already exists. All clients create temporary sequentially numbered directory nodes under it. Just like selecting a master, the one with the smallest number obtains the lock and deletes it after it is used up. This is convenient in sequence.

11. Process of obtaining distributed locks

When acquiring a distributed lock, a temporary sequential node is created under the locker node, and the temporary node is deleted when the lock is released. The client calls the createNode method to create a temporary sequential node under the locker.

Then call getChildren("locker") to get all child nodes under locker. Note that no Watcher is required at this time. After the client obtains all child node paths, if it finds that the node it created has the smallest sequence number among all created child nodes, then it is considered that the client has obtained the lock.

If you find that the node you created is not the smallest among all the child nodes of the locker, it means that you have not yet obtained the lock. At this time, the client needs to find a node that is smaller than itself, and then call the exist() method on it and register an event listener for it. After that, if the node being watched is deleted, the client's Watcher will receive the corresponding notification. At this time, it will again determine whether the node you created is the one with the smallest sequence number among the child nodes of the locker. If it is, the lock is obtained. If not, repeat the above steps to continue to obtain a node smaller than itself and register a listener. At present, this process still requires a lot of logical judgment.

The implementation of the code is mainly based on mutex locks. The key logic of obtaining distributed locks lies in BaseDistributedLock, which implements the details of distributed locks based on Zookeeper.

12.Zookeeper queue management (file system, notification mechanism)

Two types of queues:

A synchronous queue is available only when all members of the queue are gathered together, otherwise it will wait for all members to arrive.
The queue performs enqueue and dequeue operations in FIFO mode.

The first type is to create a temporary directory node under the agreed directory and monitor whether the number of nodes is the number we require.

The second type is consistent with the basic principle of the control sequence scenario in the distributed lock service. Enqueues are numbered and dequeues are numbered. Create a PERSISTENT_SEQUENTIAL node in a specific directory. When the creation is successful, the Watcher notifies the waiting queue, and the queue deletes the node with the smallest sequence number for consumption. In this scenario, Zookeeper's znode is used for message storage. The data stored in the znode is the message content in the message queue. The SEQUENTIAL sequence number is the message number, which can be taken out in sequence. Since the created node is persistent, there is no need to worry about the loss of queue messages.

13.Zookeeper Data Replication

Zookeeper provides consistent data services as a cluster, so naturally, it needs to replicate data between all machines. Benefits of data replication:

Fault tolerance: If a node fails, the entire system will not stop working, and other nodes can take over its work;
Improve the system's scalability: distribute the load to multiple nodes, or add nodes to increase the system's load capacity;
Improve performance: Allow clients to access nearby nodes locally to increase user access speed.

From the perspective of client read and write access transparency, data replication cluster systems are divided into the following two types:

WriteMaster: Submits data modifications to a specified node. There is no such restriction for reading, and any node can be read. In this case, the client needs to distinguish between reading and writing, commonly known as read-write separation;
Write Any: Data modifications can be submitted to any node, just like reads. In this case, the client is transparent to the roles and changes of cluster nodes.

For ZooKeeper, the approach it adopts is to write anything. By adding machines, its read throughput and responsiveness are very scalable, while the write throughput will definitely decrease with the increase of machines (this is also the reason why it establishes observers), and the responsiveness depends on the specific implementation method, whether it is delayed replication to maintain eventual consistency, or immediate replication for fast response.

14.How Zookeeper works

The core of Zookeeper is atomic broadcast, which ensures synchronization between servers. The protocol that implements this mechanism is called Zab protocol. Zab protocol has two modes, recovery mode (leader election) and broadcast mode (synchronization). When the service starts or after the leader crashes, Zab enters recovery mode. When the leader is elected and most servers complete state synchronization with the leader, the recovery mode ends. State synchronization ensures that the leader and the server have the same system state.

15.How does ZooKeeper ensure the sequential consistency of transactions?

Zookeeper uses an incremental transaction ID to identify. All proposals are added with zxid when they are proposed. zxid is actually a 64-bit number. The upper 32 bits are epochs (periods; eras; times; new eras) used to identify whether the leader has changed. If a new leader is generated, the epoch will be incremented, and the lower 32 bits are used to increment the count. When a new proposal is generated, it will first send a transaction execution request to other servers according to the two-stage process of the database. If more than half of the machines can execute and succeed, then execution will begin.

16. Server working status under Zookeeper

Each Server has three states during its working process:

LOOKING: The current server does not know who the leader is and is searching
LEADING: The current server is the elected leader
FOLLOWING: The leader has been elected and the current server is synchronized with it.

17.How does Zookeeper select the primary leader?

When the leader crashes or the leader loses the majority of followers, Zk enters recovery mode, which requires re-electing a new leader to restore all servers to a correct state. There are two Zk election algorithms: one is based on basic paxos, and the other is based on fast paxos. The system default election algorithm is fast paxos.

1. Zookeeper master election process (basic paxos)

1. The election thread is initiated by the current server. Its main function is to count the voting results and select the recommended server.

2. The election thread first sends a query to all servers (including itself);

3. After receiving the reply, the election thread verifies whether it is the query initiated by itself (verify whether the zxid is consistent), then obtains the other party's id (myid) and stores it in the current query object list, and finally obtains the leader-related information (id, zxid) proposed by the other party, and stores this information in the voting record table of the current election;

4. After receiving replies from all servers, calculate the server with the largest zxid and set the relevant information of this server as the server to be voted for next time;

5. The thread sets the server with the largest zxid as the leader recommended by the current server. If the winning server obtains n/2 + 1 server votes at this time, the currently recommended leader is set as the winning server, and its own status is set according to the winning server related information. Otherwise, this process continues until the leader is elected. Through process analysis, we can conclude that in order for the leader to obtain the support of the majority of servers, the total number of servers must be an odd number 2n+1, and the number of surviving servers must not be less than n+1. The above process will be repeated after each server is started. In recovery mode, if the server has just recovered from a crash or just started, it will also restore data and session information from disk snapshots. ZK will record transaction logs and take snapshots regularly to facilitate state recovery during recovery.

2. Zookeeper master election process (basic paxos)

In the fast paxos process, a server first proposes to all servers that it wants to become the leader. After receiving the proposal, other servers resolve the conflict between epoch and zxid, accept the other party's proposal, and then send a message to the other party to complete the acceptance of the proposal. This process is repeated, and the leader will be elected in the end.

18.Zookeeper synchronization process

After selecting the Leader, zk enters the state synchronization process.

Leader waits for server connection;
Follower connects to the leader and sends the largest zxid to the leader;
The leader determines the synchronization point based on the follower’s zxid;
After the synchronization is completed, the follower is notified that the status has become uptodate;
After receiving the uptodate message, the Follower can accept the client's request and provide service again.

19. Distributed Notification and Coordination

For system scheduling: the operator sends a notification by actually changing the state of a node through the console, and then zk sends these changes to all clients that have registered the watcher of this node.

For execution status reporting: Each work process creates a temporary node in a certain directory and carries the progress data of the work, so that the summary process can monitor the changes of the directory subnodes to obtain the real-time global situation of the work progress.

20.Why is there a leader in the machine?

In a distributed environment, some business logic only needs to be executed by one machine in the cluster, and other machines can share the results. This can greatly reduce repeated calculations and improve performance, so leader election is required.

21. How to deal with zk node crash?

Zookeeper itself is also a cluster, and it is recommended to configure no less than 3 servers. Zookeeper itself must also ensure that when one node goes down, other nodes will continue to provide services.

If a Follower fails, there are still two servers that provide access because the data on Zookeeper has multiple copies and the data will not be lost;

If a leader fails, Zookeeper will elect a new leader.

The mechanism of the ZK cluster is that as long as more than half of the nodes are normal, the cluster can provide services normally. The cluster will fail only when too many ZK nodes fail and only half or less than half of the nodes are working.

A cluster of 3 nodes can lose 1 node (the leader can get 2 votes > 1.5)
A cluster of 2 nodes cannot fail if any of the nodes fail (the leader can get 1 vote <= 1)

22.Differences between zookeeper load balancing and nginx load balancing

The load balancing of zk can be adjusted, while nginx can only adjust the weight. Other controllable ones require writing plug-ins by yourself; however, the throughput of nginx is much larger than that of zk, so it should be said that which method to use depends on the business.

23. Zookeeper watch mechanism

Official statement of the Watch mechanism: A Watch event is a one-time trigger. When the data set for Watch changes, the server sends the change to the client that has set Watch to notify them.

Features of the Zookeeper mechanism:

1. When a one-time trigger data changes, a watcher event will be sent to the client, but the client will only receive such information once.

2. Watcher event asynchronous sending Watcher notification events are sent asynchronously from the server to the client. This poses a problem. Different clients and servers communicate through sockets. Due to network delays or other factors, the client may listen to events at different times. Since Zookeeper itself provides ordering guarantees, that is, the client will only perceive changes to the znode it monitors after listening to the event. Therefore, we cannot expect to monitor every change of the node when using Zookeeper. Zookeeper can only guarantee eventual consistency, not strong consistency.

3. Data monitoring Zookeeper has data monitoring and child data monitoring getdata() and exists() set data monitoring, getchildren() sets child node monitoring.

4. Register watcher getData, exists, getChildren

5. Trigger watcher create, delete, setData

6. setData() will trigger the data watch set on the znode (if set succeeds). A successful create() operation will trigger the data watch on the created znode, as well as the child watch on its parent node. A successful delete() operation will trigger both the data watch and child watch of a znode (because there will be no child nodes), and will also trigger the child watch of its parent node.

7. When a client connects to a new server, the watch will be triggered with any session event. When the connection with a server is lost, the watch cannot be received. When the client reconnects, all previously registered watches will be re-registered if necessary. Usually this is completely transparent. Only in one special case, the watch may be lost: for an existing watch of an uncreated znode, if it is created during the client disconnection and then deleted before the client connects, in this case, the watch event may be lost.

8. Watch is lightweight, in fact, it is the callback of the local JVM. The server only stores the Boolean type of whether the Watcher is set.

<<: What is 6G? It may appear in 2030, crushing 5G without any pressure

>>: NB-IoT smart door magnet market stimulated by the epidemic