Hello everyone, I am Brother Shu. In complex distributed systems, it is often necessary to uniquely identify large amounts of data and messages, such as the ID primary key of a sharded database or table, the request ID of distributed tracing, and so on. Therefore, designing a "distributed ID issuer" has become a very common system design problem. Today I will show you how to design a distributed ID issuer. Article mind map System RequirementsFor business systems, there are generally the following requirements for globally unique IDs:
For the above two requirements, the first one is required by all systems. The second one is not required by all systems. For example, the request ID of distributed tracing does not need to be monotonically increasing. For those scenarios that need to be stored in the database as an ID, it may be necessary to ensure that the globally unique ID is monotonically increasing. In addition, we may also need to consider security issues. If a globally unique ID is incremented sequentially, it may cause business information leakage. For example, if the order ID is incremented by 1 each time, then competitors can directly know the number of orders we place every day through the order ID, which is unacceptable for business. There are many unique ID solutions on the market to meet the above demands. The most common solutions are as follows:
UUIDUUID stands for Universally Unique Identifier, which is a native API in Java. A standard UUID contains 32 hexadecimal digits, divided into 5 segments with a hyphen as a separator. The length of each segment is 8 characters, 4 characters, 4 characters, 4 characters, and 12 characters, respectively, with a total of 36 characters, as shown in the figure below. A simple UUID example: 630e4100-e29b-33d4-a635-246652140000. UUID composition diagram The advantage of UUID, a unique ID solution, is that it has no external dependencies and is purely locally generated, so its performance is very high. However, its disadvantages are also very obvious: The field is very long, wasting storage space. UUID is generally a 36-character string. If it is stored as a database primary key, it will greatly increase the storage space of the index. Non-auto-incremental, which reduces database write performance. UUID is not auto-incremental. If it is used as a database primary key, sequential writes cannot be implemented, which will reduce database write performance. No business meaning. UUID has no business meaning, and we cannot derive any meaning from it. Therefore, UUID is more suitable for non-database ID storage, such as generating a local distributed tracing request ID. Snowflake Algorithm SnowFlake is an open-source distributed ID generation algorithm developed by Twitter. The idea is to use 64 bits to represent an ID and divide the 64 bits into 4 parts, as shown in the following figure. Snowflake algorithm unique ID composition diagram
The advantages of the snowflake algorithm are:
The snowflake algorithm is almost perfect, but it has a fatal flaw: it is strongly dependent on the machine time. If the system time on the machine is dialed back, that is, the time is slower than the normal time, then duplicate numbers may appear. In this case, we can maintain a file locally, write the last timestamp, and then compare it with the current timestamp. If the current timestamp is less than the last timestamp, it means that there is a problem with the system time and it should be handled in time. In general, the snowflake algorithm is not only shorter in length but also has business significance. It can also improve write performance in database storage scenarios. Therefore, the snowflake algorithm for generating distributed unique IDs is popular among people. At present, many domestic large companies have implemented open source number generators based on the snowflake algorithm, such as Baidu's open source UidGenerator, Meituan's open source Leaf, etc. The core of these snowflake-like algorithms is to divide 64 bits more reasonably, so that they are more suitable for their own scenarios. Database auto-increment primary keySpeaking of unique ID, we naturally think of the auto-increment primary key of the database, because it is unique. In the case of low concurrency, we can directly deploy one machine, insert a piece of data into the database table each time an ID is obtained, and then return the primary key ID. The advantage of this method is that it is very simple and has low implementation cost. In addition, the generated unique ID is also monotonically increasing, which can meet the requirements of database write performance. But its disadvantage is also very obvious, that is, it is highly dependent on the database. When the database is abnormal, the entire system will be unavailable. Even if a high-availability switch is made, if the data synchronization is inconsistent during the master-slave switch, it may still cause duplicate numbers. In addition, since it is a stand-alone deployment, its performance bottleneck is limited to the read and write performance of a single MySQL machine, and it is destined to be unable to bear high-concurrency business scenarios. We can solve the performance problem mentioned above through cluster deployment. We can solve the ID conflict problem after cluster deployment by setting the increment step. For example, if we have 3 machines, we set the increment step to 3, and the ID generation strategy for each machine is:
This method solves the problems of cluster deployment and ID conflicts, and can improve the capacity of concurrent access to a certain extent. However, its disadvantages are also obvious: We can only rely on stacking machines to improve performance. When the number of requests increases again, we can only stack machines infinitely, which seems to be a physical defense. Horizontal expansion is difficult. When we need to add a machine, the process is very troublesome. First, we need to deploy the new server and set a new step size. The starting value must be set to an impossible value. After the new server is deployed, we need to deal with the old servers one by one. This process is really painful and can be said to be manual operation and maintenance. Redis atomic incrementSince Redis is an in-memory database, its powerful performance is very suitable for implementing high-concurrency distributed ID generation. To implement auto-increment ID based on Redis, the main method is to use the INCR command in Redis. This command can increment a number and return the result at the same time, and this operation is an atomic operation. The distributed ID function is implemented through Redis. Its mode is similar to the self-increment ID through the database, except that the storage medium is changed from hard disk to memory. When a single Redis cannot support concurrent requests, Redis can also be solved by cluster deployment and step size setting. However, the same problems that database auto-increment primary keys have, Redis auto-increment ID also has, that is, it can only stack machines and has difficulty in horizontal expansion. In addition, compared with the persistence of database storage, Redis is based on memory storage, and the persistence problem needs to be considered, which is also a headache. SummarizeAfter looking at so many distributed ID solutions, which one should we choose? When we make decisions, we should determine the dimensions of the decision. For this issue, the dimensions we should focus on are: R&D cost, concurrency, performance, and operation and maintenance cost. First of all, UUID is actually inferior to the snowflake algorithm in all aspects. Its only advantage is that JDK comes with its own API. Therefore, if you just want to use it very simply, you can just use UUID directly. After all, you still need to write some implementation code for the snowflake algorithm. Secondly, the snowflake algorithm is undoubtedly a very good implementation. Compared with UUID, it not only has the advantages of UUID being generated locally and not relying on third-party systems, but also has business significance, can improve writing performance, and solve security issues. However, its disadvantage is that the code of the snowflake algorithm must be implemented, so its R&D cost is slightly higher than that of UUID. Finally, for the two methods of database self-increment ID and Redis atomic self-increment. The advantage of the database self-increment ID method is also simple and convenient, and does not require too high R&D costs. But its disadvantage is that the supported concurrency is too low, and the subsequent operation and maintenance costs are too high. Therefore, the database self-increment ID method should be suitable for small-scale usage scenarios. The Redis atomic self-increment method is preferred to support high-concurrency scenarios. But the disadvantage is that you need to handle the persistence problem yourself, and the operation and maintenance costs may be relatively high. I prefer the database auto-increment method. The two methods are very similar, the only difference is the storage medium. Redis atomic auto-increment method is very fast, and a single machine may be several times faster than the database method. However, if you want to consider the issue of persistence, it is too complicated for Redis. We sort out the above four implementation methods and summarize them into the following comparison table:
In general, if you consider long-term use, then operation and maintenance costs and high concurrency must be considered. Under this basic condition, the snowflake algorithm and the database auto-increment ID may be relatively good choices. References
|
<<: If the TCP protocol is used, will there be no packet loss?
>>: Easy-to-understand illustrations of online interview knowledge - Part 1
[51CTO.com original article] Veeam ON Forum hoste...
Yecaoyun is a Chinese hosting company founded in ...
If a cabling project is to be successful, you fir...
[[386593]] The coronavirus outbreak broke out in ...
A local area network (LAN) is a computer network ...
Normal 0 7.8 磅 0 2 false false false EN-US ZH-CN ...
HostYun has launched a new product, this time wit...
[51CTO.com original article] On December 22, 2016...
With the changes in traffic flows used in modern ...
[[386748]] HTTP/0.9HTTP/0.9 was proposed in 1991 ...
We know that the direct connection between two co...
1. The main responsibilities of TCP/IP protocol ●...
DiyVM is a Chinese hosting company founded in 200...
Aruba, a subsidiary of Hewlett Packard Enterprise...
Beijing time on the 24th, Russian Deputy Minister...