Seven distributed global ID generation strategies, which one do you prefer?

Seven distributed global ID generation strategies, which one do you prefer?

[[415300]]

After using microservices, many problems that were originally simple have become complicated, such as the global ID issue!

I just happened to use this content in my work recently, so I investigated several common global ID generation strategies on the market and made a brief comparison for your reference.

After the database is divided into different libraries and tables, the original primary key auto-increment method is no longer convenient to use, and a new suitable solution needs to be found. Song Ge's demand was raised under such circumstances.

Next, let’s take a look at it together.

1. Two approaches

Generally speaking, there are two different approaches to this problem:

  • Let the database handle it itself
  • Java code to handle the primary key, and then directly insert it into the database

These two ideas correspond to different solutions. Let’s look at them one by one.

2. The database is done by itself

The database handles it by itself, which means that when I insert data, I still don't consider the primary key issue and hope to continue using the database's primary key auto-increment function. However, it is obvious that the original default primary key auto-increment function cannot be used now, and we must have a new solution.

2.1 Modify database configuration

The structure of the database after splitting is as follows (assuming that MyCat is used as the database middleware):

At this time, if the original db1, db2, and db3 continue to increase their primary keys, then for MyCat, the primary key will not be self-incrementing, the primary key will be repeated, and the primary key of the data queried by the user from MyCat will have problems.

Find the cause of the problem, and the rest will be easy to solve.

We can directly modify the starting value and step size of the MySQL database primary key auto-increment.

First, we can view the values ​​of the two related variables through the following SQL:

  1. SHOW VARIABLES LIKE   'auto_increment%'  

It can be seen that the starting value and step size of the primary key auto-increment are both 1.

The starting value is easy to change. It can be set when defining the table. The step size can be achieved by modifying this configuration:

  1. set @@auto_increment_increment=9;

After the modification, check the corresponding variable value and find that it has changed:

Now when we insert data again, the primary key will not increase by 1 each time, but by 9 each time.

As for the auto-increment starting value, it is actually very easy to set. You can set it when creating the table.

  1. create   table test01(id integer   PRIMARY   KEY auto_increment,username varchar (255)) auto_increment=8;

Since MySQL can modify the starting value of auto-increment and the step size of each increase, now assuming that I have db1, db2 and db3, I can set the starting value of auto-increment for the tables in these three databases to 1, 2, and 3 respectively, and the step size of auto-increment is 3, so that auto-increment can be achieved.

But it is obvious that this method is not elegant enough, and it is troublesome to handle and inconvenient for future expansion, so it is not recommended.

2.2 MySQL+MyCat+ZooKeeper

If you happen to use MyCat as your database and table sharding tool, then combining it with Zookeeper can also achieve global auto-increment of the primary key.

MyCat, as a distributed database middleware, shields the operation of the database cluster, allowing us to operate the database cluster just like a stand-alone database. It has its own solution for primary key auto-increment:

  • Implemented via local files
  • Implemented through database
  • Implemented via local timestamp
  • Implemented through a distributed ZK ID generator
  • Implemented via ZK incremental approach

Here we mainly look at solution 4.

The configuration steps are as follows:

  • First, change the primary key auto-increment mode to 4, 4 means using zookeeper to implement primary key auto-increment.

server.xml

  • Configure the table to increment automatically and set the primary key

schema.xml

Set the primary key to auto-increment and set the primary key to id.

  • Configure Zookeeper information

Configure zookeeper information in myid.properties:

  • Configure the table to be incremented

sequence_conf.properties

Note that the table name should be capitalized.

  1. TABLE.MINID The minimum value in the current interval of a thread
  2. TABLE.MAXID The maximum value in the current interval of a thread
  3. TABLE.CURID Current value in the current interval of a thread
  4. The MAXID and MINID of the file configuration determine the interval each time it is obtained. This is valid for each thread or process.
  5. The three property configurations in the file are only valid for the first thread of the first process. Other threads and processes will dynamically read ZK
  • Restart MyCat Test

Finally, restart MyCat, delete the previously created table, and then create a new table for testing.

This method is more convenient and has strong scalability. If you choose MyCat as a database and table sharding tool, this is the best solution.

The two methods introduced above both handle primary key auto-increment at the database or database middleware level, and our Java code does not require additional work.

Next, let's look at several solutions that need to be handled in Java code.

3. Java code processing

3.1 UUID

The most obvious one is UUID (Universally Unique Identifier). The standard format of UUID contains 32 hexadecimal digits, divided into five segments by hyphens, and has 36 characters in the form of 8-4-4-4-12. This is built-in Java and is easy to use. The biggest advantage is that it is generated locally and does not consume network resources. However, any developer in a company knows that this thing is not used much in company projects. The reasons are as follows:

  1. The string is too long for MySQL to index.
  2. The randomness of UUID is very unfriendly to I/O-intensive applications! It will make the insertion of clustered indexes completely random, making the data have no clustering characteristics.
  3. Information insecurity: The algorithm for generating UUID based on MAC address may cause MAC address leakage. This vulnerability was used to find the location of the creator of the Melissa virus.

Therefore, UUID is not the best solution.

3.2 SNOWFLAKE

The Snowflake algorithm is a distributed primary key generation algorithm published by Twitter. It can ensure the non-repetitiveness of primary keys of different processes and the orderliness of primary keys of the same process. In the same process, it first ensures non-repetitiveness through the time bit, and if the time is the same, it ensures it through the sequence bit.

At the same time, since the time bit is monotonically increasing and if the servers are roughly synchronized in time, the generated primary key can be considered to be generally ordered in a distributed environment, which ensures the efficiency of inserting index fields.

For example, the primary key of MySQL's Innodb storage engine. The primary key generated by the snowflake algorithm has four parts in binary representation, from high to low: 1-bit sign bit, 41-bit timestamp bit, 10-bit work process bit, and 12-bit sequence number bit.

  • Sign bit (1 bit)

The reserved sign bit is always zero.

  • Timestamp bit (41 bits)

The number of milliseconds that a 41-bit timestamp can hold is 2 to the 41th power, and the number of milliseconds used in a year is: 365 * 24 * 60 * 60 * 1000. By calculation, we know: Math.pow(2, 41) / (365 * 24 * 60 * 60 * 1000L); the result is approximately equal to 69.73 years.

The time epoch of ShardingSphere's snowflake algorithm starts at 0:00 on November 1, 2016, and can be used until 2086. It is believed that it can meet the requirements of most systems.

  • Work process bit (10bit)

This flag is unique within a Java process. If it is a distributed application deployment, the id of each working process should be different. The default value is 0 and can be set through properties.

  • Serial number bit (12 bits)

This sequence is used to generate different IDs within the same millisecond. If the number generated within this millisecond exceeds 4096 (2 to the power of 12), the generator will wait until the next millisecond to continue generating.

Note: This algorithm has a clock dialback problem. Server clock dialback will cause duplicate sequences. Therefore, the default distributed primary key generator provides a maximum tolerated clock dialback milliseconds. If the clock dialback time exceeds the maximum tolerated milliseconds threshold, the program will report an error; if it is within the tolerable range, the default distributed primary key generator will wait for the clock to synchronize to the time of the last primary key generation before continuing to work. The default value of the maximum tolerated clock dialback milliseconds is 0, which can be set through properties.

Below Song Ge gives a tool class for the snowflake algorithm, you can refer to:

  1. public class IdWorker {
  2. // The time starting point is used as a benchmark. Generally, the most recent time of the system is used (once determined, it cannot be changed)
  3. private final static long twepoch = 1288834974657L;
  4. // Number of machine identification digits
  5. private final static long workerIdBits = 5L;
  6. // Number of data center identification bits
  7. private final static long datacenterIdBits = 5L;
  8. // Maximum value of machine ID
  9. private final static long maxWorkerId = -1L ^ (-1L << workerIdBits);
  10. // Maximum value of data center ID
  11. private final static long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
  12. // Increment bit within milliseconds
  13. private final static long sequenceBits = 12L;
  14. // Shift the machine ID 12 bits to the left
  15. private final static long workerIdShift = sequenceBits;
  16. // Data center ID shifted left by 17 bits
  17. private final static long datacenterIdShift = sequenceBits + workerIdBits;
  18. // Time milliseconds shifted left 22 bits
  19. private final static long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits;
  20.  
  21. private final static long sequenceMask = -1L ^ (-1L << sequenceBits);
  22. /* Last production id timestamp */
  23. private static long lastTimestamp = -1L;
  24. // 0, concurrency control
  25. private long sequence = 0L;
  26.  
  27. private final long workerId;
  28. //Data identification id part
  29. private final long datacenterId;
  30.  
  31. public IdWorker(){
  32. this.datacenterId = getDatacenterId(maxDatacenterId);
  33. this.workerId = getMaxWorkerId(datacenterId, maxWorkerId);
  34. }
  35.  
  36. /**
  37. * @param workerId
  38. * Work machine ID
  39. * @param datacenterId
  40. * Serial Number
  41. */
  42. public IdWorker(long workerId, long datacenterId) {
  43. if (workerId > maxWorkerId || workerId < 0) {
  44. throw new IllegalArgumentException(String.format( "worker Id can't be greater than %d or less than 0" , maxWorkerId));
  45. }
  46. if (datacenterId > maxDatacenterId || datacenterId < 0) {
  47. throw new IllegalArgumentException(String.format( "datacenter Id can't be greater than %d or less than 0" , maxDatacenterId));
  48. }
  49. this.workerId = workerId;
  50. this.datacenterId = datacenterId;
  51. }
  52.  
  53. /**
  54. * Get the next ID
  55. *
  56. * @return  
  57. */
  58. public synchronized long nextId() {
  59. long timestamp = timeGen();
  60. if ( timestamp < lastTimestamp) {
  61. throw new RuntimeException(String.format( "Clock moved backwards. Refusing to generate id for %d milliseconds" , lastTimestamp - timestamp ));
  62. }
  63.  
  64. if (lastTimestamp == timestamp ) {
  65. // Within the current millisecond, +1
  66. sequence = ( sequence + 1) & sequenceMask;
  67. if ( sequence == 0) {
  68. // If the current millisecond count is full, wait for the next second
  69. timestamp = tilNextMillis(lastTimestamp);
  70. }
  71. } else {
  72. sequence = 0L;
  73. }
  74. lastTimestamp = timestamp ;
  75. // ID offset combination generates the final ID and returns the ID
  76. long nextId = (( timestamp - twepoch) << timestampLeftShift)
  77. | (datacenterId << datacenterIdShift)
  78. | (workerId << workerIdShift) | sequence ;
  79.  
  80. return nextId;
  81. }
  82.  
  83. private long tilNextMillis(final long lastTimestamp) {
  84. long timestamp = this.timeGen();
  85. while ( timestamp <= lastTimestamp) {
  86. timestamp = this.timeGen();
  87. }
  88. return   timestamp ;
  89. }
  90.  
  91. private long timeGen() {
  92. return System.currentTimeMillis();
  93. }
  94.  
  95. /**
  96. * <p>
  97. * Get maxWorkerId
  98. * </p>
  99. */
  100. protected static long getMaxWorkerId(long datacenterId, long maxWorkerId) {
  101. StringBuffer mpid = new StringBuffer();
  102. mpid.append(datacenterId);
  103. String name = ManagementFactory.getRuntimeMXBean().getName();
  104. if (! name .isEmpty()) {
  105. /*
  106. * GET jvmPid
  107. */
  108. mpid.append( name .split( "@" )[0]);
  109. }
  110. /*
  111. * Get the 16 low bits of the hashcode of MAC + PID
  112. */
  113. return (mpid.toString().hashCode() & 0xffff) % (maxWorkerId + 1);
  114. }
  115.  
  116. /**
  117. * <p>
  118. * Data identification id part
  119. * </p>
  120. */
  121. protected static long getDatacenterId(long maxDatacenterId) {
  122. long id = 0L;
  123. try {
  124. InetAddress ip = InetAddress.getLocalHost();
  125. NetworkInterface network = NetworkInterface.getByInetAddress(ip);
  126. if (network == null ) {
  127. id = 1L;
  128. } else {
  129. byte[] mac = network.getHardwareAddress();
  130. id = ((0x000000FF & (long) mac[mac.length - 1])
  131. | (0x0000FF00 & (((long) mac[mac.length - 2]) << 8))) >> 6;
  132. id = id % (maxDatacenterId + 1);
  133. }
  134. } catch (Exception e) {
  135. System. out .println( " getDatacenterId: " + e.getMessage());
  136. }
  137. return id;
  138. }
  139. }

Usage is as follows:

  1. IdWorker idWorker = new IdWorker(0, 0);
  2. for ( int i = 0; i < 1000; i++) {
  3. System. out .println(idWorker.nextId());
  4. }

3.3 LEAF

Leaf is Meituan's open source distributed ID generation system. The earliest demand was the order ID generation demand of each business line. In the early days of Meituan, some businesses directly generated IDs through DB auto-increment, some businesses generated IDs through Redis cache, and some businesses directly used UUID to generate IDs. The above methods each have their own problems, so Meituan decided to implement a set of distributed ID generation services to meet the needs. Currently, Leaf covers many business lines such as Meituan Dianping's internal finance, catering, takeaway, hotel tourism, Maoyan Movie, etc. Based on 4C8G VM, through the company's RPC method call, the QPS stress test result is nearly 5w/s, TP999 1ms (TP=Top Percentile, Top percentage is a term in statistics, which is the same as the average and median. Indicators such as TP50, TP90 and TP99 are often used in system performance monitoring scenarios, referring to situations above 50%, 90%, 99% and other percentiles).

Currently, there are two different ways to use LEAF, the segment mode and the SNOWFLAKE mode. You can enable both modes at the same time, or specify a certain mode to enable (both modes are disabled by default).

After we clone LEAF from GitHub, its configuration file is in leaf-server/src/main/resources/leaf.properties. The meaning of each configuration is as follows:

As you can see, if the number segment mode is used, database support is required; if the SNOWFLAKE mode is used, Zookeeper support is required.

3.3.1 Number segment mode

The number segment mode is still based on the database, but the idea has changed a bit, as follows:

  • Use the proxy server to obtain IDs from the database in batches, obtain the value of a segment (the step determines its size) each time, and then go to the database to obtain a new segment after use, which can greatly reduce the pressure on the database.
  • The different number issuance requirements of each business are distinguished by the biz_tag field. The ID of each biz-tag is obtained in isolation and does not affect each other.
  • If a new business requires an expanded zone ID, you only need to add a table record.

If we use the number segment mode, we first need to create a data table. The script is as follows:

  1. CREATE   DATABASE leaf
  2. CREATE   TABLE `leaf_alloc` (
  3. `biz_tag` varchar (128) NOT   NULL   DEFAULT   '' ,
  4. `max_id` bigint (20) NOT   NULL   DEFAULT   '1' ,
  5. `step` int (11) NOT   NULL ,
  6. `description` varchar (256) DEFAULT   NULL ,
  7. `update_time` timestamp   NOT   NULL   DEFAULT   CURRENT_TIMESTAMP   ON   UPDATE   CURRENT_TIMESTAMP ,
  8. PRIMARY   KEY (`biz_tag`)
  9. )ENGINE=InnoDB;
  10.  
  11. insert   into leaf_alloc(biz_tag, max_id, step, description) values ​​( 'leaf-segment-test' , 1, 2000, 'Test leaf Segment Mode Get Id' )

The meanings of the fields in this table are as follows:

  • biz_tag: business tag (different businesses can have different number segment sequences)
  • max_id: the maximum id in the current number segment
  • step: the step length of each number segment
  • description: description information
  • update_time: update time

After the configuration is complete, start the project and access the http://localhost:8080/api/segment/get/leaf-segment-test path (the leaf-segment-test at the end of the path is a business tag) to get the ID.

You can access the monitoring page of the segment mode through the following address: http://localhost:8080/cache.

Advantages and disadvantages of number segment mode:

advantage

  • Leaf services can be easily expanded linearly, and their performance is fully capable of supporting most business scenarios.
  • The ID number is an 8-byte 64-bit number that increases in trend, meeting the primary key requirements of the above database storage.
  • High disaster tolerance: Leaf service has internal segment cache. Even if the DB goes down, Leaf can still provide normal services to the outside world in a short period of time.
  • The max_id size can be customized, which is very convenient for migrating businesses from the original ID method.

shortcoming

  • The ID number is not random enough and can leak information about the number of numbers issued, which is not very secure.
  • A DB failure will cause the entire system to be unavailable.

3.3.2 SNOWFLAKE mode

SNOWFLAKE mode needs to be used with Zookeeper, but SNOWFLAKE's dependency on Zookeeper is weak. After starting Zookeeper, we can configure Zookeeper information in SNOWFLAKE as follows:

  1. leaf.snowflake.enable = true  
  2. leaf.snowflake.zk.address=192.168.91.130
  3. leaf.snowflake.port=2183

Then restart the project. After successful startup, the ID can be accessed through the following address:

  1. http://localhost:8080/api/snowflake/get/test

3.4 Redis Generation

This is mainly achieved by using Redis's incrby, which I don't think there is much to say.

3.5 Zookeeper Processing

Zookeeper can also do this, but it is more troublesome and not recommended.

4. Summary

In summary, if MyCat happens to be used in the project, you can use MyCat+Zookeeper, otherwise it is recommended to use LEAF, both modes are acceptable.

This article is reprinted from the WeChat public account "江南一点雨", which can be followed through the following QR code. To reprint this article, please contact the Jiangnan一点雨 public account.

<<:  A table to understand the difference between 5G and Wi-Fi 6

>>:  How to promote 5G packages in small and medium-sized cities

Recommend

Cloud empowers new life and Wind River IoT genes are upgraded again

There is a wind power plant abroad that mainly us...

Maxthon Hosting: 56 yuan/month Hong Kong CN2-2GB/40G SSD/300GB/15M Bandwidth

Aoyo Zhuji is one of the foreign hosting services...

5G Massive MIMO Says Goodbye to Power-hungry 5G Base Stations

The attacks on the large-scale construction of 5G...

Let’s talk about what is 5G CPE?

[[350048]] This article is reprinted from the WeC...

Understand 3GPP 5G versions and the features of each version

While for years cellular technology has been prim...

Transition technology from IPv4 to IPv6

As IPv4 addresses are about to be exhausted, the ...

An article on learning Go network library Gnet analysis

Introduction We analyzed the Go native network mo...

Why the coronavirus pandemic makes 5G more important than ever

While 2020 has brought unprecedented challenges, ...

How do these countries plan their 5G breakthrough amid the COVID-19 crisis?

5G is a new technology field that all countries a...