Seven distributed global ID generation strategies, which one do you prefer?

[[415300]]

After using microservices, many problems that were originally simple have become complicated, such as the global ID issue!

I just happened to use this content in my work recently, so I investigated several common global ID generation strategies on the market and made a brief comparison for your reference.

After the database is divided into different libraries and tables, the original primary key auto-increment method is no longer convenient to use, and a new suitable solution needs to be found. Song Ge's demand was raised under such circumstances.

Next, let’s take a look at it together.

1. Two approaches

Generally speaking, there are two different approaches to this problem:

Let the database handle it itself
Java code to handle the primary key, and then directly insert it into the database

These two ideas correspond to different solutions. Let’s look at them one by one.

2. The database is done by itself

The database handles it by itself, which means that when I insert data, I still don't consider the primary key issue and hope to continue using the database's primary key auto-increment function. However, it is obvious that the original default primary key auto-increment function cannot be used now, and we must have a new solution.

2.1 Modify database configuration

The structure of the database after splitting is as follows (assuming that MyCat is used as the database middleware):

At this time, if the original db1, db2, and db3 continue to increase their primary keys, then for MyCat, the primary key will not be self-incrementing, the primary key will be repeated, and the primary key of the data queried by the user from MyCat will have problems.

Find the cause of the problem, and the rest will be easy to solve.

We can directly modify the starting value and step size of the MySQL database primary key auto-increment.

First, we can view the values of the two related variables through the following SQL:

 SHOW VARIABLES LIKE   'auto_increment%'

It can be seen that the starting value and step size of the primary key auto-increment are both 1.

The starting value is easy to change. It can be set when defining the table. The step size can be achieved by modifying this configuration:

 set @@auto_increment_increment=9;

After the modification, check the corresponding variable value and find that it has changed:

Now when we insert data again, the primary key will not increase by 1 each time, but by 9 each time.

As for the auto-increment starting value, it is actually very easy to set. You can set it when creating the table.

 create   table test01(id integer   PRIMARY   KEY auto_increment,username varchar (255)) auto_increment=8;

Since MySQL can modify the starting value of auto-increment and the step size of each increase, now assuming that I have db1, db2 and db3, I can set the starting value of auto-increment for the tables in these three databases to 1, 2, and 3 respectively, and the step size of auto-increment is 3, so that auto-increment can be achieved.

But it is obvious that this method is not elegant enough, and it is troublesome to handle and inconvenient for future expansion, so it is not recommended.

2.2 MySQL+MyCat+ZooKeeper

If you happen to use MyCat as your database and table sharding tool, then combining it with Zookeeper can also achieve global auto-increment of the primary key.

MyCat, as a distributed database middleware, shields the operation of the database cluster, allowing us to operate the database cluster just like a stand-alone database. It has its own solution for primary key auto-increment:

Implemented via local files
Implemented through database
Implemented via local timestamp
Implemented through a distributed ZK ID generator
Implemented via ZK incremental approach

Here we mainly look at solution 4.

The configuration steps are as follows:

First, change the primary key auto-increment mode to 4, 4 means using zookeeper to implement primary key auto-increment.

server.xml

Configure the table to increment automatically and set the primary key

schema.xml

Set the primary key to auto-increment and set the primary key to id.

Configure Zookeeper information

Configure zookeeper information in myid.properties:

Configure the table to be incremented

sequence_conf.properties

Note that the table name should be capitalized.

TABLE.MINID The minimum value in the current interval of a thread
TABLE.MAXID The maximum value in the current interval of a thread
TABLE.CURID Current value in the current interval of a thread
The MAXID and MINID of the file configuration determine the interval each time it is obtained. This is valid for each thread or process.
The three property configurations in the file are only valid for the first thread of the first process. Other threads and processes will dynamically read ZK

Restart MyCat Test

Finally, restart MyCat, delete the previously created table, and then create a new table for testing.

This method is more convenient and has strong scalability. If you choose MyCat as a database and table sharding tool, this is the best solution.

The two methods introduced above both handle primary key auto-increment at the database or database middleware level, and our Java code does not require additional work.

Next, let's look at several solutions that need to be handled in Java code.

3. Java code processing

3.1 UUID

The most obvious one is UUID (Universally Unique Identifier). The standard format of UUID contains 32 hexadecimal digits, divided into five segments by hyphens, and has 36 characters in the form of 8-4-4-4-12. This is built-in Java and is easy to use. The biggest advantage is that it is generated locally and does not consume network resources. However, any developer in a company knows that this thing is not used much in company projects. The reasons are as follows:

The string is too long for MySQL to index.
The randomness of UUID is very unfriendly to I/O-intensive applications! It will make the insertion of clustered indexes completely random, making the data have no clustering characteristics.
Information insecurity: The algorithm for generating UUID based on MAC address may cause MAC address leakage. This vulnerability was used to find the location of the creator of the Melissa virus.

Therefore, UUID is not the best solution.

3.2 SNOWFLAKE

The Snowflake algorithm is a distributed primary key generation algorithm published by Twitter. It can ensure the non-repetitiveness of primary keys of different processes and the orderliness of primary keys of the same process. In the same process, it first ensures non-repetitiveness through the time bit, and if the time is the same, it ensures it through the sequence bit.

At the same time, since the time bit is monotonically increasing and if the servers are roughly synchronized in time, the generated primary key can be considered to be generally ordered in a distributed environment, which ensures the efficiency of inserting index fields.

For example, the primary key of MySQL's Innodb storage engine. The primary key generated by the snowflake algorithm has four parts in binary representation, from high to low: 1-bit sign bit, 41-bit timestamp bit, 10-bit work process bit, and 12-bit sequence number bit.

Sign bit (1 bit)

The reserved sign bit is always zero.

Timestamp bit (41 bits)

The number of milliseconds that a 41-bit timestamp can hold is 2 to the 41th power, and the number of milliseconds used in a year is: 365 * 24 * 60 * 60 * 1000. By calculation, we know: Math.pow(2, 41) / (365 * 24 * 60 * 60 * 1000L); the result is approximately equal to 69.73 years.

The time epoch of ShardingSphere's snowflake algorithm starts at 0:00 on November 1, 2016, and can be used until 2086. It is believed that it can meet the requirements of most systems.

Work process bit (10bit)

This flag is unique within a Java process. If it is a distributed application deployment, the id of each working process should be different. The default value is 0 and can be set through properties.

Serial number bit (12 bits)

This sequence is used to generate different IDs within the same millisecond. If the number generated within this millisecond exceeds 4096 (2 to the power of 12), the generator will wait until the next millisecond to continue generating.

Note: This algorithm has a clock dialback problem. Server clock dialback will cause duplicate sequences. Therefore, the default distributed primary key generator provides a maximum tolerated clock dialback milliseconds. If the clock dialback time exceeds the maximum tolerated milliseconds threshold, the program will report an error; if it is within the tolerable range, the default distributed primary key generator will wait for the clock to synchronize to the time of the last primary key generation before continuing to work. The default value of the maximum tolerated clock dialback milliseconds is 0, which can be set through properties.

Below Song Ge gives a tool class for the snowflake algorithm, you can refer to:

 public class IdWorker {
 // The time starting point is used as a benchmark. Generally, the most recent time of the system is used (once determined, it cannot be changed)
    private final static long twepoch = 1288834974657L;
 // Number of machine identification digits
    private final static long workerIdBits = 5L;
 // Number of data center identification bits
    private final static long datacenterIdBits = 5L;
 // Maximum value of machine ID
    private final static long maxWorkerId = -1L ^ (-1L << workerIdBits);
 // Maximum value of data center ID
    private final static long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);
 // Increment bit within milliseconds
    private final static long sequenceBits = 12L;
 // Shift the machine ID 12 bits to the left
    private final static long workerIdShift = sequenceBits;
 // Data center ID shifted left by 17 bits
    private final static long datacenterIdShift = sequenceBits + workerIdBits;
 // Time milliseconds shifted left 22 bits
    private final static long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits; 
 
    private final static long sequenceMask = -1L ^ (-1L << sequenceBits);
 /* Last production id timestamp */
    private static long lastTimestamp = -1L;
 // 0, concurrency control
    private long sequence = 0L; 
 
    private final long workerId;
 //Data identification id part
    private final long datacenterId; 
 
 public IdWorker(){
        this.datacenterId = getDatacenterId(maxDatacenterId);
        this.workerId = getMaxWorkerId(datacenterId, maxWorkerId);
 } 
 
 /**
 * @param workerId
 * Work machine ID
 * @param datacenterId
 * Serial Number
 */
 public IdWorker(long workerId, long datacenterId) {
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format( "worker Id can't be greater than %d or less than 0" , maxWorkerId));
 }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format( "datacenter Id can't be greater than %d or less than 0" , maxDatacenterId));
 }
        this.workerId = workerId;
        this.datacenterId = datacenterId;
 } 
 
 /**
 * Get the next ID
 *
 * @return  
 */
 public synchronized long nextId() {
        long timestamp = timeGen();
        if ( timestamp < lastTimestamp) {
            throw new RuntimeException(String.format( "Clock moved backwards. Refusing to generate id for %d milliseconds" , lastTimestamp - timestamp ));
 } 
 
        if (lastTimestamp == timestamp ) {
 // Within the current millisecond, +1
 sequence = ( sequence + 1) & sequenceMask;
            if ( sequence == 0) {
 // If the current millisecond count is full, wait for the next second
 timestamp = tilNextMillis(lastTimestamp);
 }
 } else {
 sequence = 0L;
 }
        lastTimestamp = timestamp ;
 // ID offset combination generates the final ID and returns the ID
        long nextId = (( timestamp - twepoch) << timestampLeftShift)
                | (datacenterId << datacenterIdShift)
                | (workerId << workerIdShift) | sequence ; 
 
 return nextId;
 } 
 
    private long tilNextMillis(final long lastTimestamp) {
        long timestamp = this.timeGen();
        while ( timestamp <= lastTimestamp) {
 timestamp = this.timeGen();
 }
 return   timestamp ;
 } 
 
 private long timeGen() {
 return System.currentTimeMillis();
 } 
 
 /**
 * <p>
 * Get maxWorkerId
 * </p>
 */
    protected static long getMaxWorkerId(long datacenterId, long maxWorkerId) {
        StringBuffer mpid = new StringBuffer();
        mpid.append(datacenterId);
        String name = ManagementFactory.getRuntimeMXBean().getName();
        if (! name .isEmpty()) {
 /*
 * GET jvmPid
 */
            mpid.append( name .split( "@" )[0]);
 }
 /*
 * Get the 16 low bits of the hashcode of MAC + PID
 */
 return (mpid.toString().hashCode() & 0xffff) % (maxWorkerId + 1);
 } 
 
 /**
 * <p>
 * Data identification id part
 * </p>
 */
    protected static long getDatacenterId(long maxDatacenterId) {
 long id = 0L;
 try {
            InetAddress ip = InetAddress.getLocalHost();
            NetworkInterface network = NetworkInterface.getByInetAddress(ip);
            if (network == null ) {
 id = 1L;
 } else {
                byte[] mac = network.getHardwareAddress();
                id = ((0x000000FF & (long) mac[mac.length - 1])
                        | (0x0000FF00 & (((long) mac[mac.length - 2]) << 8))) >> 6;
                id = id % (maxDatacenterId + 1);
 }
        } catch (Exception e) {
            System. out .println( " getDatacenterId: " + e.getMessage());
 }
 return id;
 }
 }

Usage is as follows:

 IdWorker idWorker = new IdWorker(0, 0);
 for ( int i = 0; i < 1000; i++) {
    System. out .println(idWorker.nextId());
 }

3.3 LEAF

Leaf is Meituan's open source distributed ID generation system. The earliest demand was the order ID generation demand of each business line. In the early days of Meituan, some businesses directly generated IDs through DB auto-increment, some businesses generated IDs through Redis cache, and some businesses directly used UUID to generate IDs. The above methods each have their own problems, so Meituan decided to implement a set of distributed ID generation services to meet the needs. Currently, Leaf covers many business lines such as Meituan Dianping's internal finance, catering, takeaway, hotel tourism, Maoyan Movie, etc. Based on 4C8G VM, through the company's RPC method call, the QPS stress test result is nearly 5w/s, TP999 1ms (TP=Top Percentile, Top percentage is a term in statistics, which is the same as the average and median. Indicators such as TP50, TP90 and TP99 are often used in system performance monitoring scenarios, referring to situations above 50%, 90%, 99% and other percentiles).

Currently, there are two different ways to use LEAF, the segment mode and the SNOWFLAKE mode. You can enable both modes at the same time, or specify a certain mode to enable (both modes are disabled by default).

After we clone LEAF from GitHub, its configuration file is in leaf-server/src/main/resources/leaf.properties. The meaning of each configuration is as follows:

As you can see, if the number segment mode is used, database support is required; if the SNOWFLAKE mode is used, Zookeeper support is required.

3.3.1 Number segment mode

The number segment mode is still based on the database, but the idea has changed a bit, as follows:

Use the proxy server to obtain IDs from the database in batches, obtain the value of a segment (the step determines its size) each time, and then go to the database to obtain a new segment after use, which can greatly reduce the pressure on the database.
The different number issuance requirements of each business are distinguished by the biz_tag field. The ID of each biz-tag is obtained in isolation and does not affect each other.
If a new business requires an expanded zone ID, you only need to add a table record.

If we use the number segment mode, we first need to create a data table. The script is as follows:

 CREATE   DATABASE leaf
 CREATE   TABLE `leaf_alloc` (
 `biz_tag` varchar (128) NOT   NULL   DEFAULT   '' ,
 `max_id` bigint (20) NOT   NULL   DEFAULT   '1' ,
 `step` int (11) NOT   NULL ,
  `description` varchar (256) DEFAULT   NULL ,
 `update_time` timestamp   NOT   NULL   DEFAULT   CURRENT_TIMESTAMP   ON   UPDATE   CURRENT_TIMESTAMP ,
 PRIMARY   KEY (`biz_tag`)
 )ENGINE=InnoDB; 
 
 insert   into leaf_alloc(biz_tag, max_id, step, description) values ( 'leaf-segment-test' , 1, 2000, 'Test leaf Segment Mode Get Id' )

The meanings of the fields in this table are as follows:

biz_tag: business tag (different businesses can have different number segment sequences)
max_id: the maximum id in the current number segment
step: the step length of each number segment
description: description information
update_time: update time

After the configuration is complete, start the project and access the http://localhost:8080/api/segment/get/leaf-segment-test path (the leaf-segment-test at the end of the path is a business tag) to get the ID.

You can access the monitoring page of the segment mode through the following address: http://localhost:8080/cache.

Advantages and disadvantages of number segment mode:

advantage

Leaf services can be easily expanded linearly, and their performance is fully capable of supporting most business scenarios.
The ID number is an 8-byte 64-bit number that increases in trend, meeting the primary key requirements of the above database storage.
High disaster tolerance: Leaf service has internal segment cache. Even if the DB goes down, Leaf can still provide normal services to the outside world in a short period of time.
The max_id size can be customized, which is very convenient for migrating businesses from the original ID method.

shortcoming

The ID number is not random enough and can leak information about the number of numbers issued, which is not very secure.
A DB failure will cause the entire system to be unavailable.

3.3.2 SNOWFLAKE mode

SNOWFLAKE mode needs to be used with Zookeeper, but SNOWFLAKE's dependency on Zookeeper is weak. After starting Zookeeper, we can configure Zookeeper information in SNOWFLAKE as follows:

 leaf.snowflake.enable = true  
 leaf.snowflake.zk.address=192.168.91.130
 leaf.snowflake.port=2183

Then restart the project. After successful startup, the ID can be accessed through the following address:

 http://localhost:8080/api/snowflake/get/test

3.4 Redis Generation

This is mainly achieved by using Redis's incrby, which I don't think there is much to say.

3.5 Zookeeper Processing

Zookeeper can also do this, but it is more troublesome and not recommended.

4. Summary

In summary, if MyCat happens to be used in the project, you can use MyCat+Zookeeper, otherwise it is recommended to use LEAF, both modes are acceptable.

This article is reprinted from the WeChat public account "江南一点雨", which can be followed through the following QR code. To reprint this article, please contact the Jiangnan一点雨 public account.

<<: A table to understand the difference between 5G and Wi-Fi 6

>>: How to promote 5G packages in small and medium-sized cities

Base station construction progress exceeds expectations. What changes has 5G brought?

Huawei and China Southern Power Grid jointly released the "Smart Grid Next Generation Transport Technology Liquid OTN White Paper" to accelerate the digital transformation of the power industry

Blog

5G manufacturing involves much more than just 5G

Shumai Technology offers a limited-time 45% discount, with monthly payment starting from 300 yuan for Hong Kong independent servers

Blog

HTTP 2.0 is a bit explosive!

Blog

H3C iMC leads the Chinese network management software market for three consecutive years

Blog

Ministry of Industry and Information Technology: The information and communication industry is operating smoothly, with 5G users accounting for more than 30%

On November 21, the Ministry of Industry and Info...

spinservers Mid-Autumn Festival Promotion: 1Gbps unlimited traffic server $179/month - dual E5-2630Lv3/64GB/1.6T SSD/San Jose data center

The Mid-Autumn Festival is coming soon. Spinserve...

"Three mountains" weighing on China's radio and television industry: China's radio and television industry seeks a way out of the broadband market

The hype about 5G has masked many problems. For C...

China has 600,000 5G base stations. Why should 5G investment be moderately ahead of schedule?

In the popular movie "My Hometown and Me&quo...

WiFi coverage is fully covered and the pain points are solved on the last day of the holiday

It's the last day of the holiday. Did you get...

The construction enthusiasm remains unabated, and 5G will drive the full resumption of work and production in the industrial chain

The novel coronavirus pneumonia epidemic that beg...

Seven distributed global ID generation strategies, which one do you prefer?

1. Two approaches

2. The database is done by itself

2.1 Modify database configuration

2.2 MySQL+MyCat+ZooKeeper

3. Java code processing

3.1 UUID

3.2 SNOWFLAKE

3.3 LEAF

3.4 Redis Generation

3.5 Zookeeper Processing

4. Summary

Base station construction progress exceeds expectations. What changes has 5G brought?

Huawei: "Intelligent Distributed Access Network" creates true gigabit high-quality life experience

Cabling technology continues to evolve to meet rapidly growing network needs

Huawei and China Southern Power Grid jointly released the "Smart Grid Next Generation Transport Technology Liquid OTN White Paper" to accelerate the digital transformation of the power industry

5G manufacturing involves much more than just 5G

TCP three-way handshake and four-way wave and 11 states

Global spending on 5G network infrastructure nearly doubled in 2020

Shumai Technology offers a limited-time 45% discount, with monthly payment starting from 300 yuan for Hong Kong independent servers

HTTP 2.0 is a bit explosive!

H3C iMC leads the Chinese network management software market for three consecutive years

Recommend

The key role of optical transceivers in passive optical network technology

HostTheBest: $2.5/month KVM-quad-core/1GB/30G SSD/1Gbps unlimited traffic/Los Angeles data center

How does DNS affect your surfing speed?

11-year-old naked runner graduated from college by himself. Online education is a great help

Accelerate the construction of new infrastructure projects such as 5G and integrated data centers

2019 Communications Industry Statistical Bulletin: How are the three major operators doing?

spinservers: $99/month - 2E5-2630Lv3/256GB/21.6T SSD/10Gbps bandwidth/San Jose data center

Why is the latency so high for a simple HTTP call? Let’s capture a packet and analyze it

How is the VoLTE development of the three major operators?

Ministry of Industry and Information Technology: The information and communication industry is operating smoothly, with 5G users accounting for more than 30%

spinservers Mid-Autumn Festival Promotion: 1Gbps unlimited traffic server $179/month - dual E5-2630Lv3/64GB/1.6T SSD/San Jose data center

"Three mountains" weighing on China's radio and television industry: China's radio and television industry seeks a way out of the broadband market

China has 600,000 5G base stations. Why should 5G investment be moderately ahead of schedule?

WiFi coverage is fully covered and the pain points are solved on the last day of the holiday

The construction enthusiasm remains unabated, and 5G will drive the full resumption of work and production in the industrial chain