Distributed ID Solution Detailed Explanation

Distributed ID Solution Detailed Explanation

In a distributed system, it is very important to generate a globally unique ID, because in a distributed system, multiple nodes generating IDs at the same time may cause ID conflicts. The following are several commonly used distributed ID solutions.

UUID

UUID (Universally Unique Identifier) ​​is an identifier consisting of 128 bits, which can guarantee uniqueness worldwide because its generation algorithm is based on factors such as timestamp, node ID, etc. UUID can be generated using Java's own UUID class, as shown below:

 javaCopy code
import java .util .UUID ;


public class UuidGenerator {
public static void main ( String [ ] args ) {
UUID uuid = UUID .randomUUID ( ) ;
System .out .println ( uuid .toString ( ) ) ;
}
}

The advantage of UUID is that it is simple and easy to use, does not require additional configuration and management, and can be directly generated using the UUID class that comes with Java. However, UUID is long (128 bits), which is not suitable as the primary key of a database table, and is not easy to sort and index.

Snowflake

Snowflake is an open-source distributed ID generation algorithm from Twitter. It can generate a 64-bit unique ID that contains information such as timestamp, data center ID, and machine ID. The Java code for the Snowflake algorithm is as follows:

Java code for Snowflake algorithm:

 javaCopy code
public class SnowflakeGenerator {
private final static long START_STMP = 1480166465631 L ;
private final static long SEQUENCE_BIT = 12 ;
private final static long MACHINE_BIT = 5 ;
private final static long DATACENTER_BIT = 5 ;
private final static long MAX_DATACENTER_NUM = - 1 L ^ ( - 1 L << DATACENTER_BIT ) ;
private final static long MAX_MACHINE_NUM = - 1 L ^ ( - 1 L << MACHINE_BIT ) ;
private final static long MAX_SEQUENCE = - 1 L ^ ( - 1 L << SEQUENCE_BIT ) ;
private final static long MACHINE_LEFT = SEQUENCE_BIT ;
private final static long DATACENTER_LEFT = SEQUENCE_BIT + MACHINE_BIT ;
private final static long TIMESTMP_LEFT = DATACENTER_LEFT + DATACENTER_BIT ;
private long datacenterId ;
private long machineId ;
private long sequence = 0 L ;
private long lastStmp = - 1 L ;
public SnowflakeGenerator ( long datacenterId , long machineId ) {
if ( datacenterId > MAX_DATACENTER_NUM || datacenterId < 0 ) {
throw new IllegalArgumentException ( "datacenterId can't be greater than MAX_DATACENTER_NUM or less than 0" ) ;
}
if ( machineId > MAX_MACHINE_NUM || machineId < 0 ) {
throw new IllegalArgumentException ( "machineId can't be greater than MAX_MACHINE_NUM or less than 0" ) ;
}
this .datacenterId = datacenterId ;
this .machineId = machineId ;
}
public synchronized long nextId ( ) {
long currStmp = getNewstmp ( ) ;
if ( currStmp < lastStmp ) {
throw new RuntimeException ( "Clock moved backwards. Refusing to generate id" ) ;
}
if ( currStmp == lastStmp ) {
sequence = ( sequence + 1 ) & MAX_SEQUENCE ;
if ( sequence == 0 L ) {
currStmp = getNextMill ( ) ;
}
} else {
sequence = 0 L ;
}
lastStmp = currStmp ;
return ( currStmp - START_STMP ) << TIMESTMP_LEFT
| datacenterId << DATACENTER_LEFT
| machineId << MACHINE_LEFT
| sequence ;
}
private long getNextMill ( ) {
long mill = getNewstmp ( ) ;
while ( mill <= lastStmp ) {
mill = getNewstmp ( ) ;
}
return mill ;
}
private long getNewstmp ( ) {
return System .currentTimeMillis ( ) ;
}
}

The advantages of the Snowflake algorithm are high performance in generating IDs and a short ID length (64 bits), which can be used as the primary key of a database table and is easy to sort and index. However, it should be noted that if the number of nodes in the cluster exceeds the number of bits occupied by the machine ID, or the cluster is large and the number of timestamp bits is insufficient, then other distributed ID generation algorithms need to be considered.

Leaf

Leaf is a distributed ID generation algorithm open sourced by Meituan Dianping, which can generate a globally unique 64-bit ID. The Java code of the Leaf algorithm is as follows:

Java code for Leaf algorithm:

 javaCopy code
public class LeafGenerator {
private static final Logger logger = LoggerFactory .getLogger ( LeafGenerator .class ) ;
private static final String WORKER_ID_KEY = "leaf.worker.id" ;
private static final String PORT_KEY = "leaf.port" ;
private static final int DEFAULT_PORT = 8080 ;
private static final int DEFAULT_WORKER_ID = 0 ;
private static final int WORKER_ID_BITS = 10 ;
private static final int SEQUENCE_BITS = 12 ;
private static final int MAX_WORKER_ID = ( 1 << WORKER_ID_BITS ) - 1 ;
private static final int MAX_SEQUENCE = ( 1 << SEQUENCE_BITS ) - 1 ;
private static final long EPOCH = 1514736000000 L ;
private final SnowflakeIdWorker idWorker ;
public LeafGenerator ( ) {
int workerId = SystemPropertyUtil .getInt ( WORKER_ID_KEY , DEFAULT_WORKER_ID ) ;
int port = SystemPropertyUtil .getInt ( PORT_KEY , DEFAULT_PORT ) ;
this .idWorker = new SnowflakeIdWorker ( workerId , port ) ;
logger .info ( "Initialized LeafGenerator with workerId={}, port={}" , workerId , port ) ;
}
public long nextId ( ) {
return idWorker .nextId ( ) ;
}
private static class SnowflakeIdWorker {
private final long workerId ;
private final long port ;
private long sequence = 0 L ;
private long lastTimestamp = - 1 L ;
SnowflakeIdWorker ( long workerId , long port ) {
if ( workerId < 0 || workerId > MAX_WORKER_ID ) {
throw new IllegalArgumentException ( String .format ( "workerId must be between %d and %d" , 0 , MAX_WORKER_ID ) ) ;
}
this .workerId = workerId ;
this .port = port ;
}
synchronized long nextId ( ) {
long timestamp = System .currentTimeMillis ( ) ;
if ( timestamp < lastTimestamp ) {
throw new RuntimeException ( "Clock moved backwards. Refusing to generate id" ) ;
}
if ( timestamp == lastTimestamp ) {
sequence = ( sequence + 1 ) & MAX_SEQUENCE ;
if ( sequence == 0 L ) {
timestamp = tilNextMillis ( lastTimestamp ) ;
}
} else {
sequence = 0 L ;
}
lastTimestamp = timestamp ;
return ( ( timestamp - EPOCH ) << ( WORKER_ID_BITS + SEQUENCE_BITS ) )
| ( workerId << SEQUENCE_BITS )
| sequence ;
}
private long tilNextMillis ( long lastTimestamp ) {
long timestamp = System .currentTimeMillis ( ) ;
while ( timestamp <= lastTimestamp ) {
timestamp = System .currentTimeMillis ( ) ;
}
return timestamp ;
}
}
}

The Leaf algorithm is characterized by generating IDs at a slightly slower speed than the Snowflake algorithm, but it can support more Worker nodes. The ID generated by the Leaf algorithm consists of three parts: timestamp, Worker ID, and sequence number, of which the timestamp occupies 42 bits, the Worker ID occupies 10 bits, and the sequence number occupies 12 bits, for a total of 64 bits.

The above are common distributed ID generation algorithms. Of course, there are other solutions, such as MongoDB ID, UUID, Twitter Snowflake, etc. Different solutions are suitable for different business scenarios, and the specific implementation details and performance are also different. You need to choose the appropriate solution according to the actual situation.

In addition to the distributed ID generation algorithms introduced above, some new distributed ID generation solutions are constantly emerging, such as Flicker's distributed ID generation algorithm, which uses a similar idea to Snowflake, but adopts a different way of allocating bits. It is more flexible than Snowflake and can dynamically adjust the number of bits occupied by each part as needed. In addition, Facebook has also launched the ID Generation Service (IGS) solution, which separates the generation and storage of IDs, providing a more flexible and scalable solution, but requires more complex architecture design and implementation.

According to different business needs, you can design multiple sets of distributed ID generation solutions. Here are some of my personal suggestions:

  1. Based on database auto-increment ID generation: Using database auto-increment ID as the global unique ID can well ensure the uniqueness of the ID and is simple to implement, but it may cause performance bottlenecks when the concurrency is high. Therefore, it is not recommended in high-concurrency scenarios.
  2. Based on UUID generation: Using UUID as a globally unique ID can well guarantee the uniqueness of the ID, but the ID length is long (128 bits), which is not convenient for storage and transmission, and the probability of duplicate IDs is very small but not zero. Therefore, it is recommended to consider the length of the ID and the cost of storage and transmission when using it in a distributed system.
  3. Redis-based generation: Using Redis' atomic operations can ensure the uniqueness of the ID, and the speed of generating the ID is very fast, which can be applied to high-concurrency scenarios. However, it should be noted that if Redis crashes or has insufficient performance, it may affect the efficiency and availability of ID generation.
  4. Generation based on ZooKeeper: Using the ZooKeeper serial number generator can ensure the uniqueness of the ID and is relatively simple to implement, but it requires the introduction of additional dependencies and resources, and there may be performance bottlenecks.

To choose a distributed ID generation solution that suits your business scenario, you need to consider multiple factors, including ID uniqueness, generation speed, length, storage cost, scalability, availability, etc. At the same time, it should be noted that the implementation details and performance of different solutions are also different, and you need to weigh and choose according to the actual situation.

The following is a detailed code demo for each solution:

Generate auto-increment ID based on database

 javaCopy code
public class IdGenerator {
private static final String JDBC_URL = "jdbc:mysql://localhost:3306/test" ;
private static final String JDBC_USER = "root" ;
private static final String JDBC_PASSWORD = "password" ;

public long generateId ( ) {
Connection conn = null ;
PreparedStatement pstmt = null ;
ResultSet rs = null ;
try {
Class .forName ( "com.mysql.jdbc.Driver" ) ;
conn = DriverManager .getConnection ( JDBC_URL , JDBC_USER , JDBC_PASSWORD ) ;
pstmt = conn .prepareStatement ( "INSERT INTO id_generator (stub) VALUES (null)" , Statement .RETURN_GENERATED_KEYS ) ;
pstmt.executeUpdate ( ) ;
rs = pstmt .getGeneratedKeys ( ) ;
if ( rs .next ( ) ) {
return rs .getLong ( 1 ) ;
}
} catch ( Exception e ) {
e .printStackTrace ( ) ;
finally
try {
if ( rs != null ) {
rs .close ( ) ;
}
if ( pstmt != null ) {
pstmt .close ( ) ;
}
if ( conn != null ) {
conn .close ( ) ;
}
} catch ( Exception e ) {
e .printStackTrace ( ) ;
}
}
return 0 L ;
}
}

Generated based on UUID

 javaCopy code
import java .util .UUID ;


public class IdGenerator {
public String generateId ( ) {
return UUID .randomUUID ( ) .toString ( ) .replace ( "-" , "" ) ;
}
}

Generated based on Redis

 javaCopy code
import redis .clients .jedis .Jedis ;


public class IdGenerator {
private static final String REDIS_HOST = "localhost" ;
private static final int REDIS_PORT = 6379 ;
private static final String REDIS_PASSWORD = "password" ;
private static final int ID_GENERATOR_EXPIRE_SECONDS = 3600 ;
private static final String ID_GENERATOR_KEY = "id_generator" ;

public long generateId ( ) {
Jedis jedis = null ;
try {
jedis = new Jedis ( REDIS_HOST , REDIS_PORT ) ;
jedis .auth ( REDIS_PASSWORD ) ;
long id = jedis .incr ( ID_GENERATOR_KEY ) ;
jedis .expire ( ID_GENERATOR_KEY , ID_GENERATOR_EXPIRE_SECONDS ) ;
return id ;
} catch ( Exception e ) {
e .printStackTrace ( ) ;
finally
if ( jedis != null ) {
jedis .close ( ) ;
}
}
return 0 L ;
}
}

Generated based on ZooKeeper

 javaCopy code
import java .util .concurrent .CountDownLatch ;
import org .apache .zookeeper .CreateMode ;
import org .apache .zookeeper .WatchedEvent ;
import org .apache .zookeeper .Watcher ;
import org .apache .zookeeper .ZooDefs .Ids ;
import org .apache .zookeeper .ZooKeeper ;


public class IdGenerator implements Watcher {
private static final String ZK_HOST = "localhost" ;
private static final int ZK_PORT = 2181 ;
private static final int SESSION_TIMEOUT = 5000 ;
private static final String ID_GENERATOR_NODE = "/id_generator" ;
private static final int ID_GENERATOR_EXPIRE_SECONDS = 3600 ;
private long workerId = 0 ;

public IdGenerator ( ) {
try {
ZooKeeper zk = new ZooKeeper ( ZK_HOST + ":" + ZK_PORT , SESSION_TIMEOUT , this ) ;
CountDownLatch latch = new CountDownLatch ( 1 ) ;
latch .await ( ) ;
if ( zk .exists ( ID_GENERATOR_NODE , false ) == null ) {
zk .create ( ID_GENERATOR_NODE , null , Ids .OPEN_ACL_UNSAFE , CreateMode .PERSISTENT ) ;
}
workerId = zk .getChildren ( ID_GENERATOR_NODE , false ) .size ( ) ;
zk .create ( ID_GENERATOR_NODE + "/worker_" + workerId , null , Ids .OPEN_ACL_UNSAFE , CreateMode .EPHEMERAL ) ;
} catch ( Exception e ) {
e .printStackTrace ( ) ;
}
}

public long generateId ( ) {
ZooKeeper zk = null ;
try {
zk = new ZooKeeper ( ZK_HOST + ":" + ZK_PORT , SESSION_TIMEOUT , null ) ;
CountDownLatch latch = new CountDownLatch ( 1 ) ;
latch .await ( ) ;
zk .create ( ID_GENERATOR_NODE + "/id_" , null , Ids .OPEN_ACL_UNSAFE , CreateMode .EPHEMERAL_SEQUENTIAL , ( rc , path , ctx , name ) -> { } , null ) ;
byte [ ] data = zk .getData ( ID_GENERATOR_NODE + "/worker_" + workerId , false , null ) ;
long id = Long .parseLong ( new String ( data ) ) * 10000 + zk .getChildren ( ID_GENERATOR_NODE , false ) .size ( ) ;
return id ;
} catch ( Exception e ) {
e .printStackTrace ( ) ;
finally
if ( zk != null ) {
try {
zk .close ( ) ;
} catch ( Exception e ) {
e .printStackTrace ( ) ;
}
}
}
return 0 L ;
}


@Override
public void process ( WatchedEvent event ) {
if ( event .getState ( ) == Event .KeeperState .SyncConnected ) {
System .out .println ( "Connected to ZooKeeper" ) ;
CountDownLatch latch = new CountDownLatch ( 1 ) ;
latch .countDown ( ) ;
}
}
}

Note that ZooKeeper's temporary node is used here to coordinate the various working nodes. If a working node hangs up, its temporary node will also be deleted. This ensures that the ID obtained by each working node is unique.

The above are detailed code demos of various distributed ID generation solutions. In fact, each solution has its advantages and disadvantages, and the appropriate solution should be selected according to the specific business scenario and system architecture.

<<:  Improving time efficiency and accuracy: Carrier routing network mining

>>:  How is the ETag value in the HTTP response header generated?

Recommend

When WiFi Master Key quietly takes away your password, do you really not mind?

Many Internet companies or products , in the name...

Competition in the fixed broadband market enters the "second half"

China Telecom leads strongly, China Mobile overta...

5G and cybersecurity risks in 2023

The rollout of 5G networks has been alarmingly sl...

The Wireless Network Alliance praises Wi-Fi 6E, and the future is promising

After Wi-Fi 6, wireless networks have also ushere...

Industry insiders look at this: The history of 5G at the two sessions

[[327682]] A 5G+ holographic remote same-screen i...

I would like to say a few more words about this communication failure...

​These days, everyone is paying attention to the ...

The SD-WAN track has changed. When will the dragon trainer appear?

In an environment where cloud computing, mobile a...

US operators confirm that only premium users can enjoy C-band 5G signals

According to foreign media reports, sources have ...