Part 01What is UUIDUUID stands for Universal Unique Identifier, which is a 128-bit digital code used to uniquely identify network objects or events. Due to its unique generation mechanism and usage scenarios, UUID can ensure global uniqueness and avoid duplication. UUID is widely used in various scenarios that require unique identification, such as database primary keys, system instance IDs, and identification of short-lived Bluetooth profiles and objects. UUID is a term similar to GUID. The GUID originally introduced by Microsoft is actually a variant of UUID. The two terms are defined as synonyms in the RFC 4122 specification. Subsequently, the Open Software Foundation (OSF) standardized UUID, making it an important component in distributed computing networks. All derived UUID versions follow the RFC 4122 specification. Part 02How UUID is constructedBut if the UUID is usually generated by a specific algorithm, such as based on a timestamp or network address. UUID ensures its uniqueness through a specific combination and arrangement, consisting of 32 hexadecimal digits (including numbers 0 to 9 and letters A to F) and 4 hyphens. The number of characters per hyphen is 8-4-4-4-12, where the last 4 digits or N digits indicate the format and encoding: UUID can also be expressed in decimal or binary format: Traditional UUIDs are roughly divided into three variants: ➤ Variant 0: Retained for compatibility with the Apollo network computing system from the 1980s, its structure is similar to the currently used version 1. ➤ Variant 1: The most commonly used variant, defined in the Internet Engineering Documentation Specification as RFC 4122/DCE 1.1 UUID or Leach-Salz UUID. For example, Microsoft's GUID is variant 1 of UUID. ➤ Variant 2: It is retained for compatibility with Microsoft's subsequent development. Although the existing Microsoft GUID is variant 1 of UUID, the early Windows platform uses variant 2. Variant 1 and variant 2 have different bit numbers at the N-bit position. For example, variant 1 uses 2 bits, while variant 2 uses 3 bits. Part 03Current version of UUIDThe existing UUID is mainly derived from variant 1 and consists of 5 different versions. The UUID generation methods of different versions are different, including: ➢ Version1: UUID generated based on timestamp and node. It uses the current time and the computer's MAC address to generate a unique identifier. This version of UUID guarantees global uniqueness and time ordering, but may have security and privacy issues in some cases. ➢ Version2: A UUID generated based on the DCE Security Identifier. This version of the UUID encodes the role and permission information of the identifier into the UUID, making it possible to represent users, groups, and ACLs (access control lists). However, this version of the UUID is not common and is not widely supported. ➢ Version3: UUID generated based on name and namespace. It uses the MD5 hash function to process the name and namespace to generate a unique identifier. This version of UUID can be used to identify objects in the namespace, such as URLs, domain names, etc. ➢ Version4: UUID based on random number generation. This version of UUID is generated using a random number generation algorithm, so it has high randomness and uniqueness. It is currently the most commonly used UUID version and is widely used in various fields. ➢ Version5: A UUID generated based on a name and namespace. It is similar to Version 3, but uses the SHA-1 hash function to provide a stronger hash algorithm. This version of UUID can also be used to identify objects in a namespace. Part 04Existing UID generation strategy4.1 MySQL generates ID MySQL uses the primary key auto_increment method to generate IDs. The step length between IDs is fixed, but the step length can be customized. This method is simple and easy to use, and can ensure the increment and uniqueness of IDs, but there are single point failures and data consistency issues, and there are also certain challenges in scalability. 4.2 MongoDB generates ID The UUID generated by MongoDB is composed of 12 bytes of hexadecimal numbers, specifically: 4 bytes - timestamp in seconds, 3 bytes - machine identifier, 2 bytes - process ID, 3 bytes - counter (starting from a random value). It is shorter than the traditional UUID, but longer than the MYSQL auto-increment field (64-bit Bigint value). 4.3 Redis generates ID Redis usually uses atomic operations INCR or INCRBY to generate IDs. If it is a Redis cluster, you can set the initial value of the ID and customize the step size of each node ID. Since Redis is a single-threaded model, the generated ID can be guaranteed to be unique. 4.4 Zookeeper generates ID Zookeeper mainly generates IDs through ZNODE data versions. This ID is usually composed of a 32-bit or 64-bit string. The client can use this ID as a UID. Due to the strong reliance on Zookeeper, this method is rarely considered in high-concurrency scenarios. Part 05UUID application case analysis5.1 Twitter generates UUID Twitter uses the snowflake algorithm as a professional service to uniformly generate 64-bit unique identifiers for object identification in distributed systems, such as tweets, direct messages, lists, etc. These IDs are unique unsigned 64-bit integers based on time. The complete ID definition is mainly composed of the following:
picture The UID constructed in this way not only improves availability, but also can be sorted by time because the timestamp is used as the first part. By default, the snowflake algorithm generates a 64-bit unsigned long integer, which is an ID with a length of 19. Sometimes, such a long ID may not be needed in a specific project, and the algorithm can be modified according to your own needs. 5.2 Baidu generates UUID Baidu improves the UID generation based on the snowflake algorithm, adjusts the bit of the UID, and changes the timestamp part to 28 bits, which is used to represent the time difference between the current time and the time when the initial online generation is generated (in seconds). The initial time can be manually configured. 22 bits are used to represent the working node ID. When the instance is started, a distributed ID is generated and the self-increasing sequence value is written into the database. A 13-bit sequence number is used to solve problems such as time dialback. If the current time is the same as the last time, the sequence will increase automatically, and if it exceeds the threshold, it will spin. picture 5.3 Meituan generates UUID Meituan improved the snowflake algorithm and used the "1+41+5+5+12" method to form the UID. On the basis of the original snowflake algorithm, 5 bits are used to represent the machine ID, 5 bits represent the computer room ID, and 12 bits represent the self-increment number. Due to the large cluster, the machine ID is configured based on the characteristics of the Zookeeper component. For the solution to the time callback problem, a threshold of 5 milliseconds is set. If the callback time is less than 5 milliseconds, a new UID will be regenerated after waiting for the callback. If it exceeds the threshold, an exception will be thrown. picture Most of the UID generation schemes of other large Internet companies are based on Twitter's snowflake algorithm. Didi uses "timestamp + starting number + license plate number" to generate the corresponding UID, and Taobao orders use "timestamp + user ID" to generate. Didi loads the number segment into memory based on Meituan's scheme and supports multiple master node modes. WeChat's UID generation is mainly bound to the user serial number, and adopts the implementation method of step-by-step persistence and segmented shared storage. Part 06Application practice of provincial broadband TV membership management platformThe provincial broadband TV membership management platform mainly manages the members of broadband TV in each province. It is a large distributed platform with servers distributed across provinces and a large number of them, which has the problem of clock rollback. Therefore, this platform is improved on the basis of the snowflake algorithm, adding the concept of timeline, which can support multiple timelines in parallel at the same time, and solves the problem of clock rollback well. The specific plan is as follows: Adjust 2 bits to represent the timeline, support up to 4 timelines, set a unified initial time for all timelines, designate one timeline as the unified timeline, generate the corresponding ID according to the timeline, and advance the time progress. When the machine clock is dialed back, when the dial-back time interval is greater than the set threshold, switch to another timeline to continue generating IDs. |
<<: The Heart of Smart Devices: Understanding Semiconductor Sensors
[[398509]] This article is reprinted from the WeC...
On May 26, the 2021 China International Big Data ...
IMIDC, also known as Rainbow Cloud, is a local op...
A CDN is a group of geographically distributed pr...
[[392807]] The loop structures of C language incl...
Data packet sending process First, the green chat...
On August 11, 2020, the DevRun Developer Salon As...
80VPS is a long-established Chinese hosting compa...
10g.biz has launched its 2022 New Year event. In ...
The tribe once shared Crunchbits' VPS informa...
The text is reproduced from the official account:...
CloudCone has launched a Hashtag 2022 VPS promoti...
RackNerd has launched a Memorial Day promotion, w...
With the rapid development of the Internet of Thi...