Interview question overview:
Interviewer: What are the underlying data types of Redis? What underlying data types are used to make up the five basic data types of Redis?The underlying data types of Redis mainly include the following, which are used to implement and support the five main data types provided by Redis (String, Hash, List, Set, Zset): 1. Simple Dynamic String (SDS)Purpose: Mainly used to store String type values. FeaturesSDS is a string data structure built by Redis. Compared with the traditional string in C language (ending with the null character '\0'), SDS has the advantages of automatic expansion, length cache, binary security, etc. SDS uses the structure to record information such as the length of the string and the size of the allocated space, thereby improving the performance of string operations. 2. ZipListPurpose: Used to store data of types such as Hash, List, Zset, etc. when the amount of data of these types is small and the elements are small. Features A compressed list is a set of continuous memory blocks that can save space. It contains multiple entry nodes, each of which contains information such as the length of the previous node and the node content. When the amount of data increases or the elements become larger, the compressed list may be converted to other data structures (such as hash tables, doubly linked lists, etc.). 3. HashtableIt is mainly used to store data of types such as Hash, Set, Zset, etc. when the amount of data of these types is large or the elements are large. Features: The hash table maintains an array structure internally and determines the position of the element in the array by calculating the hash value of the key. The hash table supports fast search, insertion, and deletion operations. 4. LinkedList and QuickListUsage Doubly linked lists are used to store List type data. Before Redis 3.2, the underlying implementation of List was a doubly linked list or a compressed list. Redis 3.2 and later introduced QuickList, which is a combination of a doubly linked list and a compressed list, to optimize the performance of the List type. Features Each node in a doubly linked list holds a reference to the previous and next nodes, supporting operations at both ends. The fast list is implemented by combining multiple compressed lists, which not only retains the space advantage of the compressed list, but also has the operational flexibility of the doubly linked list. 5. Integer Set (IntSet)Purpose: Used to store Set type data when all elements in the Set are integers and the number is small. Features: Integer sets use arrays to store elements and select different encoding methods based on the element type (such as 16-bit integers, 32-bit integers, and 64-bit integers). When adding new elements to an integer set, if the capacity or type range of the current array is exceeded, the capacity will be expanded or upgraded. 6. SkipListPurpose: Mainly used to store Zset type data to achieve orderliness. Features Skip list is a linked list with a hierarchical structure, and each layer is an ordered linked list. Through the combination of multiple layers of linked lists, the skip list can complete the search, insertion and deletion operations in O(log N) time. At the same time, the skip list also uses a hash table to store the correspondence between members and scores to provide fast member search. Redis's five basic data types (String, List, Set, Hash, Zset) are composed of multiple underlying data structures to meet data storage and operation requirements in different scenarios. The following is the correspondence between these basic data types and the underlying data structures: (1) String Underlying data structure: Simple Dynamic String (SDS) (2) List Underlying data structure: bidirectional linked list, compressed list (ZipList) Note: When there are fewer list elements, Redis uses compressed lists to save memory. When there are more elements, a bidirectional linked list is used to support fast insertion and deletion operations. (3) Set Underlying data structures: Hash Table, IntSet Note: When the elements in the set are all integers and the number is small, Redis uses integer sets to optimize memory usage. When the number of elements is large or contains non-integer elements, a hash table is used to implement fast addition, deletion, and query operations. (4) Hash Underlying data structures: ZipList, Hash Table Note: When the hash object stores a small number of key-value pairs and the length of the key and value strings is less than 64 bytes, a compressed list is used as the underlying implementation to save memory. When there are many key-value pairs or the key and value strings are long, a hash table is used to implement fast insertion, search, and deletion operations. (5) Zset (ordered set) Underlying data structure: SkipList, Hash Table Description: A skip list is a data structure of an ordered linked list that provides fast insertion, deletion, and search operations. By using a combination of a skip list and a hash table, an ordered set can maintain order while also performing range searches and ranking calculations based on scores quickly. A hash table is used to store the correspondence between members and scores. Interviewer: Can you tell me in detail how to implement Redis hash table and how to expand its capacity?1. Basic structureThe hash table in Redis is represented by a dict structure, which contains a dict (hash table) object. The dict structure contains the following key fields:
2. Key-value pair storageEach dictEntry object contains a key-value pair and a pointer to the next dictEntry, forming a linked list structure. When a hash conflict occurs, the new key-value pair is added to the end of the linked list at the conflicting position. This design enables Redis to efficiently handle hash conflicts without complex rehashing operations. When creating a hash object, you can get the following diagram (some properties are omitted): 3. Hash FunctionRedis uses the MurmurHash2 algorithm as a hash function. This algorithm is a non-cryptographic hash function known for its high efficiency and low collision rate. Through the MurmurHash2 algorithm, Redis can map keys of arbitrary length to fixed-length hash values to determine the position of the key in the hash table. 4. Load factor and rehashThe load factor is an indicator of the degree of hash table usage, and the calculation formula is the number of used nodes / hash table size. When the load factor exceeds the preset threshold (the default is 0.75), Redis triggers a rehash operation to expand the size of the hash table and reduce the probability of collision. The rehash operation involves the following steps: (1) Allocate a new hash table whose size is twice the current hash table size (or other multiples depending on the configuration). (2) Traverse all key-value pairs in the current hash table, recalculate the hash value based on the new hash table size, and then insert the key-value pairs into the new hash table. (3) Replace the old hash table with the new hash table to complete the rehash operation. It is worth noting that Redis uses a progressive rehash strategy to avoid performance issues caused by processing a large amount of data at one time. During the progressive rehash, each time the hash table is added, deleted, modified, or queried, a portion of the data will be migrated from the old table to the new table until the migration is complete. The following is the specific process of rehash:
Progressive rehashing uses a divide-and-conquer approach, distributing the rehashing operation among each access, thus avoiding the huge amount of computation caused by centralized rehashing. It should be noted that in the process of progressive rehash, if there are add, delete, modify, or query operations, if index is greater than rehashindex, access ht[0], otherwise access ht[1]. The advantage of progressive rehashing is that it can gradually expand or shrink the hash table without affecting the main thread service request, greatly reducing the impact on system performance. However, it also brings some additional memory space overhead because the new and old hash tables need to be maintained simultaneously during the rehashing process. Interviewer: Why is Redis so fast? What is the QPS of a single Redis server?Redis is very fast due to several key factors: (1) Memory-based data storage: Redis stores data in memory, which greatly reduces the overhead of disk I/O operations. Compared with traditional databases that need to read and write data from disk, Redis can read and write data very quickly, thus achieving an extremely high operation rate. (2) Efficient data structures: Redis supports a variety of data structures, such as strings, hashes, lists, sets, and ordered sets. These data structures have been carefully designed and optimized to minimize the time complexity of data storage and access. For example, Redis uses simple dynamic strings (SDS) to process strings. Compared with traditional string processing methods in C language, SDS is more efficient in obtaining string length, modifying strings, and allocating memory. (3) Reasonable data encoding: Redis can automatically select the optimal encoding method based on the type and size of the data. For example, for string data, Redis will choose int encoding or raw encoding based on the length of the string and the numeric nature of the content. This reasonable encoding selection allows Redis to maintain high performance when processing data of different types and sizes. (4) Single-threaded model: Redis uses a single-threaded model to process client requests. This model avoids the overhead of context switching and lock contention between multiple threads, thereby increasing the speed of processing requests. At the same time, Redis uses I/O multiplexing technology to simultaneously process connections and requests from multiple clients, further improving concurrency performance. (5) Asynchronous non-blocking I/O: Redis uses an asynchronous non-blocking I/O model to process network requests. This means that Redis can continue to perform other tasks while waiting for I/O operations to complete, thereby improving overall throughput and responsiveness. The QPS (query per second) performance of a single Redis server depends on many factors, including the Redis version, hardware configuration, operating system, network conditions, and the complexity of business operations. Generally speaking, a single Redis server can support tens to hundreds of thousands of QPS. However, to achieve more than 100,000 QPS, a single Redis server may face greater pressure, and it is usually necessary to consider using a Redis cluster or other distributed architecture to share the load. Benchmarks can be run using the redis-benchmark command:
Interviewer: You just mentioned that Redis uses a single-threaded model. Is Redis really a single-threaded application? If so, please explain why it uses a single-threaded model?From the perspective of core operations, Redis does use a single thread to execute commands, including receiving client requests, parsing requests, data reading and writing, and returning results to the client. These processes are all completed by a main thread. This is why Redis is called a single-threaded database. However, from the perspective of overall functionality and implementation, Redis is not strictly single-threaded. In versions prior to Redis 6.0, although most operations were completed by the main thread, there were also some background threads or subprocesses processing tasks, such as cleaning dirty data, generating snapshots, and rewriting AOF. These background tasks exist to avoid blocking the main thread and improve the overall performance of Redis. In Redis 6.0 and later versions, Redis introduced a multi-threaded model to handle network I/O tasks. This multi-threaded model is only used to handle network data reading and writing and protocol parsing, while the execution of read and write commands is still a single thread. This design is to make full use of the multi-core resources of the server CPU and improve the network I/O performance of Redis. Therefore, it can be said that Redis adopts a single-threaded model when executing commands, but from the perspective of overall implementation and functionality, it is not completely single-threaded. Redis achieves high performance and efficiency by combining the advantages of single thread and multi-thread, and utilizing memory and non-blocking I/O technology. The reasons for using a single-threaded model when executing commands are as follows: 1. Avoid excessive context switching overhead The multi-threaded scheduling process inevitably requires switching thread contexts between CPUs, and context switching involves a series of register replacements such as the program counter, stack pointer, and program status word, program stack reset, and even CPU cache and TLB cache replacement. If it is multi-threaded switching within a process, it is better, because multiple threads in a single process share the process address space, so the thread context is much smaller than the process context. If it is cross-process scheduling, the entire process address space needs to be switched. If it is single-threaded, the frequent thread switching overhead within the process can be avoided, because the program always runs in a single thread in the process, and there is no multi-thread switching scenario. 2. Avoid the overhead of synchronization mechanisms If Redis chooses a multi-threaded model, and because Redis is a database, it will inevitably involve the issue of underlying data synchronization, and some synchronization mechanisms will inevitably be introduced, such as locks. As we know, Redis not only provides simple key-value data structures, but also other rich data structures such as lists, sets, and hashes. Different data structures have different locking granularities for synchronous access, which may result in a lot of locking and unlocking overheads in the process of operating data, increasing program complexity while reducing performance. 3. Simple and maintainable The original intention of the Redis author to design and code Redis is to be simple and maintainable, but the introduction of multithreading will inevitably lead to an increase in code complexity and a decrease in maintainability. First, the introduction of multi-threading will make the program no longer maintain the logical seriality of the code, and the order of code execution will become unpredictable. If you are not careful, it will cause various concurrent programming problems in the program; secondly, the multi-threaded mode also makes program debugging more complicated and troublesome. If Redis uses multi-threaded mode, then all underlying data structures must be implemented as thread-safe, which undoubtedly makes the implementation of Redis more complicated. In summary, Redis' choice of single thread in the main scenario of command execution can be said to be a trade-off after multi-party negotiation: while ensuring sufficient performance, using single thread keeps the code simple and maintainable. Interviewer: Can you tell me in detail why Redis 6.0 introduced multithreading? Where does Redis use multithreading? What is the implementation mechanism of Redis multithreading?The reason why Redis initially chose a single-threaded network model is that the CPU is usually not a performance bottleneck, the bottleneck is often memory and network, so a single thread is sufficient. Now Redis is introducing multi-threading because the network I/O bottleneck of Redis has become increasingly obvious. With the rapid development of the Internet, the online traffic that Internet business systems need to handle is increasing. The single-threaded mode of Redis causes the system to consume a lot of CPU time on network I/O, thereby reducing throughput. There are two ways to improve the performance of Redis:
The latter depends on the development of hardware and has no solution yet. So we can only start from the former. The optimization of network I/O can be divided into two directions:
Zero-copy technology has its limitations and cannot fully adapt to complex network I/O scenarios such as Redis. The DPDK technology bypasses the kernel protocol stack by bypassing the network card I/O, which is too complicated and requires kernel and even hardware support. Therefore, the multi-threaded model that takes advantage of multiple cores has become the most cost-effective solution for optimizing network I/O. In Redis 6.0, multithreading is mainly used to process network IO operations, and command parsing and execution are still completed by a single thread. This can not only take advantage of multi-core CPUs, but also avoid performance losses caused by locks and context switches. Next, let's talk about Redis's multi-threaded implementation mechanism: (1) The main thread is responsible for command execution: The main thread of Redis is still responsible for processing the execution of client commands, including data reading and writing operations. (2) Multithreaded network I/O processing:
(3) Task distribution mechanism:
(4) Command execution process:
In Redis 6.0 and later versions, multithreading is disabled by default. To enable multithreading, you need to set io-threads-do-reads yes in the redis.conf configuration file and specify the number of threads (io-threads). If you do not set the number of threads, multithreading will not take effect. Regarding the setting of the number of threads, the official recommendation is: for a 4-core machine, it is recommended to set it to 2 or 3 threads, and for an 8-core machine, it is recommended to set it to 6 threads. The number of threads must be less than the number of cores on the machine. It should also be noted that the larger the number of threads, the better. The official believes that it is basically meaningless to exceed 8 threads. In fact, if you want to enable multithreading, you need a machine with at least 4 cores, and it is recommended only when the Redis instance has already occupied a considerable amount of CPU time. Otherwise, it is meaningless to use multithreading. Therefore, it is estimated that 80% of the company's business can be supported normally without enabling multithreading. In summary, the implementation mechanism of Redis multithreading is to distribute network I/O operations to multiple worker threads for processing, while command execution is still completed by a single thread. This design fully utilizes the performance of multi-core CPUs and avoids the overhead caused by multi-thread switching and shared resource competition. |
<<: Double your O&M efficiency! What you need to know about the Ansible Copy module
2022 is coming to us with the vigorous spring new...
1. Support high-power terminals Compared with bas...
According to data from the China Academy of Infor...
Business strategy and customers determine how to ...
Since 1994, there have been 12 versions of Blueto...
Power outages are happening all over the country,...
HostXen is a Chinese hosting provider founded in ...
This article is reproduced from the public accoun...
[[263958]] Why is it called Wi-Fi 6? Each new Wi-...
The rise of remote work is arguably the biggest c...
IDC is the abbreviation of Internet Data Center. ...
A strange theory about wireless routers has appea...
On June 24, according to foreign media reports, S...
spinservers has sent us new promotional machines,...