Interviewer: What are the underlying data types of Redis? Why is Redis so fast? Why did Redis introduce multithreading? What is the implementation mechanism of Redis multithreading?

Interviewer: What are the underlying data types of Redis? Why is Redis so fast? Why did Redis introduce multithreading? What is the implementation mechanism of Redis multithreading?

Interview question overview:

  • What are the underlying data types of Redis? What underlying data types are used to make up the five basic data types of Redis?
  • Can you explain in detail how to implement Redis hash table and how to expand its capacity?
  • Why is Redis so fast? What is the QPS of a single Redis server?
  • Is Redis really a single-threaded application? If so, please explain why it uses a single-threaded model?
  • Why did Redis 6.0 introduce multithreading? Where does Redis use multithreading? What is the implementation mechanism of Redis multithreading?
  • Is multi-threading enabled by default in Redis 6.0? If multi-threading is enabled, how many threads should be set for Redis?

Interviewer: What are the underlying data types of Redis? What underlying data types are used to make up the five basic data types of Redis?

The underlying data types of Redis mainly include the following, which are used to implement and support the five main data types provided by Redis (String, Hash, List, Set, Zset):

1. Simple Dynamic String (SDS)

Purpose: Mainly used to store String type values.

FeaturesSDS is a string data structure built by Redis. Compared with the traditional string in C language (ending with the null character '\0'), SDS has the advantages of automatic expansion, length cache, binary security, etc. SDS uses the structure to record information such as the length of the string and the size of the allocated space, thereby improving the performance of string operations.

2. ZipList

Purpose: Used to store data of types such as Hash, List, Zset, etc. when the amount of data of these types is small and the elements are small.

Features A compressed list is a set of continuous memory blocks that can save space. It contains multiple entry nodes, each of which contains information such as the length of the previous node and the node content. When the amount of data increases or the elements become larger, the compressed list may be converted to other data structures (such as hash tables, doubly linked lists, etc.).

3. Hashtable

It is mainly used to store data of types such as Hash, Set, Zset, etc. when the amount of data of these types is large or the elements are large.

Features: The hash table maintains an array structure internally and determines the position of the element in the array by calculating the hash value of the key. The hash table supports fast search, insertion, and deletion operations.

4. LinkedList and QuickList

Usage Doubly linked lists are used to store List type data. Before Redis 3.2, the underlying implementation of List was a doubly linked list or a compressed list. Redis 3.2 and later introduced QuickList, which is a combination of a doubly linked list and a compressed list, to optimize the performance of the List type.

Features Each node in a doubly linked list holds a reference to the previous and next nodes, supporting operations at both ends. The fast list is implemented by combining multiple compressed lists, which not only retains the space advantage of the compressed list, but also has the operational flexibility of the doubly linked list.

5. Integer Set (IntSet)

Purpose: Used to store Set type data when all elements in the Set are integers and the number is small.

Features: Integer sets use arrays to store elements and select different encoding methods based on the element type (such as 16-bit integers, 32-bit integers, and 64-bit integers). When adding new elements to an integer set, if the capacity or type range of the current array is exceeded, the capacity will be expanded or upgraded.

6. SkipList

Purpose: Mainly used to store Zset type data to achieve orderliness.

Features Skip list is a linked list with a hierarchical structure, and each layer is an ordered linked list. Through the combination of multiple layers of linked lists, the skip list can complete the search, insertion and deletion operations in O(log N) time. At the same time, the skip list also uses a hash table to store the correspondence between members and scores to provide fast member search.

Redis's five basic data types (String, List, Set, Hash, Zset) are composed of multiple underlying data structures to meet data storage and operation requirements in different scenarios.

The following is the correspondence between these basic data types and the underlying data structures:

(1) String

Underlying data structure: Simple Dynamic String (SDS)

(2) List

Underlying data structure: bidirectional linked list, compressed list (ZipList)

Note: When there are fewer list elements, Redis uses compressed lists to save memory. When there are more elements, a bidirectional linked list is used to support fast insertion and deletion operations.

(3) Set

Underlying data structures: Hash Table, IntSet

Note: When the elements in the set are all integers and the number is small, Redis uses integer sets to optimize memory usage. When the number of elements is large or contains non-integer elements, a hash table is used to implement fast addition, deletion, and query operations.

(4) Hash

Underlying data structures: ZipList, Hash Table

Note: When the hash object stores a small number of key-value pairs and the length of the key and value strings is less than 64 bytes, a compressed list is used as the underlying implementation to save memory. When there are many key-value pairs or the key and value strings are long, a hash table is used to implement fast insertion, search, and deletion operations.

(5) Zset (ordered set)

Underlying data structure: SkipList, Hash Table

Description: A skip list is a data structure of an ordered linked list that provides fast insertion, deletion, and search operations. By using a combination of a skip list and a hash table, an ordered set can maintain order while also performing range searches and ranking calculations based on scores quickly. A hash table is used to store the correspondence between members and scores.

Interviewer: Can you tell me in detail how to implement Redis hash table and how to expand its capacity?

1. Basic structure

The hash table in Redis is represented by a dict structure, which contains a dict (hash table) object. The dict structure contains the following key fields:

  • table: A pointer array, each element is a dictEntry object used to store key-value pairs.
  • size: The size of the hash table, that is, the length of the table array. In Redis, the size of the hash table is always 2 to the power of n, which helps optimize the handling of hash conflicts.
  • sizemask: mask value, used to calculate the index value. It is always equal to size-1, and the index position of the hash value in the array can be quickly obtained through bit operations.
  • used: The number of nodes used in the hash table, that is, the number of key-value pairs stored.

2. Key-value pair storage

Each dictEntry object contains a key-value pair and a pointer to the next dictEntry, forming a linked list structure. When a hash conflict occurs, the new key-value pair is added to the end of the linked list at the conflicting position. This design enables Redis to efficiently handle hash conflicts without complex rehashing operations.

When creating a hash object, you can get the following diagram (some properties are omitted):

3. Hash Function

Redis uses the MurmurHash2 algorithm as a hash function. This algorithm is a non-cryptographic hash function known for its high efficiency and low collision rate. Through the MurmurHash2 algorithm, Redis can map keys of arbitrary length to fixed-length hash values ​​to determine the position of the key in the hash table.

4. Load factor and rehash

The load factor is an indicator of the degree of hash table usage, and the calculation formula is the number of used nodes / hash table size. When the load factor exceeds the preset threshold (the default is 0.75), Redis triggers a rehash operation to expand the size of the hash table and reduce the probability of collision.

The rehash operation involves the following steps:

(1) Allocate a new hash table whose size is twice the current hash table size (or other multiples depending on the configuration).

(2) Traverse all key-value pairs in the current hash table, recalculate the hash value based on the new hash table size, and then insert the key-value pairs into the new hash table.

(3) Replace the old hash table with the new hash table to complete the rehash operation.

It is worth noting that Redis uses a progressive rehash strategy to avoid performance issues caused by processing a large amount of data at one time. During the progressive rehash, each time the hash table is added, deleted, modified, or queried, a portion of the data will be migrated from the old table to the new table until the migration is complete.

The following is the specific process of rehash:

  • When the load factor of the Redis hash table exceeds the threshold, the system will allow the dictionary to hold both ht[0] and ht[1] hash tables.
  • Redis will set a variable rehashidx to record the current rehash progress. The initial value of rehashidx is 0, which means that the migration starts from the starting position 0 of ht[0].
  • During the rehash process, each time an add, delete, modify, or query operation is performed on the dictionary, all key-value pairs at the rehashindex position in the ht[0] hash table will be rehashed to ht[1]. When the rehash work is completed, the value of rehashindex is +1.
  • As dictionary operations are continuously executed, eventually all key-value pairs of ht[0] will be rehashed to ht[1] within a certain period of time. At this time, the value of rehashindex is set to -1, indicating that the rehash operation is completed.

Progressive rehashing uses a divide-and-conquer approach, distributing the rehashing operation among each access, thus avoiding the huge amount of computation caused by centralized rehashing.

It should be noted that in the process of progressive rehash, if there are add, delete, modify, or query operations, if index is greater than rehashindex, access ht[0], otherwise access ht[1].

The advantage of progressive rehashing is that it can gradually expand or shrink the hash table without affecting the main thread service request, greatly reducing the impact on system performance. However, it also brings some additional memory space overhead because the new and old hash tables need to be maintained simultaneously during the rehashing process.

Interviewer: Why is Redis so fast? What is the QPS of a single Redis server?

Redis is very fast due to several key factors:

(1) Memory-based data storage: Redis stores data in memory, which greatly reduces the overhead of disk I/O operations. Compared with traditional databases that need to read and write data from disk, Redis can read and write data very quickly, thus achieving an extremely high operation rate.

(2) Efficient data structures: Redis supports a variety of data structures, such as strings, hashes, lists, sets, and ordered sets. These data structures have been carefully designed and optimized to minimize the time complexity of data storage and access. For example, Redis uses simple dynamic strings (SDS) to process strings. Compared with traditional string processing methods in C language, SDS is more efficient in obtaining string length, modifying strings, and allocating memory.

(3) Reasonable data encoding: Redis can automatically select the optimal encoding method based on the type and size of the data. For example, for string data, Redis will choose int encoding or raw encoding based on the length of the string and the numeric nature of the content. This reasonable encoding selection allows Redis to maintain high performance when processing data of different types and sizes.

(4) Single-threaded model: Redis uses a single-threaded model to process client requests. This model avoids the overhead of context switching and lock contention between multiple threads, thereby increasing the speed of processing requests. At the same time, Redis uses I/O multiplexing technology to simultaneously process connections and requests from multiple clients, further improving concurrency performance.

(5) Asynchronous non-blocking I/O: Redis uses an asynchronous non-blocking I/O model to process network requests. This means that Redis can continue to perform other tasks while waiting for I/O operations to complete, thereby improving overall throughput and responsiveness.

The QPS (query per second) performance of a single Redis server depends on many factors, including the Redis version, hardware configuration, operating system, network conditions, and the complexity of business operations. Generally speaking, a single Redis server can support tens to hundreds of thousands of QPS. However, to achieve more than 100,000 QPS, a single Redis server may face greater pressure, and it is usually necessary to consider using a Redis cluster or other distributed architecture to share the load.

Benchmarks can be run using the redis-benchmark command:

 redis-benchmark -h 127.0.0.1 -p 6379 -c 50 -n 10000
  • -h: specifies the address of the Redis server, the default is 127.0.0.1.
  • -p: specifies the port of the Redis server, the default is 6379.
  • -c: The number of concurrent connections, that is, how many clients are testing at the same time.
  • -n: The total number of requests, that is, the total number of requests to be executed during the test.

Interviewer: You just mentioned that Redis uses a single-threaded model. Is Redis really a single-threaded application? If so, please explain why it uses a single-threaded model?

From the perspective of core operations, Redis does use a single thread to execute commands, including receiving client requests, parsing requests, data reading and writing, and returning results to the client. These processes are all completed by a main thread. This is why Redis is called a single-threaded database.

However, from the perspective of overall functionality and implementation, Redis is not strictly single-threaded. In versions prior to Redis 6.0, although most operations were completed by the main thread, there were also some background threads or subprocesses processing tasks, such as cleaning dirty data, generating snapshots, and rewriting AOF. These background tasks exist to avoid blocking the main thread and improve the overall performance of Redis.

In Redis 6.0 and later versions, Redis introduced a multi-threaded model to handle network I/O tasks. This multi-threaded model is only used to handle network data reading and writing and protocol parsing, while the execution of read and write commands is still a single thread. This design is to make full use of the multi-core resources of the server CPU and improve the network I/O performance of Redis.

Therefore, it can be said that Redis adopts a single-threaded model when executing commands, but from the perspective of overall implementation and functionality, it is not completely single-threaded.

Redis achieves high performance and efficiency by combining the advantages of single thread and multi-thread, and utilizing memory and non-blocking I/O technology.

The reasons for using a single-threaded model when executing commands are as follows:

1. Avoid excessive context switching overhead

The multi-threaded scheduling process inevitably requires switching thread contexts between CPUs, and context switching involves a series of register replacements such as the program counter, stack pointer, and program status word, program stack reset, and even CPU cache and TLB cache replacement. If it is multi-threaded switching within a process, it is better, because multiple threads in a single process share the process address space, so the thread context is much smaller than the process context. If it is cross-process scheduling, the entire process address space needs to be switched.

If it is single-threaded, the frequent thread switching overhead within the process can be avoided, because the program always runs in a single thread in the process, and there is no multi-thread switching scenario.

2. Avoid the overhead of synchronization mechanisms

If Redis chooses a multi-threaded model, and because Redis is a database, it will inevitably involve the issue of underlying data synchronization, and some synchronization mechanisms will inevitably be introduced, such as locks. As we know, Redis not only provides simple key-value data structures, but also other rich data structures such as lists, sets, and hashes. Different data structures have different locking granularities for synchronous access, which may result in a lot of locking and unlocking overheads in the process of operating data, increasing program complexity while reducing performance.

3. Simple and maintainable

The original intention of the Redis author to design and code Redis is to be simple and maintainable, but the introduction of multithreading will inevitably lead to an increase in code complexity and a decrease in maintainability.

First, the introduction of multi-threading will make the program no longer maintain the logical seriality of the code, and the order of code execution will become unpredictable. If you are not careful, it will cause various concurrent programming problems in the program; secondly, the multi-threaded mode also makes program debugging more complicated and troublesome.

If Redis uses multi-threaded mode, then all underlying data structures must be implemented as thread-safe, which undoubtedly makes the implementation of Redis more complicated.

In summary, Redis' choice of single thread in the main scenario of command execution can be said to be a trade-off after multi-party negotiation: while ensuring sufficient performance, using single thread keeps the code simple and maintainable.

Interviewer: Can you tell me in detail why Redis 6.0 introduced multithreading? Where does Redis use multithreading? What is the implementation mechanism of Redis multithreading?

The reason why Redis initially chose a single-threaded network model is that the CPU is usually not a performance bottleneck, the bottleneck is often memory and network, so a single thread is sufficient. Now Redis is introducing multi-threading because the network I/O bottleneck of Redis has become increasingly obvious.

With the rapid development of the Internet, the online traffic that Internet business systems need to handle is increasing. The single-threaded mode of Redis causes the system to consume a lot of CPU time on network I/O, thereby reducing throughput. There are two ways to improve the performance of Redis:

  • Optimizing Network I/O Modules
  • Improve the speed of machine memory reading and writing

The latter depends on the development of hardware and has no solution yet. So we can only start from the former. The optimization of network I/O can be divided into two directions:

  • Zero copy technology or DPDK technology
  • Taking advantage of multiple cores

Zero-copy technology has its limitations and cannot fully adapt to complex network I/O scenarios such as Redis. The DPDK technology bypasses the kernel protocol stack by bypassing the network card I/O, which is too complicated and requires kernel and even hardware support.

Therefore, the multi-threaded model that takes advantage of multiple cores has become the most cost-effective solution for optimizing network I/O.

In Redis 6.0, multithreading is mainly used to process network IO operations, and command parsing and execution are still completed by a single thread. This can not only take advantage of multi-core CPUs, but also avoid performance losses caused by locks and context switches.

Next, let's talk about Redis's multi-threaded implementation mechanism:

(1) The main thread is responsible for command execution:

The main thread of Redis is still responsible for processing the execution of client commands, including data reading and writing operations.

(2) Multithreaded network I/O processing:

  • In the multi-threaded I/O model, network I/O operations such as reading client requests and writing responses are distributed to multiple worker threads for processing.
  • These worker threads are only responsible for network I/O reading and writing and protocol parsing, and are not responsible for the specific execution of commands.

(3) Task distribution mechanism:

  • Redis uses a global read queue (clients_pending_read) and a global write queue (clients_pending_write) to store pending network I/O tasks.
  • The main thread is responsible for distributing tasks from the global queue to the queue corresponding to each thread (io_threads_list).
  • When distributing tasks, the main thread uses a round-robin approach to ensure that tasks can be evenly distributed to each thread.

(4) Command execution process:

  • When the client sends a request, the main thread is responsible for receiving the request and putting it into the global read queue.
  • The main thread distributes tasks to the queues corresponding to each thread and sets corresponding marks.
  • The child thread polls to check whether there are tasks in its queue. If so, it handles network I/O reading and writing and protocol parsing.
  • After parsing is completed, the child thread returns the parsing results to the main thread.
  • The main thread executes the corresponding commands according to the parsing results and puts the results into the global write queue.
  • The main thread then distributes the writing tasks to the queues corresponding to each thread, and the child thread is responsible for writing the results back to the client.

In Redis 6.0 and later versions, multithreading is disabled by default. To enable multithreading, you need to set io-threads-do-reads yes in the redis.conf configuration file and specify the number of threads (io-threads). If you do not set the number of threads, multithreading will not take effect.

Regarding the setting of the number of threads, the official recommendation is: for a 4-core machine, it is recommended to set it to 2 or 3 threads, and for an 8-core machine, it is recommended to set it to 6 threads. The number of threads must be less than the number of cores on the machine. It should also be noted that the larger the number of threads, the better. The official believes that it is basically meaningless to exceed 8 threads.

In fact, if you want to enable multithreading, you need a machine with at least 4 cores, and it is recommended only when the Redis instance has already occupied a considerable amount of CPU time. Otherwise, it is meaningless to use multithreading. Therefore, it is estimated that 80% of the company's business can be supported normally without enabling multithreading.

In summary, the implementation mechanism of Redis multithreading is to distribute network I/O operations to multiple worker threads for processing, while command execution is still completed by a single thread. This design fully utilizes the performance of multi-core CPUs and avoids the overhead caused by multi-thread switching and shared resource competition.

<<:  Double your O&M efficiency! What you need to know about the Ansible Copy module

>>: 

Recommend

Metaverse, drones, 5G... may become technologies worth investing in in 2022?

2022 is coming to us with the vigorous spring new...

7 key features of 5G mobile phones

1. Support high-power terminals Compared with bas...

...

IoT platform types and common features

Business strategy and customers determine how to ...

Bluetooth 4.0 Beacons vs Bluetooth 5.0 Beacons: Technology Comparison

Since 1994, there have been 12 versions of Blueto...

The future is here: Will 5G users reach 2.6 billion by 2025?

This article is reproduced from the public accoun...

Wi-Fi 6 is here! Wireless veteran explains the next generation of Wi-Fi

[[263958]] Why is it called Wi-Fi 6? Each new Wi-...

Eight surprising ways remote work can help your business

The rise of remote work is arguably the biggest c...

Considerations for designing the integrated cabling system in IDC computer rooms

IDC is the abbreviation of Internet Data Center. ...