Purpose of the tableDuring the project development, our database data is getting bigger and bigger, and as a result, there is too much data in a single table. As a result, data query becomes slow, and the application operation is also seriously affected due to the table lock mechanism, resulting in a database performance bottleneck. When this happens, we can consider splitting the table, that is, splitting a single database table into multiple data tables, and then when users access, according to a certain algorithm, let users access different tables, so that the data is dispersed into multiple data tables, reducing the access pressure of a single data table and improving database access performance. For exampleFor example, our most common user table (user table)
We usually use user_id to query the corresponding user information, but as the business grows, this table will become larger and larger, even hundreds of millions, which seriously affects the query performance. So we will split this table into multiple tables to reduce the query pressure. Table partitioning strategyTaking 10 tables as an example (the specific number of tables should be estimated based on actual conditions). First, we create 10 tables user1, user2, user3, etc. ... user10. In general, we use the index field (user_id) for modulus processing. The modulus is based on the number of tables you want to divide into, for example, 10 in this case.
According to the above modulus formula:
"Each time CURD is executed according to the strategy of the above table search", this is not a big problem, so we will not talk about it for now. What should I do with the running tables that are already online?In fact, everyone should know how to use the above method, but there is a problem, what to do with the table that has been online? The data in that table is always being searched or changed online. How can we smoothly split the table without users noticing? Method 1 Go online directly and write a script in advance. The script content is to synchronize the data of the old table (user) to the user1 table and the user10 table. Execute it as soon as it goes online This approach is obviously not feasible, mainly due to the following problems: What if there is a problem with the script during execution? Roll back all the code? The script synchronizes the data of the old table (user) to the user1 table and the user10 table. How long does it take for this script to execute? If it takes 1 hour, then the online business related to this table will be abnormal during this period. This obviously doesn't work and has a huge impact on the online community. Method 2 First, write a script to synchronize data. The script content is to synchronize the data of the old table (user) to the user1 table and the user10 table. After the script synchronization is completed, go online. This method seems to be more friendly, but there are also some problems. After the script is synchronized, it goes online immediately. There is some time difference between these two things. During this time difference, there may be some changes in the online table. What should be done with these changes? "It seems that the above two methods are not feasible, so it seems that we have to do something different. Let's go straight to the conclusion." Step 1: Launch dual writeFirst, we put double writing online. What does it mean? For example, if user_id=123, for add, delete, and modify operations, we operate both the user table and the user3 table corresponding to user_id=123.
Because the query is still in the user table, the above operation has no effect on online users. Step 2: Full synchronizationWrite a script to fully synchronize the user table to the user1-user10 table. It is best to find a low-peak period to execute the script in case it affects the query of the user table. After this step is executed, because we have launched dual write before (see step 1), the data between the user table and the user1-user10 table are completely consistent. Step 3: Query the new table dataChange the query part to user1-user10 Because we have ensured the complete consistency of data between the user table and each sub-table in the previous two steps, there is no problem in directly changing the query part. If you follow the above steps, there will be no impact on online data. This is how we operate online. After many practices, we have ensured that there will be no problems. You can use it with confidence. |
<<: I encountered message accumulation, but it is not a big problem
>>: The basic concepts of Kafka producers, consumers, and brokers
On July 15, 2016, Huawei's Smart City Ecosyst...
Perhaps it was because the performance in January...
Arthur C. Clarke, a famous British science fictio...
We have previously shared information about CMIVP...
As 5G rolls out around the world, verticals acros...
I have read some information about the working pr...
Implementing HTTPS transmission has many benefits...
What is Wi-Fi 6? Wi-Fi 6 is the alias given by th...
1. Introduction In recent years, the "Intern...
Krypt's iONcloud cloud platform has recently ...
In 2018, friends in the network circle have witne...
The tribe once shared information about QuantumCo...
While debate rages about when and how to restart ...
As businesses expand globally to gain access to n...
HostNamaste recently provided several special ann...