How to deploy multiple computer rooms? How to synchronize data?

How to deploy multiple computer rooms? How to synchronize data?

Author: Guo Guanhua, unit: China Mobile Smart Home Operation Center

Labs Guide

Service availability is the average failure-free rate of the network. For a platform, this is one of the most important indicators. The current availability rate used is 4 nines, that is, 99.99%. This also means that there can be no more than 52 minutes of failure time per year.

Part 01 Problems Solved by Active-Active and Active-Active

Although single-node failures can be dealt with through load balancing and other methods, when there is a small probability of force majeure (natural disasters, power outages, broken optical cables, etc.), the entire computer room will still be unavailable. In recent years, Alipay, Weibo, Bilibili, etc. have all experienced computer room-level failures, so one or more computer rooms that are easy to switch quickly become backups. At the same time, a secondary condition for multi-active is that it can be deployed in different locations to improve response speed by reducing physical distance. Deployment in separate computer rooms can also greatly reduce the demand for resources in a single computer room.

Part 02 Same-city backup or same-city active-active?

The biggest advantage of deploying services in the same city is that the distance between computer rooms is small. Through dedicated lines, the latency between computer rooms can be stabilized to less than 3ms, so services can access data across computer rooms.

Therefore, the simplest active-active method is as follows: We can call it intra-city backup, the database is placed in computer room A, and the data in the database is synchronized to computer room B regularly. The advantage is that it can achieve simple and convenient horizontal service expansion. If a computer room-level disaster occurs, the data can be restored from computer room B as soon as possible. But the problem also lies here: computer room B does not have a database and cannot completely take over the role of computer room A.

Figure 1.1.1 Backup in the same city

To solve the above problems, a slave database can be placed in computer room B, and the data in computer room A can be synchronized to computer room B in real time: that is, the database in computer room A is the master database, and the one in computer room B is the slave database. If a failure occurs in computer room A, the traffic can be switched to computer room B, and real-time synchronization can be stopped, and the database in computer room B can be set as the master database.

Figure 2 Same-city synchronization

The main problem with this data center is that it requires a lot of manual intervention. In addition to switching the database and DNS, a large number of database configurations need to be modified. Overall, this architecture can restore services in a relatively short time, which is sufficient for multiple data centers in the same city.

Part 03 New Issues with Multi-Local Active

Different locations are different from the same city. First of all, even if a dedicated line is used in the computer room, the latency problem cannot be solved (it is almost impossible to access the database remotely).

The first thing we thought of was, is there a way to avoid information synchronization as much as possible? That is the method shown in the figure below: By geography, user hash, device ID hash, etc., the request is diverted to multiple computer rooms at the DNS layer. Each computer room handles fixed user requests, so that the data synchronization between the two computer rooms can be reduced to a minimum, and the amount of data in each computer room can be reduced to a minimum. However, data synchronization can only be synchronized by business, not by database tools.

The divide-and-conquer approach may seem ideal, but it actually has other problems. For example, if the solution is divided by geography (i.e., by IP), how can data be synchronized if the user's location changes and the request is sent to another data center? If the solution is divided by user ID hash, how can old data be handled when horizontal expansion is required? When a failure occurs, how can other data centers synchronize other shard data in a short time? And so on.

In short, sharding is not a perfect solution.

Figure 3: Remote Active-Active

Part 04 Remote Data Synchronization

At this point, we have discovered that the data in multiple computer rooms must be synchronized. This is the only way to have multiple computer rooms in different locations.

Data synchronization can be done through business methods or through database middleware. The method shown in the figure below is synchronization within the computer room. This method does increase the complexity of the business, and as the computer room expands, the synchronization between computer rooms will become more complicated.

Figure 4 Mesh synchronization

In the above architecture, if a central data node is set up and all computer rooms are synchronized through the central data node, the data flow will change from "mesh" to "star", which greatly reduces the synchronization work and complexity. However, this solution is actually a deviation from the distributed deployment concept we pursue.

Figure 5 Mesh star synchronization

Part 05 Is distributed database a silver bullet?

Not only in the case of multi-computer room deployment, as the amount of data increases dramatically, developers have conducted in-depth practical work on JDBC Proxy and DB Proxy for MySQL's sharding. People are increasingly finding that it is necessary to recognize database sharding. On this basis, many distributed databases such as Cockroach and TiDB have emerged. Distributed databases naturally support multi-cluster and multi-computer room deployment, which coincides with the demand for multi-active in different locations.

Through a distributed database, we can not only achieve horizontal expansion of data, but also reduce the complexity of business data synchronization, which can be said to be killing two birds with one stone.

In addition, the database has high requirements for resources. A three-cluster TiDB requires at least 9 physical machines, 10 10 Gigabit network cards and 3 dedicated lines between them.

Another disadvantage is that this architecture is relatively demanding on the latency between distributed database clusters, which also prevents the architecture from being infinitely expanded in a wider area.

Figure 6 Distributing the database in multiple computer rooms

Part 06 Summary

Compared with a single computer room, the difficulty, development, and resource cost of multi-computer room deployment are all increased exponentially. Therefore, the specific architecture needs to be selected according to actual needs. This article aims to introduce several types of multi-computer room deployment to open up everyone's thinking. The actual multi-computer room deployment form may also be different from the above, but the ideas are generally the same. ​

<<:  Expert Viewpoint: Is it time to go wireless?

>>:  The impact of drone technology and use cases

Recommend

How is the UK train network going digital?

[51CTO.com Quick Translation] Dennis Rocks, produ...

Wi-Fi 7 is already here before Wi-Fi 6 is used?

On April 19, although Wi-Fi 6 is being widely pop...

5G Thinking丨Please give 5G some tolerance and time

[[400629]] Recently, 5G has become a hot topic on...

5 Strategies for Monetizing Mobile Edge Computing (MEC)

In the past few years, cloud services have been u...

You must know the five common misconceptions about HTTPS

Nowadays, the https protocol is widely valued and...

Is 5G connectivity the future of IoT?

The three major US mobile operators AT&T, T-M...

What did Chinese operators show the world at the Winter Olympics?

This Winter Olympics is full of technological con...

Bluetooth has been used for so long, why hasn't it been replaced?

When it comes to Bluetooth technology, most peopl...