How to deploy multiple computer rooms? How to synchronize data?

Author: Guo Guanhua, unit: China Mobile Smart Home Operation Center

Labs Guide

Service availability is the average failure-free rate of the network. For a platform, this is one of the most important indicators. The current availability rate used is 4 nines, that is, 99.99%. This also means that there can be no more than 52 minutes of failure time per year.

Part 01 Problems Solved by Active-Active and Active-Active

Although single-node failures can be dealt with through load balancing and other methods, when there is a small probability of force majeure (natural disasters, power outages, broken optical cables, etc.), the entire computer room will still be unavailable. In recent years, Alipay, Weibo, Bilibili, etc. have all experienced computer room-level failures, so one or more computer rooms that are easy to switch quickly become backups. At the same time, a secondary condition for multi-active is that it can be deployed in different locations to improve response speed by reducing physical distance. Deployment in separate computer rooms can also greatly reduce the demand for resources in a single computer room.

Part 02 Same-city backup or same-city active-active?

The biggest advantage of deploying services in the same city is that the distance between computer rooms is small. Through dedicated lines, the latency between computer rooms can be stabilized to less than 3ms, so services can access data across computer rooms.

Therefore, the simplest active-active method is as follows: We can call it intra-city backup, the database is placed in computer room A, and the data in the database is synchronized to computer room B regularly. The advantage is that it can achieve simple and convenient horizontal service expansion. If a computer room-level disaster occurs, the data can be restored from computer room B as soon as possible. But the problem also lies here: computer room B does not have a database and cannot completely take over the role of computer room A.

Figure 1.1.1 Backup in the same city

To solve the above problems, a slave database can be placed in computer room B, and the data in computer room A can be synchronized to computer room B in real time: that is, the database in computer room A is the master database, and the one in computer room B is the slave database. If a failure occurs in computer room A, the traffic can be switched to computer room B, and real-time synchronization can be stopped, and the database in computer room B can be set as the master database.

Figure 2 Same-city synchronization

The main problem with this data center is that it requires a lot of manual intervention. In addition to switching the database and DNS, a large number of database configurations need to be modified. Overall, this architecture can restore services in a relatively short time, which is sufficient for multiple data centers in the same city.

Part 03 New Issues with Multi-Local Active

Different locations are different from the same city. First of all, even if a dedicated line is used in the computer room, the latency problem cannot be solved (it is almost impossible to access the database remotely).

The first thing we thought of was, is there a way to avoid information synchronization as much as possible? That is the method shown in the figure below: By geography, user hash, device ID hash, etc., the request is diverted to multiple computer rooms at the DNS layer. Each computer room handles fixed user requests, so that the data synchronization between the two computer rooms can be reduced to a minimum, and the amount of data in each computer room can be reduced to a minimum. However, data synchronization can only be synchronized by business, not by database tools.

The divide-and-conquer approach may seem ideal, but it actually has other problems. For example, if the solution is divided by geography (i.e., by IP), how can data be synchronized if the user's location changes and the request is sent to another data center? If the solution is divided by user ID hash, how can old data be handled when horizontal expansion is required? When a failure occurs, how can other data centers synchronize other shard data in a short time? And so on.

In short, sharding is not a perfect solution.

Figure 3: Remote Active-Active

Part 04 Remote Data Synchronization

At this point, we have discovered that the data in multiple computer rooms must be synchronized. This is the only way to have multiple computer rooms in different locations.

Data synchronization can be done through business methods or through database middleware. The method shown in the figure below is synchronization within the computer room. This method does increase the complexity of the business, and as the computer room expands, the synchronization between computer rooms will become more complicated.

Figure 4 Mesh synchronization

In the above architecture, if a central data node is set up and all computer rooms are synchronized through the central data node, the data flow will change from "mesh" to "star", which greatly reduces the synchronization work and complexity. However, this solution is actually a deviation from the distributed deployment concept we pursue.

Figure 5 Mesh star synchronization

Part 05 Is distributed database a silver bullet?

Not only in the case of multi-computer room deployment, as the amount of data increases dramatically, developers have conducted in-depth practical work on JDBC Proxy and DB Proxy for MySQL's sharding. People are increasingly finding that it is necessary to recognize database sharding. On this basis, many distributed databases such as Cockroach and TiDB have emerged. Distributed databases naturally support multi-cluster and multi-computer room deployment, which coincides with the demand for multi-active in different locations.

Through a distributed database, we can not only achieve horizontal expansion of data, but also reduce the complexity of business data synchronization, which can be said to be killing two birds with one stone.

In addition, the database has high requirements for resources. A three-cluster TiDB requires at least 9 physical machines, 10 10 Gigabit network cards and 3 dedicated lines between them.

Another disadvantage is that this architecture is relatively demanding on the latency between distributed database clusters, which also prevents the architecture from being infinitely expanded in a wider area.

Figure 6 Distributing the database in multiple computer rooms

Part 06 Summary

Compared with a single computer room, the difficulty, development, and resource cost of multi-computer room deployment are all increased exponentially. Therefore, the specific architecture needs to be selected according to actual needs. This article aims to introduce several types of multi-computer room deployment to open up everyone's thinking. The actual multi-computer room deployment form may also be different from the above, but the ideas are generally the same.

<<: Expert Viewpoint: Is it time to go wireless?

>>: The impact of drone technology and use cases

Jiang Lintao: Full text of the report "CDN Technical Issues and Standardization"! The CDN market is booming

Blog

[Black Friday] GreenCloudVPS: Multi-room 3G memory KVM only $28/year, 1TB large plate chicken starts at $52 for two years

Blog

6 Examples of How 5G Can Improve IoT Deployments

Blog

How 5G can unlock the potential of smart homes

Blog

The launching ceremony of the 4th "Zhanhua Cup" 5G Application Collection Competition-Fujian Regional Competition and the Information and Communication Development Research Forum were successfully held in Xiamen

Blog

[Black Friday] DesiVPS: 1Gbps unlimited traffic high-security VPS annual payment of $26.99, free IP change once a month, Los Angeles data center

Blog

How Huawei builds an enterprise-level development platform

How to deploy multiple computer rooms? How to synchronize data?

Labs Guide

Part 01 Problems Solved by Active-Active and Active-Active

Part 02 Same-city backup or same-city active-active?

Part 03 New Issues with Multi-Local Active

Part 04 Remote Data Synchronization

Part 05 Is distributed database a silver bullet?

Part 06 Summary

Jiang Lintao: Full text of the report "CDN Technical Issues and Standardization"! The CDN market is booming

[Black Friday] GreenCloudVPS: Multi-room 3G memory KVM only $28/year, 1TB large plate chicken starts at $52 for two years

6 Examples of How 5G Can Improve IoT Deployments

How 5G can unlock the potential of smart homes

The launching ceremony of the 4th "Zhanhua Cup" 5G Application Collection Competition-Fujian Regional Competition and the Information and Communication Development Research Forum were successfully held in Xiamen

[Black Friday] DesiVPS: 1Gbps unlimited traffic high-security VPS annual payment of $26.99, free IP change once a month, Los Angeles data center

How Huawei builds an enterprise-level development platform

Understanding Cloud Networks in One Article

Shumai Technology: 242 yuan/month Hong Kong server-E3 1230v2/16GB/1TB/3IP, optional CN2+BGP/Huawei boutique network

Recommend

DediPath Independence Day Promotion: 40% off all VPS hosts, New York dedicated servers starting from $35/month

Five-minute technical talk | Understand how computers send and receive information in one article

What makes 5G's peak speed reach 20Gb/s? An article to understand millimeter wave

Central Cyberspace Affairs Commission: Organize and carry out IPv6 deployment and application pilot projects

Don’t worry about traveling during the May Day holiday, use the portable WiFi to relieve your worries

5G: A new vision for industrial automation

China Mobile releases February operating data: 5G customer number reaches 168,971

How much is the 700MHz frequency band worth? 7.6 billion!

5G is here! Technology trends and standards you must know

SoftShellWeb December Special Package, Netherlands/San Jose/Taiwan VPS monthly payment of $9.99

The Matter protocol is rising rapidly. Do you really understand it?

HostDare Double 11: 35% off Los Angeles KVM/CN2 GT annual payment starting from US$25.99

The Legend of Network Protocols (I): ARPANET

When will 5G become mainstream, or is it already mainstream?

Is your cloud still manual? Five indispensable tools for cloud computing and infrastructure automation