Active-active data centers are key to high-availability application resiliency

Active-active data centers are key to high-availability application resiliency

Enterprises that rely on high-availability applications should adopt active-active data center designs to ensure reliability and resiliency. Any enterprise running high-availability applications must answer the following fundamental question: How can you create a resilient application architecture when the underlying communications infrastructure is no longer reliable?

Take the cooperation between a consulting agency and a user as an example. The client's main business application has high availability requirements. The client sends transactions to the primary data center application server and buffers the transactions before receiving confirmation. The client configures its two data centers as the primary database and backup data center respectively.

[[254540]]

In terms of reliability, the customer experienced network-related outages multiple times a year. In addition, the failover mechanism that switched from the primary data center to the backup data center was a manual process that took hours to execute. Therefore, network issues needed to be resolved before the failover process could be completed. It was clear that the customer needed a more reliable data center failover mechanism that would allow customers to access high-availability applications.

Another option is to make the network and data centers highly reliable, so that downtime in the data center will be very rare. However, the architecture of highly reliable infrastructure is often fragile, and small changes can cause downtime and outages that are difficult to diagnose and correct.

Resilient Application Architecture

To avoid making the system vulnerable, a better way to achieve resilient applications is to deploy an active-active data center architecture that does not rely on a single path or function. The term active-active refers to operating at least two data centers, both of which can serve applications at any time, so each data center acts as a site for active applications. Customers can perform transactions in any of the data centers, and the design and operation of each data center is much simpler than creating a single super-reliable data center.

Note that resilience should be built into the application, not the network and IT infrastructure. This means that even if part of the network or server fails unexpectedly, the application will continue to be accessible. At the heart of this approach is that a high-availability application architecture needs to include reliable data exchange. Implicit in this architecture is that the databases in each data center need to update each other when executing client transactions.

The characteristics of the customer's application are well suited to an active-active architecture, where either data center can execute a full transaction. Customer transactions are sent to the data center application, which updates the central database and then sends a confirmation to the customer endpoint. This mechanism guarantees the delivery of the transaction. Since the high-availability application was developed in-house, subsequent modifications can be made in-house.

TCP for data transfer?

TCP is a network mechanism designed to ensure reliable data transmission. Although TCP can retry the transmission of dropped packets, it cannot guarantee data transmission when one of the endpoints fails. A TCP session is established between the interfaces of two endpoints. If one of the endpoints (the server or its interface) fails, the TCP session will terminate.

Lessons from Unicorns

For example, the IT systems of unicorn companies such as Facebook, Google, Microsoft, Netflix, Amazon, etc. are designed to keep customers connected to their data centers. If a part of the data center fails, transactions that attempt to use that component will automatically be distributed to different parts of the IT infrastructure. These industry giants do not want parts of their infrastructure to fail, so they build more resilience into the applications themselves.

Other companies' flexible architecture

If your organization is not a unicorn, what can you do? You can learn from the unicorns and modify your IT systems to operate in a similar manner. This works best for high-availability applications built in-house.

For example, a client can use a transaction retransmission timer with a circular list of data center addresses learned through the domain name system, also known as global server load balancing. The client will buffer transactions until it receives an acknowledgment from an accessible data center. Database synchronization distributes updates to other instances, so any database can process these transactions. This architecture allows organizations to deploy multiple application database systems. This approach can even be extended to access database instances in cloud computing infrastructures such as Amazon and Microsoft Azure.

Adopting third-party applications, such as electronic health record applications, is more challenging. Software vendors can be asked to design resilient systems that can operate using active-active data centers. If the client side of the application is carefully examined, the enterprise may find opportunities to add a small software module that can monitor the data center connection. If the connection fails, the software module can automatically switch the application to another data center.

Another option is to consider technologies such as software-defined WAN, which increases path diversity by using multiple links from different providers. This approach also works for third-party applications.

With the widespread adoption of cloud computing, it is tempting to design systems to use one on-premises data center and one cloud-based data center.

Lessons learned from high availability applications

There are also some examples of how to make IT systems and applications highly available. While it may take some innovation to improve applications that organizations cannot control, the good news is that there are many technologies that can help organizations improve the resiliency of their applications.

<<:  How can you explain in simple terms the difference between TCP/UDP protocols and HTTP, FTP, SMTP and other protocols?

>>:  Don’t know how to access the router system backend? Learn it in one step!

Recommend

If VoLTE fails to work well with 5G, it will be a failure

The VoLTE function was once a major feature promo...

Why does TCP need a three-way handshake?

[[285361]] First, let's briefly introduce the...

Let's talk about DNS formal verification technology

background The Domain Name System (DNS) is a dist...

Flutter hybrid project highway Pigeon

Earlier, we mentioned that Flutter uses BasicMess...

Overview of Honeynet Technology Based on SDN

The development of cloud computing and virtualiza...

What Software-Defined LAN Means for Campus Virtualization

Software-defined LAN, or SD-LAN, is the applicati...

...

2020 is already halfway through, how far is 5G from a full-scale outbreak?

In 2019, we often heard the industry say that 201...

Year-end review: 2020 network communication "three major" keywords

In 2020, the COVID-19 pandemic spread wildly arou...

Simple test of BandwagonHost special price annual VPS (DC6)

Last time when BandwagonHost launched a special o...