How is Instagram expanding its infrastructure across the ocean?

How is Instagram expanding its infrastructure across the ocean?

【51CTO.com Quick Translation】In 2014, two years after Instagram joined Facebook, Instagram’s engineering team migrated the company’s infrastructure from Amazon Web Services (AWS) servers to Facebook’s data centers. Facebook has multiple data centers in Europe and the United States, but until recently Instagram only used data centers in the United States.

[[247768]]

The main reason Instagram wants to expand its infrastructure across the ocean is that we no longer have space in the United States. As the service continues to grow, Instagram has reached a point where we need to consider leveraging Facebook's data centers in Europe. Another benefit: local data centers mean lower latency for European users, which will hopefully create a better user experience on Instagram.

In 2015, Instagram expanded its infrastructure from one data center to three to provide much-needed resiliency: Our engineering team didn’t want to repeat the AWS disaster of 2012, when a major storm in Virginia brought down nearly half of its instances. Scaling from three data centers to five was easy; we simply increased the replication factor and copied the data to the new regions; however, it was harder to scale when the next data center was far away on another continent.

Understanding infrastructure

Infrastructure can generally be divided into two types:

  • Stateless services are usually used for computing and scale based on user traffic (on-demand scaling). The Django web server is an example.
  • Stateful services are usually used as storage and must maintain consistency across data centers, such as Cassandra and TAO.

Everyone loves stateless services, they are easy to deploy and scale, and can be started whenever and wherever needed. In fact, we also need stateful services like Cassandra to store user data. Running Cassandra with too many replicas not only increases the complexity of maintaining the database, but also wastes capacity, not to mention how slow it is to transmit quorum requests across the ocean.

Instagram also uses TAO (Distributed Data Store for Social Graphs) as a data storage system. We run TAO as a single master for each shard, without any slaves updating the shards for any write request. It forwards all writes to the master region of the shard. Since all writes are done in the master region located in the United States, the write latency in Europe is unbearable. You may have noticed that our problem feedback is basically at the speed of light.

Potential Solutions

Can we reduce the time it takes for a request to travel across the ocean (or even make the round trip disappear)? There are two ways to go about this.

1. Partitioning Cassandra

To prevent arbitration requests from traveling across the ocean, we are considering splitting the dataset into two parts: Cassandra_EU and Cassandra_US. If European users' data is stored in the Cassandra_EU partition and US users' data is stored in the Cassandra_US partition, users' requests will not have to travel long distances to get data.

For example, let's say there are five data centers in the US and three in the EU. If we deploy Cassandra in Europe by replicating the current cluster, the replication factor will be 8, and quorum requests must contact 5 of the 8 replicas.

But if we can find a way to split the data into two groups, we can have a Cassandra_US partition with a replication factor of 5 and a Cassandra_EU partition with a replication factor of 3, each partition can operate independently without affecting the other partition. At the same time, the quorum requests for each partition can remain on the same continent, thus solving the round-trip transmission latency problem.

2. TAO is limited to writing to local

To reduce latency for TAO writes, we can restrict all EU writes to the local region. This will look almost the same to the end user. When we send a write to TAO, TAO will update locally and will not block the write from being sent synchronously to the primary database; instead, it will queue the write in the local region. In the local region of the write, the data will be available from TAO immediately, while in other regions, the data will be available after it propagates from the local region. This is similar to regular writes today, where data propagates from the primary region.

While different services may have different bottlenecks, if we focus on reducing or eliminating cross-ocean traffic, we can address them one by one.

Lessons Learned

As with every infrastructure project, we learned some important lessons along the way. Here are a few of the main ones.

  • Don’t rush into new projects. Before you start provisioning servers in a new data center, make sure you understand why you need to deploy services in the new region, what dependencies there are, and how the system will operate when the new region is put into use. Also, don’t forget to review your disaster recovery plan and make any necessary changes.
  • Don't underestimate complexity. Always leave enough time in your schedule to make mistakes, find unexpected blockers, and learn new dependencies you didn't know about. You may find yourself inadvertently reinventing the way you build infrastructure.
  • Understand the trade-offs. Success always comes at a price. When we partitioned our Cassandra database, we saved a lot of storage space by reducing the replication factor. However, to ensure that each partition was still ready to face disasters, we needed more front-end Django capacity to accept traffic from the failed region, because now the partitions could not share capacity with each other.
  • Be patient. I can't remember how many times we said "Oh, shit!" while launching the European data center, but it always works out in the end. It may take longer than you expect, but be patient, and the whole team will work together, it's a super fun process.

Original title: How Instagram is scaling its infrastructure across the ocean, author: Sherry Xiao

[Translated by 51CTO. Please indicate the original translator and source as 51CTO.com when reprinting on partner sites]

<<:  Detailed explanation of several wireless transmission modes!

>>:  Qinghai University: Ruijie's "Beauty of Minimalism" Blooms Magnificently on the Smart Campus on the Plateau

Recommend

Edge computing/fog computing and what it means for CDN providers?

CDN is usually a large number of distributed syst...

What is the difference between FTP and SFTP?

In actual project development, the most commonly ...

What is the difference between MPLS and IP?

MPLS VS IP (1) IP forwarding principle: The route...

What are the options for 4-port/8-port/16-port/24-port Gigabit POE switches?

POE power supply technology has become the darlin...

The next generation of computing defined by Cisco is born

For more than 30 years, Cisco has been driving th...

Aruba: Modernizing the network to enable ubiquitous connectivity

Network edge is an inevitable trend, and user nee...

Is Bluetooth mesh the future of smart buildings?

Smart buildings, whether residential, commercial ...

Modernizing Configuration Management to Address Network Complexity

The expansion of network infrastructure to multip...