11 reasons why YouTube supports 100 million video views per day with just 9 engineers

11 reasons why YouTube supports 100 million video views per day with just 9 engineers

Author | NK

Planning | Yan Zheng

February 2005, California, USA. PayPal, a world-renowned online payment service company, had been in existence for 6 years and 2 months. Three early employees began looking for their own opportunities as if they had discovered the traffic password of the Internet world.

Finally, they wanted to build a platform for sharing videos. Later, this platform born in the garage became the famous YouTube.

Initially, they had limited financial resources and could only raise funds for YouTube through credit card debt and infrastructure loans. However, financial constraints also forced them to build an excellent set of scalability technology.

In the second year, the daily video playback volume of their platform reached 100 million. What’s more surprising is that they achieved this with only 9 engineers.

How did YouTube do it? Here are the key design points of that year. (P.S.: At first glance, it looks plain and simple.)

1. Magic Flywheel

They use a "flywheel" approach to collect and analyze system data to facilitate scalability. Their workflow is a continuous cycle: identify bottlenecks → fix bottlenecks → drink water → sleep. The benefit of this approach is that it avoids the need for high-end hardware (no large-scale deployment) and reduces hardware costs.

Scalability loop

2. A seemingly boring but ingenious technology stack

They keep their tech stack simple and use proven technologies. Their tech stack is definitely beyond your imagination:

YouTube Technology Stack

  • MySQL stores metadata: video titles, tags, descriptions, and user data. Because it's easy to fix problems in MySQL.
  • The video is served by the Lighttpd web server.
  • Linux was used as the operating system. They used the following Linux tools to examine system behavior: strace, ssh, rsync, vmstat, and tcpdump.
  • Python on the application server. Because it provides many reusable libraries and they don't want to reinvent the wheel. In other words, Python allows for fast and flexible development. According to their measurements, Python is never a bottleneck. Notably, they use a Python to C compiler and C language extensions to run CPU intensive tasks.

3. Keep it simple

They believed that software architecture is the root of scalability. They did not blindly pursue “buzzwords” to scale. Therefore, they kept the architecture simple — making code review easier. This allowed them to quickly re-architecture to meet changing requirements. For example, they pivoted from a dating site to a video sharing site.

Additionally, they keep network paths simple because network devices have scalability limitations.

Hardware Cost

They also used commodity hardware. It enabled them to reduce power consumption and maintenance expenses and keep costs low.

Furthermore, they keep scale-aware code relatively independent from application development.

4. Choose your main battlefield

They outsourced many unimportant issues. Because they wanted to focus on important things. They didn't have the time or resources to build their own infrastructure to serve popular videos. So, they put popular videos on a third-party CDN. Benefits:

  • Low latency. Because users have fewer network hops;
  • High performance. Because it provides video in memory;
  • High availability. Because of automatic replication.

They serve less popular videos from co-located data centers, use software RAID to improve performance by accessing multiple disks in parallel, and tune their servers to prevent cache thrashing.

They keep their infrastructure in co-located data centers for two reasons: First, they can easily adjust the servers to meet their needs. Second, it facilitates their contract negotiations.

Choose your battleground; outsource issues to free up resources

Each video has 4 thumbnails. Therefore, they faced problems in serving small objects: lots of disk seeks and file system limitations. Therefore, they put the thumbnails into BigTable. It is a distributed data store with many advantages: avoiding small file problem by clustering files, improved performance, multi-level cache low latency, easy configuration.

They also falsify data to prevent expensive transaction fees. For example, they falsify video views and update the counter asynchronously. A popular technique for approximate correctness today is the Bloom filter, which is a probabilistic data structure.

5. Three pillars of scalability

YouTube relies on three pillars of scalability: statelessness, replication, and partitioning.

The 3 Pillars of Scalability

They keep their network servers stateless and scale through replication.

They replicated database servers for read scalability and high availability, and load-balanced traffic between the replicas. But this approach caused problems: replication lag and write scalability issues.

Replication and Partitioning

So they partitioned the database to improve write scalability, cache locality, and performance. It also reduced hardware costs by 30%.

In addition, they studied data access patterns to determine the partitioning level. For example, they studied popular queries, joins, and transaction consistency and chose user as the partitioning level.

6. Solid engineering team

A knowledgeable team is a great asset for scalability.

Interdisciplinary Team

They keep the team size small to improve communication: just 9 engineers. Their team is very good at cross-disciplinary skills.

7. Don’t repeat yourself

They use cache to prevent duplication of expensive operations. It enables them to scale pageviews.

Multi-level cache can be expanded

They also implemented caching at multiple levels - and reduced latency.

8. Sorting: Prioritize important indicators

Rank important traffic; 80/20 rule (Pareto principle)

They prioritize video viewing traffic over all other traffic. Therefore, they reserve a dedicated resource cluster for video viewing traffic. This provides high availability.

9. Preventing “Thunder Groups”

If many concurrent clients query the server, thundering herd problem will occur. It will degrade the performance.

The Thundering Herd Problem

They use jitter to prevent thundering problems. For example, they added jitter to cache expiration for popular videos.

10. Fight a protracted war

They focus on macro-level things: algorithms and scalability. They do quick hacks to buy more time to build long-term solutions. For example, using Python to eliminate bad APIs to prevent short-term problems.

Risk and Reward

They tolerate defects in components. When they encounter a bottleneck: they either rewrite the component or remove it.

They trade efficiency for scalability. Here are four examples:

  • They chose Python instead of C;
  • They maintain clear boundaries between components to allow for horizontal scalability and tolerant latency;
  • They optimized the software to be fast enough, but they were not obsessed with machine efficiency;
  • They serve videos from server locations based on bandwidth availability. And not based on latency.

11. Adaptive evolution

They adapted the system to suit their needs. Example:

  • Key components use RPC instead of HTTP REST, which improves performance;
  • Custom BSON as the data serialization format. It provides high performance;
  • Eventual consistency in some parts of the application to achieve scalability. For example, a “read what you wrote” consistency model for user reviews;
  • Learning Python is about avoiding common pitfalls. Of course, there are also reasons for analyzing requirements;
  • Customized open source software;
  • Optimize database queries;
  • Make non-critical real-time tasks asynchronous.

Coding principles

They didn’t waste time writing code to limit people. Instead, they adopted good engineering practices — coding conventions — to improve the structure of their code.

--postscript--

In November 2006, Google acquired YouTube for $1.65 billion and operates it as a subsidiary. Today, it remains the leader in the video sharing market, with 5 billion video views per day.

According to Forbes, the net worth of YouTube's founder exceeds $100 million. YouTube became the leader in video search only 20 months after its creation, which can be said to have created a Silicon Valley miracle.

Reference link: https://newsletter.systemdesign.one/p/youtube-scalability

<<:  Let's talk about HTTP/3, QUIC, how do they work?

>>:  Stop guessing! Teach you how to accurately identify the indicator lights on box switches!

Recommend

Emerging technology trends to watch in 2023

As the world of technology continues to evolve, i...

Develop a comprehensive budget plan for your data center

Data center budget planning is a difficult task t...

Report: Global mobile broadband coverage reaches 95%

The majority of the world’s population – 95% – is...

The past and present of IPv6 and the comparison with IPv4

IPv6 is the abbreviation of Internet Protocol Ver...

How to connect a switch Switch usage tutorial

In the era of popular Internet, many families hav...

The virtual world's "express delivery system" is upgraded again, what is IPv6+?

This article is reproduced from Leiphone.com. If ...