Three ways to send large amounts of data over HTTP

Three ways to send large amounts of data over HTTP

In the early days of the web, people sent files that were just a few KB in size. Fast forward to 2023, and we enjoy high-resolution MB-sized images and watch 4K (soon to be 8K) videos that are several GB in size.

Even with a good internet connection, downloading a 5GB file can still take some time. If you own an Xbox or PlayStation, you know the feeling.

There are three ways we can reduce the time it takes to send large amounts of data over HTTP:

  • Compressing Data
  • Sending chunked data
  • Request data in the selected range

They are not mutually exclusive. You can use all methods together depending on your use case.

Compressing Data

1*_un0bHBemgCSDocQmucK5Q.png

To compress data, we need a compression algorithm.

When sending a request, the browser includes a header called Accept-Encoding, which contains a list of supported compression algorithms, including gzip (GZIP), compress, deflate, and br (Brotli).

Next, the server selects an algorithm it supports from the list and sets the algorithm name in the Content-Encoding header.

When the browser receives the response, it knows how to parse the data in the body.

Among these algorithms, the most popular is GZIP. It is an excellent choice for compressing text data such as HTML, CSS, and JavaScript.

Brotli is another algorithm worth mentioning. It performs even better than GZIP in compressing HTML.

These efficient algorithms have some limitations.

They compress text well, but not well enough for compressing images or videos. After all, the media is already optimized.

Try compressing a video file on your computer. You should hardly see much difference between before and after compression.

Furthermore, it is almost impossible to compress a 5GB video to a few KB without losing quality.

Compression is good, but we need a better solution - send the file in chunks and assemble the partial data on the client side.

Sending chunked data

1*0WLNkzfgw9faLpTUXkk3tg.png

In version 1.1, HTTP introduced chunked data to handle large data situations.

When sending the response, the server adds a header Transfer-Encoding: chunked to let the browser know that the data is transferred in chunks.

1*Nwlp0QqhEsvWl4fw-x0X7Q.png

Each chunk has the following components:

  • A length block marker, marking the length of the current block data
  • Chunking data blocks
  • CRLF delimiter at the end of each chunk

Want to know what CRLF is?

1*s_-5lmT9176ymCAaaGCE2w.png

CR followed by LF (CRLF, \r\n, or 0x0D0A) moves the cursor to the next line and then to the beginning of the line. You can find more details in the Further Reading section at the end of this article. Here, you can simply think of it as a delimiter.

The server continues to stream chunked data to the browser. When it reaches the end of the data stream, it appends a closing tag containing the following:

  • A length block, number 0, and CRLF at the end
  • An extra CRLF

On the browser side, it waits for all the chunks until the end marker is reached. Then, it removes the chunk encoding, including the CRLF and length information.

Next, it combines the chunked data into a whole. Therefore, on Chrome DevTools, you can only see the assembled data, not the chunked data.

Eventually, you will receive a chunk of the entire data.

1*oChWIlysG3PQD3vy8ctVxw.png

Chunking the data is useful. However, for a 5GB video, it still takes some time for the complete data to arrive.

Can we fetch selected chunks of data and request other chunks when needed?

HTTP says yes.

Request data in the selected range

1*LOGONes_KpmSN6zXaz9DhA.png

Open a video on YouTube and you'll see a gray progress bar moving forward.

What you just saw is YouTube requesting data for the selected range.

This feature allows you to jump anywhere in the timeline. When you click somewhere on the progress bar, the browser requests a specific range of video data.

Implementing range requests on the server is optional. If implemented, you can see Accept-Ranges: bytes in the response header.

1*MWd4AGP8lLRIQw5mketXew.png

This is an example of a YouTube request. You can find this header in any "playback" request.

A range request header looks like `Range:bytes=0-80`, which is indexed starting from 0.

This head is a very cleverly designed head with excellent flexibility.

Assume that a data has a total of 100 bytes.

  • Range: bytes=20 requests a range starting from 20 to the end, which is equal to Range: bytes=20-99.
  • Range: bytes=-20 requests the last 20 bytes of data, which is equal to Range: bytes=80-99.

If the requested range is valid, the server sends a response with a Content-Range header verifying the data range and total length, for example Content-Range: bytes 70-80/100.

Range requests are widely used in video streaming and file download services.

Have you ever continued a file download after an internet outage? That's a range request.

Additionally, range requests support multiple ranges.

For example, you can request two ranges from a file, like Range: bytes=20-45, 70-80.

A multi-range body looks similar to chunked data. Each chunk has the following parts:

  • A boundary block, marking the boundary of different data blocks, starts with -- and ends with CRLF
  • Two headers, Content-Type and Content-Range, show the properties of the corresponding data block and end with CRLF
  • An extra CRLF to tell the client that real data is coming
  • Finally, a data block terminated by CRLF

The boundary is just a random string that looks like 3d6b6a416f9b5, marking the boundaries between different chunks of data.

Finally, the body ends with a boundary block, which starts with -- and ends with -- and CRLF. This tells the browser that the multipart has ended.

Let's put it all together. The response body is structured as follows.

Summarize

HTTP helps us to transfer large amounts of data through compression, chunked data, and range data.

The idea here is to send the data we need when we need it, and then send other data when needed. You can try the same idea when you encounter problems in designing similar systems.

By combining these three methods, we can send compressed chunked data range data.

<<:  Huawei releases a full range of 5G-A solutions to make 5G-A a reality

>>:  Transforming the digital experience with 5G

Recommend

Three "fairy tale" ways to build a data center

There is a very important indicator for evaluatin...

SD-WAN and Operations

Software-defined WAN or SD-WAN is a great example...

Outlook for domestic 5G development in 2021 (I): Current status

The development of 5G has now become another hot ...

5G is coming, do I need to change my SIM card?

2019 is the first year of 5G. With the issuance o...