Let’s talk about protocols and hard drives in the Web3 world: IPFS

Let’s talk about protocols and hard drives in the Web3 world: IPFS

In the Web2.0 world, the protocol is usually HTTP, resource acceleration is usually CDN, and object storage is usually OSS.

In the world of Web3.0, there are many technologies that can replace these three technologies and do things better. One of the best projects is a technology that integrates protocols, resource acceleration and storage: IPFS.

This article will introduce what IPFS is and how it works.

The IPFS white paper was released in July 2014, which mentioned many technical solutions. Its concept is completely different from HTTP, CDN and OSS in traditional Web2.0.

Before introducing IPFS, let's review the advantages and disadvantages of HTTP, CDN, and OSS.

Why do technologies like HTTP, CDN, and OSS no longer work in the Web3 world?

Advantages and Disadvantages of HTTP

The HTTP referred to here is HTTP1.1 and HTTP2.0.
Its advantage is handling the transfer of small files.
However, there are five challenges in data transmission over the modern Internet:

  1. The amount of data hosted and distributed has reached petabyte levels.
  2. Big data computing across organizations.
  3. Distribute massive amounts of high-definition video.
  4. Linking and versioning of massive data.
  5. Prevent important files from being lost.

To sum up the above: massive data, everywhere.
All these are difficult for HTTP to deal with. So with the continuous development of the Internet, HTTP will eventually withdraw from the stage of history.

Advantages and Disadvantages of CDN

The main purpose of CDN is to speed up the access to static resources, which is also its main advantage. Secondly, it can resist DDOS attacks and is relatively easy to maintain.
But the disadvantages are obvious. The process of building a CDN server is relatively complicated and the cost is high, so people usually buy a server-side CDN, but the price is not low.

Advantages and Disadvantages of OSS

The OSS referred to here is the OSS service of the cloud service provider.
The advantages of OSS are high reliability, easy scalability, fast speed, and edge computing. On top of that, there may be a series of advanced features such as mirroring, backup, security, and desensitization. In summary, it can be summed up in two words: worry-free.
The disadvantage is also obvious, the price is very expensive. However, all OSS service providers consider low cost as one of its advantages. In fact, it is not noticeable when the amount of data is small, but once the amount of data increases sharply, the high price is really unacceptable. I have a deep understanding of this.
Of course, the real reason why OSS is expensive is not the high storage cost, but the high bandwidth cost.

The Problem with the Modern Internet

The above problems all belong to the category of technology. In addition, there is another fundamental problem: the problem of the modern Internet model.

Technical issues, cost issues, efficiency issues

Because the modern Internet is a centralized model. This model is extremely costly and requires the construction of a centralized large-scale server cluster. It is also prone to service congestion and delay during peak hours and a large amount of resource restrictions and waste during low hours, making it very inefficient. This model relies heavily on the storage and bandwidth of centralized service providers. Although technologies such as elastic computing can alleviate this situation, they cannot solve the fundamental model problem.
IPFS can share storage and bandwidth, which can make resources more efficient and reduce costs.
IPFS can effectively deduplicate files and eliminate redundancy.

Data ownership issues

It is easy to lose data on the modern Internet. Because data is stored on centralized server clusters, service providers have the right to manage data. Although large service providers have data backup and disaster recovery plans, accidents still happen occasionally. Most importantly, service providers can delete our data for any reason, such as saying that your data is illegal or that you violate the platform regulations. As an ordinary person, it is difficult to fight against service providers, and some important data disappears unknowingly.
IPFS can store data permanently.

Heavily dependent on backbone network

The modern Internet relies heavily on backbone networks. Once a backbone network fails, large-scale service interruptions and delays will occur.
IPFS does not rely on the backbone network, and even in areas with underdeveloped networks, IPFS still performs well.

Review mechanism issues

Since modern Internet applications are centralized networks, rulers can prevent people from accessing a certain website or app. In China, this practice is also called a wall.
IPFS is IP and is distributed, making it almost impossible to block.

Ecological issues

Although service providers paid a large amount of money to help us build server clusters and provide us with products, the wool comes from the sheep. They will take our money through a series of means such as membership fees and advertising fees. In order to make profits, they even lose their bottom line, abuse user privacy, constantly cross the line, make false advertisements, malicious pop-up ads, limit the speed of the network, and even sell our data, charge public relations fees to delete posts and ban accounts, etc.
In addition to the above problems, Internet security is also a headache. For example, various anti-human verification codes.
In the past, we had no choice. Although it was unbearable, we could only swallow our anger and use the Internet to curse the Internet on the Internet. But now it is different. We have better choices. In the IPFS world, these problems no longer exist.
IQIYI launched a skit before, which was not funny enough, and was used to satirize the modern Internet. Someone posted it to Zhihu, if you are interested, you can check it out: www.zhihu.com/zvideo/1433….
Here are some undesirable results:

What is IPFS?

There are many definitions of IPFS.
As defined in its paper, IPFS is a content-indexed, versioned, peer-to-peer file system.
Defined from a technical perspective, IPFS is a forest full of merkle-trees.
Defined from a business perspective, IPFS is a peer-to-peer hypertext protocol.
It will make the Internet faster, more secure, and more open.

Introduction to IPFS

IPFS is an abbreviation, the full name is Inter Planetary File System.
It is a peer-to-peer interplanetary file system. From this perspective, it targets the entire Internet, not a certain protocol or a certain file storage system. It is more like a single BitTorrent swarm.

IPFS Author Introduction

The author is Juan Benet, who is transliterated as Juan according to Chinese custom. He is an American, born in 1988, graduated from Stanford University, and is an absolute technical expert.
Juan is also the founder of Protocol Labs, IPFS, and Filecoin.
He founded Protocol Labs in 2014 and launched the IPFS project in the same year.
Protocol Labs is the official organization of IPFS and Filecoin, and its goal is to build the next generation of the Internet.
Four years later, in 2018, he was named to Fortune magazine's 40 under 40 list.

What technologies is IPFS based on?

Core technologies of IPFS

A core principle of IPFS is to model all data as part of the same Merkle Directed Acyclic Graph (Merkle DAG).
It uses but is not limited to the following technologies:

  • Content addressing is based on distributed hash table DHT.
  • Object management based on the Git model.
  • Based on Merkle object associations.
  • Based on peer-to-peer technology.
  • Based on the global namespace IPNS.

Through the above technologies, a series of problems such as massive data, high concurrency, high throughput, and file loss have been solved.
To sum up, there are only three things it does: it specifies how to upload files, how to retrieve files, and how to download files.
You may ask, aren’t most of these technologies from the past P2P field? Yes, IPFS is a P2P master. It did not create many technologies or concepts out of thin air, but stood on the shoulders of giants.

Why peer-to-peer?

Modern Internet resources all require an http address to obtain, so our browser bookmarks store a large number of URLs. This mode is the location-based addressing mode.
But if you think about it carefully, we only care whether the content of the resources is what we want, not where the resources are.
The resources we need must be needed by others. If someone nearby has downloaded this resource on their computer, we only need to download it directly from this person's computer, without having to go to the source of the file to get it. This mode is called content-based addressing mode.
Point-to-point transmission mode is the basis of the above.

How IPFS works

We need to go through the following steps to upload files on IPFS.

  1. IPFS will first divide the file into several small data blocks with 256 kb as a unit, and then add hash fingerprints to each of them. The hash fingerprint is a unique string that can correspond one-to-one with the data block.
  2. Then IPFS will hash the hash values ​​of every two small data blocks to get a new hash. It will repeat this process until the hash values ​​of all data blocks are calculated into one hash value. The final hash value is the root hash (Root Hash), also called CID (Content Identifier). This process is the process of building Merkle DAG, which is the core principle of IPFS.
  3. IPFS removes duplicate files. Because each file has a corresponding hash value, the same hash value means it is a duplicate file. IPFS removes duplicate files, but each node can keep a backup of this node.
  4. Each IPFS will store the data it needs, and use DHT to record what data each node stores. DHT is a mapping of content ID (CID) and user ID (PeerID). IPFS will send our file information to all other online nodes, but it is not a real file, but a structure. It contains CID and PeerID. Each node will update its own hash table. So this process is very fast. It is like copying the variable address in a program instead of copying the actual variable memory space.
  5. When we need to obtain a file, we will use the hash value of the file to find out whose computers the file is on, and then download it from these people's computers. If the file we need is stored on 100 people's computers, and now these 100 people are online at the same time, then we can transfer the file to these 100 people at the same time, and finally combine them into a complete file. Theoretically, this download speed will be 100 times faster than downloading from one person. This is the main working logic of IPFS.

The data structure of the data block is as follows:
data: contains no more than 256 kb of data.
links: connections to other data blocks.
If a file is very large, its contents will first be generated into N data blocks, and then a data block will be created from their upper layers, with links pointing to all other data blocks.

Why does IPFS need Git?

Once the content of the file changes, the original hash value will become invalid. Therefore, the file content of IPFS is immutable.
But what if we need to update the contents of the file?
In order to track file updates, IPFS introduces a version control model that is basically consistent with Git.
When we upload a file to IPFS for the first time, IPFS creates a Commit object. Its structure is roughly as follows:

  • parent: points to the previous commit, the first commit points to none.
  • object: file content.

If we need to update the file content, first upload the new file to IPFS.
IPFS will create a new Commit object for us, and its commit will point to the previous commit object.
This way we can track changes to the file contents.

IPFS cannot ensure that resources are always available

If all nodes that have a certain resource are offline, then the resource will never be available for download, just like when downloading BT without seeds.
In order to solve this problem, we need a corresponding solution.
There are two solutions for IPFS.

  1. The incentive mechanism encourages nodes to store more files and share them online for a long time.
  2. Proactively distribute files to ensure there is always an online backup.

And this incentive mechanism is Filecoin. We will talk about Filecoin later.

What is the relationship between IPFS and blockchain?

Strictly speaking, the two are not related, and IPFS does not use any blockchain technology. However, people in the blockchain industry still know about IPFS because another product of the IPFS team, Filecoin, is related to blockchain.

The relationship between IPFS and Filecoin

Filecoin is a blockchain application that can be simply understood as a digital currency.
There is no direct relationship between the two, they are just two products of one team.
However, IPFS will provide underlying support for Filecoin, and Filecoin will also inject more vitality into IPFS.
We mentioned in the operating principle of IPFS that we ultimately download files on other users' computers, and downloading from other users' computers requires a certain network cost. In order to encourage everyone to share resources with others, IPFS has an incentive model called BitSwap.
If our computer has a lot of free storage space, we can use Filecoin to store resources and share them with others. When we share with others, we will get Filecoin rewards.

IPFS Why is it called IPFS?

IPFS is called Interstellar File System. This name is not just because it is cool, futuristic or sci-fi. It is really suitable for transmitting data between stars.
Juan's ambition is not just to replace the Internet. Elon Musk is trying to find a way to migrate humans to Mars, but he certainly won't let all the people on Earth move to Mars all at once. In the future, it is very likely that some people will be on Earth and some on Mars.
If you want to get a web page from Earth on Mars, the delay is very long. It takes 4-24 minutes to send a signal. A round trip takes 8-48 minutes. This delay is very unbearable. However, once anyone on Mars gets this web page, other Martians can avoid this delay by getting the web page from this person's computer.
Therefore, IPFS is a veritable cosmic-level transmission protocol.

How to use IPFS?

The official website of IPFS is ipfs.tech/.
There are two ways to install IPFS: desktop client and CLI.

We chose to download the desktop client.
We mainly participate in the IPFS network through this client.

There are five menus on the left. I will introduce their functions one by one.

Status

This page is some overview data, including information such as hosted data size, online nodes, node ID, agent version, UI version, real-time bandwidth, etc.

Files

Files can be uploaded and deleted here.

Click the import button to select the file or folder to upload.
Its upload speed is very fast, but it does not actually upload the file. It just hashes the file and distributes the CID and PID to all currently online nodes.
After the upload is complete, we will get a CID. Click the three dots on the right side of the file and select Copy CID to share this CID with other IPFS users.

After other users get the CID, they can search for it in the search box at the top of the IPFS client.

If it can be found, you can preview or download it.
In addition, the IPFS protocol can also be used in the browser. Simply enter ipfs://{cid} in the browser to open the file resource directly. However, this requires starting the IPFS client locally.
If IPFS develops very smoothly in the future, this function may be built into the browser, and there is no need to start the client separately.

Explore

Here you can perform CID searches to obtain addressable IPLD nodes, file objects, CID information, etc.

Peers

You can see information about other nodes here.
Including the physical address of the node, delay, node ID, transmission protocol, connection protocol, etc.
The more nodes we have, the better our experience will be.

Settings

Here you can modify the shared link's gateway, fixed service, language, IPFS JSON configuration, etc.

Disadvantages of IPFS and Pin Services

After talking about so many advantages of IPFS, IPFS seems to be a technology that is infinitely moving towards perfection.
But is it really perfect? ​​No. IPFS also has a shortcoming that all P2P applications have, that is, if all the nodes holding a certain file are offline, we will no longer be able to download the file.
This problem is also easy to solve. If we want a certain file to be available for download forever, we can store the file in our local IPFS system, and our local IPFS will always be online.

But obviously, if you keep the computer turned on 24 hours a day, it will be no different from a server.
Isn’t this done by renting a cloud service in Web2.0? There are also people doing this in Web3.
This service is called Pin service, which means fixed.

The function of the Pin service is to pin a file in the IPFS system.

Pinata is a platform that specializes in providing IPFS Pin services for NFTs. It is also one of the largest Pin service platforms in the IPFS ecosystem.

But Pinata is not free, and similar platforms are the same. They charge according to certain rules, a bit like the online storage in the Web 2.0 era.

Some people think that this solution is not a long-term feasible solution, because when the fixed file size is very large, the price of Pinata will also be very expensive. The current highest price is $1,000 per month.

In fact, it is not difficult to conclude that in the current Internet world, there is no 100% free, absolutely decentralized, and absolutely guaranteed permanent storage.

If this vision is to be realized, everyone needs to have a network storage device that is online and turned on 24/7. It is a bit like what is imagined in some science fiction novels, where humans are born with a device implanted in their bodies. Even humans can rewrite genes so that babies can develop this device from their mothers, just like one of our organs, except that it can store data and connect to other people.

I can assert that this era will eventually come, and the speed of entering this era will depend on the development of a series of scientific and technological fields such as energy, network, disk, and computing.

<<:  The 17th China Enterprise Annual Selection List for 2022 was announced: Juniper Networks China CTO Jing Youhao won the 2022 IT Industry Network Outstanding Contributor

>>:  Kubesphere deploys Kubernetes using external IP

Recommend

Learn how to start your networking career

The networking industry is changing rapidly, and ...

Four Best Practices for Network Cable Management

If a cabling project is to be successful, you fir...

Comparison between MQTT and SSE

Building a real-time web or mobile application is...

Eight major IT disasters in 2024

Like most years, 2024 has seen a series of IT dis...

WiFi 7 will have these improvements with a speed 2.4 times faster than WiFi 6

At the MediaTek Technology Summit, MediaTek annou...

The Smart Network: Cisco's most disruptive innovation in a decade

A little over a year ago, my colleague David McGr...

Why 5G needs network slicing and how to implement it

[[189050]] When 5G is widely mentioned, network s...