This article will analyze the characteristics of IPFS and conduct parallel research based on comparison with other distributed file systems and Hypertext Transfer Protocol (HTTP).
introductionThe Internet is a collection of a large number of computing machines connected by protocols and physical devices. Most ecosystems within the Internet are based on the client-server (request-response) model, but this model is not indestructible and the network will fail from time to time. Whether there is a control center or not, peer-to-peer (P2P) systems can efficiently distribute massive amounts of data. In 2014, Juan Benet proposed integrating existing advanced technologies (distributed hash tables (DHT), BitTorrent-like protocols, Git-based data models, etc.) to create a new protocol/file system so that everyone has equal rights to share data, which is in line with the early ideas of the Internet. IPFS has the characteristics of efficient data storage and distribution, data retention, offline mode and decentralized management. Introduction to IPFS TheoryIPFS is a purely peer-to-peer distributed file system that focuses on removing central points from the main architecture and providing the same functionality to interconnected nodes in the network. All shared data and computing resources are stored at the edge of the network, and nodes are able to run autonomously and share required data with their peers. They are able to communicate autonomously, distribute data, localize other nodes and required files, and use the same set of protocols. IPFS originated from Juan Benet's idea of a distributed, decentralized, shared Internet. It has been around since 2014 and is still under development. There are already some available implementations (such as Go and JavaScript), as well as a set of tools, libraries, and APIs (application programming interfaces) implemented in various programming languages. Some of the main features of IPFS are: data persistence, peer-to-peer basic principles, complete decentralization, no central point of failure, and local connectivity without an Internet uplink.
The official white paper released by Juan Benet represents his views on protocol architecture and modules. In general, IPFS is inspired by a number of technologies that have been shaped into a single, modular protocol, and IPFS leverages these ideas and experiences. We will briefly introduce these technologies to better understand IPFS. Starting from the bottom layer of the IPFS stack, the network layer can store data, exchange information, and exchange control information. The transmission itself can be implemented in a secure and reliable way using various protocols (such as TCP, UTP, WebSocket, WebRTC, etc.), and IPFS itself is not bound to a specific protocol. Moving up to the routing layer, a distributed hash table (DHT) is used to store and manage metadata within the system. This information contains information about the nodes that are connected to each other at a given point in time and provides a mechanism to quickly and efficiently find data. Kademlia is very important at the routing layer. It provides an efficient way to find metadata in large networks with low coordination costs. Coral DSHT achieves expansion by querying the nearest node that can store data and increases the likelihood that data is stored on nodes in more distant locations. S/Kademlia further enhances security against malicious attacks by forcing nodes to create PKI (Public Key Infrastructure) key pairs used to generate identities and sign messages. For locally located nodes, multicast DNS (Domain Name System) is used to achieve mutual search. The switching layer is used to ensure block transfer between nodes. Further up the stack, the Merkle Directed Acyclic Graph (Merkle DAG) is the protocol's primary data model, largely inspired by the Git data structure. The nodes of the data tree are objects cryptographically addressed by their content, while the links between them are represented by hash references to other objects. Each piece of data is uniquely identified by its immutable hash reference (and therefore stored only once, deduplicated), and the system is able to detect corrupted data using checksums. The last layer of the stack is the naming layer. A unique identifier for each node is cryptographically generated locally using a PKI key pair. The InterPlanetary Naming System (IPNS) is a strategy for identifying modifiable objects. Blocks of data have immutable hash references, so once their content changes, the hash reference changes. The IPNS concept uses a self-certifying file system scheme, so nodes are able to publish data on their own unique node identifiers. If the data itself changes, the hash reference will also change, but the node may republish a new reference to the same unique node identifier. IPNS also supports DNS to provide human-readable addresses. At the top is the application layer, where developers can design and implement new distributed, decentralized technologies using the underlying capabilities of the stack. IPFS vs. other DFSThis section will discuss various aspects of DFS (Distributed File System) and HTTP (Hypertext Transfer Protocol). NFS (Network File System) is an open protocol based on the RPC (Remote Procedure Call) protocol developed by SUN in 1984. It is an application based on the UDP/IP protocol. Its main feature is that it has a control center. NFS allows a public file system to be shared among multiple users and provides the advantages of data centralization to minimize the required storage space. Comparing the early NFS with IPFS, we can see that NFS uses servers and idempotent, stateless behavior to synchronize data in the system, while the IPFS architecture does not rely on servers because the data generated by its hash references can be shared among its protocol users. IPFS handles write operations in a synchronous/asynchronous manner in the node's automatic state, and its users can share data in the network through metadata exchange and search as long as they obtain data identifiers. AFS (Andrew File System) is a distributed file system designed by Carnegie Mellon University and IBM in the 1980s. Its main function is to manage files distributed on different nodes of the network. It uses a group of trusted servers to provide clients with a homogeneous, address-transparent file name space. The main goal is to achieve scalability, with particular attention to the design of the protocol between the client and the server. Files are stored and cached as a whole on the local disk. When a client wants to access a file, it will get the file from the server, cache it locally, and then the server sets a callback (to notify the client that the file has been modified later). The IPFS mechanism can also be used to implement similar callback and caching systems while maintaining the advantage of being decentralized (single point of failure). GFS (Google File System) is a dedicated distributed file system designed by Google to store massive search data, focusing on scalability, basic performance, and low-cost hardware. Google hopes to provide a data distribution method that appends rather than rewrites, thereby building a self-sustaining file system: with supervised recovery and the possibility of using a master block architecture to store large amounts of data backed up across multiple servers. The design of GFS has some similarities to IPFS, which uses multiple block servers, data blocks, and replacement machines to cope with crashes. Compared to IPFS, GFS's information is still managed in a central area and coordinated by the master server, while IPFS's data is basically stored in the network. HTTP (Hypertext Transfer Protocol) is one of the most popular protocols used worldwide for data exchange in the context of the Web. It follows the classic client-server model architecture, where the server is usually located outside the Internet and the client is a browser. The entire mechanism relies on the request (data)-response (data/status) interaction between the client and the server. It is characterized by being simple, scalable, stateless, and having a control center. HTTP still works well, but problems gradually arise: what should be done if the resource is deleted, damaged, or closed by its provider? All of the technologies used in the above file systems/protocols bring innovative mechanisms to ensure data distribution: AFS callbacks, NFS idempotence and simple retries on crashes, GFS scalability and low-cost hardware design, HTTP simplicity and longevity, but they all rely on the same control center. SummarizeIn today's practical applications, most technologies are based on the classic client-server model. This model has been around since the birth of the Internet and still basically meets customer needs. Developers and engineers need to focus on optimizing applications to minimize computing time and response speed, and comprehensively improve our current Internet system. IPFS attempts to solve the problems of the Internet by changing the entire perspective of data distribution, storage, and management while maintaining an open interface to other protocols that may be used. Although IPFS still has a lot of room for improvement, it is still unknown whether it can become the next generation of Internet protocol. |
>>: UK government to phase out 2G and 3G mobile networks by 2033
What is Fiber Optic? This is a form of internet t...
At present, the best domestic access lines for ov...
At the beginning, I actually don’t recommend anyo...
OneTechCloud is a Chinese hosting company founded...
Enterprises need to monitor and measure the integ...
spinservers has sent the latest 2023 US Labor Day...
20 billion daily traffic, Ctrip gateway architect...
As high-speed cellular networks become mainstream...
1. Composition of the IoT industry chain The Inte...
Under the wave of mobility and digitalization, th...
[Original article from 51CTO.com] After experienc...
Fiber optics can transmit data faster and over lo...
Lancashire Teaching Hospitals NHS Foundation Trus...
In the current network, if the same RRM (Radio Re...
At the end of last month, OULUCLOUD launched a ne...