Decentralized storage systems share storage responsibilities among many independent operators forming a single storage network.
Introduction to Decentralized Storage
Modern computing is highly centralized. Over the last decade, a few massive cloud companies have made enormous fortunes carving up traditional computer systems into compartmentalized, cloud-based offerings. The modern web reflects that centralization — when one of these providers has an outage, it’s a major internet event. (If you don’t believe us, we’d like to remind you of the Amazon Web Storage outage in 2017, Github’s extended interruption in June 2020, or Microsoft’s multiple week cloud service problems in October 2020.)
The content we host on these services is no better, stashed behind brittle links that break all too often. This has profound implications for the computer systems we build, and for the societies that increasingly rely upon them.
Centralized architectures have been successful in part because they are easier to build.
In order to push back against consolidation, developers need foundational new building blocks that are just as easy to compose. Decentralized storage is one such cornerstone, serving as a precondition for a more distributed web.
Fundamental Characteristics of Decentralized Storage
There are many different ways to design a decentralized storage system. In general, they share a common emphasis on resiliency and efficiency.
The modern internet is frighteningly fragile. Web content today sits behind URLs, each of which belongs to a single definitive server at any given moment in time. If that provider drops off the network for any reason, the content it pointed to becomes inaccessible. Centralization increases this effect, creating single points of failure and providing convenient opportunities for censorship.
As a result, in today’s internet, link rot (which is just what it sounds like, when a link is broken or permanently unavailable) is pervasive, state-level censorship is straightforward and distributed denial-of-service attacks can disrupt access to almost any file.
In an ideal decentralized system, the loss of an operator should not prevent access to the content previously stored and served. By spreading responsibilities across many nodes of a network, decentralized systems also have a natural resistance to censorship and other attempts at denial-of-service because there is no centralized target against which attackers can mass resources.
A go-to example of how centralized storage systems can be susceptible to censorship is what happened when Catalonia (one of Spain’s 17 autonomous communities) held an independence referendum. The Spanish government — who opposed the independence plans — blocked websites with voting information at the ISP level. By severing these critical links, the government effectively prevented many individuals from accessing this information.
However, many of these websites were also mirrored using the Interplanetary File System (IPFS), a peer-to-peer storage network. Anybody running an IPFS node could download the censored information from other nodes on the network and begin to share it themselves. The decentralized nature of IPFS countered the Spanish government’s attempts to block access to these documents — as soon as one node was blocked, another could easily take its place. In general, decentralized storage systems make network-level blocking much harder.
All computing system architectures have some strengths and some weaknesses, and no single solution fits all possible use cases. Unfortunately, the modern web’s emphasis on centralization is no different.
Today, a few centralized data centers in a small number of cities around the world store most content. If two users on the same network want to send messages with each other, for example, those messages will usually go to one of those data centers first. If one hundred users are in a room watching the same video on their devices, they’ll each hit a central server and download one hundred copies in parallel, as opposed to downloading a single copy and sharing it over the local network.
In simplest terms, decentralized storage makes it easier to share files without sending requests bouncing all over the internet to a certain few data centers. Instead, nodes establish connections with one another using as few middlemen as possible. Connecting to nodes in other countries, for example, will still require several hops, but nodes on the same network can share files directly. The end goal for decentralized storage systems would be to have so many nodes that everyone can find relatively local peers for the information they are looking for.
Decentralized storage solutions can introduce fundamental new efficiencies into such activities. By bypassing sparse data centers, a distributed system can place nodes far closer to end-consumers than even modern content delivery networks, resulting in significantly faster file retrieval. Peer-to-peer file sharing over local networks can also save precious bandwidth, particularly in areas with limited access to the broader internet.
Desirable Characteristics of Decentralized Storage
While resiliency and efficiency are hallmarks of decentralized storage, there are a number of additional characteristics that an ideal storage system might offer:
An ideal distributed system should be accessible. Participation in the network should be easy, allowing as many nodes as possible to store and distribute files on behalf of the network.
If you’re reading this and wondering — can I be a node? The answer is: it depends.
With Filecoin, any relatively tech-savvy individual should be able to run a client node to interact with the network. As for running storage miner nodes (see below for more information), it’s not something that everyone and their mom can do — you need to have hardware that meets certain specifications.
In the case of IPFS, nodes have lower hardware requirements, which means it is possible for many more users to contribute to the network by running a node (perhaps by running a web browser that came with one built in).
Cloud service providers have made cheap and reliable storage easier than ever to work with. One major aspect of their success is the ability to provision and manage storage through code via APIs. Any competing system should be able to offer the same level of convenience.
As discussed, URLs embody some inherent design tradeoffs. They describe the location of data, rather than its content.
To explain how centralized systems can make it hard to find a piece of data — imagine that you want to download a picture of a fluffy kitty. Consider these two URLs:
Each of these URLs references a file called cat.jpeg, but there’s no guarantee that these two files are the same. If example1.com goes offline, you can’t be sure that example2.com has what you’re looking for — its cat.jpeg could be entirely different. In fact, it could even be a picture of a dog! There’s no inherent relationship between a URL and content it references.
As a result, there’s no way for you to ask the internet of today, “Does anyone out there have this file?” because you don’t know anything about the file other than its location.
When you share files using a URL, things can go wrong. The server could start serving a different file from that URL, or someone could perform a (surprisingly not that rare) man-in-the-middle attack and alter the file. It’s very difficult to verify that everyone accessing the URL receives the file they wanted.
Content addressing, by contrast, finds files based on content identifiers (CIDs), which serve as files' digital fingerprints. Addressing files in this manner solves many issues with location addressing. When a client wants a file, instead of asking one server for a URL, they ask nodes in the network for a file with a particular CID. Once the client downloads the file, they fingerprint it themselves.
To revisit our previous example, it would be as if all websites had a shared understanding of what file to deliver when asked for cat.jpeg. So while it’s not a guarantee that any node has that particular cat.jpeg, the nodes will run a check for that file’s fingerprint to try to find a match.
While a step like fingerprinting is something that would require more technical savvy than the average person would want to deal with, Filecoin and IPFS clients can easily automate this process. This lets the client guarantee that they received the file they asked for — in this system, it’s trivial to find alternate providers of a piece of data.
The main takeaway: CIDs mean that you can find content that would otherwise be missing in a centralized system, and CIDs can also prevent man-in-the-middle attacks or a server suddenly changing a file at a particular URL.
A trustless system enables cooperation between two parties without them having to know one another or look to a third party. Rather, the incentives of the system push actors towards the behavior necessary for the network to function.
An ideal storage system should make it easy to continuously prove that nodes are storing the exact data they have promised. This type of auditability is key in achieving trustlessness. If you can always establish that data is being stored correctly, you have less need to trust the party providing the storage.
Finally, an ideal distributed storage system is open: its code is open-source and auditable. Furthermore, the storage system should not be monolithic. Instead, it should expose an open protocol that anybody can implement and build upon, rather than encouraging lock-in.
Case Study: How Filecoin Embodies These Characteristics
The Filecoin project is a decentralized storage system designed to satisfy these properties. First described in 2014, the Filecoin protocol was originally developed as an incentive layer for the Interplanetary File System (IPFS), a peer-to-peer storage network. Like IPFS, Filecoin is an open protocol, and it builds on the properties of its older sibling, leveraging the same underlying peer-to-peer and content-addressing functionality.
A network of Filecoin nodes gives rise to a decentralized storage marketplace for the retrieval and storage of files. The network is backed by a novel blockchain that records commitments made by the network’s participants. Users make transactions on the network using the blockchain’s native cryptocurrency, FIL (⨎).
In the retrieval market, nodes known as retrieval miners compete to serve files to clients as quickly as possible. Retrieval miners earn rewards through small FIL fees. This gives nodes in key locations for content delivery an incentive to join the network, and promotes the rapid distribution of files. It also encourages a robust network that replicates and preserves files that are in high demand.
In Filecoin’s storage market, nodes called storage miners are empowered to compete on various characteristics, such as price and location, for contracts to provide custody of files for clients for a specified length of time. Before accepting a contract, storage miners have to front collateral FIL; this is used to automatically reimburse a client in the event that a storage miner fails to meet their obligations to the client.
When a storage miner and their client reach a deal, the client transfers their data to the storage miner. The storage miner adds their data to a sector, the fundamental unit of storage in Filecoin. The miner then performs a computationally-intensive operation known as sealing to create a unique copy of that sector’s data.
If a client wants to store multiple unique copies of their data, the sealing process ensures that each copy will have a unique fingerprint, and the computational effort needed to derive it will prevent a node from cheating by regenerating it from the base data. The sealed data is ultimately used to publish a proof-of-replication to the Filecoin blockchain.
For the duration of the storage deal, the storage miner is periodically required to submit what is called a proof-of-spacetime to the blockchain. The miner derives these proofs using randomness (provided by the blockchain itself), the sealed sector and the proof-of-replication published to the blockchain. The proofs provide a client with a strong probabilistic argument that the storage miner possessed a complete, unique copy of the data. This is a very strong guarantee — something even modern cloud storage providers don’t provide their clients.
Clients reward Filecoin storage miners with FIL paid as deal fees. Storage miners are also rewarded with the opportunity to mine blocks for the blockchain, which entails both a FIL reward, and the ability to collect transaction fees from others who wish to include a message in mined blocks.
Filecoin's proof system means that miners need some additional hardware, but requirements are still low enough for tech-savvy individuals to join. The hardware requirements for participating in the network as a client are modest. Filecoin nodes also expose an API for programmatic interaction with the network, allowing third-party services to build on top of the core network functionality.
Decentralized storage offers a compelling alternative to its traditional, centralized counterpart. It gives developers the chance to explore entire new regions of the design tradeoff space, emphasizing the robustness and efficiency of content storage and delivery. Filecoin shows that these systems are capable of providing a competitive storage product with several highly desirable properties, affording more people than ever the opportunity to serve as custodians of our digital heritage, while making the web more resilient and accessible to people all over the world.