We talk to W3C board vice-chair Robin Berjon about the InterPlanetary File System

The decentralized web is alive and well despite Web3 financial scheming

Interview The InterPlanetary File System (IPFS) debuted nine years ago with the hope of changing the way people interact with content online. It remains an ongoing project.

IPFS is part of what's known as the distributed web, a set of decentralized technologies sometimes referred to as Web3 until the NFT-cryptocoin clown car crash demanded a change of jargon.

But IPFS is just a technology, not a predatory financial gambit. It is a set of peer-to-peer protocols for finding content on a decentralized network. It relies on a Content Identifier (CID) rather than a location (URL), which means the focus is on the identity of the content (a hash) rather than a server where it's stored.

IPFS focuses on representing and addressing data, routing it, and transferring it. It's not a storage service, though storage is necessary to use it. It's been adopted by Cloudflare and implemented in Brave and Opera, among others, and work is being done to make it work in Chromium.

IPFS traffic is public unless encrypted, which is why there are rival decentralized projects that strive for stronger built-in privacy protection like Veilid.

Notionally part of a decentralized network, IPFS is nonetheless rather centralized. A 2023 research paper from academics in France, Germany, and the UK observed "that almost 80 percent of the IPFS DHT servers are hosted in the cloud with the top three cloud providers hosting 51.9 percent of the servers." It's further claimed that AWS provides 96 percent of all the content resolution requests.

Still, the idea remains an intriguing one. So The Register spoke with Robin Berjon, IPFS ecosystem lead at Protocol Labs and board vice-chair of the World Wide Web Consortium (W3C) to learn more about how IPFS is doing.

In numeric terms, that's hard to say. Brian Bondy, Brave CTO and co-founder, told The Register in an email that since Brave introduced native IPFS integration in 2021, the browser maker has seen local nodes grow by a factor of five.

"Our verified content creators have also grown from 1 million to 1.8 million since we integrated IPFS, and we want to be able to provide them with the option to share their content in a decentralized manner, without high bandwidth costs," said Bondy.

When it comes to Protocol Labs, Berjon said his firm's network probes show 300,000 to 400,000 nodes – IPFS instances running on a computer – which he characterized as pretty solid. But he said he didn't have insight into the total amount of traffic being handled. He did however have quite a bit to say in more general terms.

The Register: How do you separate IPFS from the cryptocoin ecosystem, which has taken something of a reputational hit in recent years?

Berjon: I think it's unfortunate that reputationally there's some intermixing. But I do get a sense that there's a greater nuance developing in how people are perceiving these different things.

So, it used to be that if you were anywhere near blockchain, people would go like, "Oh, terrible, go away." Now people are starting to realize that what has been lumped [together], even just on the blockchain end of things, contains all sorts of different communities. Sure, you have like shitcoins and all that stuff, but you also have people experimenting with [decentralized autonomous organizations] and people trying to really build new government systems and new ways of sharing and new ways of doing coops.

And so I think people are starting to perceive that there's a richness of modifications there. And also that there's a lot of technical bits and pieces that can be reused and applied independently from what you might perceive as the Bitcoin ideology or however that wants to be called.

The Register: How is IPFS being used right now? And how would someone start using it?

Berjon: As with every interesting technology, the answer is it depends. Some browsers are starting to support IPFS natively. That is the case with Brave for instance, which has had native IPFS support for a while. There's also work integrating IPFS into Chromium, which is the engine underlying Chrome amongst others.

There's also IPFS support in cURL, which is the command line system, and there are extensions that will support IPFS in the browser. On top of that, there's a series of IPFS gateways, [which are] systems that are purely HTTP in how they expose IPFS, [meaning] you can take any vanilla off-the-shelf browser, and point it at one of these.

In terms of usage, I tend to think of IPFS not necessarily as a single technology. It's more like a family of related technologies. There's content identifiers, there's multi-formats, there's IPLD, which is a data format that underlies it. And these tend to be used in different places for different use cases.

The idea really is this is all plumbing. These are all relatively low-level, relatively infrastructural pieces. The goal is for people to say, 'I want to do this thing' or whatever, and for IPFS or some part of the IPFS ecosystem to be the right tool for the job.

We're starting to see a lot of usage here and there with these bits and pieces. For instance, Bluesky is pretty popular these days. And in the AT Protocol they use quite a few of these pieces. They use CIDs (content identifiers), they use IPLD, and I think that's something that we're leaning into quite a lot because that is popular with developers.

The Register: How is IPFS in terms of latency these days? Traditionally peer-to-peer systems haven't been all that speedy.

Berjon: If you're using relatively naive approaches, it can be slow. But there's also another way of answering that: It depends on how the client is implemented. If a client just does block by block fetching and only approaches the network in a very basic manner, it is likely to be slow.

As with all other peer-to-peer technologies, you have to contact a whole bunch of things on the network and find your way to the content. However, there's been work on making clients fast by parallelizing these requests.

Also, there's been very interesting work from a company called Number Zero on their IPFS implementation, which is called Iroh. They've rewired and they've rethought how the network layer works in such a way that makes it extremely fast.

The Register: The internet seems to be at an inflection point. AI hype and regulatory intervention have encouraged efforts to challenge Google's dominance and its ad-based business model. How do you foresee IFPS fitting into, or shaping, networked communication in the near term?

Berjon: If I had a surefire answer for that, I'd be working on getting very rich but I can at least share how I think it could work. [IPFS] is all about the content. That's why it's content addressable retrieval, versus what we have today, which is authority-centric or location-centric. You're getting something and you have to get all of it from that same authority.

With content-centric addressing, you can get bits and pieces from different sources. That shifts the power from this unique server that becomes a throttling point – someplace where you can control everything – to something that can be much more client-oriented.

If you think of how search or social or all those ad supported models work today, the idea really is that you have to go to this central authority and get not just one service but actually a bundle of services. You're getting a list of social media posts, you're getting a list of search engine results, but bundled with that, you're also getting the relevance decided for you – they have been rated and they have been ranked. They have been ordered for you. And you're also getting the advertising because it basically has to all come as a single bundle in order to work.

If you shift that power to the client, which a content-centric approach enables, you can start having much more interesting models. For instance, the browser or whatever is working for you as your agent client-side, makes the decisions in terms of how things get composed.

So as a user, you could say, "Sure, I'm fine with ads, so long as they're ads that fit certain privacy parameters or certain preferences that I have and I'm comfortable with that, you know, making more or less money, depending on things I've decided."

But basically my browser is picking the ads for me, right? My browser knows me. It's going to respect my privacy, it's going to select the ads. So you have the browser selecting the ads and getting the money for those ads. And then the browser can, when I'm using, say, a social media service or search service, pay for that using the ad money I'm getting. And that means I can pay for the content that I'm getting from this service. And I can pay maybe for the ranking or the recommendations algorithm from this other service, and you can start to see things blend together in a way that gives us a lot more user choice user agency and is nevertheless as powerful as we have today, but with a lot more choice.

And so I think any economic system that shifts power towards users is one that matches IPFS principles and the IPFS architecture really well. And I'm working on bringing about the world in which that's the case.

The Register: At this particular moment in time, when misinformation is rampant, it seems like authority of some sort has more value than ever, to filter harmful content. Can you have the safety mechanism of authority in a content-centric system without downsides of gatekeeping?

Berjon: I think that's a very good question and I would say that's the right way of thinking about it. But essentially, if you think of how – I'm sorry, I can't help myself but I'm gonna go to a tiny bit of philosophy – but if you think of how we conceive of truth, we tend to have this very individualistic approach to truth.

We tend to think that we basically decide things for ourselves, and we know that for ourselves, and we've made that decision for ourselves. But most of the time, that's not true, right?

Overwhelmingly what we know to be true is stuff that we know to be true because we trust a very dense network of institutions, formal or informal, to have produced that knowledge.

For instance, if you trust one source of news more than others – say you're a Washington Post reader, and that's what you like – you also implicitly trust that if they come up with something that's wrong, they will be caught by the others who are more peripheral to how you think about this.

So this dense network of institutional arrangements that supports trusting truth is actually undermined by an excessive authority system because too much of that trust, in load bearing terms, relies on that single authority.

And in fact, if you start shifting things towards a more content-centric approach, then you can actually get multiple sources of authority, collaborating on the verification of that source of truth.

Starling Labs did this really interesting partner project [with] Reuters which was called the 78 Days, where they basically documented the Trump to Biden transition process photographically. And the entire system is built around this idea that you can have verified pictures, and that you want this chain of authenticity from the capture of the picture all the way to when it's presented on your screen.

And if you look at the companies that were involved, you can see it was extremely collaborative. For the capture, they worked with, like Canon, they worked with HTC to get access to encryption on the camera that provided some verification of authenticity of that capture. And then they worked with a bunch of other projects, like the Guardian Project, to sort out what truth would be. And then they distributed those images and the hashes of those images, all the IPFS-related stuff, working with IBM, with Filecoin and IPFS as well, they worked on Hyperledger, using sometimes chain storage, sometimes distributed storage, all kinds of systems that all work with this kind of verified hashed data.

They also worked with Adobe and Content Authenticity Initiative, to have all this metadata embedded in the objects and tracked.

This is to say that with something that's content-centric, all these companies could come together in a very loose coordination. … It's because it was content-centric and verifiable, they could get all people to collaborate, using standard mathematically provable systems. And this creates a much more powerful, spread-out authority behind the content that they produce. ®

More about

TIP US OFF

Send us news


Other stories you might like