Decentralized Syndication – The Missing Internet Protocol

96 comments

·January 10, 2025

glenstein

While everyone is waiting for Atproto to proto, ActivityPub is already here. This is giving me "Sumerians look on in confusion as god creates world" vibes.

https://theonion.com/sumerians-look-on-in-confusion-as-god-c...

echelon

These are still too centralized. The protocol should look more like BitTorrent.

- You don't need domain names for identity. Signatures are enough. An optional extension could contain emails and social handles in the payload if desired.

- You don't need terabytes of storage. All content can be ephemeral. Nodes can have different retention policies, and third party archival services and client-side behavior can provide durable storage, bookmarking/favoriting, etc.

- The protocols should be P2P-first rather than federated. This prevents centralization and rule by federated cabal. Users can choose their own filtering, clustering, and prioritization.

viraptor

> Nodes can have different retention policies, and third party archival services and client-side behavior can provide durable storage, bookmarking/favoriting, etc.

That's completely achievable in AP. Most current servers use reasonable retention, extended for boosted posts.

MichaelZuo

Then it is a bit strange why it wasn’t designed to be ‘BitTorrent-like’ from the beginning as the parent suggests.

immibis

There's no known way to make this work well yet, but feel free to invent that. Until that happens, federated is mostly the best we have, because most people don't want to be responsible for their own servers.

P.S. ActivityPub is a euphemism for Mastodon's protocol, which isn't just ActivityPub.

RobotToaster

Isn't this ipfs?

FireInsight

Isn't this Nostr?

thwarted

Isn't this nntp?

remram

I would love to have an RSS interface where I can republish articles to a number of my own feeds (selectively or automatically). Then I can follow some my friends' republished feeds.

I feel like the "one feed" approach of most social platform is not here to benefit users but to encourage doom-scrolling with FOMO. It would be a lot harder for them to get so much of users' time and tolerance for ads if it were actually organized. But it seems to me that there might not be that much work needed to turn an RSS reader into a very productive social platform for sharing news and articles.

James_K

This interface already exists. It's called RSS. Simply make feed titled "reposts" and add entries linking to other websites. I already have such a thing on my own website with the precise hope that others will copy it.

remram

At some level yes, but I would like to be able to de-duplicate if multiple people/feeds repost the same article, and it would need a lot more on the discovery side (so I can find friends-of-friends, more feeds from same friend I follow, etc). Like a web-of-trust type of construct which I see as necessary with the accelerating rise of bots on all platforms.

James_K

Deduping can be done on the reader end. As for a web of trust, you can put a friends list on your website.

fabrice_d

That looks close to custom feeds in the ATProto / BlueSky world.

edhelas

XMPP XEP-0060 Pubsub is doing that :)

I wrote a specific XEP for the social part https://xmpp.org/extensions/xep-0472.html

And it's implemented in Movim https://movim.eu/

AndrewDucker

This is pretty-much exactly what I use Pinboard for.

openrisk

Its not obvious to me that what is missing here is another technical protocol rather than more effective 'social protocols'. If you havent noticed, the major issues of today is not the scaling of message passing per-se but the moderation of content and violations of the boundary between public and private. These issues are socially defined and cannot be delegated to (possibly algorithmic) protocols.

In other words what is missing is rules, regulations and incentives that are adapted to the way people use the digital domain and enforce the decentralized exchange of digital information to stay within a consensus "desired" envelope.

Providing capabilities in code and network design is ofcourse a great enabler, but drifting into technosolutionism of the bitcoin type is a dead end. Society is not a static user of technical protocols. If left without matching social protocols any technical protocol will be exploited and fail.

The example of abusive hyperscale social media should be a warning: they emerged as a behavior, they were not specified anywhere in the underlying web design. Facebook is just one website after all. Tim Berners-Lee probably did not anticipate that one endpoint would succesfully fake being the entire universe.

The deeper question is, do we want the shape of digital networks to reflect the observed concentration or real current social and economic networks or do we want to use the leverage of this new techology to shape things in a different (hopefully better) direction?

The mess we are in today is not so much failure of technology as it is digital illiteracy, from the casual user all the way to the most influential legal and political roles.

miohtama

> The deeper question is, do we want the shape of digital networks to reflect the observed concentration or real current social and economic networks or do we want to use the leverage of this new techology to shape things in a different (hopefully better) direction?

Here is a book on the topic - Compliance Industrial Complex;

https://www.amazon.com/Compliance-Industrial-Complex-Operati...

It's about anti-policies (anti hate, anti money laundering, etc.), securitization of governance (private companies create and enforce what should be law) and pre-crime, using technology to do this instead of addressing underlying social problems.

pessimizer

> If you havent noticed, the major issues of today is not the scaling of message passing per-se but the moderation of content and violations of the boundary between public and private.

Are those the major issues of today? Those are the major issues for censors, not for communicators.

pluto_modadic

yes, moderation is an issue that doesn't scale. therefore, many technologists ignore it in favor of "oh, fancy serverless architecture". priority should be on building moderation and tools like reply controls (e.g. only mutuals), shared inboxes (for friends to assist cleaning out hate mail), mod appeals and the like. It's a thorny issue that involves /listening/ to community organizers, who go through pains with poorly written software to try to keep a community civil.

openrisk

Are spammers and scammers "communicators"? How about organized misinformation campaigns? In what kind of deeply sick ideological la-la-land is any kind of control of information flow "censorship".

nunobrito

NOSTR has solved most of these topics in a simple way. Anyone can generate a private/public key without emails or password, and anyone can send messages that you can verify as truly belonging to the person with that signature.

They have hundreds of servers running today by volunteers, there is little cost of entry since even cellphones can be used as servers (nodes) to keep you private notes or keep the notes from people you follow.

There is now a file sharing service called "Blossom" which is decentralized in the same simple manner. I don't think I've seen there a way to specify custom domains, people can only use the public key for the moment to host simple web pages without a server behind.

Many of the topics in your page are matching with has been implemented there, it might be a good match for you to improve it further.

brisky

Can NOSTR handle 100 million daily active users?

nunobrito

Your question rephrased: "Can EMAIL handle 100 million daily users?".

The answer is yes.

NOSTR is similar to emails. They depend on nostr/email providers and aren't depending on any single of them, what exists is a common agreement (protocol). The overwhelming majority of those providers are free and you can also run your own from the cellphone.

Some providers might become commercial like gmail, still many others will still provide access for free. Email is doing just fine nowadays, NOSTR will do fine as well.

Groxx

This is all necessarily true of any "protocol". It is absolutely not true that every protocol scales efficiently to 100 million active users all interacting though, so it is basically a meaningless claim.

E.g. ActivityPub has exactly the same claims, and it's currently handling several million, essentially all interactable. Some parts are working fine, and some parts are DDoSing every link shared on any normally-connected instance.

wmf

1. Domain names: good.

2. Proof of work time IDs as timestamps: This doesn't work. It's trivial to backdate posts just by picking an earlier ID. (I don't care about this topic personally but people are concerned about backdating not forward-dating.)

N. Decentralized instances should be able to host partial data: This is where I got lost. If everybody is hosting their own data, why is anything else needed?

evbogue

If the data is a signed hash, why does it need the domain name requirement? One can host self-authenticating content in many places.

And one can host many signing keys at a single domain.

catlifeonmars

In the article, the main motivation for requiring a domain name, is to raise the barrier to entry above “free” to mitigate spamming/abuse.

uzyn

A 1-time fixed cost will not deter spam, it only encourages more spamming to lower the averaged per-spam cost. Email spamming requires some system set up, that's a 1-time fixed cost above $10/year but it does not stop spam.

wmf

One person per domain is essentially proof of $10.

hinkley

There was a psychological study that decided that community moderation tends to be self healing if, and only if, punishing others for a perceived infraction comes at a cost to the punisher.

I believe I have the timeline right that this study happened not too long before StackOverflow got the idea that getting upvoted gives you ten points and downvoting someone costs you two. As long as you’re saying something useful occasionally instead of disagreeing with everyone else, your karma continues to rise.

macawfish

Domain names are fine but they shouldn't be forced onto anyone. Nothing about DID or any other flexible and open decentralized naming/identity protocol will prevent anyone from using domain names if they want to.

hinkley

Time services can help with these sorts of things. They aren’t notarizing the message. You don’t trust the service to validate who wrote it or who sent it, you just trust that it saw these bytes at this time.

catlifeonmars

Something that maintains a mapping between a signature+domain and the earliest seen timestamp for that combination? I think at that point the time service becomes a viable aggregated index for readers who use to look for updates. I think this also solves the problem for lowering the cost of participation… since the index would only store a small amount of data per-post, and since indexes can be composed by the reader, it could scale cost effectively.

hinkley

I’ve only briefly worked with these but got a rundown from someone more broadly experienced with them. Essentially you treat trust as a checklist. I accept this message (and any subsequent transactions implied by its existence) if it comes from the right person, was emitted during the right time frame (whether I saw it during a separate time frame), and <insert other criteria here>. If I miss the message due to transmission errors or partitioning, I can still honor it later even though it now changes the consequences of some later message I can now determine to have arrived out of order.

arccy

that's too much tech for a trust problem it can't solve. just use a TimeStamp Authority like https://freetsa.org/index_en.php or https://knowledge.digicert.com/general-information/rfc3161-c...

brisky

Hi, author here. Regarding backdating it is a valid concern. I did not mention in the article, but in my proposed architecture users could post links of others (consider that a retweet). For links that have reposts there could exist additional security checks implemented to check validity of post time.

Regarding hosting partial data: there should be an option to host just recent data for the past month or other time frames and not full DB of URLs. This would make decentralization better as each instance could have less storage requirements, but total information would be present on the network.

imglorp

Recent events also taught us that proof of work is a serious problem for the biosphere when serious money is involved and everybody scales up. Instead, it seems proof of stake is more what is required.

wmf

Yeah, a verifiable delay function is probably better for timestamping.

hkt

https://en.wikipedia.org/wiki/Syndie was a decent attempt at this which is, I gather, still somewhat alive.

defanor

AIUI, the "Decentralized" added to RSS here stands for:

- Propagation (via asynchronous notifications). Making it more like NNTP. Though perhaps that is not very different functionally from feed (RSS and Atom) aggregators: those just rely on pulling more than on pushing.

- A domain name per user. This can be problematic: you have to be a relatively tech-savvy person with a stable income and living in an accommodating enough country (no disconnection of financial systems, blocking of registrar websites, etc) to reliably maintain a personal domain name.

- Mandatory signatures. I would prefer OpenPGP over a fixed algorithm though: otherwise it lacks cryptographic agility, and reinvents parts of it (including key distribution). And perhaps to make that optional.

- Bitcoin blockchain.

I do not quite see how those help with decentralization, though propagation may help with discovery, which indeed tends to be problematic in decentralized and distributed systems. But that can be achieved with NNTP or aggregators. While the rest seems to hurt the "Simple" part of RSS.

James_K

A number of countries actually offer free domain names to citizens. I agree with the rest though. I don't see what this adds to RSS, which already has most of these things given its served over HTTPS in most cases.

pluto_modadic

cryptographic agility is a recipe for JWT shooting you in the foot. Age or Minisign strike good balances by making the cryptography decision for you.

convolvatron

alot of the use cases for this would have been covered by protocol designs suggested by Floyd, Jacobson and Zhang in https://www.icir.org/floyd/papers/adapt-web.pdf

but it came right at a time when the industry had kind of just stopped listening to that whole group, and it was built on multicast, which was a dying horse.

but if we had that facility as a widely implemented open standard, things would be much different and arguably much better today.

rapnie

> built on multicast, which was a dying horse.

There's a fascinating research project Librecast [0], funded by the EU via NLnet, that may boost multicast right into modern tech stacks again.

[0] https://www.librecast.net/about.html

nunobrito

What is that used for? Was looking at the documentation but I'm still without understanding the use case they are trying to solve.

Isn't multicasting something already available with UDP or Point-to-Point connections without a single network envolved?

convolvatron

by 'multicast' here one really means a facility that's provided by layer 3. So yes, we can build our own multicast overlays. But a generic facility had two big benefits. One is that the spanning distribution tree can be built with a knowledge of the actual topology, and copies can be made in the backbone where they belong (copies in the overlay often mean that the data can traverse the same link more than once).

The other big one is access. If we call agree on multicast semantics and addressing, and its built into everyone operating system, then we can all use that as a equal access facility to effectively publish to everyone, not just people who happen to be part of this particular club and are running this particular flavor of multicast.

teddyh

Is he reinventing USENET netnews?

bb88

Yes and no. I think the issue primarily is that I could never just generate a new newsgroup back when usenet was popular and get it to syndicate with other servers.

The other issue is who's going to host it? I need a port somehow (CGNAT be damned!).

hinkley

Spam started on Usenet. As did Internet censorship. You can’t just reinvent Usenet. Or we could all just use Usenet.

stackghost

>Or we could all just use Usenet.

Usenet doesn't scale. The Eternal September taught us that.

To being Usenet back into the mainstream would require a major protocol upgrade, to say nothing of the seismic social shift.

hinkley

That’s also my feeling. There’s a space for something that has some of the same goals as Usenet while also learning from the past.

I don’t think it’s a fruitful or useful comment to say something is “like Usenet” as a dismissal. So what if it is? It was useful as hell when it wasn’t terrible.

fiatjaf

Nostr is kind of what you're looking for.

doomroot

My thought as well.

ps When is your SC podcast coming back?

cyberax

That is a really great list of requirements.

One area that is overlooked is commercialization. I believe, that the decentralized protocol needs to support some kind of paid subscription and/or micropayments.

WebMonetization ( https://webmonetization.org/docs/ ) is a good start, but they're not tackling the actual payment infrastructure setup.

jasode

The blog mentions the "discovery problem" 7 times but this project's particular technology architecture for syndication doesn't seem to actually address that.

The project's main differentiating factor seems to be not propagating the actual content to the nodes but instead save disk space by only distributing hashes of content.

However, having a "p2p" decentralized network of hashes doesn't solve the "discovery" problem. The blog lists the following bullet points of metadata but that's not enough to facilitate "content discovery":

>However it could be possible to build a scalable and fast decentralized infrastructure if instances only kept references to hosted content.

>Let’s define what could be the absolute minimum structure of decentralized content unit:

>- Reference to your content — a URL

>- User ID — A way to identify who posted the content (domain name)

>- Signature — A way to verify that the user is the actual owner

>- Content hash — A way to identify if content was changed after publishing

>- Post time — A way to know when the post was submitted to the platform

>It is not unreasonable to expect that all this information could fit into roughly 100 bytes.

Those minimal 5 fields of metadata (url+userid+sig+hash+time) are not enough to facilitate content discovery.

Content discovery of reducing the infinite internet down to a manageable subset requires a lot more metadata. That extra metadata requires scanning the actual content instead of the hashes. This extra metadata based on actual content (e.g. Google's "search index", Twitter's tweets & hashtags, etc) -- is one of the factors that acts as unescapable gravity pulling users towards centralization.

To the author, what algorithm did you have in mind for decentralized content discovery?

null

[deleted]

brisky

Thanks for the comment, these concerns are valid. At the core the protocol supports only basic discovery - you can see who is posting right now and history of everyone who has ever posted. Regarding rich context discovery where content could be found by specific tags and key words this would be implemented by reader platforms that crawl the index

HN

Decentralized Syndication – The Missing Internet Protocol

Decentralized Syndication – The Missing Internet Protocol