Skip to content(if available)orjump to list(if available)

Postel's Law and the Three Ring Circus

zamfi

As a counterpoint, Postel's Law as implemented in other domains has been spectacularly successful.

One classic example is in transistor networks: each node in a network (think interconnected logic gates, but at the analog level) accepts a wider range of voltages as "high" and "low (i.e., 1- and 0-valued) than they are specified to output. In 5V logic, for example, transistors might output 5V and 0V to within 5%, but accept anything above 1.2V as "high" and below that as "low". (Sometimes called the "static discipline" and used as an example of the "robustness principle"—the other name for Postel's Law.)

This is critical in these networks, but not because transistor manufacturers don't read or fully implement the spec: it's because there is invariably unavoidable noise introduced into the system, and one way to handle that is for every node to "clean up" its input to the degree that it can.

It's one thing to rely on this type of clean-up to make your systems work in the face of external noise. But when you start rearchitecting your systems to operate close to this boundary—that is, you're no longer trying to meet spec, because you know some other node will clean up your mess for you—you're cooked. Because the invariable noise will now push you outside the range of what your spec's liberal input regime can tolerate, and you'll get errors.

The problem isn't Postel's law. It's adverse selection / moral hazard / whatever you want to call the incentive to exploit a system's tolerance for error to improve short-term outcomes for the exploiter but at long-term cost to system stability.

kurthr

Complex systems are by definition high dimensional. We often build them with fault tolerance and soft "failure modes" to prevent catastrophic events. However, the dimensionality and complexity mean that almost every sufficiently complex system is ALWAYS running in a "degraded" mode. However once this is normalized, the failure when it occurs is usually catastrophic, and determining proximate cause (often work grouped as "root cause"), much less fixing it, is made even more difficult.

The rapid exponential growth in complexity seen in semi over many decades has created guardrails (eg horrific yield/field failures) in modeling and verification that prevent a lot of problems. I do worry that as Moore slows (multi-chip modules are not Dennard scaling) we will lose some of this anti-fragility.

Of course the other side of this is Muntzing (the removal of any part that doesn't cause immediate failure):

https://en.wikipedia.org/wiki/Muntzing

phkahler

I think there's a related phenomenon that we see all the time in automotive circles. A vehicle manufacturer (OEM) needs a new thing, so a supplier develops it in cooperation with them. Great! The requirements were vague and all that, but you arrive at a solution that works. The supplier documents the performance in a product specification stating what it can do and under what conditions. Great! Next the OEM asks for quotes on next gen, and also goes to other suppliers to see if they can provide at lower cost. The OEM uses your product spec as a requirements document and this is where it all goes to shit (a product specification is not a requirements document). They no longer know what they need, but the know what they have and they want "same or better" next time. Over time this can lead to absurd specs and increased complexity to meet them. Sometimes a competitor comes in with a cheaper version and the OEM gets excited based on price and all the actual requirements might get revisited because the cheaper part is out of spec (to save cost).

This seems similar to the Postel's Law thing. If you produce to spec but accept things out of spec, the ecosystem will unofficially change the spec to what's actually out there. Then people will re-apply Postel's Law with a wider spec on what you produce and even wider on what you accept. Eventually you end up with a complex mess, but in this case it's a widening spec where in the automotive example it's usually a narrowing spec.

ChrisMarshallNY

> I encourage maintainers to reject user feedback that is much more properly directed to some commercial software vendor who has not implemented the specification correctly.

Words to live by.

cratermoon

If only we'd had that attitude in the 90s when html parsers proliferated. Instead we got tag soup and well, whatever this is: https://github.com/WebKit/WebKit/blob/main/Source/WebCore/pa...

ChrisMarshallNY

Ah, good ol' Quirks Mode.

Memories...

kayo_20211030

Perhaps the specification is irrelevant, and it would be better to look at the actors' goals and incentives.

Using the OP's example

As a first approximation, it's reasonable to say that the goal of the closed source software's producer is to make money, and the goal of the open source software's provider is to have people use it.

The closed source software's author has no incentive to continue to follow the specification unless it advances the goal of making money; and the open source software's authors have no incentive to reject deviations from the specification if the result is a drop in usage.

Postel's law still holds even in this asymmetric situation because that's where the incentives are.

andrewstuart

A very wordy post, no really sure the point of it.

What I can say though is that Postel's Law is a pile of rubbish that sounds sage and wise and leads to software problems.

Make software clearly defined in what you output and what you accept. Clear and strongly typed data interfaces help everyone get along with minimal bugs. Being “liberal in what you accept” is a recipe for errors.

Ref:

The Harmful Consequences of the Robustness Principle

https://datatracker.ietf.org/doc/draft-thomson-postel-was-wr...

chrismorgan

> https://datatracker.ietf.org/doc/draft-thomson-postel-was-wr...

The original draft had an excellent name, and expressed a strong position. It was picked up and the consensus was published as RFC 9413 in mid-2023.

prepend

I think this may be a scissor statement for developers [0] as people seem to really be on one side or the other.

I remember years ago I wrote a Java package to sit on top of DOMs and let you query and manipulate in memory. People at my company liked it and so maybe like 20-30 people used it as the in memory store for passing info around this workflow engine. Anyway, I assumed you would always have a DOM or some Nodes and if you didn’t, allowed null pointer exceptions to throw in the right places (what I thought was right).

Some people liked starting with null and populating later and my logic would blow them up. My response was “fix your code, don’t do that.” My package was clearly documented for what to do and when.

But one day another senior dev came to me to plead the case for some mids who were writing all this extra code to prehandle those nulls. I still didn’t change my package because I thought it was most efficient to use it properly than to add in extra logic that sort of broke the interface. I did tell them that I welcomes a pull request to add the functionality.

Looking back, I wish I had been more liberal in what I accepted because it was causing a lot of emotional pain by people sort of lobbying others.

“Minimal bugs” depends on where in the stack you look.

[0] https://slatestarcodex.com/2018/10/30/sort-by-controversial/

ipython

I think there’s a temporal aspect to this. Postel’s law was the right idea at the right time. It allows for wide experimentation in the early growth curve while at the same time exploding the realistic user base (because these implementations could in large part talk to each other)

roenxi

> Ecosystems that adhere to Postel’s Law therefore experience a one way ratchet: consumers must accept more and more deviations from the specifications, and because consumers accept the deviations, producers are never forced (or incentivized) to themselves become stricter in following the specifications

He's not wrong but this isn't exactly true - it is sometimes the case that protocols will incentive conformity even while accepting liberally.

As a basic example, not failing on an unknown tag - say we have a system with emoji and if we sees an emoji tag we don't recognise we print it in text. This is liberal in what it accepts because if someone typos an emoji it is still accepted, but they are nonetheless incentivised to get the spelling right because they were trying to get an emoji to display. The same principle means that most comemnts on HN are spelled correctly or close too.

So the truly inspired protocol will attach subtle costs to non-conforming emitters, but nonetheless tolerate them and allow them to participate in a network.

akoboldfrying

Agree, but I don't think the incentives align as claimed -- with bad (lax) producers being usually commercial, and good (strict) consumers being usually open source. I think anyone who has already made a thing in program X and then finds that program Y won't load that thing will tend to complain about program Y first -- both unconsciously, and also consciously (because if the bug is in X, they might need to recreate their thing, and nobody wants that). So even open source producer programs will tend to be favoured over open source consumers.

The ability to complain at open source consumers is real, but I think this form of pain probably reached its apex in commercial software: backwards compatibility in Windows. Raymond Chen's blog The Old New Thing details the absolutely insane lengths Windows engineers went to to keep buggy software running on new versions of Windows.

bonoboTP

I immediately thought about LibreOffice and MS Office. The compatibility has improved a lot, but things do break. And it is a showstopper for many users who receive documents by other people made in MS Office, or request MS Office files. It don't think the compatibility has to be 100% perfect, but visible breakage is an absolute showstopper for many users. I've recommended using LibreOffice to a small family business, but when existing documents showed up visually misaligned, they had to switch back. In this case it really doesn't matter who is to blame. For normal people who have things to do, the only thing that matters is whether they get the results they need or not. 99% won't complain to either MS or LibreOffice, they will just throw out the open product and just pay the commercial one and move on with their business.

Similarly, maybe your new browser fully and faithfully implements all the web standards down to the letter. But if it doesn't load Instagram, or Reddit or whatever popular site, then users won't care whether the blame lies with the website or the browser. They don't even have the granularity required for that in their mental model to distinguish these things. They see: when using Chrome, things work, when using the alternative browser, things are broken. Conclusion: use Chrome. It's like this in many cases. You can stomp your foot and stand up for rigor or whatever, but it won't change reality that you merely insist that it changes.

robinhouston

I would find this argument more compelling if it were supported by examples. Although it’s easy to imagine this sort of dynamic unfolding, offhand I can’t think of any good examples of it happening with internet protocols in quite this way.

On the other hand, I can immediately think of an example to the contrary: I would at least partly attribute the failure of XHTML to the fact that conforming renderers were forbidden to be liberal in what they accept, and viewers of XHTML pages would often be faced with an error message in place of the page they wanted to see, as a consequence of some minor mistake.

dale_glass

HTML is a great example in my opinion. XHTML failed because it was trying to close the barn door after the horse had bolted a decade ago.

XHTML was a reaction to an existing issue: that HTML is absolutely hellish to parse to the point that in the 90s half the web officially endorsed either IE or Netscape to be used for that particular website. Not to speak of anyone actually trying to do something clever with the contents of web pages.

Had XHTML worked it'd have been a huge quality of life improvement. The problem was that it was too late and that it was introduced into an ecosystem not prepared for it because everyone had long built on the assumption that you can just haphazardly generate chunks of HTML and insert them into the middle of other chunks of HTML.

Also, XHTML's error messages would probably be appreciated more today. Today we appreciate that allowing an HN poster to just drop random tags into a post is a terrible idea, so the fact that if on XHTML an user just leaves an unclosed tag the entire page fails to render, it's a good thing. It's a sign that you're doing handling of user data terribly wrong and it needs fixing post-haste. But back when XHTML showed up security was such an afterthought that barely anyone cared, and they just wanted things to work, dammit.

In the end we solved it first by a bigger push on standardization because people finally got sick of it, and later by Chrome just eating the browser market.

JimDabell

XHTML failed for exactly one reason – Internet Explorer, with overwhelming market share, did not support it. If you served it as text/html then it would parse it, but not apply any of the XHTML parsing rules, it would just treat it as weird HTML.

The history of the web is littered with copious examples of what the article talks about. Here’s one example: HTTP has a Content-Type header that indicates the media type being transferred. Sometimes, web developers would screw up and serve things with the wrong media type. For instance, serving CSS files as text/plain instead of text/css.

Now, if no action were taken by browser vendors, the resolution would be simple: the developer would see that the styling on their site was completely broken, and fix it. But Internet Explorer decided to be “liberal in what it accepted”, and instead of respecting the Content-Type header, they used content sniffing to guess. This meant that when a web developer screwed up, and they only tested on the 90%+ market share Internet Explorer, they would think everything was fine but it would be broken in every browser but Internet Explorer.

After a while, so many sites were broken in Firefox that Firefox had to follow the non-standard behaviour as well. Leaving other minority browsers broken. At this point, the behaviour a browser needs to implement to render a website successfully is not what is written in the specifications, but copying unpredictable guessing from proprietary code. You can see how that’s harmful to interoperability, right? Even though interoperability is the entire purpose of Postel’s Law?

It gets worse though, because this content sniffing was happening all over, not just to CSS. So if a website said that something was a PNG image, if that image’s metadata contained the wrong sequence of bytes, it would get interpreted as an HTML document. This is important because PNGs don’t contain any kind of scripting, while HTML documents do.

So this opened up an attack vector: make a polyglot file that contains malicious JavaScript, and upload it as an image to a website. Then link to that image, and Internet Explorer users will execute the JavaScript in the security context of that website, resulting in a successful XSS attack. Even though the website served it as an image, not HTML, Internet Explorer would “be liberal” and do its best to interpret it as badly-broken HTML.

At some point, Internet Explorer came up with an authoritative: true parameter that opted out of this behaviour. Sometimes. Then they tried again with X-Content-Type-Options: nosniff. Meanwhile, web developers who just wanted to serve user-provided images without opening themselves up to attack vectors needed to follow all these twists and turns instead of just reading the spec. Ever wonder why opening an image URL on so many sites triggers a download? It’s because for many years it was the only safe way to avoid an image being interpreted as HTML.

This has happened over and over and over and over again on the web. Did you know that Netscape Navigator would interpret Unicode characters that merely look like angle brackets as if they were actually angle brackets? Consider the impact on security that one had. Remember the “samy is my hero” MySpace worm? That was down to browsers “helpfully” ignoring newlines in the javascript pseudo-protocol.

Postel’s law has been downright poisonous for security and interoperability. We would all be much better off with strict parsers.

prepend

> Now, if no action were taken by browser vendors, the resolution would be simple: the developer would see that the styling on their site was completely broken, and fix it

This assumes the developer sees the output. I remember having a dozen browser builds in the 90s and still getting error reports from users with stuff I never heard of. And back then we didn’t have VMs so I had to keep a bunch of pcs, Macs, etc to test things out.

I think the reason browsers were so lax/supportive is that there were many browsers, servers, etc.

It would be so easy if you could just dev and test your site. But that didn’t work and in fact was kind of a bad approach as you ended up with really tightly wound to a single browser and “works for me” mentality that made users with different configs not use your site, or for intranet made people hate the author for forcing a config and devving in a dumb way (looking at you person who still makes me run ie11 to submit “telework agreements” at my org, you never should have built this to work with ie11 so many years ago).

It seemed like successful sites had to triangulate what the de facto standards are and code in a way that made testing really tough.

JimDabell

> This assumes the developer sees the output.

That’s why doing the opposite of Postel’s Law is so effective. If your protocol or file format mandates fatal errors instead of being lenient in what you accept, you surface the error immediately during development and stop it from getting into production.

How many HTML or CSS parse errors do you think make it into production?

How many Python, Ruby, Java, Go, or PHP parse errors do you think make it into production?

> It seemed like successful sites had to triangulate what the de facto standards are and code in a way that made testing really tough.

Yes, and this is the consequence of Postel’s Law that the article is arguing against.

floren

As someone who has dealt with a lot of syslog over the years... There are RFCs which specify the overall structure and the correct timestamp format, but many tools (open and closed source alike) follow the following "standard" instead:

"Just print a timestamp in some format, then maybe the hostname if you feel like it, and then your message"

austin-cheney

For the best case example design your code with the goal of failure and properly describing failure states for the users benefit. The success state should be a narrow exception provided the input is formatted exactly as the specification describes.

This greatly lowers the acceptance criteria to test for and instead forces a greater focus on negative tests and error conditions.

mhb

This is an example of the broader explanation of why we can't have nice things. In a high trust society, condiment packets are free with the expectation that customers will be reasonable in their use. Then customers take advantage of this to use the free ketchup to fill a wading pool.

gpvos

I think a fair amount of leniency with input is a good thing. The problem starts when you start to increase your leniency in response to someone else's out-of-spec output.

A nice thing of open source projects is that they are freeer to call out other software that produces out-of-spec output, and I would encourage them to do so... liberally.

shadowgovt

Good essay.

Counter-argument: the Python setuptools outage this week (https://github.com/pypa/setuptools/issues/4919) was more-or-less an exercise in ignoring Postel's Law to the peril of the project. Python (as an ecosystem) had historically allowed hyphens and underscores in the `setup.cfg` file. The setuptools project deprecated this behavior years ago. This week, they tried to act on the deprecation and blew up multiple high-use dependencies.

The best way to avoid this issue would have been to never deprecate it in the first place; treating hyphen and underscore as the same had minimal impact on the codebase and made setuptools more compatible with other Python tooling.

bonoboTP

Great example for minor cosmetic changes that pointlessly break stuff. I see this a lot in Python libraries, seems to be a culture thing. Renaming arguments, moving stuff into different subpackages, just so much churn. Yeah, I get it, it may be slightly more consistent, or feels cleaner, but in many cases it's quite subjective but library authors seem quite prone to just deprecate it, then deprecate the new version again soon after... It's like they don't realize that there's code out there that uses it. This is why code "rots". When I complain about software churn and the waste of effort to keep existing code functional, people point to security vulnerabilities etc., but that should be possible to do without a bunch of pointless renames and refactors.

Case in point: `collections.MutableMapping` was moved to `collections.abc.MutableMapping` in Python 3.10, breaking a bunch of nice libraries that provide useful data structures. And why? Because it looked more tidy and conceptually comforting for some Python devs.

https://stackoverflow.com/questions/70870041/cannot-import-n...

This can be super demotivating. Someone at some point in their lives made a useful tool, but then moved on, maybe they were a grad student at the time, and now they have a family, etc. And we are cut off from their contributions because they can't keep up with slightly adjusting the code to the whims of language and library authors every month or two.

Some other languages are much better about not breaking existing code, e.g. Java.

shadowgovt

This is actually what keeps me out of the Ruby on Rails ecosystem unless I'm forced into it.

I was excited about Rails... Until I put a project down for a year, picked it up again, and discovered it wouldn't build because Rails had switched out its entire package management engine.

I simply don't have time for that much churn.