Skip to content(if available)orjump to list(if available)

SQL nulls are weird

SQL nulls are weird

61 comments

·January 9, 2025

hiAndrewQuinn

SQL NULLs aren't weird, they're just based off of Kleene's TRUE-FALSE-UNKNOWN logic!

If you want you can read NULL as UNKNOWN and suddenly a whole bunch of operations involving them become a lot more intuitive:

1. TRUE OR UNKNOWN = TRUE, because you know you have at least one TRUE already.

2. TRUE AND UNKNOWN = UNKNOWN, because you don't know whether you have two TRUEs or not. It's just out there.

3. UNKNOWN XOR UNKNOWN = UNKNOWN, because it could darn near be anything: TRUE XOR TRUE, TRUE XOR FALSE, FALSE XOR FALSE, FALSE XOR TRUE... Internalizing this is where SQL's use of NULL / UNKNOWN really becomes intuitive.

4. (TRUE AND FALSE) XOR (TRUE OR UNKNOWN) = (FALSE) XOR (TRUE) per #1 = TRUE. See, it's consistent, you just need to keep in mind that if you have a lot of known UNKNOWNs they're quite parasitic and your final outcome is likely to be, itself, an UNKNOWN. Just like in real life!

mrkeen

It's unfortunate to namesquat on 'boolean' if your elements have three possible values. Just call it 'ternary logic' and let individuals decide which system to use.

foobarchu

Who's name squatting boolean? The bool column is exactly what it claims to be, you just have the option of introducing unknowability if you define it allow nulls.

bunderbunder

The distinction is that not all formal logic systems are Boolean. Meaning that it is nonsensical and confusing to use "Boolean" as a generic synonym for "truth value" in the same way that it's nonsensical to use "Pantone" as a generic synonym for "color value", including when the specific kind of color value you're talking about is CMYK or HSV and definitely not Pantone.

chuckadams

Maybe GP was edited, but it doesn't use the word "boolean" anywhere.

hiAndrewQuinn

Correct, I edited "boolean" out prior to ^^P's comment. My apologies.

null

[deleted]

tzury

For Postgres specific approach, you may refer to

https://blog.rustprooflabs.com/2022/07/postgres-15-unique-im...

Practically speaking, I go with not null, and always set default value.

demurgos

> select null = null; returns NULL, because each NULL is basically a placeholder representing any “unknown value”. Two unknown values are not necessarily the same value; we can’t say that they are equal, because we don’t know the value of either of them.

Agreed with all of this, it would probably have been better if they were named `unknown` instead of reusing the `null` keyword.

Note also that since Postgresql 15, you can use `NULLS NOT DISTINCT` when creating a unique index [0]. I'm less familiar with other databases.

0: https://www.postgresql.org/docs/15/sql-createtable.html

magicalhippo

The result of comparisons involving NULL values can result[1][2] in UNKNOWN, and in PostgreSQL for example you can test[3] for this using IS UNKNOWN.

That said, as someone self-taught in SQL, I agree NULL was not a good choice.

Replacing NULL with UNKNOWN and the third boolean value as INDETERMINATE for example would have been better.

[1]: https://stackoverflow.com/a/79270181

[2]: https://learn.microsoft.com/en-us/sql/t-sql/language-element...

[3]: https://www.postgresql.org/docs/current/functions-comparison...

otteromkram

Also self-taught SQLer and I don't have an issue with NULL.

I also don't use UNIQUE constraints, so maybe that has something to do with it.

layer8

SQL NULL doesn’t behave like “unknown” in all contexts. That’s one issue of NULL, that it doesn’t really have consistent semantics.

masklinn

Furthermore if null only means unknown then we need a value for “known absent”, there’s a reason why null is so often used as that.

thaumasiotes

Do you actually need that in a Boolean context? It would only be useful for evaluating self-referent claims like "this sentence is false".

duncan-donuts

Introducing “unknown” feels like another kind of hell like undefined in JavaScript.

demurgos

Just to clarify, I'm not advocating to introduce a new `unknown` keyword. I'm saying that the existing `null` in SQL was not named properly and that the name `unknown` would have been more fitting. SQL's `null` already has the semantics of `unknown` as explained in the part of the article that I quoted.

wvenable

SQL's use of "null" is probably one of the oldest instances of that concept in computing. It's exactly equivalent to unknown. That is its definition.

NoMoreNicksLeft

>also that since Postgresql 15, you can use `NULLS NOT DISTINCT` when creating a unique index [0]. I'm less familiar with other databases.

Why would anyone want to use another database?

stronglikedan

Simplicity. PG is often overkill for simple apps, where MySQL/Maria/et al is easier to maintain, or even SQLite for very simple apps where zero DB maintenance is preferable.

homebrewer

Also the reverse — MySQL et al support much more complex replication topologies out of the box, including multi master for the very rare use case when you need it.

It's also much easier to tune, most database instances require setting innodb_buffer_pool_size, and that's basically it. Newer versions can even set it automatically if you're fine with consuming all memory on that machine, thus requiring no tuning at all.

lcnPylGDnU4H9OF

Why would you say MySQL/Maria/et al are easier to maintain for simple apps than PG?

poincaredisk

* legacy applications or vendor lock-in

* use of a cloud provider that favours another database (like SQL server)

* some people claim mysql/maria is faster for them, simpler, or has a better support for replication

* use of sqlite for lightweight or portable apps

* domain specific databases that still use sql as their query language

* someone may want to use another database for fun or to learn something new

password4321

Replication

Recursing

Surprised that this doesn't mention "IS DISTINCT FROM" ( https://modern-sql.com/caniuse/is-distinct-from )

(Although in rare cases that is even weirder: https://stackoverflow.com/a/58998043 )

irrational

I expected the article to mention how in Oracle NULLs are equal to empty strings. Now that is weird.

hyperman1

Oh man. I had a talk with a DBA about how oracle could not deal with an adress with no street name - literally a tiny village with 10 houses on 1 nameless town square. It was unsearchable in parts of the app because street='' was interpreted as street is null. DBA kept claiming oracle was right and the town should adapt their naming to our software.

This attitude was so prevalent at the time, I sometimes wonder if the rise of noSQL was simply people sick of dealing with Oracle DBAs

svieira

Domain-embedded nulls are the bane of my existence.

datadrivenangel

SQL NULLs are not weird once you consider how you want relational logic to work when they is a record with non-existent values.

grahamlee

Exactly this. SQL is based on the relational algebra and that's well-defined, NULL along with other features of SQL work in an entirely regular and predictable way. The only time it's weird is when a developer decides that it should work the way Javascript (or whatever) NULLs work because that's the last time they saw the same word used in a programming language, in which case it's the assumption that's weird.

tzs

That's not the only time it is weird. There's even a whole book by one of the pioneers of the relational DB model, Date's "Database Technology: Nulls Considered Harmful" [1], covering many of the ways it is weird.

[1] https://www.amazon.com/Database-Technology-Nulls-Considered-...

setr

The part that’s weird with nulls is that it’s a trinary logic stuffed into a boolean algebra. The use of x = NULL instead of x IS NULL is pretty much always a mistake.

More importantly, x = value instead of (x = value and x IS NOT NULL) is almost always a mistake, and a stupidly subtle one at that. And for this curse, we get… nothing particularly useful from these semantics.

Also the x != NULL case is completely cursed

whstl

Agreed.

I will die on the hill that regular C-like nulls are the actual thing that's weird.

The real billion dollar mistake [1] was the damage it made on the minds of developers.

[1] https://en.wikipedia.org/wiki/Tony_Hoare

chuckadams

Even null in programming languages isn't so bad if it's a distinct type. The problem with null in languages like Java is that null is part of every reference type (C's pointers are another world of broken, null being basically just another unsafe invalid address).

Most languages nowadays do get nulls right, even PHP of all things.

poincaredisk

We should start adjusting that billion for inflation.

bunderbunder

> ... and this is even less obvious if you’re used to using ORMs.

Which is why I continue to be such an ORM skeptic. I agree that they're convenient. But I do worry that we've now got an entire generation of engineers who regularly interact with relational databases, but have largely been spared the effort of learning how they actually work.

As another commenter pointed out, if you've learned basic relational algebra then the way SQL nulls behave seems obvious and logically consistent. The logic is the same as the logic behind the comparison rules for NaN in IEEE floats. It's the behavior of C-style nulls that is, always and forever, a billion-dollar mistake.

thrance

My experience with ORMs is that most of the time you end up needing to write some queries by hand, in raw SQL. Usually these are the most complex, that you can't express in your ORM's DSL. My point being, I don't think using an ORM really shields you from having to learn how it works behind the scenes.

osigurdson

Weirder still are floating point numbers in SQL.

kurtbuilds

If you want equality testing with nulls, you want to use `is (not) distinct from` instead of `=` and `<>` / `!=`.

`1 is not distinct from NULL` => false

`NULL is not distinct from NULL` => true

`0 is not distinct from 1` => false

blast

Having that is much better than not having it, but man is it verbose and confusing.

zokier

SQL nulls in some ways behave in similar to floating point nans. Of course nans are also weird in their own way, but it is a bit comforting that its not so completely singularly weird.

dunham

Also similar to the bottom value in haskell and exceptions in other languages.

giraffe_lady

NaN is cool because it's almost like a type that constrains uncertainty. What do we know about this entity? not much! but it's definitely not a number. Calling it anything else would have been a mistake.

Null is more confusing because it means different things in different languages. Sometimes it's a more constrained uncertainty, eg this definitely doesn't exist. But in sql it's a less constrained uncertainty, like "undefined" in math. The value of this thing couldn't make sense in this context, but we can't make assertions about its existence.

mplanchard

There's another comment in here that talks about thinking of NULL as UNKNOWN, and I quite like that. It makes a lot more sense, and I think it would've been a better choice to standardize on. UNDEFINED would also be an improvement.

adhamsalama

Except it is of type number, at least in JavaScript.

al2o3cr

FWIW, you can explicitly change this behavior in Postgres as of version 15 - include "NULLS NOT DISTINCT" when creating the unique index.

jmyeet

NULL is the absence of a value. If you try and treat it as a value, you're going to have a bad time. So an attempted UNIQUE(email_address, deleted_at) constraint is fundamentally flawed. If you treated NULL as a value that could be unique, you're going to break foreign keys.

But let's continue the logic of deleted_at being NULL indicating an active account, which seems to the intent here. You end up doing things like:

    SELECT /* ... */
    FROM accounts
    WHERE email_address = '...'
    AND deleted_at IS NOT NULL
Depending on your database, that may or may not index well. More problematic, you may end up with privacy leaks if someone forgets the last conditional.

If anything, you want to reverse this so someone has to go out of their way to explicitly select deleted accounts. There are multiple strategies for this eg using an active_accounts view or table.

Lastly, there are lots of potential reasons for an account to be disabled or otherwise not visible/accessible. Takedowns, court orders, site safety, hacked accounts and so on.

Overloading deleted_at to have a semantic meaning for an active account is just fundamentally bad design.

chuckadams

> Overloading deleted_at to have a semantic meaning for an active account is just fundamentally bad design.

Then don't do that. It's kind of a leap to say soft deletes are categorically bad because someone might confuse "deleted" with "inactive". My users table does the super-advanced thing of having both columns. The ORM also doesn't forget to add the not-null criterion. There's also zero databases in active use where it poses a problem to indexing.

Soft deletes suck in their own way, but none of the alternatives are perfect either.

indeed30

That's interesting - I believe this is exactly how Sequelize implements soft-deletion.

giraffe_lady

You put the "is not null" on the index itself and then simply don't use it for the much rarer queries that are on deleted accounts. Or just use a view for active accounts.

Overloading timestamps to carry a boolean on null is awesome as long as you decide that's what you're doing and use one of the several standard techniques to dodge the easily avoided potential downside.

This isn't a valid security concern, more than any other incorrect sql query would be anyway. A dev can always write a bad Q, you need another way to address that it's not more likely here because of the null.