SQL nulls are weird
292 comments
·January 9, 2025hiAndrewQuinn
crazygringo
That doesn't address anything in the second half of the post though, starting with this pull quote:
> The fact that NULLs are distinct for UNIQUE columns but are indistinct for SELECT DISTINCT and UNION continues to be puzzling. It seems that NULLs should be either distinct everywhere or nowhere. And the SQL standards documents suggest that NULLs should be distinct everywhere. Yet as of this writing, no SQL engine tested treats NULLs as distinct in a SELECT DISTINCT statement or in a UNION.
sitharus
This is confusing when you know that NULLs are not comparable, but it makes some sense if you consider the result of distinct/union as the output of a GROUP BY. You can consider everything that's NULL to be part of the same group, all the values are unknown.
So NULLs are not comparable but they are part of the same set.
adammarples
If nulls are distinct then group by should not group them together, this just ignores the problem. Why does group by treat them as equal?
GuB-42
That's because "different" and "distinct" don't mean the same thing.
Two unknown values are assumed to be different, but they are not distinct from each other.
For example, take two boxes, in each box is a die, the value of the box is the value shown on the die inside. You don't know the value since you don't see the die, it may even change as you manipulate the box, so it is unknown, NULL in SQL. Because of that, you assume the two boxes have different values. They are, however, indistinguable, so, not distinct. All you need to know is that you hold two boxes with dices in it, which one you hold doesn't matter, and that's what "SELECT DISTINCT" tells you: that you have two boxes and it doesn't matter which is which, even though they have different values.
SigmundA
>That's because "different" and "distinct" don't mean the same thing.
The literal definition distinct is:
>recognizably different in nature from something else of a similar type.
If you want to get down to it nothing is "equal" or the same.
Is a temperature measurement 25C the same as another of 25C? No these measurements are an approximation of the actual values which are actually not equal to each other they are distinct they have just been lumped into the same 25C group due to the resolution of measurement yet equality works just fine on that value in sql.
I have used SQL for a long time null handling is weird and inconsistent and a waste of time. For all the language bugs due to the existence of null at least I can count on null=null and not write garbage like value=param or (param is null and value is null)
Aloisius
MS SQL Server treats NULLs as indistinct for UNIQUE constraints, SELECT DISTINCT and for UNION.
Indeed, the sqlite page the pull quote is from says as much.
hans_castorp
> MS SQL Server treats NULLs as indistinct for UNIQUE constraints
Postgres lets you control that behaviour when creating the constraint (or index)
marcosdumay
Yes, ergonomics dictates some weird behavior for nulls.
Luckily, Postgres nowadays lets you declare the behavior of each null in unique constraints, like it should be. We can expect this to creep down to the other DBMSs with time.
Making nulls distinct on a "select distinct" or a "group by" would be completely useless.
Pxtl
If you're compromising on your high-minded and unorthodox purist concept for ergonomics, you may as well finish the job of ergonomics and just use normal nulls where X=X is true.
AlienRobot
It's possible that this is due to the underlying implementation.
In a unique column normally you'll have an index, so NULL becomes a special value in an index, but in SELECT DISTINCT you probably won't have an index, which means a full scan is performed, then every row has to be compared with every other row.
SoftTalker
If you're including possibly NULL columns in a distinct or group by and you want to treat them in a particular way, use the COALESCE() or NVL() or whatever similar function to give a real value to the NULL for that purpose.
tessierashpool9
there is a pattern starting to emerge here on hackernews of highly voted posts by people who present themselves as experts and thought leaders who shamelessly put their lack of understanding at display. it's frightening.
daswerth
The idea that someone should refrain from publishing a blog post about _anything_ unless they are a certifiable expert is not reasonable. Many people (correctly) write to learn, and even if they are publishing just to "present themselves as experts", it's on the reader to determine value.
izacus
In a world filled with false bullshit, crating more false unchecked writing instead of educating yourself is not a benefit to anyone.
hobs
I think that's more a pattern of your understanding growing over time.
Most technical writing is actually at the start of the technical journey, there's more people there and its more interesting to talk about strategy and ideas when its not "messy" with internal details and trade offs you make.
I encourage folks at any technical level to write, and I encourage folks to check authors work with a critical eye no matter how long the source has been in the industry, no amount of experience can protect you from being wrong.
ozim
I see it the other way around.
People think if someone wrote blog post with technical details and it got upvoted - somehow it has to be an expert.
tessierashpool9
go the extra mile and click on about and then check out the linkedin profile.
i quote:
"I graduated top of my class with a BSc in Computer Science [...]. I have a strong background in software engineering and technical leadership"
finnthehuman
> starting to emerge here on hackernews
It's not getting worse, you're getting better.
HN has for a long time been where I go for whatever you call the tech equivalent of watching stoners think they're having a deep conversation.
Pxtl
Or maybe people understand but still think it's dumb and hideously inconvenient?
Ergonomics matter.
pwdisswordfishz
Starting?
lynguist
Like on Reddit etc, which I deliberately avoid for this reason. The hiding of the vote count and the heavy moderation still help a lot that HN is still a massively better platform than any of its alternatives.
noduerme
eh. This probably shouldn't have gotten so many votes, but it's a little interesting from a logic standpoint. It falls somewhere in the region of a StackOverflow question that makes some people scratch their heads and functions as nerdbait for everyone who knows the answer. These things don't rank for that long on HN, (and I agree that the self-important "expert" blog posture is silly), but I do find them to be a better daily checkin for my brain than actually going on S.O. anymore...
gigatexal
This is the correct way of thinking about things. Null is one of the hardest things for traditional software engineers in my experience as a guy who came up as a data admin.
drpossum
Null in not-SQL (which is most things) usually isn't this tortured and isn't hard.
mathw
That's because null in not-SQL is a rather different concept, and while it's pretty easy to understand it's absolutely is hard to actually work with. Hoare didn't call it a "billion dollar mistake" on a whim.
swiftcoder
I don't know about that. null/undefined in javascript gives it a pretty good run for it's money
rad_gruchalski
Because they're two orthogonal problems. It's not like you do select distinct from program variables group by scope in your frontend programming language.
gigatexal
Yeah the 3 valued logic of SQL trips people up, me too from time to time
Pxtl
> Null is one of the hardest things for traditional software engineers
Making them harder is not better.
gigatexal
Never said it was
maest
The Maybe monad is really well understood at this point.
mrkeen
It's unfortunate to namesquat on 'boolean' if your elements have three possible values. Just call it 'ternary logic' and let individuals decide which system to use.
foobarchu
Who's name squatting boolean? The bool column is exactly what it claims to be, you just have the option of introducing unknowability if you define it allow nulls.
bunderbunder
The distinction is that not all formal logic systems are Boolean. Meaning that it is nonsensical and confusing to use "Boolean" as a generic synonym for "truth value" in the same way that it's nonsensical to use "Pantone" as a generic synonym for "color value", including when the specific kind of color value you're talking about is CMYK or HSV and definitely not Pantone.
chuckadams
Maybe GP was edited, but it doesn't use the word "boolean" anywhere.
hiAndrewQuinn
Correct, I edited "boolean" out prior to ^^P's comment. My apologies.
null
null
dominicrose
A null boolean good very well mean true, if the default value is true (in the code if not in the DB).
nilamo
Is this just being pedantic wrt uninitialized values?
thayne
If only it had a name that was more indicative of that, like UNKNOWN, or UNDEFINED or INDERTIMINATE or something.
dominicrose
Javascript has both null and undefined and I'm not sure that's a good idea. At least in SQL we only have one of them, but it can mean unknown or it can mean N/A or even false. It's like a joker, what it means depends on how you use it.
hobs
No, it's not those other things, that's just using the tool incorrectly. A NULL is definitely "we dont know", not false, not N/A, especially not any known value.
hun3
Or VBA, which has Empty, Null, and Nothing:
https://excelbaby.com/learn/the-difference-between-empty-nul...
(and sometimes Missing)
salawat
Null is shorter, and given the nightmarish queries I've had to read/write, I'll take any mercy that comes my way.
atombender
E. F. Codd originally suggested two types of values: "unknown" and "missing". Somehow we got NULL, which represents both.
ANSI SQL:1991 provides an optional feature that introduces a special value UNKNOWN to boolean expressions [1]. But few databases support it. MSSQL [2] is one of the few that do. As I understand it, it's not a data type that be used in tables, but is only a potential result of boolean operations.
[1] https://modern-sql.com/concept/three-valued-logic
[2] https://learn.microsoft.com/en-us/sql/t-sql/language-element...
Pxtl
Honestly, at this point I just wish SQL servers supported proper discriminated union types and nullable columns were implemented as some kind of MaybeKnown<T> and I could just have a normal Maybe<T> with normal equality semantics if I wanted.
SQL needs to join 21st century type systems... or needs to get replaced altogether. SQL is the FORTRAN of relational programming languages, but hangs around because every time somebody tries to replace it they throw the relational-theory baby out with the bath-water.
johnthescott
> SQL is the FORTRAN of relational programming languages
and what is an alternative to sql ... quel?
Pxtl
> SQL NULLs aren't weird, they're just based off of Kleene's TRUE-FALSE-UNKNOWN logic!
Kleene's TRUE-FALSE-UNKNOWN logic is weird.
SQL nulls effectively violate the reflexive property of equality, because X=X does not result in a value of TRUE.
And in many contexts in SQL, NULL is treated as equivalent to false, such as within a WHERE clause.
So that means that X=X is effectively FALSE in SQL*.
That is a clown language.
*(yes, I have the same opinion about ANSI NaN, but that wouldn't come up so often if Javascript didn't love NaN for some stupid reason.)
dragonwriter
> And in many contexts in SQL, NULL is treated as equivalent to false, such as within a WHERE clause.
NULL is not equivalent to FALSE, it is neither FALSE nor TRUE. It has the same effect as FALSE as the final result of evaluating a WHERE clause condition only because WHERE clause conditions allow a row to be included only when they evaluate strictly to TRUE. But if NULL were equivalent to FALSE in a WHERE clause, than a WHERE clause condition which would evaluate to NULL that was instead negated would be equivalent to TRUE but instead it remains NULL which remains not TRUE.
Pxtl
No, because NOT NULL/UNKNOWN is still NULL/UNKNOWN.
I realized earlier I was using the term NULL, but going forwards let's use the ANSI SQL concept where the null state of a Boolean is called UNKNOWN. You'll have to forgive me for using the term NULL this far, but in my defense the concept of NULL column-values and UNKNOWN boolean expression results are pretty intertwingled.
SELECT * FROM foo WHERE NOT (1 = NULL)
resolves to SELECT * FROM foo WHERE UNKNOWN
which is the same value that SELECT * FROM foo WHERE 1 = NULL
resolves to.So the WHERE clause is treating UNKNOWN/NULL as equivalent to false. The rest of the Boolean algebra is not.
SQL likes to have it both ways. Sometimes UNKNOWN/NULL is equivalent to false sometimes it's not.
It does this because UNKNOWN/NULL Booleans are incredibly inconvenient and tedious and sometimes they'd rather not think about them.
I'd like to do that as well, but this hoary old language won't give me the same permission it gives itself.
tpmoney
> And in many contexts in SQL, NULL is treated as equivalent to false, such as within a WHERE clause.
I don't think any databases treat `NULL` as `FALSE` in the WHERE clause. `SELECT * FROM foo WHERE bar = NULL` doesn't return rows with a NULL in the bar column. `SELECT * FROM foo WHERE bar != NULL` doesn't return rows without NULL in the bar column. `SELECT * FROM foo WHERE (bar = 'a') = NULL;` doesn't return rows where bar is not equal to `a`[1]. As far as I know every DB treats NULL as what it is, an unknown value.
It also doesn't to my mind violate the reflexive property because NULL is not equal to anything. It is a marker for an unknown value, not a value in and of itself. If you have a database of every person in a room and what color shirt they're wearing, and in your database, Alice and Bob both have NULL in their "shirt_color" column, that does not mean that Alice and Bob have the same color shirt. Nor does it mean that they don't have the same color shirt. Nor does it mean that someone with a green colored shirt has the same color shirt as Bob or Alice. It doesn't mean they don't have a shirt either. It means you don't have a record of/don't know what color their shirts are. You can't violate the reflexive property because you can't say what color shirt they have. You're not doing `X = X -> false`, you're doing `X = UNKNOWN -> UNKNOWN`
Pxtl
It treats the NULL/unknown value of the boolean as false
1 <> NULL => Boolean UNKNOWN,
so SELECT * FROM foo WHERE 1 <> NULL returns nothing.
1 = NULL => Boolean UNKNOWN,
so SELECT * FROM foo WHERE 1 = NULL returns nothing.
That's the thing that's being treated as FALSE. That UNKNOWN. Not the value of NULL itself.
> You're not doing `X = X -> false`, you're doing `X = UNKNOWN -> UNKNOWN`
That's not how "=" works. If you want a relationship for testing equality than handles unknown, don't call it equality.
Basic properties of equality, from Wikipedia
https://en.wikipedia.org/wiki/Equality_(mathematics)
- Reflexivity: for every a, one has a = a.
- Symmetry: for every a and b, if a = b, then b = a.
- Transitivity: for every a, b, and c, if a = b and b = c, then a = c.
edit:
We can also see the incoherence of this concept when we look at set theory.
Because UNKONWN booleans are neither true or false, if you use them in a WHERE clause you get the ugly result that the set of
X ⋃ Xᶜ
is not everything.zigzag312
They are weird because they are inconsistent compared to nearly everything else.
Being based on someone's logic is not sufficient. Most weird things are based on some (weird) logic.
tshaddox
Even simpler, note that the first two examples are essentially just two of the most basic boolean algebra identities. Replace UNKNOWN with a variable and change the notation to the more conventional boolean algebra notation and you've got:
1. x ∧ 1 = 1 (identity law for conjunction)
2. x ∨ 0 = 1 (identity law for disjunction)
demurgos
> select null = null; returns NULL, because each NULL is basically a placeholder representing any “unknown value”. Two unknown values are not necessarily the same value; we can’t say that they are equal, because we don’t know the value of either of them.
Agreed with all of this, it would probably have been better if they were named `unknown` instead of reusing the `null` keyword.
Note also that since Postgresql 15, you can use `NULLS NOT DISTINCT` when creating a unique index [0]. I'm less familiar with other databases.
layer8
SQL NULL doesn’t behave like “unknown” in all contexts. That’s one issue of NULL, that it doesn’t really have consistent semantics.
masklinn
Furthermore if null only means unknown then we need a value for “known absent”, there’s a reason why null is so often used as that.
int_19h
From a purely relational perspective, if some piece of data can be absent, it's a 1:N relation where N<=1, and ought to be encoded as such.
(Of course, this is rather awkward in practice, and when NULL is there, it's inevitably going to be used for that instead.)
tpmoney
But the "known absent" value is going to be different for different domains. For example, in EEOC databases the "known absent" value for a race would be "declined to answer". In a database of test scores, it might be "Didn't complete", but it could also be "was absent from class on exam day" so SQL can't specify what that is. On the other hand "this value is unknown" can use the same marker in all domains, and SQL chose NULL as that marker. To be completely strict about it, "have a value/don't have a value" is one piece of data if that's something you care about and "what is that value" is another one. So in an ideal system, you should have a column for "value is [present | known absent | unknown]" and a separate column for the actual value when "value is present"
Most of the time it's not that important and people can and do shortcut "null" to mean "not present" but then the issues with using null in equality statements is a result of taking the short cut, not necessarily with the logic around null.
thaumasiotes
Do you actually need that in a Boolean context? It would only be useful for evaluating self-referent claims like "this sentence is false".
marcosdumay
We could really use some kind of "polymorphic table" that can represent a sum type in a convenient way.
cm2187
The problem is that in practice in a database NULL is a placeholder for a missing value, not an unknown value.
SoftTalker
Best way to think of NULL is "no value." Not "unknown value," as that implies that it is a value you just don't know what it is. Not "missing" value as that even the notion of a value being "missing" tells you something. NULL is no value. It's like a black hole, it consumes anything it touches. Any expression involving a NULL term becomes NULL.
bballer
And to further apply semantics to this just to be snide, the value is only "missing" if it could possibly ever be defined for that tuple. There are cases where you expect the value to be "missing", and it not being "missing" would be considered a data integrity issue. Fun.
cm2187
Yes I should have rather written “an absence of data”. But still not “unknown”.
magicalhippo
The result of comparisons involving NULL values can result[1][2] in UNKNOWN, and in PostgreSQL for example you can test[3] for this using IS UNKNOWN.
That said, as someone self-taught in SQL, I agree NULL was not a good choice.
Replacing NULL with UNKNOWN and the third boolean value as INDETERMINATE for example would have been better.
[1]: https://stackoverflow.com/a/79270181
[2]: https://learn.microsoft.com/en-us/sql/t-sql/language-element...
[3]: https://www.postgresql.org/docs/current/functions-comparison...
labster
SQL was developed in the 1970s, there’s no way they’d waste all those bytes to spell out UNKNOWN and INDETERMINATE.
jandrese
Have you ever seen SQL? Blowing lots of bytes on a query was not a concern. They could have made a much more compact syntax with little effort, but it wouldn't have looked kind of like English the way SQL does.
You could imagine a world where instead of:
SELECT ( email, name, outstanding_balance ) FROM accounts WHERE outstanding_balance > 0 AND last_payment > 60 ORDER BY name
the queries looked more like:
accounts: outstanding_balance > 0 & last_payment > 60 => email, ^name, outstanding_balance
There were plenty of contemporary languages that looked like the latter, but few survive to this day. SQL's relative verbosity was not seen as a problem then and is definitely not one today.
Besides, if the verbosity was a problem they could easily shorten it to UNK. That would have been perfectly normal.
ChoHag
[dead]
otteromkram
Also self-taught SQLer and I don't have an issue with NULL.
I also don't use UNIQUE constraints, so maybe that has something to do with it.
magicalhippo
I don't have an issue as such, I was a fairly experienced developer first time I had to dabble with SQL, but sometimes it can still surprise.
For example I learned the hard way that the DB we use at work does not index NULL values.
And once in a while if I'm tired or stressed I might forget about UNKNOWN and thus that "Col <> 42" does not return rows where Col is NULL.
Not that better naming would prevent such surprises, but I still think the current naming is less than optimal from a pedagogical perspective. At least I see this at times when teaching our support folks SQL (many have domain background and not a technical background).
null
chongli
UNKNOWN isn’t always correct though. Let’s say your data is input by users filling out a form and some of the fields are allowed to be left blank. NULL captures both the case where the user intentionally left the field blank but also the case where they accidentally skipped that field.
So NULL can capture multiple distinct concepts: unknown values (say, as a result of a calculation), not applicable (where the data has been deliberately left out), and missing data (possibly due to instrumentation or user error).
jandrese
Is it really desirable to combine "unknown" and "no value" into the same identifier? They seems like very distinct concepts. Using your form example you might have a name field that includes a spot for a suffix (John Doe III) for example that many people would leave blank because they don't have a suffix. In that case you should encode it as a NULL, but if they fail to fill in their family name then that's UNKNOWN.
If you do it this way you can avoid some confusion as to what NULL represents.
John Doe NULL IS EQUAL John Doe NULL
John Doe NULL NOT EQUAL John Doe UNKNOWN
John Doe UNKNOWN NOT EQUAL John Doe UNKNOWN
Determining if any particular input is NULL or UNKNOWN is a tricky problem, but at least this gets the programmer thinking about it up front and structuring their code to behave in the sanest possible manner.
NoMoreNicksLeft
>also that since Postgresql 15, you can use `NULLS NOT DISTINCT` when creating a unique index [0]. I'm less familiar with other databases.
Why would anyone want to use another database?
stronglikedan
Simplicity. PG is often overkill for simple apps, where MySQL/Maria/et al is easier to maintain, or even SQLite for very simple apps where zero DB maintenance is preferable.
lcnPylGDnU4H9OF
Why would you say MySQL/Maria/et al are easier to maintain for simple apps than PG?
homebrewer
Also the reverse — MySQL et al support much more complex replication topologies out of the box, including multi master for the very rare use case when you need it.
It's also much easier to tune, most database instances require setting innodb_buffer_pool_size, and that's basically it. Newer versions can even set it automatically if you're fine with consuming all memory on that machine, thus requiring no tuning at all.
poincaredisk
* legacy applications or vendor lock-in
* use of a cloud provider that favours another database (like SQL server)
* some people claim mysql/maria is faster for them, simpler, or has a better support for replication
* use of sqlite for lightweight or portable apps
* domain specific databases that still use sql as their query language
* someone may want to use another database for fun or to learn something new
hot_gril
SQLite has its own use cases. And MySQL was all-around better than Postgres in the past, so it still has more usage in some areas. Nowadays I'll take Postgres over MySQL by default, but it's not a huge difference.
password4321
Replication
duncan-donuts
Introducing “unknown” feels like another kind of hell like undefined in JavaScript.
jampekka
JavaScript's undefined is great. It's sort of similar to a maybe monad. Or IEEE 754 NaN. JS could have nicer mechanisms to handle undefined though.
demurgos
Just to clarify, I'm not advocating to introduce a new `unknown` keyword. I'm saying that the existing `null` in SQL was not named properly and that the name `unknown` would have been more fitting. SQL's `null` already has the semantics of `unknown` as explained in the part of the article that I quoted.
wvenable
SQL's use of "null" is probably one of the oldest instances of that concept in computing. It's exactly equivalent to unknown. That is its definition.
bunderbunder
> ... and this is even less obvious if you’re used to using ORMs.
Which is why I continue to be such an ORM skeptic. I agree that they're convenient. But I do worry that we've now got an entire generation of engineers who regularly interact with relational databases, but have largely been spared the effort of learning how they actually work.
As another commenter pointed out, if you've learned basic relational algebra then the way SQL nulls behave seems obvious and logically consistent. The logic is the same as the logic behind the comparison rules for NaN in IEEE floats. It's the behavior of C-style nulls that is, always and forever, a billion-dollar mistake.
feoren
Stop thinking of ORMs as trying to hide the details of SQL and you'll stop hating them. Instead think of them as a way to compose relational queries dynamically, with the full power of your primary language, instead of inside of database stored procedures in a language totally devoid of any support for basic software engineering best practices. They shouldn't be hiding SQL from your primary language, they should be exposing the relational model to it! SQL is not the only possible implementation of the relational model, and it's not even a particularly good one. Even SQL's founders don't think it implements EF Codd's relational model very faithfully. Good ORMs act as a domain-specific language for the relational model embedded inside the parent language.
rqtwteye
"compose relational queries dynamically"
That's an important one. It would be super nice to have a SQL dialect that works more like LINQ where you can compose your queries easily. I always hate it when I have to write SQL directly. It's super powerful but the syntax just isn't designed well. To me it feels like a throwback to the good old FORTRAN or COBOL days: you can get stuff done but modern languages are so much better.
fauigerzigerk
>They shouldn't be hiding SQL from your primary language, they should be exposing the relational model to it!
But this has never been their primary purpose and it's not what they are good at. ORMs are supposed to map the relational model into an object oriented model so that you can work with objects rather than sets of tuples. And that's exactly how people use them.
ORMs incentivise people to replace simple and declarative set operations with complex procedural code operating on individual objects.
ORMs are just a terrible idea - conceptually messy, hard to debug and optimise, full of needless complexity.
feoren
Exposing the relational model to be manipulated dynamically within the parent language is exactly what LINQ in C# is. That was its primary purpose. True, LINQ is not itself an ORM -- it was built to support other ORMs like LINQ to SQL and Entity Framework, which aren't as "pure" on this subject. I don't actually like the LINQ syntax that much since it's not as extensible, but its existence is proof that the C# team did in fact intend to expose the relational model to C#.
Entity Framework did try to cater to the "SQL is scary, let me use objects" crowd, and that is the majority of how it's used, and that is a mistake in my opinion. But it is also very good at supporting relational algebra within C# and composing queries dynamically; ironically, it's best at it if you disable or avoid many of its features (dynamic subclassing, linked objects). Pass IQueryables around and compose them together and life is good. Updating needs work, but updates have always been a weakness for SQL too.
zzzeek
> ORMs are just a terrible idea - conceptually messy, hard to debug and optimise, full of needless complexity.
and that's why ORMs are so unpopular and entirely absent from successful production applications for the past 30 years
thrance
My experience with ORMs is that most of the time you end up needing to write some queries by hand, in raw SQL. Usually these are the most complex, that you can't express in your ORM's DSL. My point being, I don't think using an ORM really shields you from having to learn how it works behind the scenes.
globular-toast
It's not even about having to write SQL by hand. In an ORM like Django that's exceedingly rare. But you still need to understand what's going on underneath. In other words, it's the most leaky abstraction there is. I think the popularity is mostly aesthetic and convenience. Most people into ORMs like Django don't really know about layered architecture and that you can keep all your SQL in one place in the data access layer. They just scatter that stuff everywhere in the codebase.
feoren
I don't know Django specifically but I'm always floored by how people talk about ORMs. They're only a leaky abstraction if you believe their point is to shield terrified junior devs of the inner workings of the scary relational database. That's an awful way to use ORMs, and the source of most of the flak they get. To be fair, some are designed that way, or at least strongly push you toward it.
Stop thinking of ORMs as trying to hide the details of SQL and you'll stop hating them. Instead think of them as a way to compose SQL dynamically, with the full power of your language. SQL is an awful language to write application logic in, because it has horrible support for abstraction, composition, encapsulation, dependency injection, etc. The ORM gives you a way to produce SQL in an environment that actually supports basic software engineering principles. Scattering ORM logic everywhere in the codebase is the point: putting all your SQL in one data access layer is like putting all your arithmetic in one calculation layer. Why would you ever do that? What's wrong with seeing a plus sign in more than one file? What's wrong with seeing language-encoded relational logic in more than one file?
I can guarantee you the popularity is not "aesthetic". And convenience is a real thing that actually does reduce costs. People complain about ORMs, but have you seen the absolute horse-shit-level code that people jam into SQL functions and procedures to do the utterly most basic things? The standard for what ends up in SQL Stored procedures is the most unmaintainable garbage in the entire software engineering ecosystem.
hot_gril
If you can avoid learning SQL or the underlying DBMS then great, I have no problem with people ignoring things that are ignore-able. Problem is you can't. You will need to learn the DBMS, and the ORM will end up just getting in the way of what you really want to do.
DangitBobby
I've used ORMs extensively in my career, and I've seen it trotted out as a Boogeyman here a million times. Why do I still prefer writing queries with a good ORM over awkwardly composing SQL by conditionally concatenating strings? Is buggy, error prone string concatenation with a bunch of if statements the true way? What am I missing?
hot_gril
Conditional concat should be pretty rare. When are you doing that?
zzzeek
I was actually going to comment on how breathtakingly ignorant the blog post author was for making this statement. What on earth ORM would make a UNIQUE constraint against multiple columns without an explicit instruction, much less where some of the columns are NULLable and then leave the user entirely unaware that anything like that was happening? ORMs do nothing like this and the developer would always have built this constraint explicitly and therefore is working at the DDL layer. ORMs have nothing to do with whether or not someone knows SQL just like cars have nothing to do with whether or not someone knows how to walk.
irrational
I expected the article to mention how in Oracle NULLs are equal to empty strings. Now that is weird.
hyperman1
Oh man. I had a talk with a DBA about how oracle could not deal with an adress with no street name - literally a tiny village with 10 houses on 1 nameless town square. It was unsearchable in parts of the app because street='' was interpreted as street is null. DBA kept claiming oracle was right and the town should adapt their naming to our software.
This attitude was so prevalent at the time, I sometimes wonder if the rise of noSQL was simply people sick of dealing with Oracle DBAs
aidenn0
> This attitude was so prevalent at the time, I sometimes wonder if the rise of noSQL was simply people sick of dealing with Oracle DBAs
That was definitely one part; another part was sharp corners in MySQL (at least as of 20 years ago; I would be surprised if many of them haven't been rounded off in the meantime). The last part was places with no DBA with developers unaware of how to handle schema migrations.
zo1
It's weirder. If you insert an empty string into a VARCHAR field in Oracle, it returns Null back to you when you query that same field. At the very least, I'd expect a software system to behave in a deterministic way. I.e. either throw an error because you're not doing something right (whatever Oracle deems right in this case), or give you back what you gave it, especially for database software who's entire role of existence is to persist data without side-effects.
svieira
Domain-embedded nulls are the bane of my existence.
datadrivenangel
SQL NULLs are not weird once you consider how you want relational logic to work when they is a record with non-existent values.
grahamlee
Exactly this. SQL is based on the relational algebra and that's well-defined, NULL along with other features of SQL work in an entirely regular and predictable way. The only time it's weird is when a developer decides that it should work the way Javascript (or whatever) NULLs work because that's the last time they saw the same word used in a programming language, in which case it's the assumption that's weird.
tzs
That's not the only time it is weird. There's even a whole book by one of the pioneers of the relational DB model, Date's "Database Technology: Nulls Considered Harmful" [1], covering many of the ways it is weird.
[1] https://www.amazon.com/Database-Technology-Nulls-Considered-...
setr
The part that’s weird with nulls is that it’s a trinary logic stuffed into a boolean algebra. The use of x = NULL instead of x IS NULL is pretty much always a mistake.
More importantly, x = value instead of (x = value and x IS NOT NULL) is almost always a mistake, and a stupidly subtle one at that. And for this curse, we get… nothing particularly useful from these semantics.
Also the x != NULL case is completely cursed
grahamlee
> The part that’s weird with nulls is that it’s a trinary logic stuffed into a boolean algebra.
It's a three-valued logic (though not trinary, which would use a base-3 number system) in a three-valued algebra: specifically, the relational algebra. The outcome of a logical test has three values: true, false, or NULL; this is distinct from Boole's algebra where outcomes have a continuous value between 0 and 1 inclusive.
dataflow
> SQL NULLs are not weird once you consider how you want relational logic to work when they is a record with non-existent values.
Could you explain how this makes sense then?
SELECT ...
WHERE NULL
If NULL is just "unknown" then shouldn't this be a type error?Moreover, could you explain why the heck this ought to be empty?
WITH T AS (SELECT 1 AS C1,
NULL AS C2)
SELECT C1, C2
FROM T
INNER JOIN T
USING (C1, C2);
As a human this looks insane to me, "relational algebra" be damned. You find a row, then you look it up again, and it's not there? What the hell?whstl
Agreed.
I will die on the hill that regular C-like nulls are the actual thing that's weird.
The real billion dollar mistake [1] was the damage it made on the minds of developers.
chuckadams
Even null in programming languages isn't so bad if it's a distinct type. The problem with null in languages like Java is that null is part of every reference type (C's pointers are another world of broken, null being basically just another unsafe invalid address).
Most languages nowadays do get nulls right, even PHP of all things.
PhilipRoman
>unsafe invalid address
Ironically NULL is probably the safest pointer value in C, as any dereferences from it (and thousands of pages that follow it) are guaranteed to crash.
poincaredisk
We should start adjusting that billion for inflation.
iefbr14
When the null concept was introduced to me in the seventies, the only thing I could say was that it would be causing a lot of unnecessary confusion in the future. If you have missing values in your datarecord then that datarecord belongs in an exception-queue. And now some 45 years later people are still discussing it like we did then..
acuozzo
> If you have missing values in your datarecord then that datarecord belongs in an exception-queue.
This depends on the context, no? I doubt there exists someone with a contact list on their phone which has every single field for every single contact populated.
There needs to be some way to codify that a field in a datarecord is unpopulated. Using the "zero value" for the type of the field (e.g., the empty string) is reasonable, but is this necessarily better than NULL? I reckon an argument can be made that this approach is just as likely to lead to bugs.
I'm not necessarily in favor of NULL, for what it's worth, but I can't think of an adequate replacement which doesn't reduce to "NULL in sheep's clothing".
thfuran
I did last week, before I added the first contact.
DangitBobby
Missing values are not always an exception. There's a reason modern languages almost universally include an Option type which may contain a Null and allow you to pass it around as a first class value. Good data representations allow you to express missing values.
zokier
SQL nulls in some ways behave in similar to floating point nans. Of course nans are also weird in their own way, but it is a bit comforting that its not so completely singularly weird.
giraffe_lady
NaN is cool because it's almost like a type that constrains uncertainty. What do we know about this entity? not much! but it's definitely not a number. Calling it anything else would have been a mistake.
Null is more confusing because it means different things in different languages. Sometimes it's a more constrained uncertainty, eg this definitely doesn't exist. But in sql it's a less constrained uncertainty, like "undefined" in math. The value of this thing couldn't make sense in this context, but we can't make assertions about its existence.
mplanchard
There's another comment in here that talks about thinking of NULL as UNKNOWN, and I quite like that. It makes a lot more sense, and I think it would've been a better choice to standardize on. UNDEFINED would also be an improvement.
reshlo
UNDEFINED would not be accurate. If your signup form has an optional field for a full name which I don’t fill in, I still have a name. Just because a value is not known by your database doesn’t mean it isn’t defined.
E. F. Codd thought about this issue.[0]
> Codd indicated in his 1990 book The Relational Model for Database Management, Version 2 that the single Null mandated by the SQL standard was inadequate, and should be replaced by two separate Null-type markers to indicate why data is missing. In Codd's book, these two Null-type markers are referred to as 'A-Values' and 'I-Values', representing 'Missing But Applicable' and 'Missing But Inapplicable', respectively. Codd's recommendation would have required SQL's logic system be expanded to accommodate a four-valued logic system. Because of this additional complexity, the idea of multiple Nulls with different definitions has not gained widespread acceptance in the database practitioners' domain. It remains an active field of research though, with numerous papers still being published.
adhamsalama
Except it is of type number, at least in JavaScript.
dunham
Also similar to the bottom value in haskell and exceptions in other languages.
galaxyLogic
In Object Oriented Context "null" is useful to indicate that some object doesn't have value for that property.
What's interesting is, do we mean that in our data that attribute has no value? Or do we mean the real-world object represented by the data does not have that attribute?
Does null mean
a) We don't know the value of this attribute for this object, or
b) We do know that there is no value for this attribute in the real-world object represented by our data.
In JavaScript because there is both null and undefined it is easy to assume that undefined means we don't know the value and null means we do know it has no value.
EXAMPLE: The attribute 'spouse'. Some people have a spouse some don't. So what does it mean if the value of the field 'spouse' is null? That we know there is no spouse, or that we don't know who the spouse is if any.
In practical terms we can say null means "We don't know" which includes the case that there is no spouse.
Izkata
> In JavaScript because there is both null and undefined it is easy to assume that undefined means we don't know the value and null means we do know it has no value.
Javascript objects have two kinds of undefined that are both represented by the same value. You have to use another method to see which it is, and I've seen "foo" in this example used for the same thing as "null" in your example:
>> z = {foo: undefined}
Object { foo: undefined }
>> z.foo
undefined
>> z.bar
undefined
>> z.hasOwnProperty('foo')
true
>> z.hasOwnProperty('bar')
false
This is something you have to account for because the key is still there if you try to remove a value by just setting it to undefined: >> Object.keys(z)
Array [ "foo" ]
>> for (let k in z) { console.info(k); }
foo
This is the right way to remove the key: >> delete z.foo
true
>> z
Object { }
>> Object.keys(z)
mnsc
But in a _relational_ database lack of spouse would not be modeled with a nullable column "spouse" but rather an absence of a spouse row/relation. Which is very real-world-like.
afiori
As a sort of challenge I had an idea of building an app using SQL as a pseudo ECS system where every table was either a 1 column table with only an id or a 2 column table with an id and a value.
DangitBobby
And yet when you do a join because you need to actually use that data, the resulting table will have a column with nulls in it. Any way you squeeze it, you need a way to represent empty values for your database to be useful.
mnsc
An inner join? Then there wouldn't be any nulls.
andai
I remember from my databases course at university that NULL means that the database doesn't contain that data, and empty string means that it is known to be empty.
zo1
Let's also all be reminded about how Oracle DB doesn't let you insert empty strings, and instead treats them as NULLS even if you gave it an empty string initially.
https://stackoverflow.com/questions/203493/why-does-oracle-9...
That was a fun bug to find out, after having dealt with quite a few other DBs over the years. It was one of those "No, but surely" and "This can't be! This is Oracle!" moments. Found it while porting some old code that needed to store an empty string as being distinct from a NULL in that same column.
feoren
That's your professor's opinion, and probably one that does not come from industry experience. Look in 4 different databases and you'll see 9 different conventions. A common one is to have all strings non-null with a default value of empty string. And not all columns are strings; there is no "obviously empty" integer or boolean.
null
niij
What is the type is something other than a string?
age: null? married: null?
bogeholm
How about Option<bool> for that? In SQL you could have a nullable Boolean column
null
kopirgan
Recall this really funny dialogue in one of the Blackadder episodes.
The princess eyes are as blue as the stone of Galveston
Have you seen the princess eyes?
No!
Have you seen the blue stone of Galveston?
No!
So you're comparing something you've never seen with something else you've never seen!
That's NULL comparison
ludwik
Even if we set three-value logic aside for a moment, this behavior of NULL still makes sense intuitively.
The value of NULL in a particular table cell is simply a way to indicate 'no value'. If you want the values in a column to be unique, cases where there are no values shouldn't be considered.
This plays out similarly in practice. For example, you may want to allow users to optionally reserve a username, and if they do, those usernames should be unique. It's hard to imagine a use case where by wanting a field to be both optional (nullable) and unique, you mean that the field should be optional for a single record (!) and required for all the rest. Of course, you mean that IF there is a value, THEN it should be unique.
Recursing
Surprised that this doesn't mention "IS DISTINCT FROM" ( https://modern-sql.com/caniuse/is-distinct-from )
(Although in rare cases that is even weirder: https://stackoverflow.com/a/58998043 )
ziml77
I'm glad SQL Server finally got this, but I wish the syntax was nicer. It's a multi-word infix operator that gets tough to read. I've been using Snowflake SQL recently and I like that they just made it a function called EQUAL_NULL
lolpanda
I actually like how NULLs behave in SQL. They mean "I don't know" In the modern programming language we all care about Null safety. But no matter how you model your data, you will always run into the situations when you don't know everything. So I believe NOT NULL is not very practical. NULLs in SQL handle these case very well - when the input is unknown your output is unknown
int_19h
Except they don't consistently behave that way. If NULL means "unknown", why do they show up in outer joins, or when you SUM an empty table?
cglace
The most annoying is having to order by DESC NULLS LAST to get the largest value from an aggregation.
dalton_zk
I feel like the same, Null equal null is null is totally right
afiori
I feel like a select for:
- col1 = 1 should not return NULLS
- !(col1 = 1) should return NULLS
- col1 <> 1 should not return NULLS
kijin
Agreed. If SQL didn't have NULL, we'd have other special values meaning "I don't know" or "no data" all over the place.
Too many newbies hear that NULL is bad, so they declare all columns as NOT NULL and end up inserting ad hoc values like 0, -1, '', or {} when they inevitably come across cases where they don't have data. Which is even worse than NULL.
SQL NULLs aren't weird, they're just based off of Kleene's TRUE-FALSE-UNKNOWN logic!
If you want you can read NULL as UNKNOWN and suddenly a whole bunch of operations involving them become a lot more intuitive:
1. TRUE OR UNKNOWN = TRUE, because you know you have at least one TRUE already.
2. TRUE AND UNKNOWN = UNKNOWN, because you don't know whether you have two TRUEs or not. It's just out there.
3. UNKNOWN XOR UNKNOWN = UNKNOWN, because it could darn near be anything: TRUE XOR TRUE, TRUE XOR FALSE, FALSE XOR FALSE, FALSE XOR TRUE... Internalizing this is where SQL's use of NULL / UNKNOWN really becomes intuitive.
4. (TRUE AND FALSE) XOR (TRUE OR UNKNOWN) = (FALSE) XOR (TRUE) per #1 = TRUE. See, it's consistent, you just need to keep in mind that if you have a lot of known UNKNOWNs they're quite parasitic and your final outcome is likely to be, itself, an UNKNOWN. Just like in real life!