Immutability Changes Everything (2016) [pdf]
43 comments
·January 25, 2025LeftHandPath
layer8
Without specific support from the RDBMS, bitemporal schemas are difficult with regard to cross-table references, such as foreign keys. Rows that need to be consistent between tables aren’t necessarily 1:1 anymore, but instead each row in one table needs to be consistent with all corresponding rows in the other table having an intersecting time interval. You then run into problems with transaction isolation and visibility.
pyrale
> bitemporal schemas are difficult with regard to cross-table references
Who needs more than one table ? >:)
More complex models can be built and stored separately. The great benefit of this method being that, once you're unhappy with your table model, you can trash it and rebuild it from scratch without regard for data migration.
layer8
Your last sentence sounds more like event sourcing than bitemporal databases, which are quite different concepts. I don’t see how bitemporal schemas simplify schema migration.
hobs
Pretty much, you want triggers to store things in a schemaless fashion in an audit format so that you are free to migrate tables.
This does require either knowing the schema at the point in time or recording enough information to do a schema on read.
The other options are of course you basically run a table like an API, always adding, never removing.
refset
> It's relatively easy to implement your own temporal tables on most existing databases
It gets tricky when you need to change the schema without breaking historical data or queries. SQL databases could do a lot more to make immutability easier and widespread.
jiggawatts
One fundamental issue I’ve noticed is that typical SQL databases have a single schema per table defining both the logical and physical aspects, typically with a strong correlation between the two.
Databases could treat the columns as the fundamental unit with tables being not much more than a view of a bunch of columns that can change over both space (partitioning) and time (history).
bobnamob
That’s effectively how datomic works. Datoms are the fundamental unit, with attributes being analogous to a column name and views being the 4 indexes that datomic keeps
teleforce
>Actually, explaining that is a good way to introduce the concept of a merkle tree / distributed ledger, and why "blockchain" is specifically for systems without a central authority
This is a very important points, for whatever systems or solutions that you do, do not overengineer and always remember premature optimization is the root of all evil.
It used to be blockchain and it seems apparently ML/AI is the new fad. Most probably majority of the solutions being design now with ML/AI does not need it and in doing so just make it expensive/slow/complex/non-deterministic/etc.
People need to wake up and smell the coffee, since ultimately ML/AL it just a tool inside the many tools toolbox.
unit149
[dead]
gatane
My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes. Sure, you could have zippers to aid in the updating process by acting as a kind of cursor/pointer, but raw access to data beats them anytime (even if you optimize for cache).
So if you had to optimize for raw speed, why not choose mutable data?
dsQTbR7Y5mRHnZv
> My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes.
Conceptually yes, but the implementation doesn't always necessarily need to work that way under the hood: https://www.roc-lang.org/functional#opportunistic-mutation
null
munchler
> My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes.
That is not true in general. There are plenty of data structures that can be updated without forcing a full copy. Lists, trees, sets, maps, etc. All of these are common in functional programming. This is discussed in the article (e.g. "Append-Only Computing").
sarchertech
If you really care about performance, iterating over all of those is going to much much slower than iterating over an array.
munchler
If you really care about multi-threading, mutating array elements is going to be much buggier than using an immutable data structure.
KingMob
> My main gripe with immutability is that making updated data requires building a full copy of the data again with the changes.
That's not generally true. Many immutable languages are using "persistent" data structures, where "persist" here means that much of the original structure persists in the new one.
For more, see:
- Purely Functional Data Structures by Okasaki: https://www.cs.cmu.edu/~rwh/students/okasaki.pdf - Phil Bagwell's research - e.g., https://infoscience.epfl.ch/record/64398/files/idealhashtree...
mrkeen
Someone should try it with postgres. Make a raw speed branch that gets rid of the overhead of mvcc:
while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data
https://www.postgresql.org/docs/7.1/mvcc.html
ahoka
That’s not exactly how PostgreSQL works. This is true only at certain isolation levels.
dang
Related:
Immutability Changes Everything (2016) - https://news.ycombinator.com/item?id=27640308 - June 2021 (94 comments)
Immutability Changes Everything - https://news.ycombinator.com/item?id=10953645 - Jan 2016 (4 comments)
Immutability Changes Everything [pdf] - https://news.ycombinator.com/item?id=8955130 - Jan 2015 (25 comments)
(Reposts are fine after a year or so; links to past threads are just to satisfy extra-curious readers)
gleenn
I love the quote "accountants don't use erasers". So many things should be modeled over time and keep track of change right out the gate. Little things like Ruby on Rails always adding timestamps to model tables was super helpful but also a little code smell. If this is obvious enough to be useful everywhere, what is the next level? One more reason Datamoic is so cool: nothing is overwritten, it is overlayed with a newer record and you can always look back and you can always also always take a slice of the db at a specific time and have a complete and consistent viewbof the universe at that time. Immutability!
cowsandmilk
The “right to be forgotten” has caused a lot of conflicts with certain immutable data stores. If I can reconstruct a snapshot with a user’s data, have I actually “forgotten” them? Having a deadline where the merges fully occur and old data is rendered inaccessible is sometimes necessary legally.
hcarvalhoalves
You can always "redact" previous data. You can treat the sensible entries themselves as mutable, without it breaking the system design around immutable data.
I have also seen a scheme where you store the hash, and have a separate lookup table for sensible data, that you can redact more easily without messing with the log.
mrkeen
Likewise with database backups.
prydt
One of my favorite papers! This reminds me of Martin Kleppmann's work on Apache Samza and the idea of "turning the database inside out" by hosting the write-ahead log on something like Kafka and then having many different materialized views consume that log.
Seems like a very powerful architecture that is both simple and decouples many concerns.
082349872349872
In their 1992 Transaction Processing book*, Gray and Reuter extrapolate h/w and s/w trends forward and predict that the DBMS of their far future would look like a tape robot for backing store with materialised views in main memory.
Substitute streams for tape i/o, and this description of Samza sounds like it could be very similar to that vision.
* as far as I know, their exposition of the WAL and tradeoffs in its implementation has aged well. Any counter opinions?
skybrian
Editors and form validation are where this gets tricky. The user isn't just reporting new, independent observations to append to a log. They're looking at existing state and deciding how to react to it. Sometimes avoiding constraint violations with other state that they're not looking at is also important.
It often works out, but if you're not looking at the right version then you're risking a merge conflict.
niuzeta
Semi-related, but is there any repository(ies?) that comprise of these technical white papers? I'm fascinated by these papers whenever they show up in my feed and I gorge on them, and I'd love more. I can't be the only one thinking this way.
ahoka
I can recommend Adrian Colyer‘s excellent The Morning Paper blog: https://blog.acolyer.org/
lbj
I have to say, I really love the title :)
cacozen
I guess “Immutability changes nothing” wouldn’t have the same impact
sstanfie
Needs more exclaimation points!
lincpa
[dead]
Immutability is a fantastic tool, especially when working with enterprise data. It's relatively easy to implement your own temporal tables on most existing databases, no special libraries or tools required. It seems really trivial/obvious, but I'll admit I first stumbled into the concept using the AS400 at work. If you make a mistake on payroll in IBM's old MAPICS program, you don't overwrite or delete it. You introduce a new "backout record" to nullify it, then (maybe) insert another record with the correct data. It seems obvious once you've seen the pattern.
I've made a few non-technical eyes go wide by explaining A) that this is done and B) how it is done. The non-tech crypto/blockchain enthusiasts I've met get really excited when they learn you can make a set of data immutable without blockchain / merkle trees. Actually, explaining that is a good way to introduce the concept of a merkle tree / distributed ledger, and why "blockchain" is specifically for systems without a central authority.
(Bi)Temporal and immutable tables are especially useful for things like HR, PTO, employee clock activity, etc. Helps keep things auditable and correct.