Avoid UUIDv4 Primary Keys

93 comments

·December 15, 2025

vintermann

A prime example of premature optimization.

Permanent identifiers should not carry data. This is like the cardinal sin of data management. You always run into situations where the thing you thought, "surely this never changes, so it's safe to squeeze into the ID to save a lookup". Then people suddenly find out they have a new gender identity, and they need a last final digit in their ID numbers too.

Even if nothing changes, you can run into trouble. Norwegian PNs have your birth date (in DDMMYY format) as the first six digits. Surely that doesn't change, right? Well, wrong, since although the date doesn't change, your knowledge of it might. Immigrants who didn't know their exact date of birth got assigned 1. Jan by default... And then people with actual birthdays on 1 Jan got told, "sorry, you can't have that as birth date, we've run out of numbers in that series!"

Librarians in the analog age can be forgiven for cramming data into their identifiers, to save a lookup. When the lookup is in a physical card catalog, that's somewhat understandable (although you bet they could run into trouble over it too). But when you have a powerful database at your fingertips, use it! Don't make decisions you will regret just to shave off a couple of milliseconds!

mkleczek

This is actually a very deep and interesting topic. Stripping information from an identifier disconnects a piece of data from the real world which means we no longer can match them. But such connection is the sole purpose of keeping the data in the first place. So, what happens next is that the real world tries to adjust and the "data-less" identifier becomes a real world artifact. The situation becomes the same but worse (eg. you don't exist if you don't remember your social security id). In extreme cases people are tattooed with their numbers.

The solution is not to come up with yet another artificial identifier but to come up with better means of identification taking into account the fact that things change.

hyperpape

Your comment is sufficiently generic that it’s impossible to tell what specific part of the article you’re agreeing with, disagreeing with, or expanding upon.

oncallthrow

It sounds to me like you’re just arguing for premature optimization of another kind (specifically, prematurely changing your entire architecture for edge cases that probably won’t ever happen to you).

tacone

Fantastic real life example. Italian PNs carry also the gender, which something you can change surgically, and you'll eventually run into the issue when operating at scale.

I don't agree with the absolute statement, though. Permanent identifiers should not generally carry data. There are situations where you want to have a way to reconciliate, you have space or speed constraints, so you may accept the trade off, md5 your data and store it in a primary index as a UUID. Your index will fragment and thus you will vacuum, but life will still be good overall.

mckirk

I'm not sure whether that was intended, but 'operating at scale' actually made me laugh out loud :D

benterix

The article sums up some valid arguments against UUIDv4 as PKs but the solution the author provides on how to obfuscate integers is probably not something I'd use in production. UUIDv7 still seems like a reasonable compromise for small-to-medium databases.

mort96

I tend to avoid UUIDv7 and use UUIDv4 because I don't want to leak the creation times of everything.

Now this doesn't work if you actually have enough data that the randomness of the UUIDv4 keys is a practical database performance issue, but I think you really have to think long and hard about every single use of identifiers in your application before concluding that v7 is the solution. Maybe v7 works well for some things (e.g identifiers for resources where creation times are visible to all with access to the resource) but not others (such as users or orgs which are publicly visible but without publicly visible creation times).

throw0101a

> I tend to avoid UUIDv7 and use UUIDv4 because I don't want to leak the creation times of everything.

See perhaps "UUIDv47 — UUIDv7-in / UUIDv4-out (SipHash‑masked timestamp)":

* https://github.com/stateless-me/uuidv47

* Sept 2025: https://news.ycombinator.com/item?id=45275973

nbadg

I'm also not a huge fan of leaking server-side information; I suspect UUIDv7 could still be used in statistical analysis of the keyspace (in a similar fashion to the german tank problem for integer IDs). Also, leaking data about user activity times (from your other comment) is a *really* good point that I hadn't considered.

I've read people suggest using a UUIDv7 as the primary key and a UUIDv4 as a user-visible one as a remedy.

My first thought when reading the suggestion was, "well but you'll still need an index on the v4 IDs, so what does this actually get you?" But the answer is that it makes joins less expensive; you only require the index once, when constructing the query from the user-supplied data, and everything else operates with the better-for-performance v7 IDs.

To be clear, in a practical sense, this is a bit of a micro-optimization; as far as I understand it, this really only helps you by improving the data locality of temporally-related items. So, for example, if you had an "order items" table, containing rows of a bunch of items in an order, it would speed up retrieval times because you wouldn't need to do as many index traversals to access all of the items in a particular order. But on, say, a users table (where you're unlikely to be querying for two different users who happen to have been created at approximately the same time), it's not going to help you much. Of course the exact same critique is applicable to integer IDs in those situations.

Although, come to think of it, another advantage of a user-visible v4 with v7 Pk is that you could use a different index type on the v4 ID. Specifically, I would think that a hash index for the user-visible v4 might be a halfway-decent way to go.

I'm still not sure either way if I like the idea, but it's certainly not the craziest thing I've ever heard.

cdmckay

Out of curiosity, why is it an issue if you leak creation time?

bruce511

The issue will be very context specific. In other words to (reasonably) answer the question, we'd have to judge each application individually.

For one example, say you were making voting-booth software. You really don't want a (hidden) timestamp attached to each vote (much less an incrementing id) because that would break voter confidentiality.

More generally, it's more a underlying principle of data management. Not leaking ancillary data is easier to justify than "sure we leak the date and time of the record creation, but we can't think of a reason why that matters."

Personally I think the biggest issue are "clever" programmers who treat the uuid as data and start displaying the date and time. This leads to complications ("that which is displayed, the customer wants to change"). It's only a matter of time before someone declares the date "wrong" and it must be "fixed". Not to mention time zone or daylight savings conversions.

mort96

Well you're leaking user data. I'm sure you can imagine situations where "the defendant created an account on this site on this date" could come up. And the user could have created that account not knowing that the creation date is public, because it's not listed anywhere in the publicly viewable part of the profile other than the UUID in the URL.

saaspirant

There was a HN comment about competitors tracking how many new signups are happening and increasing the discounts/sales push based on that. Something like this.

kreetx

E.g, if your service users have timestamp as part of the key and this data is visible to other users, you would know when that account was created. This could be an issue.

robertlagrant

Depends on the data. If you use a primary key in data about a person that shouldn't include their age (e.g. to remove age-based discrimination) then you are leaking an imperfect proxy to their age.

Bombthecat

Admins, early users, founders, CEOs etc etc would have althe lowest creation time...

formerly_proven

If all you want is to obfuscate the fact that your social media site only has 200 users and 80 posts, simply use a permutation over the autoincrement primary key. E.g. IDEA or CAST-128, then encode in base64. If someone steps on your toes because somewhere in your codebase you're using a forbidden legacy cipher, just use AES-128. (This is sort of the degenerate/tautological base case of format-preserving encryption)

(What do you think Youtube video IDs are?)

pdimitar

> What do you think Youtube video IDs are?

I actually haven no idea. What are they?

(Also what is the format of their `si=...` thing?)

dotancohen

From the fine article:

  > Random values don’t have natural sorting like integers or lexicographic (dictionary) sorting like character strings. UUID v4s do have "byte ordering," but this has no useful meaning for how they’re accessed.

Might the author mean that random values are not sequential, so ordering them is inefficient? Of course random values can be ordered - and ordering by what he calls "byte ordering" is exactly how all integer ordering is done. And naive string ordering too, like we would do in the days before Unicode.

dev_l1x_be

Why would you need to order by UUID? I am missing something here. Most of the time we use UUID keys for being able to create a new key without coordination and most of the time we do not want to order by primary key.

kreetx

Using an UUIDv4 as primary key is a trade-off: you use it when you need to generate unique keys in a distributed manner. Yes, these are not datetime ordered and yes, they take 128 bits of space. If you can't live with this, then sure, you need to consider alternatives. I wonder if "Avoid UUIDv4 Primary Keys" is a rule of thumb though.

dotancohen

If one needs timestamp ordering, then UUIDv7 is a good alternative.

But the author does not say timestamp ordering, he says ordering. I think he actually means and believes that there is some problem ordering UUIDv4.

kreetx

Yup. There are alternatives depending on what the situation is: with non-distributed, you could just use a sufficiently sized int (which can be rather small when the table is for e.g humans). You could add a separate timestamp column if that is important.

But if you need UUID-based lookup, then you might as well have it as a primary key, as that will save you an extra index on the actual primary key. If you also need a date and the remaining bits in UUIDv7 suffice for randomness, then that is a good option too (though this does essentially amount to having a composite column made up of datetime and randomness).

torginus

I do not understand why 128 bits is considered too big - you clearly can't have less, as on 64 bits the collision probability on real world workloads is just too high, for all but the smallest databases.

Auto-incrementing keys can work, but what happens when you run out of integers? Also, distributed dbs probably make this hard, and they can't generate a key on client.

There must be something in Postgres that wants to store the records in PK order, which while could be an okay default, I'm pretty sure you can this behavior, as this isn't great for write-heavy workloads.

dagss

The point is how closely located data you access often is. If data is roughly sorted by creation time then data you access close to one another in time is stored close to one another on disk. And typically access to data is correlated with creation time. Not for all tables but for many.

Accessing data in totally random locations can be a performance issue.

Depends on lots of things ofc but this is the concern when people talk about UUID for primary keys being an issue.

K0nserv

Isn't part of this that inserting into a btree index is more performant when the keys are increasing rather than being random? A random id will cause more re-balancing operations than always inserting at the end. Increasing ids are also more cache friendly

torginus

To be polite, I don't think this article rests on sound technical foundations.

crest

Any fixed sized bitstring has an obvious natural ordering, but since they're allocated randomly they lack the density and locality of sequential allocation.

ivan_gammel

The is article is about a solution in search of a problem, a classic premature optimization issue. UUIDv4 is perfectly fine for many use cases, including small databases. Performance argument must be considered when there’s a problem with performance on the horizon. Other considerations may be and very often superior to that.

xandrius

To summarise the article: in PG, prefer using UUIDv7 over UUIDv4 as they have slightly better performance.

If you're using latest version of PG, there is a plugin for it.

That's it.

sbuttgereit

You might have missed the big H2 section in the article:

"Recommendation: Stick with sequences, integers, and big integers"

After that then, yes, UUIDv7 over UUIDv4.

This article is a little older. PostgreSQL didn't have native support so, yeah, you needed an extension. Today, PostgreSQL 18 is released with UUIDv7 support... so the extension isn't necessary, though the extension does make the claim:

"[!NOTE] As of Postgres 18, there is a built in uuidv7() function, however it does not include all of the functionality below."

What those features are and if this extension adds more cruft in PostgreSQL 18 than value, I can't tell. But I expect that the vast majority of users just won't need it any more.

tmountain

Sticking with sequences and other integer types will cause problems if you need to shard later.

zwnow

Especially in larger systems, how does one solve the issue of reaching the max value of an integer in their database? Sure for unsigned bigint thats hard to achieve but regular ints? Apps quickly outgrow that.

grugdev42

A much simpler solution is to keep your tables as they are (with an integer primary key), but add a non sequential public identifier too.

id => 123, public_id => 202cb962ac59075b964b07152d234b70

There are many ways to generate the public_id. A simple MD5 with a salt works quite well for extremely low effort.

Add a unique constraint on that column (which also indexes it), and you'll be safe and performant for hundreds of millions of rows!

Why do we developers like to overcomplicate things? ;)

Denvercoder9

This misses the point. The reason not to use UUIDv4 is that having an index on random values is slow(er), because sequential inserts into the underlying B-tree are faster than random inserts. You're hitting the same problem with your `public_id` column, that it's not the primary key doesn't change that.

K0nserv

An additional thing I learned when I worked on a ulid alternative over the weekend[0] is: Postgres's internal Datum type is at most 64 bits which means every uuid requires heap allocation[1] (at least until we get 128 bit machines).

0: https://bsky.app/profile/hugotunius.se/post/3m7wvfokrus2g

1: https://github.com/postgres/postgres/blob/master/src/backend...

BartjeD

Personally my approach has been to start with big-ints and add a GUID code field if it becomes necessary. And then provide imports where you can match objects based on their code, if you ever need to import/export between tenants, with complex object relationships.

But that also adds complexity.

dimitrisnl

Noob question, but why no use ints for PK, and UUIDs for a public_id field?

alerighi

If you put an index on the UUID field (because you have an API where you can retrieve objects with UUID) you have kind of the same problem, at least in Postgres where a primary key index or a secondary index are more or less the same (to the point is perfectly valid in pgsql to not have any primary key defined for the table, because storage on disk is done trough an internal ID and the indexes, being primary or not, just reference to the rowId in memory). Plus the waste of space of having 2 indexes for the same table.

Of course this is not always the case that is bad, for example if you have a lot of relations you can have only one table where you have the UUID field (and thus expensive index), and then the relations could use the more efficient int key for relations (for example you have an user entity with both int and uuid keys, and user attribute references the user with the int key, of course at the expense of a join if you need to retrieve one user attribute when retrieving the user is not needed).

torginus

You can create hash indexes in Postgres, so the secondary index uuid seems workable:

https://www.postgresql.org/docs/current/hash-index.html

dsego

I also think we can use a combination of a PID - persistent ID (I always thought it was public) and an auto-increment integer ID. Having a unique key helps when migrating data between systems or referencing a piece of data in a different system. Also, using serial IDs in URLs and APIs can reveal sensitive information, e.g. how many items there are in the database.

edding4500

*edit: sorry, misread that. My answer is not valid to your question.

original answer: because if you dont come up with these ints randomly they are sequential which can cause many unwanted situations where people can guess valid IDs and deduce things from that data. See https://en.wikipedia.org/wiki/German_tank_problem

javawizard

Hence the presumed implication behind the public_id field in GP's comment: anywhere identifiers are exposed, you use the public_id field, thereby preventing ID guessing while still retaining the benefits of ordered IDs where internal lookups are concerned.

Edit: just saw your edit, sounds like we're on the same page!

javaunsafe2019

So We make things hard in the backend because of leaky abstractions? Doesn't make sense imo.

jcims

Decades of security vulnerabilities and compromises because of sequential/guessable PKs is (only!) part of the reason we're here. Miss an authorization check anywhere in the application and you're spoon-feeding entire tables to anyone with the inclination to ask for it.

null

[deleted]

grim_io

The article mentions microservices, which can increase the likelihood of collisions in sequential incremental keys.

One more reason to stay away from microservices, if possible.

mrkeen

The 'collision' is two service classes both trying to use one db.

If you separate them (i.e. microservices) the they no longer try to use one db.

grim_io

There is nothing stopping multiple microservices from using the same DB, so of course this will happen in practice.

Sometimes it might even be for a good reason.

bardsore

Always try to avoid having two services using the same DB. Only way I'd ever consider sharing a DB is if only one service will ever modify it and all others only read.

grim_io

Good luck enforcing that :)

hk1337

My biggest thing for UUIDs is don’t UUID everything. Most things should be okay with just regular integers as PKs.

reactordev

I fun trick I did was generate UUID-like ids. We all can identify a UUIDv4 most of the time by looking at one. "Ah, a uuid" we say to ourselves. A little over a decade ago I was working on a massive cloud platform and rather than generate string keys like the author above suggested (int -> binary -> base62 str) we opted for a more "clever" approach.

The UUID is 128bits. The first 64bits are a java long. The last 64bits are a java long. Let's just combine the Tenant ID long with a Resource ID long to generate a unique id for this on our platform. (worked until it didn't).

nrhrjrjrjtntbt

Atlassian settles for longer "ARIs" for this (e.g.https://developer.atlassian.com/cloud/guard-detect/developer...) composed of guids which allow a scheme like the Amazon ARN to pass around.

mcny

Postgresql 18 released in September and has uuidv7

https://www.postgresql.org/docs/current/functions-uuid.html