Skip to content(if available)orjump to list(if available)

Visualising how close random GUIDs come to being the same

twiss

> The chances of generating two GUIDs that are the same is astronomically small.

> The odds are 1 in 2^122 — that’s approximately 1 in 5,000,000,000,000,000,000,000,000,000,000,000,00.

This is true if you only generate two GUIDs, but if you generate very many GUIDs, the chance of generating two identical ones between any of them increases. E.g. if you generate 2^61 GUIDs, you have about a 1 in 2 chance of a collision, due to the birthday paradox.

2^61 is still a very large number of course, but much more feasible to reach than 2^122 when doing a collision attack. This is the reason that cryptographic hashes are typically 256 bits or more (to make the cost of collision attacks >= 2^128).

Retr0id

2^61 isn't even that large, well within the compute budget of mere mortals.

vlovich123

Depends on what “isn’t even that large means”. A modern 6ghz machine would probably need 12 years of 24/7 operation to count that high. To me that seems like a lot.

dgrin91

Yeah, but a nation state server farm can probably cut that down to minutes because their budget can buy a lot of processors. You only need a few hundred to really shrink it down to manageable numbers. And it turns out that nation starts aren't the only ones that have this budget

PaulHoule

I think you might have trouble if you tried to assign one to every iron atom in an iron filing.

NoahZuniga

* not the birthday paradox, but the birthday bound.

8organicbits

Note that this only considers UUIDv4, the random UUID. Other forms can generate UUIDs that are much closer together. For UUIDv7, UUIDs generated within the same millisecond will have identical 48 bit prefixes (or up to 60 when the monotonic counter from section 6.2 is used).

https://www.rfc-editor.org/rfc/rfc9562.html#monotonicity_cou...

e1g

You need to be generating >100M of them within the same millisecond before even remembering that collisions can theoretically happen.

charcircuit

>You

The entire universe. Else it's not universally unique.

8organicbits

I like UUIDv7s as database IDs since they sort chronologically, are unique, and are efficient to generate. My system chooses the UUIDs; I don't allow externally generated IDs in. If I did, then an attacker could easily force a collision. As such, I only care about how fast I create IDs. This is a common pattern.

If your system does need to worry about UUIDv7s generated by the rest of the universe, you likely also need to worry about maliciously created IDs, software bugs, clocks that reset to unix epoch, etc. I worry about those more than a bonefide collision.

webstrand

This is the chance that given a specific guid, that you'll find a collision for it. Utterly minuscule chance. However birthday paradox controls, if you generate 2^62.60 guids the chance that you've generated a collision is around 99%. Still enormously unlikely, but way smaller than 2^122.

At a rate of comparing 400,000 guids per second, you have a 99% chance of seeing a collision within the next 553,750 years.

jonathrg

You would need a little more memory to see/detect that collision.

amingilani

Instead of picking a target UUID and evaluating new UUIDs against it, a better experiment would be finding duplicates in all the UUIDs you have generated.

This plays nicely with the birthday paradox.

RS-232

UUID > GUID.

Microsoft’s GUID standard is garbage.

lionkor

Oh, why?

w-ll

not OP but i already have fields for time ts and what model it is. i want my uuids random.

kaoD

I think the current Microsoft GUID is just UUIDv7.

https://learn.microsoft.com/en-us/dotnet/api/system.guid?vie...

I don't think there's a "Microsoft standard" and they just use different versions of UUID in different products over time. No idea why they call it GUID instead of UUID though, but it's easier to speak out loud so I'm not against it.

v7 has a timestamp indeed, but isn't the time making it more collision resistant? You'd have to generate tons of UUIDv7s in the same millisecond, while v4 is more likely to collide due to not being time-constrained and the birthday paradox.

I think both have their uses though. You might need pure random if you want your UUID not to convey any time information and you're not generating tons of them (e.g. a random user id).

What do you mean "model"? Are you referring to UUIDv1 which has time and MAC address?

nesk_

Nice experiment. Is the code available somewhere?