Cryptography - Signal & Syntax: Practical AI Coding with Tom Archer

The Birthday Paradox in Production: When Random IDs Collide

Nov 28, 2025 • Category: Applied Modeling and Simulation, • Tags: #cryptography, #databases, #distributed-systems, #probability, #Python

You generate a UUID. It’s 128 bits total, with 122 bits of randomness. That’s 340 undecillion possible values. Collision-proof, right? Your system generates a million IDs per second. Still safe? What about a billion?

As I like to say, common sense and intuition are the enemies of science. Common sense tells you that with 340,000,000,000,000,000,000,000,000,000,000,000,000 possible values, you’d need to generate at least trillions before worrying about duplicates. Maybe fill 1% of the space? 10%?

Math shows us the uncomfortable truth: You’ll hit a 50% collision probability after generating just $2.7 \times 10^{18}$ IDs. That’s 0.0000000000000000008% of your total space. At a billion IDs per second, you’ve got 86 years. Comfortable, but not infinite. Drop to 64-bit IDs? Now you’ve got 1.4 hours. Just enough time to duck out for long lunch and return to a disaster. And 32-bit? 77 microseconds. Faster than you can blink.

You might know that the birthday paradox proves that just 23 people have more than a 50% probability of sharing a birthday. What you may not know is that this isn’t just a party trick; it’s the same mathematics that determines when your “guaranteed unique” database IDs collide, why hash tables need careful sizing, and when your distributed system’s assumptions break.

“In a room of 23 people, there’s a greater than 50% chance two share a birthday. In your database, collisions arrive far sooner than intuition suggests.”

Hash Collisions: Why Your 'Unique' Fingerprints Aren't (And Why That's Usually OK)

Nov 17, 2025 • Category: Essays and Perspectives, • Tags: #blockchain, #cryptography, #hashing, #passwords, #security

In 2017, Google researchers generated two different PDF files with identical SHA-1 hashes, finally proving what cryptographers had warned about for years: hash functions don’t create truly unique fingerprints ( Stevens et al., 2017 ). This “SHAttered” attack required 9 quintillion SHA-1 computations, which is the equivalent to 6,500 years of single-CPU computation. The attack cost approximately $45,000 in cloud computing resources, making it accessible to well-funded adversaries but not casual attackers.

Yet despite this proof, we still trust hash functions for everything from Git commits to blockchain transactions to password storage. The reason is simple: while collisions are mathematically inevitable, meaningful collisions remain virtually impossible. The full story of hash collisions is more nuanced than “unique” versus “not unique.”

“In cryptography, ‘secure’ has always meant ‘secure for now’.”