How Large Language Models (LLMs) Know Things They Were Never Taught

When you ask an LLM without web search enabled a question like “What happened in the news this morning?”, the LLM will respond by telling you that it doesn’t have access to current events and suggest you check a more current news source such as Reuters or Google News.

Conversely, ask an LLM with web search enabled the same question, and you receive a detailed rundown of breaking stories, political controversies, and sports news from the past 24 hours.

Identical question. Same underlying technology. Completely different answers. The difference isn’t that one model is smarter or more current than the other. The difference is whether web search was triggered.

But why does that matter? Both models were trained months ago. Their internal knowledge stopped updating the moment training ended. So how does flipping a switch allow one model to suddenly “know” what happened this morning? The answer reveals a fundamental distinction most users never consider: the difference between what a model learned and what a model read.

LLMs don’t update their weights (the billions of numerical parameters that encode everything learned during training) when you chat with them. They don’t learn from your conversations. But they can access external information and reason over it within their context window. This isn’t learning; it’s reading. And understanding that difference changes how you think about what these systems can and cannot do.


“A model with web search doesn’t know more. It can see more. The knowledge lives in the retrieved text, not in the weights.”


Read more →

Temperature and Top-P: The Creativity Knobs

Every API call to ChatGPT , Claude , or any other LLM includes two parameters most people either ignore or tweak randomly: temperature and top-p. The defaults work fine for casual use, so why bother understanding them? Because these two numbers fundamentally control how your model thinks.

The temperature value determines whether the model plays it safe or takes creative risks while the top-p value decides how many options the model even considers. Together, these values shape the personality of every response you receive.

I’ve watched developers cargo-cult settings from others without understanding what they do. “Set temperature to 0.7 for creative writing” becomes tribal knowledge, passed down without explanation. Let’s fix that by opening the hood and examining the mathematics that makes these knobs work.


“Temperature doesn’t make the model smarter or dumber. It changes how much the model trusts its own first instinct.”


Read more →

The Wreck of the Edmund Fitzgerald: Modeling Decomposition in Extreme Environments

Originally appearing on his 1976 album, Summertime Dream, “The Wreck of the Edmund Fitzgerald” is a powerful ballad written and performed by folk singer Gordon Lightfoot . In 1976, the song hit No. 1 in Canada on the RPM chart, and No. 2 in the United States on the Billboard Hot 100. The lyrics are a masterpiece, but there was one specific line that always stood out to me: “The lake, it is said, never gives up her dead.” Following the singer’s death in 2023, the song reconnected with older fans and reached new generations of listeners, making it to No. 15 on Billboard’s Hot Rock and Alternative category.

Listening to it again after all these years, I was inspired to research what that line meant and if there was any truth to it. What I discovered was very illuminating: Lake Superior really doesn’t give up her dead, and the science behind it is as haunting as the song itself.

It turns out that line is not poetic license. It’s physics.

When 29 souls went down with the Edmund Fitzgerald on November 10, 1975, they stayed down. Not because of some mystical property of the Great Lakes, but because of a perfect storm of temperature, pressure, and biology that we can model mathematically.


Note: This post contains scientific discussion of decomposition and forensic pathology in the context of maritime disasters.


“The lake, it is said, never gives up her dead / When the skies of November turn gloomy”


Read more →

The Birthday Paradox in Production: When Random IDs Collide

You generate a UUID. It’s 128 bits total, with 122 bits of randomness. That’s 340 undecillion possible values. Collision-proof, right? Your system generates a million IDs per second. Still safe? What about a billion?

As I like to say, common sense and intuition are the enemies of science. Common sense tells you that with 340,000,000,000,000,000,000,000,000,000,000,000,000 possible values, you’d need to generate at least trillions before worrying about duplicates. Maybe fill 1% of the space? 10%?

Math shows us the uncomfortable truth: You’ll hit a 50% collision probability after generating just \(2.7 \times 10^{18}\) IDs. That’s 0.0000000000000000008% of your total space. At a billion IDs per second, you’ve got 86 years. Comfortable, but not infinite. Drop to 64-bit IDs? Now you’ve got 1.4 hours. Just enough time to duck out for long lunch and return to a disaster. And 32-bit? 77 microseconds. Faster than you can blink.

You might know that the birthday paradox proves that just 23 people have more than a 50% probability of sharing a birthday. What you may not know is that this isn’t just a party trick; it’s the same mathematics that determines when your “guaranteed unique” database IDs collide, why hash tables need careful sizing, and when your distributed system’s assumptions break.


“In a room of 23 people, there’s a greater than 50% chance two share a birthday. In your database, collisions arrive far sooner than intuition suggests.”


Read more →

Hash Collisions: Why Your 'Unique' Fingerprints Aren't (And Why That's Usually OK)

In 2017, Google researchers generated two different PDF files with identical SHA-1 hashes, finally proving what cryptographers had warned about for years: hash functions don’t create truly unique fingerprints ( Stevens et al., 2017 ). This “SHAttered” attack required 9 quintillion SHA-1 computations, which is the equivalent to 6,500 years of single-CPU computation. The attack cost approximately $45,000 in cloud computing resources, making it accessible to well-funded adversaries but not casual attackers.

Yet despite this proof, we still trust hash functions for everything from Git commits to blockchain transactions to password storage. The reason is simple: while collisions are mathematically inevitable, meaningful collisions remain virtually impossible. The full story of hash collisions is more nuanced than “unique” versus “not unique.”


“In cryptography, ‘secure’ has always meant ‘secure for now’.”


Read more →