7 "LLM" Posts

How Large Language Models (LLMs) Know Things They Were Never Taught

When you ask an LLM without web search enabled a question like “What happened in the news this morning?”, the LLM will respond by telling you that it doesn’t have access to current events and suggest you check a more current news source such as Reuters or Google News.

Conversely, ask an LLM with web search enabled the same question, and you receive a detailed rundown of breaking stories, political controversies, and sports news from the past 24 hours.

Identical question. Same underlying technology. Completely different answers. The difference isn’t that one model is smarter or more current than the other. The difference is whether web search was triggered.

But why does that matter? Both models were trained months ago. Their internal knowledge stopped updating the moment training ended. So how does flipping a switch allow one model to suddenly “know” what happened this morning? The answer reveals a fundamental distinction most users never consider: the difference between what a model learned and what a model read.

LLMs don’t update their weights (the billions of numerical parameters that encode everything learned during training) when you chat with them. They don’t learn from your conversations. But they can access external information and reason over it within their context window. This isn’t learning; it’s reading. And understanding that difference changes how you think about what these systems can and cannot do.


“A model with web search doesn’t know more. It can see more. The knowledge lives in the retrieved text, not in the weights.”


Read more →

Temperature and Top-P: The Creativity Knobs

Every API call to ChatGPT , Claude , or any other LLM includes two parameters most people either ignore or tweak randomly: temperature and top-p. The defaults work fine for casual use, so why bother understanding them? Because these two numbers fundamentally control how your model thinks.

The temperature value determines whether the model plays it safe or takes creative risks while the top-p value decides how many options the model even considers. Together, these values shape the personality of every response you receive.

I’ve watched developers cargo-cult settings from others without understanding what they do. “Set temperature to 0.7 for creative writing” becomes tribal knowledge, passed down without explanation. Let’s fix that by opening the hood and examining the mathematics that makes these knobs work.


“Temperature doesn’t make the model smarter or dumber. It changes how much the model trusts its own first instinct.”


Read more →

How Large Language Models (LLMs) Tokenize Text: Why Words Aren't What You Think

When you type “I love programming” into ChatGPT, you might assume the model reads three words. It doesn’t. It reads somewhere between three and seven tokens, depending on how the text is split.

When you ask Claude to count the letters in the word “strawberry,” it often gets it wrong. The reason is simple. Claude never saw the word “strawberry” as a complete unit. It saw tokens like "str", "aw", "berry" and tried to reason about letters it couldn’t directly access.

And when early GPT-3 users discovered that typing “SolidGoldMagikarp” caused the model to behave erratically - generating nonsense, refusing requests, or producing bizarre outputs - the culprit wasn’t the model’s training. It was a glitch token: a tokenization artifact that never appeared in training data, leaving the model with no learned representation for how to handle it ( Rumbelow & Watkins, 2023 ).


“To a language model, text isn’t a stream of words. It’s a sequence of tokens. The way those tokens are created determines what the model can and cannot understand.”


Read more →

How Large Language Models (LLMs) Handle Context Windows: The Memory That Isn't Memory

When you have a long conversation with a large language model (LLM) such as ChatGPT or Claude , it feels like the model remembers everything you’ve discussed. It references earlier points, maintains consistent context, and seems to “know” what you talked about pages ago.

But here’s the uncomfortable truth: the model doesn’t remember anything. It’s not storing your conversation in memory the way a database would. Instead, it’s rereading the entire conversation from the beginning every single time you send a message.


“A context window isn’t memory. It’s a performance where the model rereads its lines before every response.”


Read more →

How Large Language Models (LLMs) Learn: Calculus and the Search for Understanding

When you interact with a large language model (LLM) such as ChatGPT or Claude , the model seems to respond instantly relative to the question’s degree of difficulty. What’s easy to forget is that every word it predicts comes from a long history of learning where billions of gradient steps have slowly sculpted its understanding of language.

Large language models don’t memorize text. They optimize it. Behind that optimization lies calculus. I’m not referring to the calculus you did with pencil and paper. I’m talking about a sprawling, automated version that computes millions of derivatives per second.

At its heart, every LLM is a feedback system. It starts with random guesses, measures how wrong it was, and then adjusts itself to be slightly less wrong. The word “slightly” in this context is the essence of calculus.


“Each gradient step represents a measurable reduction in error, guiding the model toward a more stable understanding of language.”


Read more →