2 "Transformers" Posts

How Large Language Models (LLMs) Tokenize Text: Why Words Aren't What You Think

When you type “I love programming” into ChatGPT, you might assume the model reads three words. It doesn’t. It reads somewhere between three and seven tokens, depending on how the text is split.

When you ask Claude to count the letters in the word “strawberry,” it often gets it wrong. The reason is simple. Claude never saw the word “strawberry” as a complete unit. It saw tokens like "str", "aw", "berry" and tried to reason about letters it couldn’t directly access.

And when early GPT-3 users discovered that typing “SolidGoldMagikarp” caused the model to behave erratically - generating nonsense, refusing requests, or producing bizarre outputs - the culprit wasn’t the model’s training. It was a glitch token: a tokenization artifact that never appeared in training data, leaving the model with no learned representation for how to handle it ( Rumbelow & Watkins, 2023 ).


“To a language model, text isn’t a stream of words. It’s a sequence of tokens. The way those tokens are created determines what the model can and cannot understand.”


Read more →

How Large Language Models (LLMs) Handle Context Windows: The Memory That Isn't Memory

When you have a long conversation with a large language model (LLM) such as ChatGPT or Claude , it feels like the model remembers everything you’ve discussed. It references earlier points, maintains consistent context, and seems to “know” what you talked about pages ago.

But here’s the uncomfortable truth: the model doesn’t remember anything. It’s not storing your conversation in memory the way a database would. Instead, it’s rereading the entire conversation from the beginning every single time you send a message.


“A context window isn’t memory. It’s a performance where the model rereads its lines before every response.”


Read more →