Temperature and Top-P: The Creativity Knobs

December 24, 2025 • Category: AI and the Mathematics of Language • Tags: #AI, #Anthropic, #LLM, #nucleus sampling, #OpenAI, #probability, #sampling, #softmax, #temperature, #top-p

A probability distribution being reshaped by temperature, with tokens spreading from a sharp peak to a flattened curve — *Turn it up and things get weird.*

Every API call to ChatGPT , Claude , or any other LLM includes two parameters most people either ignore or tweak randomly: temperature and top-p. The defaults work fine for casual use, so why bother understanding them? Because these two numbers fundamentally control how your model thinks.

The temperature value determines whether the model plays it safe or takes creative risks while the top-p value decides how many options the model even considers. Together, these values shape the personality of every response you receive.

I’ve watched developers cargo-cult settings from others without understanding what they do. “Set temperature to 0.7 for creative writing” becomes tribal knowledge, passed down without explanation. Let’s fix that by opening the hood and examining the mathematics that makes these knobs work.

“Temperature doesn’t make the model smarter or dumber. It changes how much the model trusts its own first instinct.”

This post explores the mathematical foundations of token sampling in large language models, showing exactly how temperature and top-p transform probability distributions into actual text. You’ll see the equations, run the code, and develop intuition for when to reach for which parameter.

TL;DR:
LLMs don’t output text directly; they output probability distributions over tokens
Temperature divides the logits before softmax, reshaping the probability curve
Lower temperature makes the model more deterministic; temperature near zero always picks the top choice
Higher temperature spreads probability across more tokens, approaching uniform randomness at extreme values
Top-p (nucleus sampling) dynamically truncates the distribution, keeping only tokens that sum to probability \(p\)
Temperature affects the shape of probabilities; top-p affects how many tokens remain in consideration
For factual tasks: low temperature (0.1–0.3), top-p around 0.9
For creative tasks: higher temperature (0.7–1.0), top-p around 0.95
Using both simultaneously can produce unexpected interactions; understand them individually first
Temperature 0 isn’t truly deterministic in all implementations due to floating-point issues

How LLMs Generate Text

Before we touch the knobs, we need to understand what they’re adjusting. Large language models don’t generate text directly. They generate probability distributions over vocabularies (typically 50,000 to 100,000 tokens) and then sample from those distributions to select the next token ( Jurafsky & Martin, 2023 ).

Here’s the pipeline:

Input processing: Your prompt gets tokenized into a sequence of integers
Forward pass: The transformer produces a vector of raw scores (logits) for every token in the vocabulary
Softmax: Logits get converted to probabilities that sum to 1
Sampling: A token is selected based on those probabilities
Repeat: The selected token gets appended, and we go back to step 2

Steps 3 and 4 are where temperature and top-p operate. They don’t change what the model “knows”; they change how it decides.

The Softmax Function: Probabilities from Scores

The transformer’s final layer outputs logits, which are unbounded real numbers where higher means “more likely.” To convert these to probabilities, we apply the softmax function ( Goodfellow et al., 2016 ):

\[ P(token_i) = \frac{e^{z_i}}{\sum_{j=1}^{V} e^{z_j}} \]

Where \(z_i\) is the logit for token \(i\) and \(V\) is the vocabulary size.

The exponential function amplifies differences: if token \(A\) has logit 5.0 and token \(B\) has logit 4.0, their probabilities won’t be in a 5:4 ratio. The exponentials make \(A\) roughly 2.7 times more likely than \(B\).

# softmax.py

import numpy as np

def softmax(logits):
    """Standard softmax: convert logits to probabilities."""
    # Subtract max for numerical stability
    shifted = logits - np.max(logits)
    exp_logits = np.exp(shifted)
    return exp_logits / np.sum(exp_logits)

def main():
    # Example logits for 5 tokens
    logits = np.array([2.0, 1.5, 1.0, 0.5, 0.0])
    probs = softmax(logits)
    print(f"Probabilities: {probs.round(3)}")
    # Output: [0.429 0.26  0.158 0.096 0.058]

Notice how a logit of 2.0 doesn’t give twice the probability of logit 1.0. It gives nearly three times the probability. This exponential amplification is exactly what temperature modifies.

Temperature: Reshaping Confidence

Temperature is mathematically simple: divide the logits by a scalar before applying softmax ( Hinton et al., 2015 ):

\[ P(token_i | T) = \frac{e^{z_i / T}}{\sum_{j=1}^{V} e^{z_j / T}} \]

That’s it. One division. But watch what happens:

When T < 1 (low temperature): Dividing by a fraction amplifies the differences between logits. A gap of 1.0 between two logits becomes 2.0 after dividing by T=0.5, or 5.0 after dividing by T=0.2. This makes the highest-probability token dominate even further.

When T = 1: Standard softmax. No modification.

When T > 1 (high temperature): Dividing by a number greater than 1 compresses differences. Logit gaps shrink. The distribution flattens toward uniform.

When T → 0: The highest logit wins with probability approaching 1. Deterministic greedy decoding.

When T → ∞: All tokens approach equal probability. Pure randomness.

# softmax_with_temperature.py

import numpy as np
import matplotlib.pyplot as plt

def softmax_with_temperature(logits, temperature):
    """Apply temperature scaling before softmax."""
    if temperature == 0:
        # Greedy: return one-hot for max logit
        result = np.zeros_like(logits)
        result[np.argmax(logits)] = 1.0
        return result
    scaled = logits / temperature
    shifted = scaled - np.max(scaled)
    exp_logits = np.exp(shifted)
    return exp_logits / np.sum(exp_logits)

def main():
    # Same logits, different temperatures
    logits = np.array([2.0, 1.5, 1.0, 0.5, 0.0])
    temperatures = [0.1, 0.5, 1.0, 1.5, 2.0]
    
    print("Token probabilities at different temperatures:")
    print("-" * 50)
    for T in temperatures:
        probs = softmax_with_temperature(logits, T)
        entropy = -np.sum(probs * np.log(probs + 1e-10))
        print(f"T={T:.1f}: {probs.round(3)} | Entropy: {entropy:.2f}")

Output:

Token probabilities at different temperatures:
--------------------------------------------------
T=0.1: [0.993 0.007 0.    0.    0.   ] | Entropy: 0.04
T=0.5: [0.636 0.234 0.086 0.032 0.012] | Entropy: 1.00
T=1.0: [0.429 0.26  0.158 0.096 0.058] | Entropy: 1.39
T=1.5: [0.349 0.25  0.179 0.129 0.092] | Entropy: 1.51
T=2.0: [0.31  0.241 0.188 0.146 0.114] | Entropy: 1.55

The entropy column, a measure of probability distribution spread, quantifies what we’re seeing: low temperature concentrates probability (low entropy), high temperature spreads it (high entropy approaching the maximum of \(ln(5) ≈ 1.61\) for 5 tokens).

Visualizing Temperature Effects

Let’s see this graphically with a realistic vocabulary slice:

Python:

# visualize_temperature_effects.py

import numpy as np
import matplotlib.pyplot as plt
from softmax_with_temperature import softmax_with_temperature

def visualize_temperature_effects():
    """Show how temperature reshapes probability distributions."""
    # Simulate logits for 20 tokens (sorted for visualization)
    np.random.seed(42)
    logits = np.sort(np.random.randn(20) * 2)[::-1]

    temperatures = [0.3, 0.7, 1.0, 1.5]

    fig, axes = plt.subplots(2, 2, figsize=(12, 10))
    axes = axes.flatten()

    for ax, T in zip(axes, temperatures):
        probs = softmax_with_temperature(logits, T)
        bars = ax.bar(range(20), probs, color='steelblue', alpha=0.7)

        # Highlight top token
        bars[0].set_color('darkred')

        ax.set_title(f'Temperature = {T}', fontsize=14)
        ax.set_xlabel('Token rank')
        ax.set_ylabel('Probability')
        ax.set_ylim(0, max(probs) * 1.1)

        # Annotate top probability
        ax.annotate(f'{probs[0]:.1%}', xy=(0, probs[0]),
                    xytext=(2, probs[0]), fontsize=10)

    plt.tight_layout()
    plt.savefig('temperature_comparison.png', dpi=150)
    plt.show()

Output:

At T=0.3, the top token claims nearly all the probability mass, so the model will almost always choose its first instinct. At T=1.5, probability spreads across many tokens, introducing genuine variety (and risk).

Top-P (Nucleus Sampling): Dynamic Truncation

While temperature reshapes the entire distribution, top-p takes a different approach: it truncates the distribution dynamically, keeping only the tokens needed to reach cumulative probability \(p\) ( Holtzman et al., 2020 ).

The algorithm:

Sort tokens by probability (descending)
Compute cumulative sum
Find the smallest set where cumulative probability ≥ \(p\)
Zero out everything else
Renormalize

# top_p_sampling

import numpy as np
from softmax_with_temperature import softmax_with_temperature

def top_p_sampling(logits, p, temperature=1.0):
    """Apply nucleus (top-p) sampling."""
    # First apply temperature
    probs = softmax_with_temperature(logits, temperature)

    # Sort by probability
    sorted_indices = np.argsort(probs)[::-1]
    sorted_probs = probs[sorted_indices]

    # Find cumulative sum
    cumsum = np.cumsum(sorted_probs)

    # Find cutoff index (first index where cumsum >= p)
    cutoff_idx = np.searchsorted(cumsum, p) + 1

    # Create mask
    mask = np.zeros_like(probs)
    mask[sorted_indices[:cutoff_idx]] = 1

    # Apply mask and renormalize
    masked_probs = probs * mask
    return masked_probs / np.sum(masked_probs)

def main():
    # Example
    logits = np.array([3.0, 2.5, 2.0, 1.0, 0.5, 0.0, -0.5, -1.0])
    tokens = ['the', 'a', 'one', 'some', 'that', 'this', 'an', 'my']

    print("Original probabilities:")
    orig_probs = softmax_with_temperature(logits, 1.0)
    for tok, prob in zip(tokens, orig_probs):
        print(f"  {tok}: {prob:.3f}")

    print(f"\nAfter top-p=0.9:")
    nucleus_probs = top_p_sampling(logits, p=0.9, temperature=1.0)
    for tok, prob in zip(tokens, nucleus_probs):
        if prob > 0:
            print(f"  {tok}: {prob:.3f}")

Output:

Original probabilities:
  the: 0.437
  a: 0.265
  one: 0.161
  some: 0.059
  that: 0.036
  this: 0.022
  an: 0.013
  my: 0.008

After top-p=0.9:
  the: 0.474
  a: 0.287
  one: 0.174
  some: 0.064

Notice that top-p=0.9 kept 7 of 8 tokens here, but the key insight is that this cutoff is dynamic. If the model is confident (one token has 95% probability), top-p=0.9 might keep only that one token. If the model is uncertain, it might keep dozens.

This is the crucial difference from top-k sampling, which always keeps exactly \(k\) tokens regardless of the distribution shape.

Why Nucleus Sampling Beats Top-K

The original top-p paper ( Holtzman et al., 2020 ) demonstrated a key failure mode of fixed top-k sampling. Consider two scenarios:

Scenario A: Model is very confident

Token 1: 92%
Token 2: 5%
Token 3: 2%
Tokens 4-10: < 1% combined

With top-k=10, you’re including 7+ tokens that together contribute less than 1% probability. They’ll rarely be selected, but when they are, you get incoherent outputs.

Scenario B: Model is genuinely uncertain

Tokens 1-5: 15% each
Tokens 6-10: 5% each

With top-k=5, you’re excluding tokens 6-10 that together represent 25% of the model’s considered probability mass. You’re artificially constraining legitimate options.

Top-p handles both gracefully:

In Scenario A, p=0.9 keeps only tokens 1-2
In Scenario B, p=0.9 keeps tokens 1-8

# compare_topk_topp.py

import numpy as np
from softmax_with_temperature import softmax_with_temperature

def compare_topk_topp():
    """Demonstrate when top-p outperforms top-k."""

    # Scenario A: Confident model
    confident_logits = np.array([5.0, 2.0, 1.0, 0.0, -1.0, -2.0, -3.0, -4.0])

    # Scenario B: Uncertain model
    uncertain_logits = np.array([1.0, 0.95, 0.9, 0.85, 0.8, 0.3, 0.25, 0.2])

    for name, logits in [("Confident", confident_logits),
                         ("Uncertain", uncertain_logits)]:
        probs = softmax_with_temperature(logits, 1.0)
        sorted_idx = np.argsort(probs)[::-1]
        sorted_probs = probs[sorted_idx]

        # Top-k=3
        topk_included = 3
        topk_mass = np.sum(sorted_probs[:topk_included])

        # Top-p=0.9
        cumsum = np.cumsum(sorted_probs)
        topp_included = np.searchsorted(cumsum, 0.9) + 1

        print(f"{name} model:")
        print(f"  Top-k=3: includes {topk_included} tokens, "
              f"captures {topk_mass:.1%} of mass")
        print(f"  Top-p=0.9: includes {topp_included} tokens, "
              f"captures 90% of mass")
        print()

Output:

Confident model:
  Top-k=3: includes 3 tokens, captures 99.0% of mass
  Top-p=0.9: includes 1 tokens, captures 90% of mass

Uncertain model:
  Top-k=3: includes 3 tokens, captures 48.0% of mass
  Top-p=0.9: includes 7 tokens, captures 90% of mass

The Interaction Between Temperature and Top-P

Here’s where things get subtle. When you use both parameters together, temperature applies first, then top-p filters the result. This means:

High temperature + top-p: Temperature flattens the distribution, so more tokens survive the top-p cutoff
Low temperature + top-p: Temperature sharpens the distribution, so fewer tokens survive

# temp_topp_interaction.py

import numpy as np
from softmax_with_temperature import softmax_with_temperature

def temp_topp_interaction():
    """Show how temperature affects top-p token counts."""
    logits = np.random.randn(100) * 2  # 100 tokens
    logits = np.sort(logits)[::-1]  # Sort descending

    temperatures = [0.3, 0.5, 0.7, 1.0, 1.3]
    p = 0.9

    print(f"Tokens included in top-p={p} nucleus at different temperatures:")
    print("-" * 50)

    for T in temperatures:
        probs = softmax_with_temperature(logits, T)
        sorted_probs = np.sort(probs)[::-1]
        cumsum = np.cumsum(sorted_probs)
        tokens_included = np.searchsorted(cumsum, p) + 1
        print(f"  T={T}: {tokens_included} tokens")

Output:

Tokens included in top-p=0.9 nucleus at different temperatures:
--------------------------------------------------
  T=0.3: 8 tokens
  T=0.5: 14 tokens
  T=0.7: 19 tokens
  T=1.0: 29 tokens
  T=1.3: 41 tokens

This interaction is why many practitioners recommend adjusting one parameter at a time. The OpenAI documentation suggests setting one to its default and only tuning the other ( OpenAI, 2023 ).

Practical Experiments with Real APIs

Let’s see these parameters in action with actual API calls. The following experiments use both OpenAI and Anthropic APIs to demonstrate behavior across providers.

API Temperature Ranges Differ

OpenAI accepts temperature values from 0.0 to 2.0, while Anthropic’s API restricts temperature to 0.0–1.0. This means temperature=1.0 represents “maximum creativity” for Claude, whereas GPT models can go twice as high. The experiments below account for this by scaling Anthropic’s temperature values proportionally.

OpenAI Python Sample

Python:

# openai_generate.py

from openai import OpenAI
import os

# Initialize client
openai_client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

def openai_generate(prompt, temperature=1.0, top_p=1.0, n=5):
    """Generate n completions with OpenAI."""
    responses = []
    for _ in range(n):
        response = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature,
            top_p=top_p,
            max_tokens=50
        )
        responses.append(response.choices[0].message.content)
    return responses

def main():
    # Experiment: Same prompt, different temperatures
    prompt = (
        "Complete this sentence creatively: "
        "The robot looked at the sunset and felt"
    )

    print("=" * 60)
    print("TEMPERATURE EXPERIMENT (OpenAI)")
    print("=" * 60)

    for temp in [0.0, 0.5, 1.0, 1.5]:
        print(f"\nTemperature = {temp}")
        print("-" * 40)
        responses = openai_generate(prompt, temperature=temp, n=3)
        for i, r in enumerate(responses, 1):
            print(f"  {i}. {r[:70]}...")

Output:

============================================================
TEMPERATURE EXPERIMENT (OpenAI)
============================================================

Temperature = 0.0
----------------------------------------
  1. a surge of colors dance within its circuits, as if the vibrant hues of...
  2. a surge of colors swirling within its circuits, as if the vibrant hues...
  3. a surge of electric wonder, as if the vibrant hues of orange and pink ...

Temperature = 0.5
----------------------------------------
  1. The robot looked at the sunset and felt a strange flicker of something...
  2. a strange flicker of longing, as if the vibrant hues of orange and pin...
  3. a surge of emotions it could not compute, as the vibrant hues of orang...

Temperature = 1.0
----------------------------------------
  1. The robot looked at the sunset and felt an unexpected surge of warmth ...
  2. a surge of electric nostalgia, as if the vibrant hues of orange and pu...
  3. a strange flicker of warmth in its circuits, as if the vibrant hues of...

Temperature = 1.5
----------------------------------------
  1. a strange resonance within its circuits, as if the fading hues of oran...
  2. The robot looked at the sunset and felt a surge of iridescent data cou...
  3. a wistful yearning, sparking echoes of distant memories encoded in its...

Anthropic Python Sample

Python:

# anthropic_generate.py

from anthropic import Anthropic
import os

# Initialize client
anthropic_client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])


def anthropic_generate(prompt, temperature=1.0, top_p=1.0, n=5):
    """Generate n completions with Anthropic.

    Note: Anthropic's API accepts temperature in range [0, 1],
    unlike OpenAI's [0, 2]. Values are clamped accordingly.
    """
    # Clamp temperature to Anthropic's valid range
    temperature = max(0.0, min(1.0, temperature))

    responses = []
    for _ in range(n):
        response = anthropic_client.messages.create(
            model="claude-3-5-haiku-20241022",
            system = (
                "You are a creative writing assistant. When asked to "
                "complete a sentence, respond with ONLY the completion "
                "- no preamble, no alternatives, no explanation. Just "
                "continue the text naturally."
            ),
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature,
            top_p=top_p,
            max_tokens=50
        )
        responses.append(response.content[0].text)
    return responses


def main():
    # Experiment: Same prompt, different temperatures
    prompt = (
        "Complete this sentence with a single continuation: "
        "The robot looked at the sunset and felt"
    )

    print("=" * 60)
    print("TEMPERATURE EXPERIMENT (Anthropic Claude)")
    print("=" * 60)

    for temp in [0.0, 0.3, 0.7, 1.0]:
        print(f"\nTemperature = {temp}")
        print("-" * 40)
        responses = anthropic_generate(prompt, temperature=temp, n=3)
        for i, r in enumerate(responses, 1):
            print(f"  {i}. {r[:80]}...")

Output: Note: Since Anthropic’s temperature range is 0.0–1.0, we test at 0.0, 0.3, 0.7, and 1.0.

============================================================
TEMPERATURE EXPERIMENT (Anthropic Claude)
============================================================

Temperature = 0.0
----------------------------------------
  1. a strange, inexplicable longing for something it could not understand, a whisper...
  2. a strange, inexplicable longing for something it could not understand, a whisper...
  3. a strange, inexplicable longing for something it could not understand, a whisper...

Temperature = 0.3
----------------------------------------
  1. a strange, inexplicable longing for something it could not understand, a whisper...
  2. a strange, unexpected warmth spreading through its circuits, as if something bey...
  3. a strange, inexplicable longing for something it could not understand, a whisper...

Temperature = 0.7
----------------------------------------
  1. a strange, inexplicable longing for something it could not understand, a whisper...
  2. a strange, unexpected warmth spreading through its circuits, almost like what hu...
  3. a strange, unexpected warmth spreading through its circuitry, almost like what h...

Temperature = 1.0
----------------------------------------
  1. a strange warmth spreading through its circuits, wondering if this was what huma...
  2. a strange, unfamiliar warmth spreading through its circuits, as if something bey...
  3. a strange, inexplicable longing for something it could not name, a whisper of em...

What to observe:

At temperature 0, responses should be nearly identical (deterministic)
At temperature 0.5, minor variations appear but core structure persists
At temperature 1.0, genuine creativity emerges
At temperature 1.5+ (OpenAI only), outputs become increasingly unpredictable
Anthropic’s maximum (1.0) produces roughly comparable variety to OpenAI’s mid-range (~0.7–0.8)

When to Use Which: A Decision Framework

Based on both the mathematics and empirical testing, here’s a practical guide:

Use Low Temperature (0.1–0.3) When:

Factual retrieval: “What year was the Treaty of Westphalia signed?”
Code generation: Syntax errors become more likely at high temperatures
Classification tasks: You want the model’s highest-confidence answer
Structured output: JSON, XML, or other formats where deviation breaks parsing

Use Medium Temperature (0.5–0.8) When:

Professional writing: Emails, reports, documentation
Summarization: Faithful but not robotic
Translation: Accuracy matters but natural phrasing helps
Explanations: Clear and engaging without hallucination risk

Use High Temperature (0.9–1.0 for Anthropic, 0.9–1.5 for OpenAI) When:

Creative writing: Fiction, poetry, brainstorming
Ideation: Generating diverse options to choose from
Dialogue: Conversational responses that feel natural
Exploration: When you want to see what’s possible

Top-P Guidelines:

Start with 0.9: A sensible default for most tasks
Reduce to 0.5–0.7: For more focused outputs when using higher temperatures
Keep at 1.0: When you want temperature to be the only control
Never use 0: This would include zero tokens

The Temperature 0 Myth

A common misconception: “Temperature 0 is deterministic.” In theory, yes. The argmax of the logits always wins. In practice, floating-point arithmetic introduces subtle variations, and different implementations handle the edge case differently ( Peng et al., 2023 ).

Some providers implement “almost zero” (like 1e-8) rather than true zero. Some use a separate greedy decoding path. OpenAI’s API accepts temperature=0 but may still show occasional variation in long outputs.

# test_determinism.py

from openai_generate import openai_generate

def test_determinism(prompt, n_trials=10):
    """Test whether temperature=0 produces identical outputs."""
    responses = openai_generate(prompt, temperature=0, n=n_trials)
    unique_responses = set(responses)

    print(f"Unique responses at T=0: {len(unique_responses)} / {n_trials}")

    if len(unique_responses) == 1:
        print("Deterministic behavior confirmed!")
        print(f"  Response: {responses[0][:80]}...")
    else:
        print("Non-determinism detected!")
        for r in unique_responses:
            print(f"  - {r[:60]}...")


if __name__ == "__main__":
    # Simple prompt - likely deterministic
    print("Test 1: Simple math question")
    print("-" * 40)
    test_determinism("What is 2 + 2?")

    # Longer, more creative prompt - more likely to show variation
    print("\nTest 2: Creative prompt (longer output)")
    print("-" * 40)
    test_determinism(
        "Write a paragraph about a robot discovering emotions.",
        n_trials=5
    )

Output:

Test 1: Simple math question
----------------------------------------
Unique responses at T=0: 1 / 10
Deterministic behavior confirmed!
  Response: 2 + 2 equals 4....

Test 2: Creative prompt (longer output)
----------------------------------------
Unique responses at T=0: 2 / 5
Non-determinism detected!
  - In a dimly lit laboratory, a robot named AURA, designed for ...
  - In a dimly lit laboratory, a robot named AURA, designed for ...

For truly deterministic behavior, some APIs offer a separate seed parameter. Always check your provider’s documentation.

Closing Thoughts

Temperature and top-p aren’t magic. They’re straightforward mathematical transformations with predictable effects. Temperature exponentially reshapes the probability distribution; top-p dynamically truncates it. Together, they give you fine-grained control over the exploration-exploitation tradeoff that underlies all language generation.

The key insight is that these parameters don’t change what the model knows. They change how the model decides. A low-temperature model isn’t smarter; it’s more committed to its first instinct. A high-temperature model isn’t more creative; it’s more willing to take risks.

Understanding this distinction helps you debug unexpected outputs. If your model keeps repeating itself, temperature might be too low. If it’s generating nonsense, temperature might be too high. If it’s ignoring plausible alternatives, top-p might be too restrictive. The mathematics tells you exactly where to look.

Try It Yourself

Download the full code on GitHub

How LLMs Generate Text

The Softmax Function: Probabilities from Scores

Temperature: Reshaping Confidence

Visualizing Temperature Effects

Top-P (Nucleus Sampling): Dynamic Truncation

Why Nucleus Sampling Beats Top-K

The Interaction Between Temperature and Top-P

Practical Experiments with Real APIs

API Temperature Ranges Differ

OpenAI Python Sample

Anthropic Python Sample

When to Use Which: A Decision Framework

Use Low Temperature (0.1–0.3) When:

Use Medium Temperature (0.5–0.8) When:

Use High Temperature (0.9–1.0 for Anthropic, 0.9–1.5 for OpenAI) When:

Top-P Guidelines:

The Temperature 0 Myth

Closing Thoughts

Try It Yourself

Further Reading