Sampling Parameters Guide
Every parameter that controls how LLMs generate text — temperature, top-p, top-k, penalties, and more.
Quick Presets
Parameter Reference
Scales the logits (raw model output scores) before applying softmax to convert them into a probability distribution. The most fundamental sampling parameter.
Controls randomness and creativity. Low values make output deterministic and focused; high values make output diverse and creative.
Near 0: deterministic — always picks the highest-probability token. Good for code, facts, structured data.
Near 2: very random — equal probability for many tokens. Risk of incoherence or hallucination.
- →0.0–0.3: factual Q&A, code generation, classification, data extraction
- →0.5–0.8: balanced conversations, summarisation, translation
- →0.9–1.2: creative writing, brainstorming, varied outputs
- →1.5–2.0: poetry, experimental text — monitor for quality degradation
- →Temperature and top-p interact: reduce one if you increase the other
About
This reference covers every sampling parameter that controls how LLMs generate text. Temperature scales the probability distribution over the vocabulary (higher = more random). Top-p (nucleus sampling) truncates to the smallest set of tokens whose cumulative probability exceeds p. Top-k limits sampling to the k most-probable tokens. Repetition Penalty reduces the probability of tokens that already appeared. Frequency Penalty is OpenAI's log-probability penalty proportional to token frequency. Presence Penalty is a flat penalty for any token that appeared at all. Min-P is a newer alternative that sets a minimum probability relative to the top token. Mirostat dynamically adjusts sampling to maintain target perplexity. Six quick presets (Precise, Balanced, Creative, Code, Chat, Story) let you jump to sensible configurations.
How to use
- 1 Click a Quick Preset to see recommended settings for your use case.
- 2 Expand any parameter to read its detailed description, effect, and tips.
- 3 Use the Low / High value boxes to understand each end of the parameter's range.
- 4 Note which providers support each parameter (OpenAI, Anthropic, Google, Ollama, etc.).
- Should I use temperature or top-p?
- Typically you adjust one or the other, not both. Temperature scales all probabilities; top-p truncates low-probability tokens. OpenAI recommends changing temperature OR top-p but not both simultaneously. For deterministic outputs (code, data extraction) lower temperature (0.0–0.3) is more reliable. For creative tasks, higher temperature (0.7–1.0) produces more variety.
- What temperature should I use for coding tasks?
- For code generation and factual Q&A, use temperature 0.0–0.2 and top-p 0.9. Low temperature makes the model pick high-probability (likely correct) tokens consistently. Very low (0.0) is deterministic — useful for reproducible outputs.
- What is the difference between repetition, frequency, and presence penalties?
- Repetition Penalty (Hugging Face / Ollama): divides logit by the penalty factor if the token appeared before — multiplicative. Frequency Penalty (OpenAI): subtracts a value proportional to how many times the token appeared — additive, linear. Presence Penalty (OpenAI): subtracts a flat value for any token that appeared at all — additive, binary.
- What is top-p vs top-k vs min-p?
- Top-p keeps only the smallest set of tokens summing to probability p (dynamic k). Top-k always keeps exactly k tokens regardless of probability gaps. Min-p keeps tokens whose probability ≥ p × (probability of top token), adapting to the model's confidence — more aggressive pruning when the model is certain.