Should I use temperature or top-p?

Typically you adjust one or the other, not both. Temperature scales all probabilities; top-p truncates low-probability tokens. OpenAI recommends changing temperature OR top-p but not both simultaneously. For deterministic outputs (code, data extraction) lower temperature (0.0–0.3) is more reliable. For creative tasks, higher temperature (0.7–1.0) produces more variety.

What temperature should I use for coding tasks?

For code generation and factual Q&A, use temperature 0.0–0.2 and top-p 0.9. Low temperature makes the model pick high-probability (likely correct) tokens consistently. Very low (0.0) is deterministic — useful for reproducible outputs.

What is the difference between repetition, frequency, and presence penalties?

Repetition Penalty (Hugging Face / Ollama): divides logit by the penalty factor if the token appeared before — multiplicative. Frequency Penalty (OpenAI): subtracts a value proportional to how many times the token appeared — additive, linear. Presence Penalty (OpenAI): subtracts a flat value for any token that appeared at all — additive, binary.

What is top-p vs top-k vs min-p?

Top-p keeps only the smallest set of tokens summing to probability p (dynamic k). Top-k always keeps exactly k tokens regardless of probability gaps. Min-p keeps tokens whose probability ≥ p × (probability of top token), adapting to the model's confidence — more aggressive pruning when the model is certain.

Sampling Parameters Guide

Every parameter that controls how LLMs generate text — temperature, top-p, top-k, penalties, and more.

Quick Presets

Parameter Reference

Scales the logits (raw model output scores) before applying softmax to convert them into a probability distribution. The most fundamental sampling parameter.

Controls randomness and creativity. Low values make output deterministic and focused; high values make output diverse and creative.

Low value

Near 0: deterministic — always picks the highest-probability token. Good for code, facts, structured data.

High value

Near 2: very random — equal probability for many tokens. Risk of incoherence or hallucination.

Tips

→0.0–0.3: factual Q&A, code generation, classification, data extraction
→0.5–0.8: balanced conversations, summarisation, translation
→0.9–1.2: creative writing, brainstorming, varied outputs
→1.5–2.0: poetry, experimental text — monitor for quality degradation
→Temperature and top-p interact: reduce one if you increase the other

Sampling Parameters Guide

Quick Presets

Parameter Reference

About

How to use