Definition
Temperature is a sampling hyperparameter that controls how deterministic or random an LLM's token selection is at inference time. Technically, it scales the logits (raw prediction scores) before applying softmax to produce a probability distribution over possible next tokens. At temperature 0, the model always selects the highest-probability token, producing near-deterministic outputs. At higher temperatures (0.7-1.0), the probability distribution is flattened, making lower-probability tokens more likely to be selected, producing more varied and creative outputs.
Engineering Context
For production AI agents, use temperature 0 (or 0.1) for analytical tasks, data extraction, classification, JSON generation, and any step requiring output format consistency. Use temperature 0.3 to 0.7 for creative generation such as summaries, drafts, and explanations. Never use temperature above 0.7 in automated workflows where output format consistency is required—high temperature can cause the model to deviate from expected JSON schemas or structured formats. Temperature interacts with other sampling parameters (top_p, top_k) that also affect output diversity. In production, document the temperature used for each agent node alongside the prompt version for reproducibility.
Related Terms
Building production AI agents?
We design and implement deterministic AI agent systems for enterprise teams.
Start Assessment