stop-slop, taste-skill, terrashark had embedded .git dirs causing Woodpecker clone to fail on submodule update.
2.8 KiB
Parameter Tuning
Temperature and Top-p
Autoregressive models select each next token from a probability distribution generated by a softmax function applied to logit values. When a model defaults to brief outputs, the tokens associated with truncation and summarization have been assigned the highest probabilities through RLHF alignment.
Temperature
Adjusting the temperature parameter changes how the softmax function distributes probability mass across candidate tokens.
- Low temperature (0.0 - 0.5): Amplifies differences between high and low-probability tokens. The model becomes highly deterministic, consistently selecting the highest-confidence continuation. Optimal for code generation, data extraction, and structured output.
- Default temperature (1.0): Retains the original probability distribution from training.
- High temperature (1.5+): Flattens the distribution, introducing more randomness. Useful for creative tasks but increases the risk of incoherent outputs.
Example probability distribution shift for a single token position:
| Token Candidate | Probability at Temp 1.5 | Probability at Temp ~0.0 | Raw Logit |
|---|---|---|---|
| lazy | 0.4875 | 0.9933 | 2.0 |
| quick | 0.2503 | 0.0067 | 1.0 |
| tired | 0.1285 | 0.0000 | 0.0 |
| slow | 0.0660 | 0.0000 | -1.0 |
| clumsy | 0.0339 | 0.0000 | -2.0 |
Top-p (Nucleus Sampling)
Top-p truncates the probability distribution by only considering the smallest set of tokens whose cumulative probability exceeds threshold p. A Top-p of 0.0 to 0.6 combined with low temperature forces the model into a narrow, deterministic execution path, reducing the entropy that enables creative refusals and unnecessary summarization.
Gemini Thinking Level Configuration
Google Gemini 3 models replaced the legacy thinking_budget (a hard token count cap on internal reasoning) with a thinking_level parameter that provides relative guidance on computational depth.
| Setting | Flash Support | Pro Support | Use Case |
|---|---|---|---|
minimal |
Yes | No | High-throughput, low-latency tasks |
low |
Yes | Yes | Simple instruction following, data extraction |
medium |
Yes | Yes (3.1 Pro) | Moderate complexity tasks |
high |
Yes (Default) | Yes (Default) | Complex analysis, code generation, mathematics |
Important constraints:
thinking_levelandthinking_budgetare mutually exclusive. Using both in one API call triggers an HTTP 400 error.- Even at
low, Gemini Pro models perform mandatory minimum internal deliberation for safety and alignment. - For code generation and complex analysis, set to
mediumorhighfor quality scores consistently exceeding 92-95% compared to baseline. - Avoid combining extremely low temperature with
highthinking level, as this can occasionally induce internal reasoning loops.