autojanet/skills/taste-skill/research/laziness/remediation/parameter-tuning.md
Zoë cfec11bb46
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
fix: convert skill submodules to plain directories
stop-slop, taste-skill, terrashark had embedded .git dirs causing
Woodpecker clone to fail on submodule update.
2026-05-30 15:44:44 -07:00

2.8 KiB

Parameter Tuning

Temperature and Top-p

Autoregressive models select each next token from a probability distribution generated by a softmax function applied to logit values. When a model defaults to brief outputs, the tokens associated with truncation and summarization have been assigned the highest probabilities through RLHF alignment.

Temperature

Adjusting the temperature parameter changes how the softmax function distributes probability mass across candidate tokens.

  • Low temperature (0.0 - 0.5): Amplifies differences between high and low-probability tokens. The model becomes highly deterministic, consistently selecting the highest-confidence continuation. Optimal for code generation, data extraction, and structured output.
  • Default temperature (1.0): Retains the original probability distribution from training.
  • High temperature (1.5+): Flattens the distribution, introducing more randomness. Useful for creative tasks but increases the risk of incoherent outputs.

Example probability distribution shift for a single token position:

Token Candidate Probability at Temp 1.5 Probability at Temp ~0.0 Raw Logit
lazy 0.4875 0.9933 2.0
quick 0.2503 0.0067 1.0
tired 0.1285 0.0000 0.0
slow 0.0660 0.0000 -1.0
clumsy 0.0339 0.0000 -2.0

Top-p (Nucleus Sampling)

Top-p truncates the probability distribution by only considering the smallest set of tokens whose cumulative probability exceeds threshold p. A Top-p of 0.0 to 0.6 combined with low temperature forces the model into a narrow, deterministic execution path, reducing the entropy that enables creative refusals and unnecessary summarization.

Gemini Thinking Level Configuration

Google Gemini 3 models replaced the legacy thinking_budget (a hard token count cap on internal reasoning) with a thinking_level parameter that provides relative guidance on computational depth.

Setting Flash Support Pro Support Use Case
minimal Yes No High-throughput, low-latency tasks
low Yes Yes Simple instruction following, data extraction
medium Yes Yes (3.1 Pro) Moderate complexity tasks
high Yes (Default) Yes (Default) Complex analysis, code generation, mathematics

Important constraints:

  • thinking_level and thinking_budget are mutually exclusive. Using both in one API call triggers an HTTP 400 error.
  • Even at low, Gemini Pro models perform mandatory minimum internal deliberation for safety and alignment.
  • For code generation and complex analysis, set to medium or high for quality scores consistently exceeding 92-95% compared to baseline.
  • Avoid combining extremely low temperature with high thinking level, as this can occasionally induce internal reasoning loops.