autojanet/skills/taste-skill/research/laziness/root-causes/output-limits.md
Zoë cfec11bb46
Some checks failed
ci/woodpecker/push/woodpecker Pipeline failed
fix: convert skill submodules to plain directories
stop-slop, taste-skill, terrashark had embedded .git dirs causing
Woodpecker clone to fail on submodule update.
2026-05-30 15:44:44 -07:00

39 lines
2.5 KiB
Markdown

# Output Limits and Consumer Truncation
## Context Window Asymmetry
Models like Gemini have massive input context windows (up to 2 million tokens) but strictly capped output limits (typically 8,000 tokens). When the model estimates that a complete response would exceed its output budget, it preemptively compresses or summarizes the output rather than risking an abrupt cutoff.
This creates a paradox: the model can read extensive inputs but cannot respond proportionally, leading to systematic information loss on complex tasks.
## The Consumer Middleware Problem
Consumer-facing applications (gemini.google.com, standard ChatGPT tiers) apply additional software-level truncation on top of the model's inherent limits. This middleware silently truncates conversation history and uploaded files to reduce compute costs for free and low-tier users.
Key mechanisms:
- **History capping:** Many consumer interfaces cap active conversation history at approximately 32,000 tokens, regardless of the model's actual capacity.
- **Context pruning:** Large system instructions or saved personal context consume tokens that would otherwise be available for the conversation, effectively shrinking the working window.
- **Retrieval-based recall:** Consumer apps often use retrieval mechanisms to selectively inject saved context, meaning the model frequently drops instructions it was given earlier in the session.
## Developer Platform Differences
Direct API access and developer platforms (Google AI Studio, OpenAI API Playground) bypass consumer middleware entirely. These environments provide:
- Full context window access without hidden truncation
- Complete control over generation parameters
- No dynamic throttling based on user tier
- Processing of complex prompt structures without middleware interference
The practical difference is significant: the same model that produces truncated outputs through a consumer interface will generate complete, unabridged responses when accessed through direct API endpoints.
## Terminal and CLI Integration
Purpose-built CLI tools (Gemini CLI, Claude Code, third-party wrappers) offer additional advantages for avoiding truncation:
| Access Method | Context Handling | Truncation Risk | Parameter Control |
|:---|:---|:---|:---|
| Consumer web app | Aggressive pruning, 32K cap | High | Limited |
| Developer platform (AI Studio) | Full context, no hidden slicing | Low | Full |
| Direct API | Full context, raw access | Minimal | Full |
| CLI tools with local models | No corporate alignment filters | None | Full |