autojanet/skills/taste-skill/research/laziness/root-causes/output-limits.md

# Output Limits and Consumer Truncation

## Context Window Asymmetry

Models like Gemini have massive input context windows (up to 2 million tokens) but strictly capped output limits (typically 8,000 tokens). When the model estimates that a complete response would exceed its output budget, it preemptively compresses or summarizes the output rather than risking an abrupt cutoff.

This creates a paradox: the model can read extensive inputs but cannot respond proportionally, leading to systematic information loss on complex tasks.

## The Consumer Middleware Problem

Consumer-facing applications (gemini.google.com, standard ChatGPT tiers) apply additional software-level truncation on top of the model's inherent limits. This middleware silently truncates conversation history and uploaded files to reduce compute costs for free and low-tier users.

Key mechanisms:

- **History capping:** Many consumer interfaces cap active conversation history at approximately 32,000 tokens, regardless of the model's actual capacity.
- **Context pruning:** Large system instructions or saved personal context consume tokens that would otherwise be available for the conversation, effectively shrinking the working window.
- **Retrieval-based recall:** Consumer apps often use retrieval mechanisms to selectively inject saved context, meaning the model frequently drops instructions it was given earlier in the session.

## Developer Platform Differences

Direct API access and developer platforms (Google AI Studio, OpenAI API Playground) bypass consumer middleware entirely. These environments provide:

- Full context window access without hidden truncation
- Complete control over generation parameters
- No dynamic throttling based on user tier
- Processing of complex prompt structures without middleware interference

The practical difference is significant: the same model that produces truncated outputs through a consumer interface will generate complete, unabridged responses when accessed through direct API endpoints.

## Terminal and CLI Integration

Purpose-built CLI tools (Gemini CLI, Claude Code, third-party wrappers) offer additional advantages for avoiding truncation:

| Access Method | Context Handling | Truncation Risk | Parameter Control |
|:---|:---|:---|:---|
| Consumer web app | Aggressive pruning, 32K cap | High | Limited |
| Developer platform (AI Studio) | Full context, no hidden slicing | Low | Full |
| Direct API | Full context, raw access | Minimal | Full |
| CLI tools with local models | No corporate alignment filters | None | Full |