Context Window: 200K vs 1M
When to switch to the extended context window and at what cost
The Two Windows
| Parameter | 200K standard | 1M GA |
|---|---|---|
| Availability | All plans | GA for Max/Team/Enterprise CC plans (v2.1.75) |
| Header required | None (CC plans) · API header for direct API access | anthropic-beta: context-1m-2025-08-07 |
| Price (Opus input) | $5/MTok | $10/MTok |
| Max output | 128K tokens | 128K tokens |
Above 200K input tokens, all context tokens are billed at the premium rate, not just the excess. This is a cost threshold, not a linear progression.
Precision at Scale (MRCR v2)
| Model | At 256K | At 1M |
|---|---|---|
| Opus 4.6 | 93% | 76% |
| Sonnet 4.5 | n/a | 18.5% |
Opus 4.6 remains usable at 1M (76% precision), but degradation is measurable. Sonnet collapses and is not recommended beyond 200K for precise tasks.
Cost per Session (Approximate)
| Session type | Tokens in | Sonnet 4.6 | Opus 4.6 |
|---|---|---|---|
| PR review (≤200K) | 50K | ~$0.23 | ~$0.38 |
| Refactoring (≤200K) | 150K | ~$0.75 | ~$1.25 |
| Service analysis (>200K) | 500K | ~$4.13 | ~$6.88 |
When to Use 1M
The community rule: 200K + RAG by default, 1M Opus reserved for cases where loading everything at once is genuinely necessary.
Justified:
- Full codebase audit in a single pass
- Massive documentation analysis with no chunking possible
- Agent Teams on a complex multi-service architecture
Not justified:
- Day-to-day development (even on large projects)
- Tasks with fast feedback loops (tests, debugging)
- Cases where /compact + sequential sessions work fine
Activation (API)
response = client.messages.create( model="claude-opus-4-6", extra_headers={ "anthropic-beta": "context-1m-2025-08-07" }, messages=[...])For direct API access only. Claude Code Max/Team/Enterprise plans have 1M enabled automatically — no header needed. Without this header on direct API calls, requests exceeding 200K tokens return an error even on tier 4 accounts.
Recommended Pattern
Work at 200K with proactive /compact (at 70% context usage) rather than enabling 1M by default. Open a new session around 70-75% usage: performance is better and cost stays predictable.
For RAG on large documents, Gemini 1.5 Pro offers 2M context at $3.50/$10.50 per MTok, roughly 2-3x cheaper for pure retrieval without needing Opus-level reasoning.
Enter your email to read the full card and get the complete PDF bundle.
All content is free and open-source. We just ask for your email.