2. Core Workflow

📌 Section 2 TL;DR (2 minutes)

What you’ll learn: The mental model and critical workflows for Claude Code mastery.

Key Concepts:

Interaction Loop: Describe → Analyze → Review → Accept/Reject cycle
Context Management 🔴 CRITICAL: Watch Ctx(u): — /compact at 70%, /clear at 90%
Plan Mode: Read-only exploration before making changes
Rewind: Undo with Esc×2 or /rewind
Mental Model: Claude = expert pair programmer, not autocomplete

The One Rule:

Always check context % before starting complex tasks. High context = degraded quality.

Read this section if: You want to avoid the #1 mistake (context overflow) Skip if: You just need quick command reference (go to Section 10)

Reading time: 20 minutes

Skill level: Day 1-3

Goal: Understand how Claude Code thinks

2.1 The Interaction Loop

Every Claude Code interaction follows this pattern:

┌─────────────────────────────────────────────────────────┐
│                    INTERACTION LOOP                     │
├─────────────────────────────────────────────────────────┤
│                                                         │
│   1. DESCRIBE  ──→  You explain what you need           │
│        │                                                │
│        ▼                                                │
│   2. ANALYZE   ──→  Claude explores the codebas         │
│        │                                                 │
│        ▼                                                 │
│   3. PROPOSE   ──→  Claude suggests changes (diff)       │
│        │                                                 │
│        ▼                                                 │
│   4. REVIEW    ──→  You read and evaluate                │
│        │                                                 │
│        ▼                                                 │
│   5. DECIDE    ──→  Accept / Reject / Modify             │
│        │                                                 │
│        ▼                                                 │
│   6. VERIFY    ──→  Run tests, check behavior            │
│        │                                                 │
│        ▼                                                 │
│   7. COMMIT    ──→  Save changes (optional)              │
│                                                          │
└─────────────────────────────────────────────────────────┘

Key Insight

The loop is designed so that you remain in control. Claude proposes, you decide.

2.2 Context Management

🔴 This is the most important concept in Claude Code.

📌 Context Management Quick Reference

The zones:

🟢 0-50%: Work freely
🟡 50-75%: Be selective
🔴 75-90%: /compact now
⚫ 90%+: /clear required

When context is high:

/compact (saves context, frees space)
/clear (fresh start, loses history)

Prevention: Load only needed files, compact regularly, commit frequently

What is Context?

Context is Claude’s “working memory” for your conversation. It includes:

All messages in the conversation
Files Claude has read
Command outputs
Tool results

The Context Budget

Claude has a 200,000 token context window. Think of it like RAM - when it fills up, things slow down or fail.

Reading the Statusline

The statusline shows your context usage:

Claude Code │ Ctx(u): 45% │ Cost: $0.23 │ Session: 1h 23m

Metric	Meaning
`Ctx(u): 45%`	You’ve used 45% of context
`Cost: $0.23`	API cost so far
`Session: 1h 23m`	Time elapsed

Custom Statusline Setup

The default statusline can be enhanced with more detailed information like git branch, model name, and file changes.

Option 1: ccstatusline (recommended)

Add to ~/.claude/settings.json:

{
  "statusLine": {
    "type": "command",
    "command": "npx -y ccstatusline@latest",
    "padding": 0
  }
}

Option 2: Custom script

Create your own script that:

Reads JSON data from stdin (model, context, cost, git info)
Outputs a single formatted line to stdout
Supports ANSI colors for styling

{
  "statusLine": {
    "type": "command",
    "command": "/path/to/your/statusline-script.sh",
    "padding": 0
  }
}

Use /statusline command in Claude Code to auto-generate a starter script.

Available JSON fields (stdin):

Field	Type	Description
`model`	string	Current model name
`context`	object	`used`, `total`, `percentage`
`cost_usd`	number	Session cost
`git`	object	Branch, staged/unstaged counts
`rate_limits`	object	Claude.ai usage (v2.1.80+)

rate_limits object (v2.1.80+) — displays Claude.ai token usage directly in the statusline without opening the dashboard:

{
  "rate_limits": {
    "5h":  { "used_percentage": 42, "resets_at": "2026-03-20T15:30:00Z" },
    "7d":  { "used_percentage": 18, "resets_at": "2026-03-23T00:00:00Z" }
  }
}

Example usage in a statusline script:

#!/usr/bin/env bash
input=$(cat)
pct_5h=$(echo "$input" | jq -r '.rate_limits["5h"].used_percentage // "?"')
echo "RL: ${pct_5h}%"

Context Zones

Zone	Usage	Action
🟢 Green	0-50%	Work freely
🟡 Yellow	50-75%	Start being selective
🔴 Red	75-90%	Use `/compact` or `/clear`
⚫ Critical	90%+	Must clear or risk errors

Context Recovery Strategies

When context gets high:

Option 1: Compact (/compact)

Summarizes the conversation
Preserves key context
Reduces usage by ~50%

When /compact goes wrong: Compaction fires when the model has the most accumulated context, meaning it is also at its most distracted point. If the model cannot predict where the work is heading (e.g., auto-compact fires mid-debugging and your next message is “now fix that warning in bar.ts”), it may drop future-relevant info from the summary. Mitigate by compacting proactively and with context: /compact focus on the auth refactor, drop the test debugging guides the summary toward what matters next. (Source: Anthropic internal guidance)

Option 2: Clear (/clear)

Starts fresh
Loses all context
Use when changing topics

“One Task, One Chat” — mixing unrelated topics across turns degrades model accuracy by ~39%. Context accumulates noise (“context rot”) that distorts judgment even when total token usage stays low. Use /clear aggressively between distinct tasks, not just when the context bar turns red.

Option 3: Summarize from here (v2.1.32+)

Use /rewind (or Esc + Esc) to open the checkpoint list
Select a checkpoint and choose “Summarize from here”
Claude summarizes everything from that point forward, keeping earlier context intact
Frees space while keeping critical context
More precise than full /compact

Option 4: Targeted Approach

Be specific in queries
Avoid “read the entire file”
Use symbol references: “read the calculateTotal function”

Context Triage: What to Keep vs. Evacuate

When approaching the red zone (75%+), /compact alone may not be enough. You need to actively decide what information to preserve before compacting.

Priority: Keep

Keep	Why
CLAUDE.md content	Core instructions must persist
Files being actively edited	Current work context
Tests for the current component	Validation context
Critical decisions made	Architectural choices
Error messages being debugged	Problem context

Priority: Evacuate

Evacuate	Why
Files read but no longer relevant	One-time lookups
Debug output from resolved issues	Historical clutter
Long conversation history	Summarized by /compact
Files from completed tasks	No longer needed
Large config files	Can be re-read if needed

Pre-Compact Checklist:

Document critical decisions in CLAUDE.md or a session note
Commit pending changes to git (creates restore point)
Note the current task explicitly (“We’re implementing X”)
Run /compact to summarize and free space

Pro tip: If you know you’ll need specific information post-compact, tell Claude explicitly: “Before we compact, remember that we decided to use Strategy A for authentication because of X.” Claude will include this in the summary.

Session vs. Persistent Memory

Claude Code has three distinct memory systems. Understanding the difference is crucial for effective long-term work:

Aspect	Session Memory	Auto-Memory (native)	Persistent Memory (Serena)
Scope	Current conversation only	Across sessions, per-project	Across all sessions
Managed by	`/compact`, `/clear`	`/memory` command (automatic)	`write_memory()` via Serena MCP
Lost when	Session ends or `/clear`	Explicitly deleted via `/memory`	Explicitly deleted from Serena
Requires	Nothing	Nothing (v2.1.59+)	Serena MCP server
Use case	Immediate working context	Key decisions, context snippets	Architectural decisions, patterns

Session Memory (short-term):

Everything in your current conversation
Files Claude has read, commands run, decisions made
Managed with /compact (compress) and /clear (reset)
Disappears when you close Claude Code

Auto-Memory (native, v2.1.59+):

Built into Claude Code — no MCP server or configuration required
Claude automatically saves useful context (decisions, patterns, preferences) to MEMORY.md files
Organized per-project: .claude/memory/MEMORY.md or ~/.claude/projects/<path>/memory/MEMORY.md
Managed with /memory: view, edit, or delete what’s been saved
Survives across sessions automatically

Persistent Memory (long-term, Serena MCP):

Requires Serena MCP server installed
Explicitly saved with write_memory("key", "value")
Survives across sessions
Ideal for: architectural decisions, API patterns, coding conventions

Pattern: End-of-Session Save

# Before ending a productive session:
"Save our authentication decision to memory:
- Chose JWT over sessions for scalability
- Token expiry: 15min access, 7d refresh
- Store refresh tokens in httpOnly cookies"

# Claude calls: write_memory("auth_decisions", "...")

# Next session:
"What did we decide about authentication?"
# Claude calls: read_memory("auth_decisions")

When to use which:

Session memory: Active problem-solving, debugging, exploration
Auto-memory: Decisions and context you want Claude to rediscover next session without manual effort (v2.1.59+)
Persistent memory (Serena): Structured key-value store for architectural decisions across many projects
CLAUDE.md: Team conventions, project structure (versioned with git)

Auto-compact and PostToolUse memory capture — a conflict to know about:

Claude Code auto-compacts the conversation when the remaining context drops below a fixed buffer threshold (roughly the last 6-7% of the context window, or about 13K tokens from the effective limit). In practice, this triggers somewhere in the 90-95% usage range depending on the model’s context window and reserved output tokens. Before full compaction runs, Claude Code also applies micro-compaction — a lighter pass that selectively compresses older tool results (file reads, bash outputs, search results) to free space incrementally without summarizing the whole conversation. If auto-compact fails (e.g., due to a rate limit), it retries up to 3 consecutive times before giving up for that session.

If you use a hook-based memory capture tool (like claude-mem) that saves session history via PostToolUse, auto-compact can fire and discard conversation history before the save pipeline has a chance to capture it.

Two ways to handle this:

// Option 1: disable auto-compact in your project settings.json
// (you manage compaction manually via /compact)
{
  "autoCompactEnabled": false
}

# Option 2: keep auto-compact on, but set your tool's save threshold
# to trigger well below 80% (e.g., at 60% context usage)
# — check your memory plugin's cooldowns/threshold config

Option 1 gives full control but requires discipline. Option 2 is safer if you forget to compact manually. The general guide advice (use /compact proactively at 75%) still applies — auto-compact disabled just means you own the timing.

See also: Memory Systems: Session vs Persistent Memory for the full comparison table and cross-session tool options.

Fresh Context Pattern (Ralph Loop)

The Problem: Context Rot

Research shows LLM performance degrades significantly with accumulated context:

20-30% performance gap between focused and polluted prompts (Chroma, 2025)
Degradation starts at ~16K tokens for older Claude models (Chroma, 2025); Anthropic reports noticeable degradation around 300-400K tokens on the 1M context window (task-dependent, not a fixed threshold)
Failed attempts, error traces, and iteration history dilute attention

Instead of managing context within a session, you can restart with a fresh session per task while persisting state externally.

The Pattern

# Canonical "Ralph Loop" (Geoffrey Huntley)
while :; do cat TASK.md PROGRESS.md | claude -p ; done

Naming note: “Ralph Loop” is used in two distinct ways in the community. Geoffrey Huntley’s original pattern (above) is about context rotation — spawning fresh sessions to avoid context rot. A separate usage, popularized by Addy Osmani and others in 2026, applies the same term to atomic task iteration in multi-agent teams: pick task → implement → validate → commit → reset context → repeat. Both share the same core mechanic (stateless loop with external state), but the scope differs. When the term appears without attribution, clarify which variant is meant.

State persists via:

TASK.md — Current task definition with acceptance criteria
PROGRESS.md — Learnings, completed tasks, blockers
Git commits — Each iteration commits atomically

Variant: tasks/lessons.md

A lightweight alternative for interactive sessions (no loop required): after each user correction, Claude updates tasks/lessons.md with the rule to avoid the same mistake. Reviewed at the start of each new session.

tasks/
├── todo.md      # Current plan (checkable items)
└── lessons.md   # Rules accumulated from corrections

The difference from PROGRESS.md: lessons.md captures behavioral rules (“always diff before marking done”, “never mock without asking”) rather than task state. It compounds over time — the mistake rate drops as the ruleset grows.

Traditional	Fresh Context
Accumulate in chat history	Reset per task
`/compact` to compress	State in files + git
Context bleeds across tasks	Each task gets full attention

When to Use

Situation	Use
Context 70-90%, staying interactive	`/compact`
Context 90%+, need fresh start	`/clear` then continue
Long autonomous run, task-based	Fresh Context Pattern
Overnight/AFK execution	Fresh Context Pattern

Good fit:

Autonomous sessions >1 hour
Migrations, large refactorings
Tasks with clear success criteria (tests pass, build succeeds)

Poor fit:

Interactive exploration
Design without clear spec
Tasks with slow/ambiguous feedback loops

Variant: Session-per-Concern Pipeline

Instead of looping the same task, dedicate a fresh session to each quality dimension:

Plan session — Architecture, scope, acceptance criteria
Test session — Write unit, integration, and E2E tests first (TDD)
Implement session — Code until all linters and tests pass
Review sessions — Separate sessions for security audit, performance, code review
Repeat — Iterate with scope adjustments as needed

This combines Fresh Context (clean 200K per phase) with OpusPlan (Opus for review/strategy sessions, Sonnet for implementation). Each session generates progress artifacts that feed the next.

Practical Implementation

Option 1: Manual loop

# Simple fresh-context loop
for i in {1..10}; do
    echo "=== Iteration $i ==="
    claude -p "$(cat TASK.md PROGRESS.md)"
    git diff --stat  # Check progress
    read -p "Continue? (y/n) " -n 1 -r
    [[ ! $REPLY =~ ^[Yy]$ ]] && break
done

Option 2: Script (see examples/scripts/fresh-context-loop.sh)

./fresh-context-loop.sh 10 TASK.md PROGRESS.md

Option 3: External orchestrators

AFK CLI — Zero-config orchestration across task sources

Task Definition Template

## Current Focus
[Single atomic task with clear deliverable]

## Acceptance Criteria
- [ ] Tests pass
- [ ] Build succeeds
- [ ] [Specific verification]

## Context
- Related files: [paths]
- Constraints: [rules]

## Do NOT
- Start other tasks
- Refactor unrelated code

Key Insight

/compact preserves conversation flow. Fresh context maximizes per-task attention at the cost of continuity.

Sources: Chroma Research - Context Rot | Ralph Loop Origin | METR - Long Task Capability | Anthropic - Context Engineering

What Consumes Context?

Action	Context Cost
Reading a small file	Low (~500 tokens)
Reading a large file	High (~5K+ tokens)
Running commands	Medium (~1K tokens)
Multi-file search	High (~3K+ tokens)
Long conversations	Accumulates

Context Depletion Symptoms

Learn to recognize when context is running out:

Symptom	Severity	Action
Shorter responses than usual	🟡 Warning	Continue with caution
Forgetting CLAUDE.md instructions	🟠 Serious	Document state, prepare checkpoint
Inconsistencies with earlier conversation	🔴 Critical	New session needed
Errors on code already discussed	🔴 Critical	New session needed
”I can’t access that file” (when it was read)	🔴 Critical	New session immediately

Context Inspection

Check your context usage in detail:

/context

Example output:

┌─────────────────────────────────────────────────────────────┐
│ CONTEXT USAGE                                    67% used   │
├─────────────────────────────────────────────────────────────┤
│ System Prompt          ████████░░░░░░░░░░░░░░░░  12,450 tk  │
│ System Tools           ██░░░░░░░░░░░░░░░░░░░░░░   3,200 tk  │
│ MCP Tools (5 servers)  ████████████░░░░░░░░░░░░  18,600 tk  │
│ Conversation           ████████████████████░░░░  89,200 tk  │
├─────────────────────────────────────────────────────────────┤
│ TOTAL                                           123,450 tk  │
│ REMAINING                                        76,550 tk  │
└─────────────────────────────────────────────────────────────┘

💡 The Last 20% Rule: Reserve ~20% of context for:

Multi-file operations at end of session
Last-minute corrections
Generating summary/checkpoint

Cost Awareness & Optimization

Note: If you use claude -p, the Agent SDK, GitHub Actions, or any automation harness, a billing model change effective June 15, 2026 introduces a new monthly credit cap on programmatic usage separate from interactive limits. See §9.13 — The Interactive/Programmatic Billing Split for the full breakdown, affected tools, and audit steps.

Claude Code isn’t free - you’re using API credits. Understanding costs helps optimize usage.

Pricing Model (as of April 2026)

The default model depends on your subscription: Max/Team Premium subscribers get Opus 4.7 by default, while Pro/Team Standard subscribers get Sonnet 4.6. If Opus usage hits the plan threshold, it auto-falls back to Sonnet.

Model lineup (April 2026): Claude Opus 4.7 is the standard production Opus model (claude-opus-4-7). Claude Mythos Preview is more capable but remains in limited release. Opus 4.7 is the recommended upgrade path from Opus 4.6. For workflows where Opus 4.6’s lower token footprint outweighs 4.7’s newer features, see Pinning Opus 4.6 in the OpusPlan section.

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window	Notes
Sonnet 4.6	$3.00	$15.00	200K tokens	Default (Pro/Team Standard)
Sonnet 4.5	$3.00	$15.00	200K tokens	Legacy
Opus 4.7	$5.00	$25.00	200K tokens	Released April 2026; default for Max/Team Premium
Opus 4.7 (1M context)	$5.00	$25.00	1M tokens	GA for Max/Team/Enterprise; API requires tier 4
Opus 4.6 (standard)	$5.00	$25.00	200K tokens	Previous generation
Opus 4.6 (1M context)	$5.00	$25.00	1M tokens	Previous generation
Opus 4.6 (fast mode)	$30.00	$150.00	200K tokens	Fast mode; 2.5x faster, 6x price
Haiku 4.5	$0.80	$4.00	200K tokens	Budget option

Opus 4.7 tokenizer: A new tokenizer means the same input can map to roughly 1.0–1.35× more tokens depending on content type. At higher effort levels, Opus 4.7 also produces more output tokens (more reasoning). Measure your real traffic when migrating from Opus 4.6; use the effort parameter to control spend.

Reality check: A typical 1-hour session costs $0.10 - $0.50 depending on usage patterns.

Model retirement (April 2026): claude-3-haiku-20240307 (Claude 3 Haiku) was retired on April 20, 2026. If your CLAUDE.md, agent definitions, or scripts still hardcode this model ID, migrate to claude-haiku-4-5-20251001 (Haiku 4.5) immediately. Source: platform.claude.com/docs/en/release-notes/model-deprecations

200K vs 1M Context: Performance, Cost & Use Cases

The 1M context window (GA for Max/Team/Enterprise plans; API tier 4 still required for direct API use) is a significant capability jump — but community feedback consistently frames it as a niche premium tool, not a default.

Retrieval accuracy at scale (MRCR v2 8-needle 1M variant)

Model	256K accuracy	1M accuracy	Source
Opus 4.6	93%	76%	Anthropic blog + independent analysis (Feb 2026)
Sonnet 4.5	—	18.5%	Anthropic blog (Feb 2026)
Sonnet 4.6	Not yet published	Not yet published	—

The benchmark is the “8-needle 1M variant” — finding 8 specific facts in a 1M-token document. Opus 4.6 drops from 93% to 76% when scaling from 256K to 1M; Sonnet 4.5 collapses to 18.5%. Community validation: a developer loaded ~733K tokens (4 Harry Potter books) and Opus 4.6 retrieved 49/50 documented spells in a single prompt (HN, Feb 2026). Sonnet 4.6 MRCR not yet published, but community reports suggest it “struggles with following specific instructions and retrieving precise information” at full 1M context.

Cost per session (approximate)

Above 200K input tokens on direct API, all tokens in the request are charged at premium rates — not just the excess. Note: on Max/Team/Enterprise Claude Code plans, Opus 4.6 1M is the default at standard rates (no premium) as of v2.1.75 (March 2026).

Session type	~Tokens in	~Tokens out	Sonnet 4.6	Opus 4.6
Bug fix / PR review (≤200K)	50K	5K	~$0.23	~$0.38
Module refactoring (≤200K)	150K	20K	~$0.75	~$1.25
Full service analysis (>200K, 1M context)	500K	50K	~$4.13	~$6.88

For comparison: Gemini 1.5 Pro offers a 2M context window at $3.50/$10.50/MTok — significantly cheaper for pure long-context RAG. Community advice: use Gemini for large-document RAG, Claude for reasoning quality and agentic workflows.

When to use which

Scenario	Recommendation
Bug fix, PR review, daily coding	Sonnet 4.6 @ 200K — fast and cheap
Full-repo audit, entire codebase load	Opus 4.7 @ 1M — worth the cost for precision
Cross-module refactoring	Sonnet 4.6 @ 1M — but weigh cost vs. chunking + RAG
Architecture analysis, Agent Teams	Opus 4.7 @ 1M — strongest retrieval at scale
Large-document RAG (PDFs, legal, books)	Consider Gemini 1.5 Pro — cheaper at this scale

Key facts

Opus 4.7 max output: 128K tokens (same as Opus 4.6); Sonnet 4.6 max output: 64K tokens
1M context ≈ 30,000 lines of code / 750,000 words
1M context is GA for Max/Team/Enterprise Claude Code plans (v2.1.75, March 2026) — API direct use still requires tier 4 or custom rate limits
API direct use above 200K input tokens: Sonnet 4.6 doubles to $6/$22.50/MTok; Opus 4.6 doubles to $10/$37.50/MTok (standard rate applies for Claude Code Max/Team/Enterprise plans)
If input stays ≤200K, standard pricing applies even with the beta flag enabled
Practical workaround: check context at ~70% and open a new session rather than hitting compaction (HN pattern)
Community consensus: 200K + RAG is the default; 1M Opus is reserved for cases where loading everything at once is genuinely necessary

What Costs the Most?

Action	Tokens Consumed	Estimated Cost
Read a 100-line file	~500	$0.0015
Read 10 files (1000 lines)	~5,000	$0.015
Long conversation (20 messages)	~30,000	$0.090
MCP tool call (Serena, Context7)	~2,000	$0.006
Running tests (with output)	~3,000-10,000	$0.009-$0.030
Code generation (100 lines)	~2,000 output	$0.030

The expensive operations:

Reading entire large files - 2000+ line files add up fast
Multiple MCP server calls - Each server adds ~2K tokens overhead
Long conversations without /compact - Context accumulates
Repeated trial and error - Each iteration costs

Cost Optimization Strategies

Strategy 1: Be specific in queries

# ❌ Expensive - reads entire file
"Check auth.ts for issues"
# ~5K tokens if file is large

# ✅ Cheaper - targets specific location
"Check the login function in auth.ts:45-60"
# ~500 tokens

Strategy 2: Use /compact proactively

# Without /compact - conversation grows
Context: 10% → 30% → 50% → 70% → 90%
Cost per message increases as context grows

# With /compact at 70%
Context: 10% → 30% → 50% → 70% → [/compact] → 30% → 50%
Frees significant context space for subsequent messages

Strategy 3: Choose the right model

# Use Haiku for simple tasks (4x cheaper input, 3.75x cheaper output)
claude --model haiku "Fix this typo in README.md"

# Use Sonnet (default) for standard work
claude "Refactor this module"

# Use Opus only for critical/complex tasks
claude --model opus "Design the entire authentication system"

Strategy 4: Limit MCP servers

// ❌ Expensive - 5 MCP servers loaded
{
  "mcpServers": {
    "serena": {...},
    "context7": {...},
    "sequential": {...},
    "playwright": {...},
    "postgres": {...}
  }
}
// ~10K tokens overhead per session

// ✅ Cheaper - load only what you need
{
  "mcpServers": {
    "serena": {...}  // Only for this project
  }
}
// ~2K tokens overhead

Strategy 5: Batch operations

# ❌ Expensive - 5 separate prompts
"Read file1.ts"
"Read file2.ts"
"Read file3.ts"
"Read file4.ts"
"Read file5.ts"

# ✅ Cheaper - single batched request
"Read file1.ts, file2.ts, file3.ts, file4.ts, file5.ts and analyze them together"
# Shared context, single response

Strategy 6: Use prompt caching for repeated context (API)

If you call the Anthropic API directly (e.g., for custom agents or pipelines), prompt caching cuts costs by up to 90% on repeated prefixes.

# Mark stable sections with cache_control
response = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "<your large system prompt / codebase context>",
            "cache_control": {"type": "ephemeral"}  # Cache this prefix
        }
    ],
    messages=[{"role": "user", "content": "Fix the bug in auth.ts"}]
)

Prompt caching economics:

Operation	Cost multiplier	TTL
Cache write	1.25x base price	5 minutes (default)
Cache write (extended)	2x base price	1 hour
Cache read (hit)	0.1x base price	—
Latency reduction	Up to 85% for long prompts	—

Break-even: 2 cache hits with 5-minute TTL. After that, pure savings.

Rules:

Max 4 cache breakpoints per request
Cache key = exact prefix match (single character change = cache miss)
Place breakpoints after large stable sections: system prompt, tool definitions, codebase context
For Claude Code itself: caching is handled automatically by the CLI — this applies to API-based workflows you build on top of Claude

Docs: prompt caching

How Claude Code Handles Caching Automatically

Claude Code manages prompt caching without any configuration on your part. Understanding the mechanics helps you make decisions that keep cache hit rates high and costs low.

Cache prefix hierarchy

Every API call Claude Code makes structures content in this fixed order: tools → system → messages. Cache matching always starts from the beginning of this prefix. A stable tool list + stable CLAUDE.md + growing conversation history means the first two layers are almost always cache hits, while only new message turns require fresh computation.

The 20-block lookback — the long-session trap

Cache matching uses a bounded lookback of approximately 20 blocks. In a long session with many tool calls and exchanges, blocks from early in the conversation fall outside this window and become cache misses. Practical consequence: very long sessions gradually lose cache efficiency at the message layer. The fix is /compact — it compresses the conversation history into a single summary block, resetting the lookback window and restoring high hit rates.

Minimum token thresholds by model

A block must meet a minimum size to be eligible for caching. Blocks smaller than the threshold are never cached, regardless of how stable they are:

Model family	Minimum tokens
Claude Opus 4.7, Opus 4.6, Opus 4.5, Haiku 4.5	4,096
Claude Sonnet 4.6	2,048
Claude Sonnet 4.5, Sonnet 4, Sonnet 3.7, Opus 4.1, Opus 4	1,024
Claude Haiku 3.5, Haiku 3	2,048

Short CLAUDE.md files (under ~1,000 tokens) may not be cached at all on Sonnet models. If cost optimization matters, make sure your system prompt crosses the threshold for your target model.

Tool result size and cache economics

Tool results land in the message history and stay there for the rest of the session. Every subsequent API call re-reads that history — at cache read price (0.1x), but still proportional to size. A git status output of 500 tokens costs 500 × 0.1x to read on every turn that follows. The same output at 50 tokens (filtered by a tool like RTK) costs 50 × 0.1x — 90% less, compounding across every turn in the session. Compact tool outputs are not just faster to process; they make the entire cached prefix cheaper to maintain.

The same logic applies to cache writes: a smaller history prefix means cheaper initial writes (1.25x × fewer tokens).

Monitoring cache performance in your own pipelines

When building agents or pipelines on top of the Anthropic API, the response usage object exposes cache metrics directly:

response = client.messages.create(...)

print(response.usage.cache_creation_input_tokens)  # Tokens written to cache this request
print(response.usage.cache_read_input_tokens)       # Tokens read from cache (hits)
print(response.usage.input_tokens)                  # Non-cached input tokens

Calculate your hit rate as cache_read / (cache_read + cache_creation) across requests. A ratio above 0.8 means your prompt structure is working well. Low ratios usually mean content in the stable prefix is changing between requests — check for timestamps, random IDs, or dynamic content embedded in your system prompt.

No dedicated monitoring tool exists specifically for Claude Code session cache metrics. Cost tracking via ccusage covers overall spend but does not break out cache hit rates. For cache-specific visibility in custom pipelines, parse the response fields above.

Practical rules

Keep CLAUDE.md stable between sessions — edits invalidate the system cache one-shot, then it re-warms on the next request
Run /compact before the conversation gets very long, not after performance degrades
Avoid dynamic content in stable sections (dates, random values, per-request context)
Larger CLAUDE.md = more expensive cache write, but also more tokens saved per read — profitable after ~2 hits

Known cache bugs (v2.1.69+)

Two active bugs silently break caching on v2.1.69+. Apply these workarounds immediately:

—resume/—continue causes a full cache rebuild (0% hit ratio) on every resume because session JSONL strips deferred tool records before write. Workaround: avoid --resume until fixed.
Per-session billing header injects a unique hash as the first system prompt block, causing a cold miss on every session start and subagent call. Workaround: "CLAUDE_CODE_ATTRIBUTION_HEADER": "false" in ~/.claude/settings.json.

See Known Issues → Prompt Cache Bugs and run /check-cache-bugs for a full audit.

Docs: prompt caching

Tracking Costs

Real-time tracking:

The status line shows current session cost:

Claude Code │ Ctx(u): 45% │ Cost: $0.23 │ Session: 1h 23m
                              ↑ Current session cost

Advanced tracking with ccusage:

The ccusage CLI tool provides detailed cost analytics beyond the /cost command (use /usage since v2.1.118):

ccusage                    # Overview all periods
ccusage --today            # Today's costs
ccusage --month            # Current month
ccusage --session          # Active session breakdown
ccusage --model-breakdown  # Cost by model (Sonnet/Opus/Haiku)

Example output:

┌──────────────────────────────────────────────────────┐
│ USAGE SUMMARY - January 2026                         │
├──────────────────────────────────────────────────────┤
│ Today                           $2.34 (12 sessions)  │
│ This week                       $8.91 (47 sessions)  │
│ This month                     $23.45 (156 sessions) │
├──────────────────────────────────────────────────────┤
│ MODEL BREAKDOWN                                      │
│   Sonnet 3.5    85%    $19.93                        │
│   Opus 4.6      12%     $2.81                        │
│   Haiku 3.5      3%     $0.71                        │
└──────────────────────────────────────────────────────┘

Why use ccusage over /cost (alias for /usage since v2.1.118)?

Historical trends: Track usage patterns over days/weeks/months
Model breakdown: See which model tier drives costs
Budget planning: Set monthly spending targets
Team analytics: Aggregate costs across developers

For a full inventory of community cost trackers, session viewers, config managers, and alternative UIs, see Third-Party Tools.

Monthly tracking:

Check your Anthropic Console for detailed usage:

https://console.anthropic.com/settings/usage

Cost budgeting:

# Set a mental budget per session
- Quick task (5-10 min): $0.05-$0.10
- Feature work (1-2 hours): $0.20-$0.50
- Deep refactor (half day): $1.00-$2.00

# If you're consistently over budget:
1. Use /compact more often
2. Be more specific in queries
3. Consider using Haiku for simpler tasks
4. Reduce MCP servers

Cost vs. Value

Perspective on costs: If Claude Code saves you meaningful time on a task, the API cost is usually negligible compared to your hourly rate. Don’t over-optimize for token costs at the expense of productivity.

When to optimize:

✅ You’re on a tight budget (student, hobbyist)
✅ High-volume usage (>4 hours/day)
✅ Team usage (5+ developers)

When NOT to optimize:

❌ Your time is more expensive than API costs
❌ You’re spending more time optimizing than the savings
❌ Optimization hurts productivity (being too restrictive)

Cost-Conscious Workflows

For solo developers on a budget:

1. Start with Haiku for exploration/planning
2. Switch to Sonnet for implementation
3. Use /compact aggressively (every 50-60% context)
4. Limit to 1-2 MCP servers
5. Be specific in all queries
6. Batch operations when possible

Monthly cost estimate: $5-$15 for 20-30 hours

For professional developers:

1. Use Sonnet as default (optimal balance)
2. Use /compact when needed (70%+ context)
3. Use full MCP setup (productivity matters)
4. Don't micro-optimize queries
5. Use Opus for critical architectural decisions

Monthly cost estimate: $20-$50 for 40-80 hours

For teams:

1. Shared MCP infrastructure (Context7, Serena)
2. Standardized CLAUDE.md to avoid repeated explanations
3. Agent library to avoid rebuilding patterns
4. CI/CD integration for automation
5. Track costs per developer in Anthropic Console

Monthly cost estimate: $50-$200 for 5-10 developers

Red Flags (Cost Waste Indicators)

Indicator	Cause	Fix
Sessions consistently >$1	Not using `/compact`	Set reminder at 70% context
Cost per message >$0.05	Context bloat	Start fresh `/clear`
>$5/day for hobby project	Over-using or inefficient queries	Review query specificity
Haiku failing simple tasks	Using wrong model tier	Use Sonnet for anything non-trivial

Subscription Plans & Limits

Note: Anthropic’s plans evolve frequently. Always verify current pricing and limits at claude.com/pricing.

How Subscription Limits Work

Unlike API usage (pay-per-token), subscriptions use a hybrid model that’s deliberately opaque:

Concept	Description
5-hour rolling window	Primary limit; resets when you send next message after 5 hours lapse
Weekly aggregate cap	Secondary limit; resets every 7 days. Both apply simultaneously
Hybrid counting	Advertised as “messages” but actual capacity is token-based, varying by code complexity, file size, and context
Model weighting	Opus consumes 8-10× more quota than Sonnet for equivalent work

Approximate Token Budgets by Plan (Jan 2026, community-verified)

Plan	5-Hour Token Budget	Claude Code prompts/5h	Weekly Sonnet Hours	Weekly Opus Hours	Claude Code Access
Free	0	0	0	0	❌ None
Pro ($20/mo)	~44,000 tokens	~10-40 prompts	40-80 hours	N/A (Sonnet only)	✅ Limited
Max 5x ($100/mo)	~88,000-220,000 tokens	~50-200 prompts	140-280 hours	15-35 hours	✅ Full
Max 20x ($200/mo)	~220,000+ tokens	~200-800 prompts	240-480 hours	24-40 hours	✅ Full

Warning: These are community-measured estimates. Anthropic does not publish exact token limits, and limits have been reduced without announcement (notably Oct 2025). The 8-10× Opus/Sonnet ratio means Max 20x users get only ~24-40 Opus hours weekly despite paying $200/month. “Prompts/5h” is a rough practical translation of the token budget — actual capacity varies significantly with task complexity, context size, and sub-agent usage. Monthly cap: ~50 active 5-hour windows across all plans.

Why “Hours” Are Misleading

The term “hours of Sonnet 4” refers to elapsed wall-clock time during active processing, not calendar hours. This is not directly convertible to tokens without knowing:

Code complexity (larger files = higher per-token overhead)
Tool usage (Bash execution adds ~245 input tokens per call; text editor adds ~700)
Context re-reads and caching misses

Tier-Specific Strategies

If you have…	Recommended approach
Pro plan	Sonnet only; batch sessions, avoid context bloat
Limited Opus quota	OpusPlan essential: Opus for planning, Sonnet for execution
Max 5x	Sonnet default, Opus only for architecture/complex debugging
Max 20x	More Opus freedom, but still monitor weekly usage (24-40h goes fast)

The Pro User Pattern (validated by community):

1. Opus → Create detailed plan (high-quality thinking)
2. Sonnet/Haiku → Execute the plan (cost-effective implementation)
3. Result: Best reasoning where it matters, lower cost overall

This is exactly what OpusPlan mode does automatically (see Section 2.3).

Monitoring Your Usage

/status    # Shows current session: cost, context %, model

Anthropic provides no in-app real-time usage metrics. Community tools like ccusage help track token consumption across sessions.

For subscription usage history: Check your Anthropic Console or Claude.ai settings.

Historical Note: In October 2025, users reported significant undocumented limit reductions coinciding with Sonnet 4.5’s release. Pro users who previously sustained 40-80 Sonnet hours weekly reported hitting limits after only 6-8 hours. Anthropic acknowledged the limits but did not explain the discrepancy.

Peak Hours (March 2026): On March 26, 2026, Anthropic adjusted how session limits are consumed during peak demand — the 5-hour rolling window drains faster during weekdays 5am–11am PT (1pm–7pm GMT). Same weekly total, different distribution. Anthropic cited GPU capacity constraints; roughly 7% of users hit limits they wouldn’t have before. Max users reported going from 21% to 100% usage on a single prompt during peak. Practical workaround: move compute-heavy agentic tasks (long sub-agent chains, large refactors) to evenings or weekends. Off-peak usage clears faster, stretching the same budget further.

Context Poisoning (Bleeding)

Definition: When information from one task contaminates another.

Pattern 1: Style Bleeding

Task 1: "Create a blue button"
Claude: [Creates blue button]

Task 2: "Create a form"
Claude: [Creates form... with all buttons blue!]
        ↑ The "blue" bled into the new task

Solution: Use explicit boundaries
"---NEW TASK---
Create a form. Use default design system colors."

Pattern 2: Instruction Contamination

Instruction 1: "Always use arrow functions"
Instruction 2: "Follow project conventions" (which uses function)

Claude: [Paralyzed, alternating between styles]

Solution: Clarify priority
"In case of conflict, project conventions take precedence over my preferences."

Pattern 3: Temporal Confusion

Early session: "auth.ts contains login logic"
... 2h of work ...
You renamed auth.ts to authentication.ts

Claude: "I'll modify auth.ts..."
        ↑ Using outdated info

Solution: Explicit updates
"Note: auth.ts was renamed to authentication.ts"

Context Hygiene Checklist:

New tasks = explicit markdown boundaries
Structural changes = inform Claude explicitly
Contradictory instructions = clarify priority
Long session (>2h) = consider /clear or new session
Erratic behavior = check with /context

Sanity Check Technique

Verify that Claude has loaded your configuration correctly.

Simple Method:

Add at the top of CLAUDE.md:

# My name is [Your Name]
# Project: [Project Name]
# Stack: [Your tech stack]

Ask Claude: “What is my name? What project am I working on?”
If correct → Configuration loaded properly

Advanced: Multiple Checkpoints

# === CHECKPOINT 1 === Project: MyApp ===

[... 500 lines of instructions ...]

# === CHECKPOINT 2 === Stack: Next.js ===

[... 500 lines of instructions ...]

# === CHECKPOINT 3 === Owner: [Name] ===

Ask “What is checkpoint 2?” to verify Claude read that far.

Failure Symptom	Probable Cause	Solution
Doesn’t know your name	CLAUDE.md not loaded	Check file location
Inconsistent answers	Typo in filename	Must be `CLAUDE.md` (not `clause.md`)
Partial knowledge	Context exhausted	`/clear` or new session

Session Handoff Pattern

When ending a session or switching contexts, create a handoff document to maintain continuity.

Purpose: Bridge the gap between sessions by documenting state, decisions, and next steps.

Template:

# Session Handoff - [Date] [Time]

## What Was Accomplished
- [Key task 1 completed]
- [Key task 2 completed]
- [Files modified: list]

## Current State
- [What's working]
- [What's partially done]
- [Known issues or blockers]

## Decisions Made
- [Architectural choice 1: why]
- [Technology selection: rationale]
- [Trade-offs accepted]

## Next Steps
1. [Immediate next task]
2. [Dependent task]
3. [Follow-up validation]

## Context for Next Session
- Branch: [branch-name]
- Key files: [list 3-5 most relevant]
- Dependencies: [external factors]

When to create handoff documents:

Scenario	Why
End of work day	Resume seamlessly tomorrow
Before context limit	Preserve state before `/clear`
Switching focus areas	Different task requires fresh context
Interruption expected	Emergency or meeting disrupts work
Complex debugging	Document hypotheses and tests tried

Storage location: claudedocs/handoffs/handoff-YYYY-MM-DD.md

Pro tip: Ask Claude to generate the handoff:

You: "Create a session handoff document for what we accomplished today"

Claude will analyze git status, conversation history, and generate a structured handoff.

Handoff Triad Pattern: For teams or multi-session workflows, a three-command protocol adds explicit merge semantics on top of the basic handoff. Three commands work together:

Command	Job
`/handoff:create`	Generates the structured document from current session context
`/handoff:resume`	Loads a handoff document, confirms understanding, and waits for approval before starting
`/handoff:update`	Updates an existing handoff with section-specific merge rules (see below)

The critical addition is per-section merge rules in update:

Section	Merge Rule
Task, Scope	Keep or refine
Files	Merge — combine original with new files touched
Discoveries	Append — add new findings, never remove prior ones
Work Done	Append only — add new entries, never delete history, include commit hashes
Status	Replace — write current state
Next Steps	Replace — write updated checklist

The append-only Work Done section creates an audit trail across sessions. Even if earlier work was revised, the revision appears as a new entry rather than an overwrite.

Fork-ready templates at examples/commands/handoff/ in this repo.

Pattern inspired by Packmind’s handoff command triad (Apache 2.0). See Credits.

2.3 Plan Mode

Plan Mode is Claude Code’s “look but don’t touch” mode.

Entering Plan Mode

/plan

Or ask Claude directly:

You: Let's plan this feature before implementing

What Plan Mode Allows

✅ Reading files
✅ Searching the codebase
✅ Analyzing architecture
✅ Proposing approaches
✅ Writing to a plan file

What Plan Mode Prevents

❌ Editing files
❌ Running commands that modify state
❌ Creating new files
❌ Making commits

When to Use Plan Mode

Situation	Use Plan Mode?
Exploring unfamiliar codebase	✅ Yes
Investigating a bug	✅ Yes
Planning a new feature	✅ Yes
Fixing a typo	❌ No
Quick edit to known file	❌ No

Recommended frequency: Boris Cherny (Head of Claude Code at Anthropic) starts approximately 80% of tasks in Plan Mode — letting Claude plan before writing a single line of code. Once the plan is approved, execution is almost always correct on the first try. — Lenny’s Newsletter, February 19, 2026

Exiting Plan Mode

Press Shift+Tab to toggle back to Normal Mode (Act Mode). You can also type a message and Claude will ask: “Ready to implement this plan?”

Note: Shift+Tab toggles between Plan Mode and Normal Mode during a session. Use Shift+Tab twice from Normal Mode to enter Plan Mode, once from Plan Mode to return.

Auto Plan Mode

Concept: Automatically trigger planning mode before any risky operation.

Configuration File (~/.claude/auto-plan-mode.txt):

Before executing ANY tool (Read, Write, Edit, Bash, Grep, Glob, WebSearch), you MUST:
1. FIRST: Use exit_plan_mode tool to present your plan
2. WAIT: For explicit user approval before proceeding
3. ONLY THEN: Execute the planned actions

Each new user request requires a fresh plan - previous approvals don't carry over.

Launch with Auto Plan Mode:

macOS/Linux:

# Direct
claude --append-system-prompt "Before executing ANY tool..."

# Via file (recommended)
claude --append-system-prompt "$(cat ~/.claude/auto-plan-mode.txt)"

# Alias in .zshrc/.bashrc
alias claude-safe='claude --append-system-prompt "$(cat ~/.claude/auto-plan-mode.txt)"'

Windows (PowerShell):

# Create the config file at %USERPROFILE%\.claude\auto-plan-mode.txt with the same content

# Direct
claude --append-system-prompt "Before executing ANY tool..."

# Via file (add to $PROFILE)
function claude-safe {
    $planPrompt = Get-Content "$env:USERPROFILE\.claude\auto-plan-mode.txt" -Raw
    claude --append-system-prompt $planPrompt $args
}

Resulting Workflow:

User: "Add an email field to the User model"

Claude (Auto Plan Mode active):
┌─────────────────────────────────────────────────────────────┐
│ 📋 PROPOSED PLAN                                            │
│                                                             │
│ 1. Read schema.prisma to understand current model           │
│ 2. Add field email: String? @unique                         │
│ 3. Generate Prisma migration                                │
│ 4. Update TypeScript types                                  │
│ 5. Add Zod validation in routers                            │
│                                                             │
│ ⚠️ Impact: 3 files modified, 1 migration created            │
│                                                             │
│ Approve this plan? (y/n)                                    │
└─────────────────────────────────────────────────────────────┘

User: "y"

Claude: [Executes the plan]

Result: 76% fewer tokens with better results because the plan is validated before execution.

Model Aliases

Claude Code supports six model aliases via /model (each always resolves to the latest version):

Alias	Resolves To	Use Case
`default`	Latest model for your plan tier	Standard usage
`sonnet`	Claude Sonnet 4.6	Fast, cost-efficient
`opus`	Claude Opus 4.7	Deep reasoning
`haiku`	Claude Haiku 4.5	Budget, high-volume
`sonnet[1m]`	Sonnet with 1M context	Large codebases
`opusplan`	Opus (plan) + Sonnet (act)	Hybrid intelligence

Model can also be set via claude --model <alias>, ANTHROPIC_MODEL env var, or "model" in settings.json. Priority: /model > --model flag > ANTHROPIC_MODEL > settings.json.

Knowledge cutoffs (what each model knows about):

Model	Knowledge Cutoff
Claude Opus 4.7	Not yet published
Claude Sonnet 4.6	August 2025
Claude Opus 4.6	May 2025
Claude Haiku 4.5	February 2025

Claude Code injects the cutoff date for the active model into the system prompt at the start of each session. You can ask Claude directly — “what’s your knowledge cutoff?” — to confirm which date applies to your current session.

OpusPlan Mode

Concept: Use Opus for planning (superior reasoning) and Sonnet for implementation (cost-efficient).

Why OpusPlan?

Cost optimization: Opus tokens cost more than Sonnet
Best of both worlds: Opus-quality planning + Sonnet-speed execution
Token savings: Planning is typically shorter than implementation

Activation:

/model opusplan

Or in ~/.claude/settings.json:

{
  "model": "opusplan"
}

How It Works:

In Plan Mode (/plan or Shift+Tab twice) → Uses Opus
In Act Mode (normal execution) → Uses Sonnet
Automatic switching based on mode

Recommended Workflow:

1. /model opusplan        → Enable OpusPlan
2. Shift+Tab × 2          → Enter Plan Mode (Opus)
3. Describe your task     → Get Opus-quality planning
4. Shift+Tab              → Exit to Act Mode (Sonnet)
5. Execute the plan       → Sonnet implements efficiently

Alternative Approach with Subagents:

You can also control model usage per agent:

---
name: planner
model: opus
tools: Read, Grep, Glob
---
# Strategic Planning Agent

---
name: implementer
model: haiku
tools: Write, Edit, Bash
---
# Fast Implementation Agent

Pro Users Note: OpusPlan is particularly valuable for Pro subscribers with limited Opus tokens. It lets you leverage Opus reasoning for critical planning while preserving tokens for more sessions.

Budget Variant: SonnetPlan (Community Hack)

opusplan is hardcoded to Opus+Sonnet — there’s no native sonnetplan alias. But you can remap what the opus and sonnet aliases resolve to via environment variables, effectively creating a Sonnet→Haiku hybrid:

# Add to ~/.zshrc
sonnetplan() {
    ANTHROPIC_DEFAULT_OPUS_MODEL=claude-sonnet-4-6 \
    ANTHROPIC_DEFAULT_SONNET_MODEL=claude-haiku-4-5-20251001 \
    claude "$@"
}

With sonnetplan, /model opusplan routes:

Plan Mode → Sonnet 4.6 (via remapped opus alias)
Act Mode → Haiku 4.5 (via remapped sonnet alias)

Caveat: The model’s self-report (what model are you?) is unreliable — models don’t always know their own identity. Trust the status bar (Model: Sonnet 4.6 in plan mode) or verify via billing dashboard. GitHub issue #9749 tracks native support.

Pinning Opus 4.6 (Community Hack)

Opus 4.7 ships with a new tokenizer that maps the same input to roughly 1.0-1.35x more tokens depending on content type, and at higher effort levels it produces more output tokens (more reasoning steps). For workflows where that extra spend doesn’t translate into better results, pinning to Opus 4.6 cuts cost without changing behavior.

Option A — Opus 4.6 everywhere (simplest)

{
  "model": "claude-opus-4-6"
}

All sessions use Opus 4.6. No hybrid. Add [1M] if you need the 1M context window: "claude-opus-4-6[1M]".

Option B — Keep OpusPlan, pin only the Opus side (recommended)

{
  "model": "opusplan",
  "env": {
    "ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4-6"
  }
}

opusplan still switches between Plan and Act modes, but Plan Mode now routes to Opus 4.6 instead of 4.7. Sonnet stays unchanged in Act Mode.

Shell variant (non-persistent, useful for testing):

ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-6 claude

Option C — Per-session switch (no config change)

/model claude-opus-4-6

Resets on the next session. Useful before committing to a config change.

Verification: check the status bar in Plan Mode. It should show Model: Opus 4.6, not Opus 4.7. The billing dashboard confirms which model was charged.

Trade-offs: Opus 4.6 loses the xhigh effort level (the default for Opus 4.7 in Claude Code) and the max effort level (max returns an error on Opus 4.6). Knowledge cutoff is also older: May 2025 vs. unpublished for 4.7. If you rely on max effort or need post-May-2025 knowledge baked in, stay on 4.7.

Rev the Engine

Concept: Run multiple rounds of planning and deep thinking before executing. Like warming up an engine before driving.

Standard workflow: think → plan → execute. Rev the Engine: think → plan → think harder → refine plan → think hardest → finalize → execute.

When to use:

Critical architectural decisions (irreversible, high-impact)
Complex migrations affecting 10+ files
Unfamiliar domain where first instincts are often wrong

Pattern:

## Round 1: Initial analysis
User: /plan
User: Analyze the current auth system. What are the key components,
      dependencies, and potential risks of migrating to OAuth2?
Claude: [Initial analysis]

## Round 2: Deep challenge
User: Now use extended thinking. Challenge your own analysis:
      - What assumptions did you make?
      - What failure modes did you miss?
      - What would a senior security engineer flag?
Claude: [Deeper analysis with self-correction]

## Round 3: Final plan
User: Based on both rounds, write the definitive migration plan.
      Include rollback strategy and risk mitigation for each step.
Claude: [Refined plan incorporating both rounds]

## Execute
User: /execute
User: Implement the plan from round 3.

Why it works: Each round forces Claude to reconsider assumptions. Round 2 typically catches 30-40% of issues that round 1 missed. Round 3 synthesizes into a more robust plan.

📊 Empirical backing — Anthropic AI Fluency Index (Feb 2026)

An Anthropic study analyzing 9,830 Claude conversations quantifies exactly why plan review works: users who iterate and question the AI’s reasoning are 5.6× more likely to catch missing context and errors compared to users who accept the first output. A second round of review makes you 4× more likely to identify what was left out.

The Rev the Engine pattern operationalizes this finding: each round of deep challenge triggers the questioning behavior that produces measurably better plans.

Source: Swanson et al., “The AI Fluency Index”, Anthropic (2026-02-23) — anthropic.com/research/AI-fluency-index

Ultraplan

Status: Research preview — requires Claude Code v2.1.91+ and a Claude Code on the web account.

Concept: Offload planning to Anthropic’s cloud while your terminal stays free. Claude drafts the plan remotely using multiple Opus 4.7+ agents in parallel; you review it in your browser with inline comments, then choose whether to execute in the cloud or teleport the plan back to your terminal.

This solves the core friction of local Plan Mode: on complex tasks, the terminal blocks for minutes while planning runs. Ultraplan runs asynchronously — you keep working, check back when ready.

How It Works

CLI launches a cloud session → terminal shows a live status indicator
Multiple Opus 4.7+ agents explore the codebase in parallel (planning windows up to 30 minutes)
Browser opens the plan with outline sidebar, inline commenting, and emoji reactions
You iterate on the plan — comment on specific sections, request revisions
Choose where to execute: cloud (opens a PR) or terminal (teleports the plan back)

Activation (3 methods)

# 1. Dedicated command
/ultraplan migrate the auth service from sessions to JWTs

# 2. Keyword anywhere in a prompt
Plan with ultraplan a full refactor of the payments module

# 3. From a local plan approval dialog
# → choose "No, refine with Ultraplan on Claude Code on the web"

The command and keyword paths show a confirmation dialog first. The local plan path skips it.

Terminal Status Indicators

Status	Meaning
`◇ ultraplan`	Claude is researching and drafting
`◇ ultraplan needs your input`	Clarification needed — open the browser link
`◆ ultraplan ready`	Plan is ready to review

Run /tasks to see the session link, agent activity, and a Stop ultraplan action.

Browser Review Interface

Outline sidebar: navigate between sections without scrolling
Inline comments: highlight any passage, leave targeted feedback
Emoji reactions: signal approval or concern on a section without writing a comment
Revision cycles: ask Claude to address your comments; it presents an updated draft — iterate as many times as needed

Execution: Two Choices

Once the plan looks right, choose in the browser:

Option	What happens
Approve and start coding	Cloud session implements the plan, creates a PR; terminal clears
Approve and teleport back	Plan sent to your terminal with 3 sub-options

Teleport sub-options:

Implement here — inject plan into current conversation, proceed immediately
Start new session — fresh session with plan as context (prints claude --resume to return to current session)
Cancel — saves plan to a file, prints the path

Requirements and Constraints

Requirement	Detail
Claude Code version	v2.1.91+
Account	Pro, Max, Team, or Enterprise (not free tier)
Repository	GitHub only (no GitLab, Bitbucket)
Providers	Anthropic API only — not available on Bedrock, Vertex, Foundry
Conflict	Incompatible with Remote Control (both use claude.ai/code)

Ultraplan vs. OpusPlan vs. Plan Mode

Feature	Plan Mode	OpusPlan	Ultraplan
Execution	Local	Local	Cloud
Terminal blocked?	Yes	Yes	No
Models	Active model	Opus (plan) + Sonnet (act)	Opus 4.7 (multi-agent)
Review surface	Terminal scrollback	Terminal scrollback	Browser with inline comments
Requires GitHub	No	No	Yes
Token accounting	Counts locally	Counts locally	Cloud planning free from local quota

When to Use Ultraplan

Best fit:

Complex architectural changes touching many files (service migrations, large refactors)
Tasks where you want to keep working while planning runs
Situations where stakeholders need to review the plan before implementation

Skip it for:

Simple, focused changes where local Plan Mode takes under a minute
Environments without internet or not on GitHub
Sessions using Remote Control

Token Note: Early tests show cloud planning consuming ~37% fewer tokens than equivalent local plans (82K vs 131K for a ~55 min migration task). Cloud planning tokens don’t count against your local quota; only implementation tokens do.

See also: §9.16 Session Teleportation for the broader web ↔ terminal workflow. Ultraplan uses the same cloud infrastructure with planning-specific review capabilities.

Ultrareview (v2.1.114+)

Cloud-based parallel multi-agent code review. Where Ultraplan handles planning, Ultrareview handles review: multiple Opus 4.7 agents read through your changes simultaneously and surface bugs and design issues that careful reviewers would catch.

Activation:

/ultrareview              # Review current branch (diff from base)
/ultrareview <PR#>        # Review a specific GitHub PR

Ultrareview operates on diffs, not the full codebase — it reviews what changed on the current branch, or the changes in a given PR. The cloud session dispatches parallel agents to analyse the diff; results arrive in the browser and can optionally be teleported back to the terminal.

Launch offer: Pro and Max subscribers receive three free ultrareviews to try the feature.

Requirements:

Requirement	Detail
Claude Code version	v2.1.114+
Account	Pro or Max
Providers	Anthropic API only

Ultraplan vs. Ultrareview

	Ultraplan	Ultrareview
Purpose	Plan before coding	Review after coding
Input	Prompt describing the task	Current branch diff or PR diff
Scope	Unbounded	Diffs only (not full codebase)
Output	Architectural plan	Bug and design issue report

Mechanic Stacking

Concept: Layer multiple Claude Code mechanisms for maximum intelligence on critical decisions.

Layer 1: Plan Mode          → Safe exploration, no side effects
Layer 2: Extended Thinking  → Deep reasoning with thinking tokens
Layer 3: Rev the Engine     → Multi-round refinement
Layer 4: Split-Role Agents  → Multi-perspective analysis
Layer 5: Permutation        → Systematic variation testing

You don’t need all layers for every task. Match the stack depth to the decision’s impact:

Decision Impact	Stack Depth	Example
Low (fix typo)	0 layers	Just do it
Medium (add feature)	1-2 layers	Plan Mode + Extended Thinking
High (architecture)	3-4 layers	Rev the Engine + Split-Role
Critical (migration)	4-5 layers	Full stack

Anti-pattern: Stacking on trivial decisions. If the change is reversible and low-risk, just execute. Over-planning is as wasteful as under-planning.

Cross-references:

Permutation Frameworks: See §9.19
Split-Role Sub-Agents: See Sub-Agent Isolation
Extended Thinking: See §9.1 The Trinity

2.4 Rewind

Rewind is Claude Code’s undo mechanism.

Using Rewind

Access via Esc + Esc (double-tap Escape) or the /rewind command. This opens a scrollable checkpoint list.

What Rewind Does

Rewind provides four distinct actions from the checkpoint list:

Action	Effect
Restore code and conversation	Revert both file changes and conversation to selected point
Restore conversation	Keep current code, rewind conversation only
Restore code	Revert file changes, keep conversation
Summarize from here	Compress conversation from selected point forward (frees space without reverting)

Key distinction: Restore = undo (reverts state). Summarize = compress (frees space without reverting). Checkpoints persist across sessions (30-day cleanup).

Limitations

Only works on Claude’s changes (not manual edits)
Works within the current session
Git commits are NOT automatically reverted

Best Practice: Checkpoint Before Risk

Before a risky operation:

You: Let's commit what we have before trying this experimental approach

This creates a git checkpoint you can always return to.

Recovery Ladder: Three Levels of Undo

When things go wrong, you have multiple recovery options. Use the lightest-weight approach that solves your problem:

┌─────────────────────────────────────────────────────────┐
│               RECOVERY LADDER                           │
├─────────────────────────────────────────────────────────┤
│                                                         │
│   Level 3: Git Restore (nuclear option)                 │
│   ─────────────────────────────────────                 │
│   • git checkout -- <file>    (discard uncommitted)     │
│   • git stash                 (save for later)          │
│   • git reset --hard HEAD~1   (undo last commit)        │
│   • Works for: Manual edits, multiple sessions          │
│                                                         │
│   Level 2: /rewind (session undo)                       │
│   ─────────────────────────────                         │
│   • Reverts Claude's recent file changes                │
│   • Works within current session only                   │
│   • Doesn't touch git commits                           │
│   • Works for: Bad code generation, wrong direction     │
│                                                         │
│   Level 1: Reject Change (inline)                       │
│   ────────────────────────────                          │
│   • Press 'n' when reviewing diff                       │
│   • Change never applied                                │
│   • Works for: Catching issues before they happen       │
│                                                         │
└─────────────────────────────────────────────────────────┘

When to use each level:

Scenario	Recovery Level	Command
Claude proposed bad code	Level 1	Press `n`
Claude made changes, want to undo	Level 2	`/rewind`
Changes committed, need full rollback	Level 3	`git reset`
Experimental branch went wrong	Level 3	`git checkout main`
Context corrupted, strange behavior	Fresh start	`/clear` + restate goal

Pro tip: The /rewind command shows a list of changes to undo. You can selectively revert specific files rather than all changes.

Checkpoint Pattern: Safe Experimentation

For systematic experimentation, use the checkpoint pattern to create safe restore points:

┌─────────────────────────────────────────────────────────┐
│              CHECKPOINT WORKFLOW                        │
├─────────────────────────────────────────────────────────┤
│                                                         │
│   1. Create checkpoint                                  │
│   ──────────────────                                    │
│   git stash push -u -m "checkpoint-before-refactor"     │
│   (saves all changes including untracked files)         │
│                                                         │
│   2. Experiment freely                                  │
│   ──────────────────                                    │
│   Try risky refactoring, architectural changes, etc.    │
│   If it works → commit normally                         │
│   If it fails → restore checkpoint                      │
│                                                         │
│   3. Restore checkpoint                                 │
│   ──────────────────                                    │
│   git stash list              # find your checkpoint    │
│   git stash apply stash@{0}   # restore without delete  │
│   # or                                                  │
│   git stash pop stash@{0}     # restore and delete      │
│                                                         │
└─────────────────────────────────────────────────────────┘

Automated checkpoint: Create a Stop hook to auto-checkpoint on session end:

# See: examples/hooks/bash/auto-checkpoint.sh

# Automatically creates git stash on session end
# Naming: claude-checkpoint-{branch}-{timestamp}
# Logs to: ~/.claude/logs/checkpoints.log

Common workflows:

Scenario	Workflow
Risky refactor	Checkpoint → Try → Commit or restore
A/B testing approaches	Checkpoint → Try A → Restore → Try B → Compare
Incremental migration	Checkpoint → Migrate piece → Test → Repeat
Prototype exploration	Checkpoint → Experiment → Discard cleanly

Benefits over branching:

Faster than creating feature branches
Preserves uncommitted changes
Lightweight for quick experiments
Works across multiple files

2.5 Model Selection & Thinking Guide

Choosing the right model for each task is the fastest ROI improvement most Claude Code users can make. One decision per task — no overthinking.

Quick jump: Decision Table · Effort Levels · Model per Agent · When Thinking Helps

Cross-references: OpusPlan Mode · Rev the Engine · Cost Awareness

Decision Table

Task	Model	Effort	Est. cost/task
Rename, format, boilerplate	Haiku	low	~$0.02
Generate unit tests	Haiku	low	~$0.03
CI/CD PR review (volume)	Haiku	low	~$0.02
Feature dev, standard debug	Sonnet	medium	~$0.23
Module refactoring	Sonnet	high	~$0.75
System architecture	Opus	high	~$1.25
Critical security audit	Opus	max	~$2+
Multi-agent orchestration	Sonnet + Haiku	mixed	variable

Note on costs: Estimates based on API pricing (Haiku $0.80/$4.00 per MTok, Sonnet $3/$15, Opus $5/$25). Pro/Max subscribers pay a flat rate, so prioritize quality over cost. See Section 2.2 for full pricing breakdown.

Budget modifier (Teams Standard/Pro): downgrade one tier per phase — use Sonnet where the table says Opus, Haiku where it says Sonnet for mechanical implementation tasks. Community pattern: Sonnet for Plan → Haiku for Implementation on a $25/mo Teams Standard plan.

Effort Levels

The effort parameter (Opus 4.6 API) controls the model’s overall computational budget — not just thinking tokens, but tool calls, verbosity, and analysis depth. Low effort = fewer tool calls, no preamble. High effort = more explanations, detailed analysis.

Calibrated gradient — one real prompt per level:

low — Mechanical, no design decisions needed

"Rename getUserById to findUserById across src/" — Find-replace scope, zero reasoning required.
medium — Clear pattern, defined scope, one concern

"Convert fetchUser() in api/users.ts from callbacks to async/await" — Pattern is known, scope bounded.
high — Design decisions, edge cases, multiple concerns

"Redesign error handling in the payment module: add retry logic, partial failure recovery, and idempotency guarantees" — Architectural choices, not just pattern application.
xhigh (Opus 4.7+, v2.1.114+) — Extra-high effort between high and max; default for Claude Code (all plans) with Opus 4.7

"Debug this race condition in the distributed job queue with concurrent writes and partial reads" — More reasoning depth than high, faster than max.
max (Opus 4.7+ only — returns error on other models) — Cross-system reasoning, irreversible decisions

"Analyze the microservices event pipeline for race conditions across order-service, inventory-service, and notification-service" — Multi-service hypothesis testing, adversarial thinking.

Per-Skill Effort Allocation (v2.1.80+)

Skills can declare their own effort level in frontmatter. The skill’s value overrides the session setting for the duration of that skill’s execution, then reverts. This eliminates the need to manually toggle effort between mechanical and analytical tasks.

# Mechanical skill — always fast, never wastes reasoning budget
---
name: release
description: Bump version, update CHANGELOG, commit, push
effort: low
---

# Analytical skill — always deep, regardless of session setting
---
name: architecture-review
description: Full architectural analysis with trade-off evaluation
effort: high
---

Decision table for common skill types:

Skill type	Recommended effort	Reasoning
Commit, push, sync	`low`	Sequential steps, no design decisions
Changelog, release notes	`low`	Reads git + formats, mécanique
Scaffolding, boilerplate	`low`	Template instantiation
Code review (single PR)	`medium`	Pattern recognition, bounded scope
Issue triage, backlog	`medium`	Categorization + some analysis
Security audit	`high`	Threat modeling, adversarial thinking
Architecture review	`high`	Design decisions, cross-component reasoning
Multi-agent orchestration	`high`	Coordination + planning

Cost model: low effort means fewer tool calls, no preamble, direct output. high effort means more tool calls with explanations, detailed summaries, deeper exploration. Match effort to where analysis adds value — not to “effort = quality” uniformly.

Model per Agent Patterns

Assign models to agents based on role, not importance:

Planner (examples/agents/planner.md) — Strategy, read-only exploration

---
name: planner
description: Strategic planning agent — read-only. Use before implementation.
model: opus
tools: Read, Grep, Glob
---

Implementer (examples/agents/implementer.md) — Mechanical execution, bounded scope

---
name: implementer
description: Mechanical execution agent. Scope must be defined explicitly in the task.
model: haiku
tools: Write, Edit, Bash, Read, Grep, Glob
---

Note: Haiku is for mechanical tasks only. If the implementation requires design decisions or complex business logic, use Sonnet — state this in the task prompt.

Architecture Reviewer (examples/agents/architecture-reviewer.md) — Critical design review

---
name: architecture-reviewer
description: Architecture and design review — read-only. Never modifies code.
model: opus
tools: Read, Grep, Glob
---

Pro tip: Add a model reminder to your CLAUDE.md:
# Model reminder
Default: Sonnet. Haiku for mechanical tasks. Opus for architecture and security audits.

When Thinking Helps vs. Wastes Tokens

Scenario	Thinking	Reason
Rename 50 files	OFF	Zero reasoning — pure mechanics
Bug spanning 3+ services	ON (high)	Multi-layer hypothesis testing
Boilerplate / test generation	OFF	Repetitive pattern, no decisions
Architecture migration	ON (max)	Irreversible decisions
Direct factual questions	OFF (low)	Immediate answer sufficient
Security code review	ON (high)	Adversarial reasoning needed

Toggle: Alt+T (current session) · /config (permanent)

2.6 Mental Model

Understanding how Claude Code “thinks” makes you more effective.

Claude’s View of Your Project

┌─────────────────────────────────────────────────────────┐
│                   YOUR PROJECT                          │
├─────────────────────────────────────────────────────────┤
│                                                         │
│   ┌─────────────┐    ┌─────────────┐    ┌───────────┐   │
│   │   Files     │    │   Git       │    │  Config   │   │
│   │   (.ts,.py) │    │   History   │    │  Files    │   │
│   └─────────────┘    └─────────────┘    └───────────┘   │
│          │                  │                  │        │
│          ▼                  ▼                  ▼        │
│   ┌─────────────────────────────────────────────────┐   │
│   │              Claude's Understanding             │   │
│   │   - File structure & relationships              │   │
│   │   - Code patterns & conventions                 │   │
│   │   - Recent changes (from git)                   │   │
│   │   - Project rules (from CLAUDE.md)              │   │
│   └─────────────────────────────────────────────────┘   │
│                                                         │
└─────────────────────────────────────────────────────────┘

What Claude Knows

File Structure: Claude can navigate and search your files
Code Content: Claude can read and understand code
Git State: Claude sees branches, commits, changes
Project Rules: Claude reads CLAUDE.md for conventions

What Claude Doesn’t Know

Runtime State: Claude can’t see running processes
External Services: Claude can’t access your databases directly
Your Intent: Claude needs clear instructions
Hidden Files: Claude respects .gitignore by default

⚠️ Pattern Amplification: Claude mirrors the patterns it finds. In well-structured codebases, it produces consistent, idiomatic code. In messy codebases without clear abstractions, it perpetuates the mess. If your code lacks good patterns, provide them explicitly in CLAUDE.md or use semantic anchors (Section 2.9).

You Are the Main Thread

Think of yourself as a CPU scheduler. Claude Code instances are worker threads. You don’t write the code—you orchestrate the work.

┌─────────────────────────────────────────┐
│          YOU (Main Thread)              │
│  ┌────────────────────────────────────┐ │
│  │  Responsibilities:                 │ │
│  │  • Define tasks and priorities     │ │
│  │  • Allocate context budgets        │ │
│  │  • Review outputs                  │ │
│  │  • Make architectural decisions    │ │
│  │  • Handle exceptions/escalations   │ │
│  └────────────────────────────────────┘ │
│         │          │          │         │
│    ┌────▼───┐ ┌────▼───┐ ┌────▼───┐    │
│    │Worker 1│ │Worker 2│ │Worker 3│    │
│    │(Claude)│ │(Claude)│ │(Claude)│    │
│    │Feature │ │Tests   │ │Review  │    │
│    └────────┘ └────────┘ └────────┘    │
└─────────────────────────────────────────┘

Implications:

Don’t write code when Claude can. Your time is for decisions, not keystrokes.
Don’t micromanage. Give clear instructions, then review results.
Context-switch deliberately. Like a scheduler, batch similar tasks.
Escalate to yourself. When Claude is stuck, step in—then hand back.

This mental model scales: one developer can orchestrate 2-5 Claude instances on independent tasks (see §9.17 Scaling Patterns).

From Chatbot to Context System

The most common mistake is treating Claude Code like a chatbot — typing ad-hoc requests and hoping for good output. What separates casual usage from production workflows is a shift in thinking:

Chatbot mode: You write good prompts. Context system: You build structured context that makes every prompt better.

“Stop treating it like a chatbot. Give it structured context. CLAUDE.md, hooks, skills, project memory. Changes everything.” — Robin Lorenz, AI Engineer (comment)

Claude Code has four layers of persistent context that compound over time:

Layer	What It Does	Section	When to Set Up
CLAUDE.md	Persistent rules, conventions, project knowledge	§3.1	Week 1
Skills	Reusable knowledge modules for consistent workflows	§5	Week 2
Hooks	Automated guardrails (lint, security, formatting)	§7	Week 2-3
Project memory	Cross-session decisions and architectural context	§3.1	Ongoing

These are not independent features. They are layers of the same system:

CLAUDE.md teaches Claude what your project needs (conventions, stack, patterns)
Skills teach Claude how to perform specific workflows (review, deploy, test)
Hooks enforce guardrails automatically (block secrets, auto-format, run linting)
Memory preserves decisions across sessions (architectural choices, resolved tradeoffs)

Before (chatbot mode):

“Use pnpm, not npm. And remember our naming convention is…” (Every session. Every time. Copy-pasting context.)

After (context system):

CLAUDE.md loads conventions automatically. Skills ensure consistent workflows. Hooks enforce quality with zero manual effort. Memory carries decisions forward.

The shift is not about prompting better. It is about building a system where Claude starts every session already knowing what you need.

See also: §9.10 Continuous Improvement Mindset for evolving this system over time. Ready to choose the right mechanism? §2.7 Configuration Decision Guide maps all seven mechanisms with a decision tree.

Communicating Effectively

Good prompt:

The login function in src/auth/login.ts isn't validating email addresses properly.
Plus signs should be allowed but they're being rejected.

Weak prompt:

Login is broken

The more context you provide, the better Claude can help.

2.8 Structured Prompting with XML Tags

XML-structured prompts provide semantic organization for complex requests, helping Claude distinguish between different aspects of your task for clearer understanding and better results.

What Are XML-Structured Prompts?

XML tags act as labeled containers that explicitly separate instruction types, context, examples, constraints, and expected output format.

Basic syntax:

<instruction>
  Your main task description here
</instruction>

<context>
  Background information, project details, or relevant state
</context>

<code_example>
  Reference code or examples to follow
</code_example>

<constraints>
  - Limitation 1
  - Limitation 2
  - Requirement 3
</constraints>

<output>
  Expected format or structure of the response
</output>

Why Use XML Tags?

Benefit	Description
Separation of concerns	Different aspects of the task are clearly delineated
Reduced ambiguity	Claude knows which information serves what purpose
Better context handling	Helps Claude prioritize main instructions over background info
Consistent formatting	Easier to template complex requests
Multi-faceted requests	Complex tasks with multiple requirements stay organized

Common Tags and Their Uses

Core Instruction Tags:

<instruction>Main task</instruction>          <!-- Primary directive -->
<task>Specific subtask</task>                 <!-- Individual action item -->
<question>What should I do about X?</question> <!-- Explicit inquiry -->
<goal>Achieve state Y</goal>                  <!-- Desired outcome -->

Context and Information Tags:

<context>Project uses Next.js 14</context>            <!-- Background info -->
<problem>Users report slow page loads</problem>       <!-- Issue description -->
<background>Migration from Pages Router</background>  <!-- Historical context -->
<state>Currently on feature-branch</state>            <!-- Current situation -->

Code and Example Tags:

<code_example>
  // Existing pattern to follow
  const user = await getUser(id);
</code_example>

<current_code>
  // Code that needs modification
</current_code>

<expected_output>
  // What the result should look like
</expected_output>

Constraint and Rule Tags:

<constraints>
  - Must maintain backward compatibility
  - No breaking changes to public API
  - Maximum 100ms response time
</constraints>

<requirements>
  - TypeScript strict mode
  - 100% test coverage
  - Accessible (WCAG 2.1 AA)
</requirements>

<avoid>
  - Don't use any for types
  - Don't modify the database schema
</avoid>

Practical Examples

Example 1: Code Review with Context

<instruction>
Review this authentication middleware for security vulnerabilities
</instruction>

<context>
This middleware is used in a financial application handling sensitive user data.
We follow OWASP Top 10 guidelines and need PCI DSS compliance.
</context>

<code_example>
async function authenticate(req, res, next) {
  const token = req.headers.authorization?.split(' ')[1];
  if (!token) return res.status(401).json({ error: 'No token' });

  const decoded = jwt.verify(token, process.env.JWT_SECRET);
  req.user = decoded;
  next();
}
</code_example>

<constraints>
- Point out any security risks
- Suggest PCI DSS compliant alternatives
- Consider timing attacks and token leakage
</constraints>

<output>
Provide:
1. List of security issues found
2. Severity rating for each (Critical/High/Medium/Low)
3. Specific code fixes with examples
4. Additional security hardening recommendations
</output>

Example 2: Feature Implementation with Examples

<instruction>
Add a rate limiting system to our API endpoints
</instruction>

<context>
Current stack: Express.js + Redis
No rate limiting currently exists
Experiencing API abuse from specific IPs
</context>

<requirements>
- 100 requests per minute per IP for authenticated users
- 20 requests per minute per IP for unauthenticated
- Custom limits for premium users (stored in database)
- Return 429 status with Retry-After header
</requirements>

<code_example>
// Existing middleware pattern we use
app.use(authenticate);
app.use(authorize(['admin', 'user']));
</code_example>

<constraints>
- Must not impact existing API performance
- Redis connection should be reused
- Handle Redis connection failures gracefully
</constraints>

<output>
Provide:
1. Rate limiter middleware implementation
2. Redis configuration
3. Unit tests
4. Documentation for the team
</output>

Example 3: Bug Investigation with State

<task>
Investigate why user sessions are expiring prematurely
</task>

<problem>
Users report being logged out after 5-10 minutes of activity,
but session timeout is configured for 24 hours.
</problem>

<context>
- Next.js 14 App Router with next-auth
- PostgreSQL session store
- Load balanced across 3 servers
- Issue started after deploying v2.3.0 last week
</context>

<state>
Git diff between v2.2.0 (working) and v2.3.0 (broken) shows changes to:
- middleware.ts (session refresh logic)
- auth.config.ts (session strategy)
- database.ts (connection pooling)
</state>

<constraints>
- Don't suggest reverting the deploy
- Production issue, needs quick resolution
- Must maintain session security
</constraints>

<output>
Provide:
1. Root cause hypothesis
2. Files to investigate (in priority order)
3. Debugging commands to run
4. Potential fixes with trade-offs
</output>

Advanced Patterns

Nested Tags for Complex Hierarchy:

<task>
Refactor authentication system
  <subtask priority="high">
    Update user model
    <constraints>
      - Preserve existing user IDs
      - Add migration for email verification
    </constraints>
  </subtask>

  <subtask priority="medium">
    Implement OAuth providers
    <requirements>
      - Google and GitHub OAuth
      - Reuse existing session logic
    </requirements>
  </subtask>
</task>

Multiple Examples with Labels:

<code_example label="current_implementation">
  // Old approach with callback hell
  getUser(id, (user) => {
    getOrders(user.id, (orders) => {
      res.json({ user, orders });
    });
  });
</code_example>

<code_example label="desired_pattern">
  // New async/await pattern
  const user = await getUser(id);
  const orders = await getOrders(user.id);
  res.json({ user, orders });
</code_example>

Conditional Instructions:

<instruction>
Optimize database query performance
</instruction>

<context>
Query currently takes 2.5 seconds for 10,000 records
</context>

<constraints>
  <if condition="PostgreSQL">
    - Use EXPLAIN ANALYZE
    - Consider materialized views
  </if>

  <if condition="MySQL">
    - Use EXPLAIN with query plan analysis
    - Consider query cache
  </if>
</constraints>

When to Use XML-Structured Prompts

Scenario	Recommended?	Why
Simple one-liner requests	❌ No	Overhead outweighs benefit
Multi-step feature implementation	✅ Yes	Separates goals, constraints, examples
Bug investigation with context	✅ Yes	Distinguishes symptoms from environment
Code review with specific criteria	✅ Yes	Clear separation of code, context, requirements
Architecture planning	✅ Yes	Organizes goals, constraints, trade-offs
Quick typo fix	❌ No	Unnecessary complexity

Best Practices

Do’s:

✅ Use descriptive tag names that clarify purpose
✅ Keep tags consistent across similar requests
✅ Combine with CLAUDE.md for project-specific tag conventions
✅ Nest tags logically when representing hierarchy
✅ Use tags to separate “what” from “why” from “how”

Don’ts:

❌ Over-structure simple requests (adds noise)
❌ Mix tag purposes (e.g., constraints inside code examples)
❌ Use generic tags (<tag>, <content>) without clear meaning
❌ Nest too deeply (>3 levels becomes hard to read)

Integration with CLAUDE.md

You can standardize XML tag usage in your project’s CLAUDE.md:

# XML Prompt Conventions

When making complex requests, use this structure:

<instruction>Main task</instruction>

<context>
  Project context and state
</context>

<code_example>
  Reference implementations
</code_example>

<constraints>
  Technical and business requirements
</constraints>

<output>
  Expected deliverables
</output>

## Project-Specific Tags

- `<api_design>` - API endpoint design specifications
- `<accessibility>` - WCAG requirements and ARIA considerations
- `<performance>` - Performance budgets and optimization goals

Combining with Other Features

XML + Plan Mode:

<instruction>Plan the migration from REST to GraphQL</instruction>

<context>
Currently 47 REST endpoints serving mobile and web clients
</context>

<constraints>
- Must maintain REST endpoints during transition (6-month overlap)
- Mobile app can't be force-updated immediately
</constraints>

<output>
Multi-phase migration plan with rollback strategy
</output>

Then use /plan to explore read-only before implementation.

XML + Cost Awareness:

For large requests, structure with XML to help Claude understand scope and estimate token usage:

<instruction>Analyze all TypeScript files for unused imports</instruction>

<scope>
  src/ directory (~200 files)
</scope>

<output_format>
  Summary report only (don't list every file)
</output_format>

This helps Claude optimize the analysis approach and reduce token consumption.

Example Template Library

Create reusable templates in claudedocs/templates/:

claudedocs/templates/code-review.xml:

<instruction>
Review the following code for quality and best practices
</instruction>

<context>
[Describe the component's purpose and architecture context]
</context>

<code_example>
[Paste code here]
</code_example>

<focus_areas>
- Security vulnerabilities
- Performance bottlenecks
- Maintainability issues
- Test coverage gaps
</focus_areas>

<output>
1. Issues found (categorized by severity)
2. Specific recommendations with code examples
3. Priority order for fixes
</output>

Usage:

cat claudedocs/templates/code-review.xml | \
  sed 's/\[Paste code here\]/'"$(cat src/auth.ts)"'/' | \
  claude -p "Process this review request"

Limitations and Considerations

Token overhead: XML tags consume tokens. For simple requests, natural language is more efficient.

Not required: Claude understands natural language perfectly well. Use XML when structure genuinely helps.

Consistency matters: If you use XML tags, be consistent. Mixing styles within a session can confuse context.

Learning curve: Team members need to understand the tag system. Document your conventions in CLAUDE.md.

💡 Pro tip: Start with natural language prompts. Introduce XML structure when:

Requests have 3+ distinct aspects (instruction + context + constraints)

Ambiguity causes Claude to misunderstand your intent

Creating reusable prompt templates

Working with junior developers who need structured communication patterns

Source: DeepTo Claude Code Guide - XML-Structured Prompts

2.8.1 Prompting as Provocation

The Claude Code team internally treats prompts as challenges to a peer, not instructions to an assistant. This subtle shift produces higher-quality outputs because it forces Claude to prove its reasoning rather than simply comply.

Three challenge patterns from the team:

1. The Gatekeeper — Force Claude to defend its work before shipping:

"Grill me on these changes and don't make a PR until I pass your test"

Claude reviews your diff, asks pointed questions about edge cases, and only proceeds when satisfied. This catches issues that passive review misses.

2. The Proof Demand — Require evidence, not assertions:

"Prove to me this works — show me the diff in behavior between main and this branch"

Claude runs both branches, compares outputs, and presents concrete evidence. Eliminates the “trust me, it works” failure mode.

3. The Reset — After a mediocre first attempt, invoke full-context rewrite:

"Knowing everything you know now, scrap this and implement the elegant solution"

This forces a substantive second attempt with accumulated context rather than incremental patches on a weak foundation. The key insight: Claude’s second attempt with full context consistently outperforms iterative fixes.

Why this works: Provocation triggers deeper reasoning paths than polite requests. When Claude must convince rather than comply, it activates more thorough analysis and catches its own shortcuts.

Source: 10 Tips from Inside the Claude Code Team (Boris Cherny thread, Feb 2026)

2.9 Semantic Anchors

LLMs are statistical pattern matchers trained on massive text corpora. Using precise technical vocabulary helps Claude activate the right patterns in its training data, leading to higher-quality outputs.

Why Precision Matters

When you say “clean code”, Claude might generate any of dozens of interpretations. But when you say “SOLID principles with dependency injection following Clean Architecture layers”, you anchor Claude to a specific, well-documented pattern from its training.

Key insight: Technical terms act as GPS coordinates into Claude’s knowledge. The more precise, the better the navigation.

Common Anchors for Claude Code

Vague Term	Semantic Anchor	Why It Helps
”error handling"	"Railway Oriented Programming with Either/Result monad”	Activates functional error patterns
”clean code"	"SOLID principles, especially SRP and DIP”	Targets specific design principles
”good tests"	"TDD London School with outside-in approach”	Specifies test methodology
”good architecture"	"Hexagonal Architecture (Ports & Adapters)“	Names a concrete pattern
”readable code"	"Screaming Architecture with intention-revealing names”	Triggers specific naming conventions
”scalable design"	"CQRS with Event Sourcing”	Activates distributed patterns
”documentation"	"arc42 template structure”	Specifies documentation framework
”requirements"	"EARS syntax for requirements (Easy Approach to Requirements)“	Targets requirement format
”API design"	"REST Level 3 with HATEOAS”	Specifies maturity level
”security"	"OWASP Top 10 mitigations”	Activates security knowledge

How to Use in CLAUDE.md

Add semantic anchors to your project instructions:

# Architecture Principles

Follow these patterns:
- **Architecture**: Hexagonal Architecture (Ports & Adapters) with clear domain boundaries
- **Error handling**: Railway Oriented Programming - never throw, return Result<T, E>
- **Testing**: TDD London School - mock collaborators, test behaviors not implementations
- **Documentation**: ADR (Architecture Decision Records) for significant choices

Combining with XML Tags

Semantic anchors work powerfully with XML-structured prompts (Section 2.8):

<instruction>
  Refactor the user service following Domain-Driven Design (Evans)
</instruction>

<constraints>
  - Apply Hexagonal Architecture (Ports & Adapters)
  - Use Repository pattern for persistence
  - Implement Railway Oriented Programming for error handling
  - Follow CQRS for read/write separation
</constraints>

<quality_criteria>
  - Screaming Architecture: package structure reveals intent
  - Single Responsibility Principle per class
  - Dependency Inversion: depend on abstractions
</quality_criteria>

Semantic Anchors by Domain

Testing:

TDD London School (mockist) vs Chicago School (classicist)
Property-Based Testing (QuickCheck-style)
Mutation Testing (PIT, Stryker)
BDD Gherkin syntax (Given/When/Then)

Architecture:

Hexagonal Architecture (Ports & Adapters)
Clean Architecture (Onion layers)
CQRS + Event Sourcing
C4 Model (Context, Container, Component, Code)

Design Patterns:

Gang of Four patterns (specify: Strategy, Factory, Observer…)
Domain-Driven Design tactical patterns (Aggregate, Repository, Domain Event)
Functional patterns (Monad, Functor, Railway)

Requirements:

EARS (Easy Approach to Requirements Syntax)
User Story Mapping (Jeff Patton)
Jobs-to-be-Done framework
BDD scenarios

💡 Pro tip: When Claude produces generic code, try adding more specific anchors. “Use clean code” → “Apply Martin Fowler’s Refactoring catalog, specifically Extract Method and Replace Conditional with Polymorphism.”

Full catalog: See examples/semantic-anchors/anchor-catalog.md for a comprehensive reference organized by domain.

Source: Concept by Alexandre Soyer. Original catalog: github.com/LLM-Coding/Semantic-Anchors (Apache-2.0)

2.10 Prompt Engineering Patterns

Two prompt-level techniques that close the gap between well-structured prompts and reliably accurate outputs: few-shot examples for calibrating format and style, and validation retry loops for catching and correcting extraction failures.

Few-Shot Prompting

Few-shot examples show the model what correct output looks like before it processes the actual input. They are most effective for establishing output format, tone calibration, and input-specific processing style. They cannot enforce business rules or guarantee compliance; use schema validation for that.

Optimal count: 2-4 examples. Below 2, the pattern is too weak to anchor behavior. Above 4, the examples consume context budget without proportional improvement, and the model may pattern-match too literally on superficial features.

Message-pair format for tool use:

When the task involves tool calls, examples must include the full exchange, not just user inputs and final text outputs:

messages = [
    # Example 1
    {"role": "user", "content": "Invoice: Acme Corp, 15 Jan 2025, $4,200.00"},
    {"role": "assistant", "content": [
        {
            "type": "tool_use",
            "id": "toolu_01",
            "name": "extract_invoice",
            "input": {
                "vendor": "Acme Corp",
                "date": "2025-01-15",
                "amount": 4200.00
            }
        }
    ]},
    {"role": "user", "content": [{"type": "tool_result", "tool_use_id": "toolu_01", "content": "OK"}]},
    # Example 2 (null handling)
    {"role": "user", "content": "Invoice: no vendor listed, 22 Feb 2025, €892"},
    {"role": "assistant", "content": [
        {
            "type": "tool_use",
            "id": "toolu_02",
            "name": "extract_invoice",
            "input": {
                "vendor": null,
                "date": "2025-02-22",
                "amount": 892.00
            }
        }
    ]},
    {"role": "user", "content": [{"type": "tool_result", "tool_use_id": "toolu_02", "content": "OK"}]},
    # Actual task
    {"role": "user", "content": f"Invoice: {actual_invoice_text}"}
]

The second example above demonstrates null handling explicitly. Without it, the model may invent a vendor name rather than returning null for an ambiguous field.

Calibrating false positive rates:

In CI-style review tasks (security scanning, code quality checks, compliance checks), false positives destroy trust faster than false negatives. A few-shot example set that includes a near-miss that should NOT trigger an alert teaches the boundary explicitly:

# In the system prompt or early in the conversation:
CALIBRATION_EXAMPLES = """
Examples of what triggers a HIGH severity flag vs what does not:

Example 1 (HIGH, triggers):
Input: SELECT * FROM users WHERE id = ' + user_input + '
Reason: Direct string concatenation in SQL, classic injection vector.

Example 2 (NOT flagged, near-miss):
Input: query = f"SELECT * FROM users WHERE id = {user_id}"
Reason: f-string with a typed integer variable. No injection risk if user_id is
validated upstream. Untyped string concatenation = flag; typed variable
interpolation = safe.

Example 3 (HIGH, triggers):
Input: os.system(request.GET['cmd'])
Reason: Direct shell execution from unsanitized request parameter.
"""

Near-miss examples reduce false positive rates by teaching the model where the actual boundary sits, not just what a clear violation looks like.

Limits of few-shot:

Few-shot examples teach style and format. They cannot enforce schema constraints (use strict: true for that), guarantee business rule compliance (use a programmatic validator), or replace explicit instructions. If the rule needs to hold without exception, state it as a rule, not just as an example.

Validation Retry Loop

The validation retry loop catches structured extraction failures programmatically and feeds specific error feedback back to the model for regeneration, rather than silently discarding bad output or failing the whole task.

Three-attempt budget:

from dataclasses import dataclass

@dataclass
class ExtractionResult:
    success: bool
    data: dict | None
    error: str | None
    attempts: int

def extract_with_retry(
    client,
    document: str,
    schema_validator,
    max_attempts: int = 3
) -> ExtractionResult:
    messages = [{"role": "user", "content": f"Extract fields from:\n\n{document}"}]

    for attempt in range(1, max_attempts + 1):
        response = client.messages.create(
            model="claude-opus-4-5",
            max_tokens=1024,
            tools=[EXTRACTION_TOOL],
            tool_choice={"type": "tool", "name": "extract_fields"},
            messages=messages
        )

        raw_output = response.content[0].input
        errors = schema_validator.validate(raw_output)

        if not errors:
            return ExtractionResult(
                success=True, data=raw_output, error=None, attempts=attempt
            )

        if attempt == max_attempts:
            break

        # Feed specific errors back, not a generic "try again"
        error_feedback = format_errors(errors, raw_output)
        messages.extend([
            {"role": "assistant", "content": response.content},
            {
                "role": "user",
                "content": (
                    f"The extraction has {len(errors)} validation error(s):\n\n"
                    f"{error_feedback}\n\n"
                    f"Please correct these specific issues and re-extract."
                )
            }
        ])

    return ExtractionResult(
        success=False, data=None,
        error=f"Failed after {max_attempts} attempts: {errors}",
        attempts=max_attempts
    )

def format_errors(errors, raw_output):
    lines = []
    for err in errors:
        lines.append(f"- Field `{err.field}`: {err.message}")
        if err.field in raw_output:
            lines.append(f"  Got: {raw_output[err.field]!r}")
    return "\n".join(lines)

The feedback triple matters. Effective retry feedback includes (1) the original document excerpt where the issue occurred, (2) the failed JSON the model produced, and (3) a specific error list per field. Vague feedback (“please try again, there were errors”) degrades into random variation. Specific feedback (“field date expected ISO 8601, got ‘15th January’”) almost always resolves on the second attempt.

When source data is absent:

If the document genuinely does not contain the required field, a retry loop will not help. The model will keep hallucinating or oscillating between null and invented values. Exit condition: if the same field fails twice with different invented values, mark it as absent_from_source and move on. Do not burn the full 3-attempt budget on a field that isn’t there.

def is_hallucination_cycle(error_history: list[dict], field: str) -> bool:
    values = [h.get(field) for h in error_history if h.get(field) is not None]
    # Two different non-null values for the same field across attempts = hallucination
    return len(set(str(v) for v in values)) > 1

Graceful degradation with human review routing:

When the retry budget is exhausted, route to human review rather than silently dropping the document:

result = extract_with_retry(client, document, validator)

if not result.success:
    human_review_queue.append({
        "document_id": doc_id,
        "document": document,
        "last_attempt": result.error,
        "attempts": result.attempts,
        "requires_human": True
    })
    metrics.increment("extraction.failed", tags={"reason": "max_retries"})

Self-Review Contamination

Asking the same model instance that generated an output to review its own output produces results with a 15-30% self-preference bias: the model tends to agree with itself, finding the generated content “correct” at rates above what independent reviewers would. The context window shared between generation and review is the contamination vector.

Mitigation: independent review instance

# Generation pass
generation_response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=2048,
    messages=[
        {"role": "user", "content": f"Analyze this contract:\n\n{contract_text}"}
    ]
)
analysis = generation_response.content[0].text

# Review pass: fresh conversation, no generation context
review_response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=1024,
    system="You are a critical reviewer. Identify gaps, errors, and unsupported claims.",
    messages=[
        {
            "role": "user",
            "content": (
                f"Review this contract analysis for accuracy:\n\n"
                f"CONTRACT:\n{contract_text}\n\n"
                f"ANALYSIS TO REVIEW:\n{analysis}\n\n"
                f"Identify any errors, gaps, or claims not supported by the contract text."
            )
        }
    ]
)

The review instance receives the source document and the generated analysis but has no memory of generating it. This eliminates the self-preference bias almost entirely. Use it for high-stakes extraction (legal, financial, medical), customer-facing content where errors damage trust, and any task where undetected hallucination is not acceptable.

Inline Reasoning for Triage

For borderline classifications, a reasoning field in the output schema surfaces the evidence chain that produced the classification. This is not chain-of-thought prompting; it is a structured output field that forces the model to articulate the key evidence before committing to a label.

{
    "classification": {"type": "string", "enum": ["urgent", "standard", "low"]},
    "reasoning": {
        "type": "string",
        "description": "The specific evidence from the input that determined this classification"
    },
    "confidence": {"type": "number"}
}

When classification: "urgent" and reasoning: "customer explicitly states production outage affecting 10,000 users", a downstream filter can verify the classification in one read. When reasoning is vague or circular (“classified as urgent because it seems urgent”), that is a reliable signal to escalate for human review regardless of the confidence score.

2.11 Structured Outputs & Schema Design

Schema design determines how much of the extraction burden falls on the model versus on downstream validation. Good schemas express what the model genuinely knows; bad schemas force the model to invent values for fields it cannot find.

Confidence Calibration

A confidence score of 0.9 means nothing without a labeled validation set showing that fields labeled 0.9 by this model are actually correct 90% of the time. Uncalibrated confidence scores create a false sense of accuracy.

Building a calibration baseline:

from collections import defaultdict

def calibrate_confidence(
    model_outputs: list[dict],
    ground_truth: list[dict],
    field: str,
    bucket_size: float = 0.1
) -> dict:
    buckets = defaultdict(lambda: {"correct": 0, "total": 0})

    for output, truth in zip(model_outputs, ground_truth):
        conf = output.get("confidence", 0.5)
        bucket = round(conf / bucket_size) * bucket_size
        buckets[bucket]["total"] += 1
        if output.get(field) == truth.get(field):
            buckets[bucket]["correct"] += 1

    return {
        bucket: {
            "accuracy": data["correct"] / data["total"] if data["total"] > 0 else 0,
            "samples": data["total"]
        }
        for bucket, data in sorted(buckets.items())
    }

# Example output:
# {0.9: {"accuracy": 0.91, "samples": 234}}  <- well-calibrated
# {0.9: {"accuracy": 0.63, "samples": 234}}  <- overconfident, needs adjustment

Per-field thresholds:

Different fields have different error costs. A wrong vendor name on an invoice is annoying; a wrong total amount is a financial error. Set per-field confidence thresholds that route to human review when not met:

REVIEW_THRESHOLDS = {
    "vendor_name": 0.70,
    "invoice_date": 0.80,
    "total_amount": 0.95,  # high bar: financial field
    "line_items": 0.85
}

def needs_review(extraction: dict, confidence_scores: dict) -> list[str]:
    return [
        field
        for field, threshold in REVIEW_THRESHOLDS.items()
        if confidence_scores.get(field, 0) < threshold
    ]

Accuracy vs confidence plots:

Plot model confidence on the x-axis against actual accuracy on the y-axis. A perfectly calibrated model follows the diagonal. Systematic overconfidence shows up as a curve below the diagonal; systematic underconfidence shows as a curve above it. Both can be corrected with temperature adjustment or post-hoc calibration (Platt scaling).

Calibrate on at least 200 labeled examples per field to get statistically meaningful buckets. Below 100 examples, bucket accuracy estimates are too noisy to act on.

2.12 Data Flow & Privacy

Important: Everything you share with Claude Code is sent to Anthropic servers. Understanding this data flow is critical for protecting sensitive information.

What Gets Sent to Anthropic

When you use Claude Code, the following data leaves your machine:

Data Type	Example	Risk Level
Your prompts	”Fix the login bug”	Low
Files Claude reads	`.env`, `src/app.ts`	High if contains secrets
MCP query results	SQL query results with user data	High if production data
Command outputs	`env \| grep API` output	Medium
Error messages	Stack traces with file paths	Low

Retention Policies

Configuration	Retention	How to Enable
Default	5 years	(default state - training enabled)
Opt-out	30 days	claude.ai/settings
Enterprise (ZDR)	0 days	Enterprise contract

Immediate action: Disable training data usage to reduce retention from 5 years to 30 days.

Protecting Sensitive Data

1. Block access to sensitive files in .claude/settings.json:

{
  "permissions": {
    "deny": [
      "Read(./.env*)",
      "Edit(./.env*)",
      "Write(./.env*)",
      "Bash(cat .env*)",
      "Bash(head .env*)",
      "Read(./secrets/**)",
      "Read(./**/*.pem)",
      "Read(./**/*.key)",
      "Read(./**/credentials*)"
    ]
  }
}

Warning: permissions.deny has known limitations. See Security Hardening Guide for details.

2. Never connect production databases to MCP servers. Use dev/staging with anonymized data.

3. Use security hooks to block reading of sensitive files (see Section 7.4).

Full guide: For complete privacy documentation including known risks, community incidents, and enterprise considerations, see Data Privacy & Retention Guide.

2.13 Under the Hood

Reading time: 5 minutes Goal: Understand the core architecture that powers Claude Code

This section provides a summary of Claude Code’s internal mechanisms. For the complete technical deep-dive with diagrams and source citations, see the Architecture & Internals Guide.

The Master Loop

At its core, Claude Code is a simple while loop:

┌─────────────────────────────────────────────────────────────┐
│                    MASTER LOOP (simplified)                 │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   Your Prompt                                               │
│       │                                                     │
│       ▼                                                     │
│   ┌────────────────────────────────────────────────────┐    │
│   │   Claude Reasons (no classifier, no router)        │    │
│   └───────────────────────┬────────────────────────────┘    │
│                           │                                 │
│              Tool needed? │                                 │
│                     ┌─────┴─────┐                           │
│                    YES         NO                           │
│                     │           │                           │
│                     ▼           ▼                           │
│              Execute Tool    Text Response (done)           │
│                     │                                       │
│                     └──────── Feed result back to Claude    │
│                                        │                    │
│                               (loop continues)              │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Source: Anthropic Engineering Blog

There is no:

Intent classifier or task router
RAG/embedding pipeline
DAG orchestrator
Planner/executor split

The model itself decides when to call tools, which tools to call, and when it’s done.

The Tool Arsenal

Claude Code has 8 core tools:

Tool	Purpose
`Bash`	Execute shell commands (universal adapter)
`Read`	Read file contents (max 2000 lines)
`Edit`	Modify existing files (diff-based)
`Write`	Create/overwrite files
`Grep`	Search file contents (ripgrep-based)
`Glob`	Find files by pattern
`Task`	Spawn sub-agents (isolated context)
`TodoWrite`	Track progress (legacy, see below)

How tool execution works: Claude Code can start executing tools marked as concurrency-safe (read-only operations like Read, Grep, Glob) while the model is still generating its response, reducing total turn time. Non-concurrent tools (writes, bash commands) wait for the response to complete and run serially. When multiple read-only tools appear in a single response, they run in parallel — up to 10 concurrent by default.

Task Management System

Version: Claude Code v2.1.16+ introduced a new task management system

Claude Code provides two task management approaches:

Feature	TodoWrite (Legacy)	Tasks API (v2.1.16+)
Persistence	Session memory only	Disk storage (`~/.claude/tasks/`)
Multi-session	❌ Lost on session end	✅ Survives across sessions
Dependencies	❌ Manual ordering	✅ Task blocking (A blocks B)
Coordination	Single agent	✅ Multi-agent broadcast
Status tracking	pending/in_progress/completed	pending/in_progress/completed/failed
Description visibility	✅ Always visible	⚠️ TaskGet only (not in TaskList)
Metadata visibility	N/A	❌ Never visible in outputs
Multi-call overhead	None	⚠️ 1 + N calls for N full tasks
Enabled by	Always available	Default since v2.1.19

Tasks API (v2.1.16+)

Available tools:

TaskCreate - Initialize new tasks with hierarchy and dependencies
TaskUpdate - Modify task status, metadata, and dependencies
TaskGet - Retrieve individual task details
TaskList - List all tasks in current task list
~~TaskOutput~~ — Deprecated (v2.1.83+). Use Read on .claude/tasks/<id>/output.log to access task output directly.

Core capabilities:

Persistent storage: Tasks saved to ~/.claude/tasks/<task-list-id>/
Multi-session coordination: Share state across multiple Claude sessions
Dependency tracking: Tasks can block other tasks (task A blocks task B)
Status lifecycle: pending → in_progress → completed/failed
Metadata: Attach custom data (priority, estimates, related files, etc.)

Configuration:

# Enable multi-session task persistence
export CLAUDE_CODE_TASK_LIST_ID="project-name"
claude

# Example: Project-specific task list
export CLAUDE_CODE_TASK_LIST_ID="api-v2-auth-refactor"
claude

⚠️ Important: Use repository-specific task list IDs to avoid cross-project contamination. Tasks with the same ID are shared across all sessions using that ID.

Task schema example:

{
  "id": "task-auth-login",
  "title": "Implement login endpoint",
  "description": "POST /auth/login with JWT token generation",
  "status": "in_progress",
  "dependencies": [],
  "metadata": {
    "priority": "high",
    "estimated_duration": "2h",
    "related_files": ["src/auth/login.ts", "src/middleware/auth.ts"]
  }
}

When to use Tasks API:

Projects spanning multiple coding sessions
Complex task hierarchies with dependencies
Multi-agent coordination scenarios
Need to resume work after context compaction

⚠️ Tasks API Limitations (Critical)

Field visibility constraint:

Tool	Visible Fields	Hidden Fields
`TaskList`	`id`, `subject`, `status`, `owner`, `blockedBy`	`description`, `activeForm`, `metadata`
`TaskGet`	All fields	-

Impact:

Multi-call overhead: Reviewing 10 task descriptions = 1 TaskList + 10 TaskGet calls (11x overhead)
No metadata scanning: Cannot filter/sort by custom fields (priority, estimates, tags) without fetching all tasks individually
Session resumption friction: Cannot glance at all task notes to decide where to resume

Cost example:

# Inefficient (if you need descriptions)
TaskList  # Returns 10 tasks (no descriptions)
TaskGet(task-1), TaskGet(task-2), ..., TaskGet(task-10)  # 10 additional calls

# Total: 11 API calls to review 10 tasks

Workaround patterns:

Hybrid approach (Recommended):
- Use Tasks API for status tracking and dependency coordination
- Maintain markdown files in repo for detailed implementation plans
- Example: docs/plans/auth-refactor.md + Tasks for status
Subject-as-summary pattern:
- Store critical info in subject field (always visible in TaskList)
- Keep description for deep context (fetch on-demand with TaskGet)
- Example subjects: "[P0] Fix login bug (src/auth.ts:45)" vs "Fix bug"
Selective fetching:
- Use TaskList to identify tasks needing attention (status, blockedBy)
- Only call TaskGet for tasks you’re actively working on

Source: Community practitioner feedback (Gang Rui, Jan 2026)

TodoWrite (Legacy)

Tool: TodoWrite - Creates task lists stored in session memory

Capabilities:

Simple task tracking within a single session
Status tracking: pending/in_progress/completed
Lost when session ends or context is compacted

When to use TodoWrite:

Single-session, straightforward implementations
Quick fixes or exploratory coding
Claude Code < v2.1.16
Prefer simplicity over persistence

Migration flag (v2.1.19+):

# Temporarily revert to TodoWrite system
CLAUDE_CODE_ENABLE_TASKS=false claude

# Use new Tasks API (default)
claude

Best Practices

Task hierarchy design:

Project (parent)
└── Feature A (child)
    ├── Component A1 (leaf task)
    │   ├── Implementation
    │   └── Tests (depends on Implementation)
    └── Component A2

Dependency management:

Always define dependencies when creating tasks
Use task IDs (not titles) for dependency references
Verify dependencies with TaskGet before execution

Status transitions:

Mark in_progress when starting work (prevents parallel execution)
Update frequently for visibility
Only mark completed when fully accomplished (tests passing, validated)
Use failed status with error metadata for debugging

Metadata conventions:

{
  "priority": "high|medium|low",
  "estimated_duration": "2h",
  "related_files": ["path/to/file.ts"],
  "related_issue": "https://github.com/org/repo/issues/123",
  "type": "feature|bugfix|refactor|test"
}

Task Lists as Diagnostic Tool

The Diagnostic Principle: When Claude’s task list doesn’t match your intent, the problem isn’t Claude—it’s your instructions.

Task lists act as a mirror for instruction clarity. If you ask Claude to plan a feature and the resulting tasks surprise you, that divergence is diagnostic information:

Your instruction: "Refactor the auth system"

Claude's task list:
- [ ] Read all auth-related files
- [ ] Identify code duplication
- [ ] Extract shared utilities
- [ ] Update imports
- [ ] Run tests

Your reaction: "That's not what I meant—I wanted to switch from session to JWT"

Diagnosis: Your instruction was ambiguous. "Refactor" ≠ "replace".

Divergence patterns and what they reveal:

Divergence Type	What It Means	Fix
Tasks too broad	Instructions lack specificity	Add WHAT, WHERE, HOW, VERIFY
Tasks too narrow	Instructions too detailed, missing big picture	State the goal, not just the steps
Wrong priorities	Context missing about what matters	Add constraints and priorities
Missing tasks	Implicit knowledge not shared	Make assumptions explicit in prompt
Extra tasks	Claude inferred requirements you didn’t intend	Add explicit scope boundaries

Using task divergence as a workflow:

## Step 1: Seed with loose instruction
User: "Improve the checkout flow"

## Step 2: Review Claude's task list (don't execute yet)
Claude generates: [task list]

## Step 3: Compare against your mental model
- Missing: payment retry logic? → Add to instructions
- Unexpected: UI redesign? → Clarify scope (backend only)
- Wrong order: tests last? → Specify TDD approach

## Step 4: Refine and re-plan
User: "Actually, here's what I need: [refined instruction with specifics]"

Pro tip: Run TaskList after initial planning as a sanity check before execution. If more than 30% of tasks surprise you, your prompt needs work. Iterate on the prompt, not the tasks.

Complete Workflow

→ See: Task Management Workflow for:

Task planning phase (decomposition, hierarchy design)
Task execution patterns
Session management and resumption
Integration with TDD and Plan-Driven workflows
TodoWrite migration guide
Patterns, anti-patterns, and troubleshooting

Sources

Official: Claude Code CHANGELOG v2.1.16 - “new task management system with dependency tracking”
Official: System Prompts - TaskCreate (extracted from Claude Code source)
Community: paddo.dev - From Beads to Tasks
Community: llbbl.blog - Two Changes in Claude Code

Context Management

Claude Code operates within a 200K token context window (1M beta available via API — see [200K vs 1M comparison](line 1751)):

Component	Approximate Size
System prompt	5-15K tokens
CLAUDE.md files	1-10K tokens
Conversation history	Variable
Tool results	Variable
Reserved for response	40-45K tokens

When context fills up (~75% in VS Code, ~95% in CLI), older content is automatically summarized. However, research shows this degrades quality (50-70% performance drop on complex tasks). Use /compact proactively at logical breakpoints, or trigger session handoffs at 85% to preserve intent over compressed history. See [Session Handoffs](line 2140) and Auto-Compaction Research.

Sub-Agent Isolation

The Task tool spawns sub-agents with:

Their own fresh context window
Access to the same tools (except Task itself)
Maximum depth of 1 (cannot spawn sub-sub-agents)
Only their summary text returns to the main context

This prevents context pollution during exploratory tasks.

TeammateTool (Experimental)

Status: Partially feature-flagged, progressive rollout in progress.

TeammateTool enables multi-agent orchestration with persistent communication between agents. Unlike standard sub-agents that work in isolation, teammates can coordinate through structured messaging.

Core Capabilities:

Operation	Purpose
`spawnTeam`	Create a named team of agents
`discoverTeams`	List available teams
`requestJoin`	Agent requests to join a team
`approveJoin`	Team leader approves join requests
Messaging	JSON-based inter-agent communication

Execution Backends (auto-detected):

In-process: Async tasks in same Node.js process (fastest)
tmux: Persistent terminal sessions (survives disconnects)
iTerm2: Visual split panes (macOS only)

Patterns:

Parallel Specialists Pattern:
Leader spawns 3 teammates → Each reviews different aspect (security, perf, architecture)
→ Teammates work concurrently → Report back to leader → Leader synthesizes

Swarm Pattern:
Leader creates shared task queue → Teammates self-organize and claim tasks
→ Independent execution → Async updates to shared state

Limitations:

5-minute heartbeat timeout before auto-removal
Cannot cleanup teams while teammates are active
Feature flags not officially documented (community-discovered)
No official Anthropic support for experimental features

When to Use:

Large codebases requiring parallel analysis (4+ aspects)
Long-running workflows with independent sub-tasks
Code reviews with multiple specialized concerns

When NOT to Use:

Simple tasks (overhead not justified)
Sequential dependencies (standard sub-agents sufficient)
Production-critical workflows (experimental = unstable)

Sources:

Community: kieranklaassen - TeammateTool Guide
Community: GitHub Issue #3013 - Parallel Agent Execution
Community: mikekelly/claude-sneakpeek - Parallel build with feature flags enabled

⚠️ Note: This is an experimental feature. Capabilities may change or be removed in future releases. Always verify current behavior with official documentation.

Agent Anti-Patterns: Roles vs Context Control

“Subagents are not for anthropomorphizing roles, they are for controlling context” - Dex Horty

Common Mistake: Creating agents as if building a human team with job titles.

❌ Wrong (Anthropomorphizing):

- Frontend Agent (role: UI developer)
- Backend Agent (role: API engineer)
- QA Agent (role: tester)
- Security Agent (role: security expert)

Why this fails: Agents aren’t humans with expertise areas. They’re context isolation tools for computational efficiency.

✅ Right (Context Control):

- Agent for isolated dependency analysis (scope: package.json + lock files only)
- Agent for parallel file processing (scope: batch edits without main context pollution)
- Agent for fresh security audit (scope: security-focused analysis without prior assumptions)
- Agent for independent module testing (scope: test execution without interfering with main workflow)

Key differences:

Anthropomorphizing (Wrong)	Context Control (Right)
“Security expert agent"	"Security audit with isolated context"
"Frontend developer agent"	"UI component analysis (scope: src/components/ only)"
"Code reviewer agent"	"PR review without main context pollution”
Mimics human team structure	Optimizes computational resources
Based on job roles	Based on scope/context boundaries

When to use agents (good reasons):

Isolate context: Prevent pollution of main conversation context
Parallel processing: Independent operations that can run concurrently
Scope limitation: Restrict analysis to specific files/directories
Fresh perspective: Analyze without baggage from previous reasoning
Resource optimization: Offload heavy operations to separate context window

When NOT to use agents (bad reasons):

❌ Creating a fake team with job titles
❌ Roleplaying different “expertise” personas
❌ Mimicking human organizational structure
❌ Splitting work by discipline (frontend/backend/QA) instead of by context boundaries

Scope-Focused Agents

Beyond generic sub-agents, scope-focused orchestration assigns distinct context boundaries to different agents for multi-perspective analysis.

The Pattern: Instead of one agent reviewing everything, spawn scope-isolated agents that each analyze distinct aspects with fresh context:

User: Review the new payment service using scope-focused analysis:

Agent 1 (Security Scope): Analyze authentication, input validation,
  injection vectors, secret handling, PCI DSS compliance.
  Context: src/payment/, src/auth/, config/security.yml

Agent 2 (Performance Scope): Analyze database queries, N+1 problems,
  caching opportunities, response time bottlenecks.
  Context: src/payment/repository/, src/database/, slow query logs

Agent 3 (API Design Scope): Analyze error messages, response format
  consistency, API discoverability, documentation completeness.
  Context: src/payment/api/, docs/api/, tests/integration/

Synthesize all three scoped analyses into a unified review with
prioritized action items.

Implementation with Custom Agents:

---
name: security-audit
model: opus
tools: Read, Grep, Glob
---
Analyze code for security issues with isolated context:
- OWASP Top 10 vulnerabilities
- Authentication/authorization flaws
- Input validation gaps
- Secret exposure risks

Scope: Security-focused analysis only. Report findings with severity
ratings (Critical/High/Medium/Low) without considering performance
or UX trade-offs.

---
name: perf-audit
model: sonnet
tools: Read, Grep, Glob, Bash
---
Analyze code for performance bottlenecks with isolated context:
- Database query efficiency (N+1, missing indexes)
- Memory leaks and resource management
- Caching opportunities
- Algorithmic complexity issues

Scope: Performance-focused analysis only. Report findings with estimated
impact (High/Medium/Low) without considering security or maintainability
trade-offs.

When to use scope-focused agents:

Analysis requiring 3+ distinct context boundaries (security scope, perf scope, API scope)
Competing concerns that benefit from isolated evaluation (performance vs. security vs. DX)
Large codebases where full context would pollute analysis of specific aspects

When NOT to use scope-focused agents:

Simple reviews (one agent with full context covers all aspects)
Time-constrained situations (overhead of synthesis outweighs benefit)
Tasks where scopes aren’t genuinely independent (overlapping context needed)

The Philosophy

“Do more with less. Smart architecture choices, better training efficiency, and focused problem-solving can compete with raw scale.” — Daniela Amodei, Anthropic CEO

Claude Code trusts the model’s reasoning instead of building complex orchestration systems. This means:

Fewer components = fewer failure modes
Model-driven decisions = better generalization
Simple loop = easy debugging

Learn More

Topic	Where
Full architecture details	Architecture & Internals Guide
Permission system	Section 7 - Hooks
MCP integration	Section 8.6 - MCP Security
Context management tips	Section 2.2

3. Memory & Settings

Quick jump: Memory Files (CLAUDE.md) · .claude/ Folder Structure · Settings & Permissions · Precedence Rules