9. Advanced Patterns
📌 Section 9 TL;DR (3 minutes)
Section titled “📌 Section 9 TL;DR (3 minutes)”What you’ll learn: Production-grade workflows that combine multiple Claude Code features.
Pattern Categories:
Section titled “Pattern Categories:”🎯 The Trinity (9.1) — Ultimate workflow: Plan Mode → Extended Thinking → Sequential MCP
- When: Architecture decisions, complex refactoring, critical systems
- Why: Maximum reasoning power + safe exploration
🔄 Integration Patterns (9.2-9.4)
- Composition: Agents + Skills + Hooks working together
- CI/CD: GitHub Actions, automated reviews, quality gates
- IDE: VS Code + Claude Code = seamless flow
⚡ Productivity Patterns (9.5-9.8)
- Tight feedback loops: Test-driven with instant validation
- Todo as mirrors: Keep context aligned with reality
- Vibe coding: Skeleton → iterate → production
🎨 Quality Patterns (9.9-9.11)
- Batch operations: Process multiple files efficiently
- Continuous improvement: Refine over multiple sessions
- Common pitfalls: Learn from mistakes (Do/Don’t lists)
When to Use This Section:
Section titled “When to Use This Section:”- ✅ You’re productive with basics and want mastery
- ✅ You’re setting up team workflows or CI/CD
- ✅ You hit limits of simple “ask Claude” approach
- ❌ You’re still learning basics (finish Sections 1-8 first)
Reading time: 20 minutes Skill level: Month 1+ Goal: Master power-user techniques
🌍 Industry Context: 2026 Agentic Coding Trends
Section titled “🌍 Industry Context: 2026 Agentic Coding Trends”Source: Anthropic “2026 Agentic Coding Trends Report” (Feb 2026)
Les patterns de cette section reflètent l’évolution de l’industrie documentée par Anthropic auprès de 5000+ organisations.
📊 Données d’Adoption Validées
Section titled “📊 Données d’Adoption Validées”| Pattern | Adoption Timeline | Productivity Gain | Business Impact |
|---|---|---|---|
| Agent Teams (9.20) | 3-6 mois | 50-67% | Timeline: semaines → jours |
| Multi-Instance (9.17) | 1-2 mois | 2x output | Cost: $500-1K/month |
| Sandbox Isolation (guide/sandbox-native.md) | Immediate | Security baseline | Compliance requirement |
🎯 Research Insights (Anthropic Internal Study)
Section titled “🎯 Research Insights (Anthropic Internal Study)”- 60% of work uses AI (vs 0% en 2023)
- 0-20% “fully delegated” → Collaboration centrale, pas remplacement
- 67% more PRs merged per engineer per day
- 27% new work wouldn’t be done without AI (exploratory, nice-to-have)
⚠️ Anti-Patterns Entreprise
Section titled “⚠️ Anti-Patterns Entreprise”Over-Delegation (trop d’agents):
- Symptôme: Context switching cost > productivity gain
- Limite: >5 agents simultanés = coordination overhead
- Fix: Start 1-2 agents, scale progressivement
Premature Automation:
- Symptôme: Automatiser workflow non maîtrisé manuellement
- Fix: Manual → Semi-auto → Full-auto (progressive)
Tool Sprawl (MCP prolifération):
- Symptôme: >10 MCP servers, conflicts, maintenance burden
- Fix: Start core stack (Serena, Context7, Sequential), add selectively
📚 Case Studies Industrie
Section titled “📚 Case Studies Industrie”- Fountain (workforce mgmt): 50% faster screening via hierarchical multi-agent
- Rakuten (tech): 7h autonomous vLLM implementation (12.5M lines, 99.9% accuracy)
- CRED (fintech): 2x execution speed, quality maintained (15M users)
- TELUS (telecom): 500K hours saved, 13K custom solutions
- Zapier (automation): 89% adoption, 800+ internal agents
🔗 Navigation
Section titled “🔗 Navigation”Chaque pattern ci-dessous inclut:
- ✅ Industry validation (stats adoption, ROI)
- ✅ Practical guide (workflows step-by-step)
- ✅ Anti-patterns (pitfalls to avoid)
Full evaluation: docs/resource-evaluations/anthropic-2026-agentic-coding-trends.md
9.1 The Trinity
Section titled “9.1 The Trinity”The most powerful Claude Code pattern combines three techniques:
┌─────────────────────────────────────────────────────────┐│ THE TRINITY │├─────────────────────────────────────────────────────────┤│ ││ ┌─────────────┐ ││ │ Plan Mode │ Safe exploration without changes ││ └──────┬──────┘ ││ │ ││ ▼ ││ ┌─────────────┐ ││ │ Ext.Thinking│ Deep analysis (Opus 4.5/4.6, adaptive in 4.6) ││ └──────┬──────┘ ││ │ ││ ▼ ││ ┌─────────────────────┐ ││ │ Sequential Thinking │ Structured multi-step reason ││ └─────────────────────┘ ││ ││ Combined: Maximum understanding before action ││ │└─────────────────────────────────────────────────────────┘When to Use the Trinity
Section titled “When to Use the Trinity”| Situation | Use Trinity? |
|---|---|
| Fixing a typo | ❌ Overkill |
| Adding a feature | Maybe |
| Debugging complex issue | ✅ Yes |
| Architectural decision | ✅ Yes |
| Legacy system modernization | ✅ Yes |
Extended Thinking (Opus 4.5+) & Adaptive Thinking (Opus 4.6+)
Section titled “Extended Thinking (Opus 4.5+) & Adaptive Thinking (Opus 4.6+)”⚠️ Breaking Change (Opus 4.6, Feb 2026): Opus 4.6 replaces budget-based thinking with Adaptive Thinking, which automatically decides when to use deep reasoning based on query complexity. The
budget_tokensparameter is deprecated on Opus 4.6.
Evolution Timeline
Section titled “Evolution Timeline”| Version | Thinking Approach | Control Method |
|---|---|---|
| Opus 4.5 (pre-v2.0.67) | Opt-in, keyword-triggered (~4K/10K/32K tokens) | Prompt keywords |
| Opus 4.5 (v2.0.67+) | Always-on at max budget | Alt+T toggle, /config |
| Opus 4.6 (Feb 2026) | Adaptive thinking (dynamic depth) | effort parameter (API), Alt+T (CLI) |
Adaptive Thinking (Opus 4.6)
Section titled “Adaptive Thinking (Opus 4.6)”How it works: The effort parameter controls the model’s overall computational budget — not just thinking tokens, but the entire response including text generation and tool calls. The model dynamically allocates this budget based on query complexity.
Key insight: effort affects everything, even when thinking is disabled. Lower effort = fewer tool calls, more concise text. Higher effort = more tool calls with explanations, detailed analysis.
Effort levels (API only, official descriptions):
max: Maximum capability, no constraints. Opus 4.6 only (returns error on other models). Cross-system reasoning, irreversible decisions.Example:
"Analyze the microservices event pipeline for race conditions across order-service, inventory-service, and notification-service"high(default): Complex reasoning, coding, agentic tasks. Best for production workflows requiring deep analysis.Example:
"Redesign error handling in the payment module: add retry logic, partial failure recovery, and idempotency guarantees"medium: Balance between speed, cost, and performance. Good for agentic tasks with moderate complexity.Example:
"Convert fetchUser() in api/users.ts from callbacks to async/await"low: Most efficient. Ideal for classification, lookups, sub-agents, or tasks where speed matters more than depth.Example:
"Rename getUserById to findUserById across src/"
See Section 2.5 Model Selection & Thinking Guide for a complete decision table with effort, model, and cost estimates.
API syntax:
response = client.messages.create( model="claude-opus-4-6", max_tokens=16000, output_config={"effort": "medium"}, # low|medium|high|max messages=[{"role": "user", "content": "Analyze..."}])Effort and Tool Use:
The effort parameter significantly impacts how Claude uses tools:
loweffort: Combines operations to minimize tool calls. No explanatory preamble before actions. Faster, more efficient for simple tasks.higheffort: More tool calls with detailed explanations. Describes the plan before executing. Provides comprehensive summaries after operations. Better for complex workflows requiring transparency.
Example: With low effort, Claude might read 3 files and edit them in one flow. With high effort, Claude explains why it’s reading those files, what it’s looking for, then provides a detailed summary of changes made.
Relationship between effort and thinking:
- Opus 4.6:
effortis the recommended control for thinking depth. Thebudget_tokensparameter is deprecated on 4.6 (though still functional for backward compatibility). - Opus 4.5:
effortworks in parallel withbudget_tokens. Both parameters are supported and affect different aspects of the response. - Without thinking enabled:
effortstill controls text generation and tool calls. It’s not a thinking-only parameter.
CLI usage: Three methods to control effort level in Claude Code:
/modelcommand with left/right arrow keys to adjust the effort slider (low,medium,high)CLAUDE_CODE_EFFORT_LEVELenvironment variable (set before launching Claude)effortLevelfield in settings.json (persistent across sessions)
Alt+T toggles thinking on/off globally (separate from effort level).
Controlling Thinking Mode
Section titled “Controlling Thinking Mode”| Method | Opus 4.5 | Opus 4.6 | Persistence |
|---|---|---|---|
| Alt+T (Option+T on macOS) | Toggle on/off | Toggle on/off | Current session |
| /config → Thinking mode | Enable/disable globally | Enable/disable globally | Across sessions |
/model slider (left/right arrows) | low|medium|high | low|medium|high | Current session |
CLAUDE_CODE_EFFORT_LEVEL env var | low|medium|high | low|medium|high | Shell session |
effortLevel in settings.json | low|medium|high | low|medium|high | Permanent |
| Ctrl+O | View thinking blocks | View thinking blocks | Display only |
Cost Implications
Section titled “Cost Implications”Thinking tokens are billed. With adaptive thinking:
- Opus 4.6: Thinking usage varies dynamically (less predictable than fixed budget)
- Simple tasks: Consider Alt+T to disable → faster responses, lower cost
- Complex tasks: Leave enabled → better reasoning, adaptive depth
- Sonnet/Haiku: No extended thinking available (Opus 4.5/4.6 only)
Migration for Existing Users
Section titled “Migration for Existing Users”Before (no longer needed):
claude -p "Ultrathink. Analyze this architecture."After (thinking is already max by default):
claude -p "Analyze this architecture."To disable thinking for simple tasks: Press Alt+T before sending, or use Sonnet.
Legacy Keywords Reference
Section titled “Legacy Keywords Reference”These keywords were functional before v2.0.67. They are now recognized visually but have no behavioral effect.
| Keyword | Previous Effect | Current Effect |
|---|---|---|
| ”Think” | ~4K tokens | Cosmetic only |
| ”Think hard” | ~10K tokens | Cosmetic only |
| ”Ultrathink” | ~32K tokens | Cosmetic only |
API Breaking Changes (Opus 4.6)
Section titled “API Breaking Changes (Opus 4.6)”Removed features:
assistant-prefill: Deprecated on Opus 4.6. Previously allowed pre-filling Claude’s response to guide output format. Now unsupported — use system prompts or examples instead.
New features:
- Fast mode API: Add
speed: "fast"+ beta headerfast-mode-2026-02-01for 2.5x faster responses (6x cost)response = client.messages.create(model="claude-opus-4-6",speed="fast", # 2.5x faster, 6x priceheaders={"anthropic-beta": "fast-mode-2026-02-01"},messages=[...])
Migration:
- If using
assistant-prefill: Replace with explicit instructions in system prompt - For speed: Use fast mode API or
/fastcommand in CLI
Example: Using the Trinity
Section titled “Example: Using the Trinity”You: /plan
Let's analyze this legacy authentication system before we touch anything.[Thinking mode is enabled by default with Opus 4.5 - no keyword needed]
[Claude enters Plan Mode and does deep analysis]
Claude: I've analyzed the auth system. Here's what I found:- 47 files depend on the current auth module- 3 critical security issues- Migration path needs 4 phases
Ready to implement?
You: /executeLet's start with phase 19.2 Composition Patterns
Section titled “9.2 Composition Patterns”Multi-Agent Delegation
Section titled “Multi-Agent Delegation”Launch multiple agents for different aspects:
You: For this feature, I need:1. Backend architect to design the API2. Security reviewer to audit the design3. Test engineer to plan the tests
Run these in parallel.Claude will coordinate:
- Backend architect designs API
- Security reviewer audits (in parallel)
- Test engineer plans tests (in parallel)
Skill Stacking
Section titled “Skill Stacking”Combine multiple skills for complex tasks:
# code-reviewer.mdskills: - security-guardian - performance-patterns - accessibility-checkerThe reviewer now has all three knowledge domains.
The “Rev the Engine” Pattern
Section titled “The “Rev the Engine” Pattern”For quality work, use multiple rounds of critique:
You: Write the function, then critique it, then improve it.Do this 3 times.
Round 1: [Initial implementation]Critique: [What's wrong]Improvement: [Better version]
Round 2: [Improved implementation]Critique: [What's still wrong]Improvement: [Even better version]
Round 3: [Final implementation]Final check: [Verification]The “Stack Maximum” Pattern
Section titled “The “Stack Maximum” Pattern”For critical work, combine everything:
1. Plan Mode + Extended Thinking → Deep exploration2. Multiple Agents → Specialized analysis3. Sequential Thinking → Structured reasoning4. Rev the Engine → Iterative improvement5. Code Review Agent → Final validation9.3 CI/CD Integration
Section titled “9.3 CI/CD Integration”Headless Mode
Section titled “Headless Mode”Run Claude Code without interactive prompts:
# Basic headless executionclaude -p "Run the tests and report results"
# With timeoutclaude -p --timeout 300 "Build the project"
# With specific modelclaude -p --model sonnet "Analyze code quality"Unix Piping Workflows
Section titled “Unix Piping Workflows”Claude Code supports Unix pipe operations, enabling powerful shell integration for automated code analysis and transformation.
How piping works:
# Pipe content to Claude with a promptcat file.txt | claude -p 'analyze this code'
# Pipe command output for analysisgit diff | claude -p 'explain these changes'
# Chain commands with Claudenpm test 2>&1 | claude -p 'summarize test failures and suggest fixes'Common patterns:
-
Code review automation:
Terminal window git diff main...feature-branch | claude -p 'Review this diff for security issues' -
Log analysis:
Terminal window tail -n 100 /var/log/app.log | claude -p 'Find the root cause of errors' -
Test output parsing:
Terminal window npm test 2>&1 | claude -p 'Create a summary of failing tests with priority order' -
Documentation generation:
Terminal window cat src/api/*.ts | claude -p 'Generate API documentation in Markdown' -
Batch file analysis:
Terminal window find . -name "*.js" -exec cat {} \; | claude -p 'Identify unused dependencies'
Using with --output-format:
# Get structured JSON outputgit status --short | claude -p 'Categorize changes' --output-format json
# Stream JSON for real-time processingcat large-file.txt | claude -p 'Analyze line by line' --output-format stream-jsonBest practices:
-
Be specific: Clear prompts yield better results
Terminal window # Good: Specific taskgit diff | claude -p 'List all function signature changes'# Less effective: Vague requestgit diff | claude -p 'analyze this' -
Limit input size: Pipe only relevant content to avoid context overload
Terminal window # Good: Filtered scopegit diff --name-only | head -n 10 | xargs cat | claude -p 'review'# Risky: Could exceed contextcat entire-codebase/* | claude -p 'review' -
Use non-interactive mode: Add
-pfor automationTerminal window cat file.txt | claude -p -p 'fix linting errors' > output.txt -
Combine with jq for JSON: Parse Claude’s JSON output
Terminal window echo "const x = 1" | claude -p 'analyze' --output-format json | jq '.suggestions[]'
Output format control:
The --output-format flag controls Claude’s response format:
| Format | Use Case | Example |
|---|---|---|
text | Human-readable output (default) | claude -p 'explain' --output-format text |
json | Machine-parseable structured data | claude -p 'analyze' --output-format json |
stream-json | Real-time streaming for large outputs | claude -p 'transform' --output-format stream-json |
Example JSON workflow:
# Get structured analysisgit log --oneline -10 | claude -p 'Categorize commits by type' --output-format json
# Output:# {# "categories": {# "features": ["add user auth", "new dashboard"],# "fixes": ["fix login bug", "resolve crash"],# "chores": ["update deps", "refactor tests"]# },# "summary": "10 commits: 2 features, 2 fixes, 6 chores"# }Integration with build scripts (package.json):
{ "scripts": { "claude-review": "git diff main | claude -p 'Review for security issues' --output-format json > review.json", "claude-test-summary": "npm test 2>&1 | claude -p -p 'Summarize failures and suggest fixes'", "claude-docs": "cat src/**/*.ts | claude -p 'Generate API documentation' > API.md", "precommit-check": "git diff --cached | claude -p -p 'Check for secrets or anti-patterns' && git diff --cached | prettier --check" }}CI/CD integration example:
name: AI Code Reviewon: [pull_request]
jobs: claude-review: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0
- name: Install Claude Code run: npm install -g @anthropic-ai/claude-code
- name: Run Claude Review env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} run: | git diff origin/main...HEAD | \ claude -p -p 'Review this PR diff for security issues, performance problems, and code quality. Format as JSON.' \ --output-format json > review.json
- name: Comment on PR uses: actions/github-script@v7 with: script: | const fs = require('fs'); const review = JSON.parse(fs.readFileSync('review.json', 'utf8')); github.rest.issues.createComment({ issue_number: context.issue.number, owner: context.repo.owner, repo: context.repo.repo, body: `## 🤖 Claude Code Review\n\n${review.summary}` });Limitations:
- Context size: Large pipes may exceed token limits (monitor with
/status) - Interactive prompts: Use
-pfor automation to avoid blocking - Error handling: Pipe failures don’t always propagate; add
set -efor strict mode - API costs: Automated pipes consume API credits; monitor usage with
ccusage
💡 Pro tip: Combine piping with aliases for frequently used patterns:
Terminal window # Add to ~/.bashrc or ~/.zshrcalias claude-review='git diff | claude -p "Review for bugs and suggest improvements"'alias claude-logs='tail -f /var/log/app.log | claude -p "Monitor for errors and alert on critical issues"'
Git Hooks Integration
Section titled “Git Hooks Integration”Windows Note: Git hooks run in Git Bash on Windows, so the bash syntax below works. Alternatively, you can create
.cmdor.ps1versions and reference them from a wrapper script.
Pre-commit hook:
#!/bin/bash# Run Claude Code for commit message validationCOMMIT_MSG=$(cat "$1")claude -p "Is this commit message good? '$COMMIT_MSG'. Reply YES or NO with reason."Pre-push hook:
#!/bin/bash# Security check before pushclaude -p "Scan staged files for secrets and security issues. Exit 1 if found."EXIT_CODE=$?
if [ $EXIT_CODE -ne 0 ]; then echo "Security issues found. Push blocked." exit 1fiGitHub Actions Integration
Section titled “GitHub Actions Integration”name: Claude Code Review
on: pull_request: types: [opened, synchronize]
jobs: review: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
- name: Install Claude Code run: npm install -g @anthropic-ai/claude-code
- name: Run Review env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} run: | claude -p "Review the changes in this PR. \ Focus on security, performance, and code quality. \ Output as markdown."Debugging Failed CI Runs
Section titled “Debugging Failed CI Runs”When GitHub Actions fails, use the gh CLI to investigate without leaving your terminal:
Quick investigation workflow:
# List recent workflow runsgh run list --limit 10
# View specific run detailsgh run view <run-id>
# View logs for failed rungh run view <run-id> --log-failed
# Download logs for detailed analysisgh run download <run-id>Common debugging commands:
| Command | Purpose |
|---|---|
gh run list --workflow=test.yml | Filter by workflow file |
gh run view --job=<job-id> | View specific job details |
gh run watch | Watch the current run in real-time |
gh run rerun <run-id> | Retry a failed run |
gh run rerun <run-id> --failed | Retry only failed jobs |
Example: Investigate test failures:
# Get the latest failed runFAILED_RUN=$(gh run list --status failure --limit 1 --json databaseId --jq '.[0].databaseId')
# View the failuregh run view $FAILED_RUN --log-failed
# Ask Claude to analyzegh run view $FAILED_RUN --log-failed | claude -p "Analyze this CI failure and suggest fixes"Pro tip: Combine with Claude Code for automated debugging:
# Fetch failures and auto-fixgh run view --log-failed | claude -p " Analyze these test failures. Identify the root cause. Propose fixes for each failing test. Output as actionable steps."This workflow saves time compared to navigating GitHub’s web UI and enables faster iteration on CI failures.
Verify Gate Pattern
Section titled “Verify Gate Pattern”Before creating a PR, ensure all local checks pass. This prevents wasted CI cycles and review time.
The pattern:
Build ✓ → Lint ✓ → Test ✓ → Type-check ✓ → THEN create PRImplementation as a command (.claude/commands/complete-task.md):
# Complete Task
Run the full verification gate before creating a PR:
1. **Build**: Run `pnpm build` - must succeed2. **Lint**: Run `pnpm lint` - must have zero errors3. **Test**: Run `pnpm test` - all tests must pass4. **Type-check**: Run `pnpm typecheck` - no type errors
If ANY step fails:- Stop immediately- Report what failed and why- Suggest fixes- Do NOT proceed to PR creation
If ALL steps pass:- Create the PR with `gh pr create`- Wait for CI with `gh pr checks --watch`- If CI fails, fetch feedback and auto-fix- Loop until mergeable or blockedAutonomous retry loop:
┌─────────────────────────────────────────┐│ VERIFY GATE + AUTO-FIX │├─────────────────────────────────────────┤│ ││ Local checks (build/lint/test) ││ │ ││ ▼ FAIL? ││ ┌─────────┐ ││ │ Auto-fix│ ──► Re-run checks ││ └─────────┘ ││ │ ││ ▼ PASS ││ Create PR ││ │ ││ ▼ ││ Wait for CI (gh pr checks --watch) ││ │ ││ ▼ FAIL? ││ ┌─────────────────────┐ ││ │ Fetch CI feedback │ ││ │ (CodeRabbit, etc.) │ ││ └─────────────────────┘ ││ │ ││ ▼ ││ Auto-fix + push + loop ││ │ ││ ▼ ││ PR mergeable OR blocked (ask human) ││ │└─────────────────────────────────────────┘Fetching CI feedback (GitHub GraphQL):
# Get PR review status and commentsgh api graphql -f query=' query($pr: Int!) { repository(owner: "OWNER", name: "REPO") { pullRequest(number: $pr) { reviewDecision reviewThreads(first: 100) { nodes { isResolved comments(first: 1) { nodes { body } } } } } } }' -F pr=$PR_NUMBERInspired by Nick Tune’s Coding Agent Development Workflows
Release Notes Generation
Section titled “Release Notes Generation”Automate release notes and changelog generation using Claude Code.
Why automate release notes?
- Consistent format across releases
- Captures technical details from commits
- Translates technical changes to user-facing language
- Saves 30-60 minutes per release
Pattern: Git commits → Claude analysis → User-friendly release notes
Approach 1: Command-Based
Section titled “Approach 1: Command-Based”Create .claude/commands/release-notes.md:
# Generate Release Notes
Analyze git commits since last release and generate release notes.
## Process
1. **Get commits since last tag**: ```bash git log $(git describe --tags --abbrev=0)..HEAD --oneline-
Read full commit details:
- Include commit messages
- Include file changes
- Include PR numbers if present
-
Categorize changes:
- ✨ Features - New functionality
- 🐛 Bug Fixes - Issue resolutions
- ⚡ Performance - Speed/efficiency improvements
- 🔒 Security - Security patches
- 📝 Documentation - Doc updates
- 🔧 Maintenance - Refactoring, dependencies
- ⚠️ Breaking Changes - API changes (highlight prominently)
-
Generate three versions:
A. CHANGELOG.md format (technical, for developers):
## [Version] - YYYY-MM-DD### Added- Feature description with PR reference### Fixed- Bug fix description### Changed- Breaking change with migration guideB. GitHub Release Notes (balanced, technical + context):
## What's NewBrief summary of the release### ✨ New Features- User-facing feature description### 🐛 Bug Fixes- Issue resolution description### ⚠️ Breaking Changes- Migration instructions**Full Changelog**: v1.0.0...v1.1.0C. User Announcement (non-technical, benefits-focused):
We're excited to announce [Version]!**Highlights**:- What users can now do- How it helps them- When to use it[Link to full release notes] -
Output files:
- Prepend to
CHANGELOG.md - Save to
release-notes-[version].md - Copy “User Announcement” to clipboard for Slack/blog
- Prepend to
Verification
Section titled “Verification”- Check for missed breaking changes
- Verify all PR references are valid
- Ensure migration guides are clear
#### Approach 2: CI/CD Automation
Add to `.github/workflows/release.yml`:
```yamlname: Release
on: push: tags: - 'v*'
jobs: release: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: fetch-depth: 0 # Full history for changelog
- name: Generate Release Notes env: ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} run: | # Get version from tag VERSION=${GITHUB_REF#refs/tags/}
# Generate with Claude claude -p "Generate release notes for $VERSION. \ Analyze commits since last tag. \ Output in GitHub Release format. \ Save to release-notes.md"
# Create GitHub Release gh release create $VERSION \ --title "Release $VERSION" \ --notes-file release-notes.md
- name: Update CHANGELOG.md run: | # Prepend to CHANGELOG cat release-notes.md CHANGELOG.md > CHANGELOG.tmp mv CHANGELOG.tmp CHANGELOG.md
# Commit back git config user.name "github-actions[bot]" git config user.email "github-actions[bot]@users.noreply.github.com" git add CHANGELOG.md git commit -m "docs: update changelog for $VERSION" git pushApproach 3: Interactive Workflow
Section titled “Approach 3: Interactive Workflow”For more control, use an interactive session:
# 1. Start Claude Codeclaude
# 2. Request release notesYou: "Generate release notes for v2.0.0"
# 3. Claude will:# - Run git log to get commits# - Ask clarifying questions:# - "Is this a major/minor/patch release?"# - "Any breaking changes users should know?"# - "Target audience for announcement?"
# 4. Review and refineYou: "Add more detail to the authentication feature"
# 5. FinalizeYou: "Save these notes and update CHANGELOG.md"Best Practices
Section titled “Best Practices”Before generation:
- ✅ Ensure commits follow conventional commits format
- ✅ All PRs have been merged
- ✅ Version number decided (semver)
During generation:
- ✅ Review for accuracy (Claude might miss context)
- ✅ Add migration guides for breaking changes
- ✅ Include upgrade instructions if needed
After generation:
- ✅ Cross-reference with closed issues/PRs
- ✅ Test upgrade path on a staging project
- ✅ Share draft with team before publishing
Example Output
Section titled “Example Output”Given these commits:
feat: add user avatar upload (PR #123)fix: resolve login timeout issue (PR #124)perf: optimize database queries by 40% (PR #125)BREAKING: change API endpoint from /api/v1 to /v2 (PR #126)Claude generates:
CHANGELOG.md (technical):
## [2.0.0] - 2025-01-10
### Added- User avatar upload functionality (#123)
### Fixed- Login timeout issue affecting mobile users (#124)
### Performance- Optimized database queries, reducing load time by 40% (#125)
### Breaking Changes- **API Endpoints**: Migrated from `/api/v1/*` to `/v2/*` - Update client code: replace `/api/v1/` with `/v2/` - Old endpoints will return 410 Gone after 2025-02-01 - Migration guide: docs/migration-v2.md (#126)GitHub Release (balanced):
## What's New in v2.0.0
This release brings performance improvements, bug fixes, and a new avatar feature.
### ✨ New Features- **Avatar Upload**: Users can now upload custom profile pictures
### 🐛 Bug Fixes- Fixed login timeout issue that affected some mobile users
### ⚡ Performance- Database queries are now 40% faster
### ⚠️ Breaking Changes- **API Endpoint Migration**: All endpoints have moved from `/api/v1` to `/v2` - **Action Required**: Update your API client code - **Timeline**: Old endpoints will stop working on February 1, 2025 - **Migration Guide**: [See docs/migration-v2.md](./docs/migration-v2.md)
**Full Changelog**: v1.9.0...v2.0.0User Announcement (non-technical):
📢 Version 2.0 is here!
We've made your experience faster and more personal:
✨ **Customize Your Profile** - Upload your own avatar⚡ **Lightning Fast** - Pages load 40% faster🐛 **More Reliable** - Fixed the login timeout issue
**For Developers**: This is a breaking release. See our migration guide for API changes.
[Read full release notes →]Common Issues
Section titled “Common Issues”“Release notes are too technical”
- Solution: Specify audience in prompt: “Generate for non-technical users”
“Claude missed a breaking change”
- Solution: Explicitly list breaking changes in prompt
- Better: Use “BREAKING:” prefix in commit messages
“Generated notes are generic”
- Solution: Provide more context: “This release focuses on mobile performance”
“Commits are messy/unclear”
- Solution: Clean up commit history before generation (interactive rebase)
- Better: Enforce commit message format with git hooks
Deployment Automation
Section titled “Deployment Automation”Claude Code can automate deployments to Vercel, GCP, and other platforms using stored credentials. The key is assembling three components: secret management, a deploy skill, and mandatory guardrails.
Required secrets
Section titled “Required secrets”Store credentials in the OS keychain rather than .env files:
# Vercel deployment (3 required variables)security add-generic-password -a claude -s VERCEL_TOKEN -w "your_token"security add-generic-password -a claude -s VERCEL_ORG_ID -w "your_org_id"security add-generic-password -a claude -s VERCEL_PROJECT_ID -w "your_project_id"
# Retrieve in scriptsVERCEL_TOKEN=$(security find-generic-password -s VERCEL_TOKEN -w)For multi-platform secrets (GitHub, Vercel, AWS simultaneously), Infisical provides centralized management with versioning and point-in-time recovery — a useful open-source alternative to HashiCorp Vault:
# Install Infisical CLIbrew install infisical/get-cli/infisical
# Inject secrets into Claude Code sessioninfisical run -- claude# Infisical automatically sets all project secrets as env varsDeployment skill
Section titled “Deployment skill”Create a skill that encapsulates the full deploy workflow:
---name: deploy-to-verceldescription: Deploy to Vercel staging then production with smoke testsallowed-tools: Bash---
## Deploy Workflow
1. Run tests: `pnpm test` — stop if any fail2. Build: `pnpm build` — stop if build fails3. Deploy to staging: `vercel deploy`4. Run smoke tests against staging URL5. **PAUSE** — output staging URL and ask for human confirmation before production6. On approval: `vercel deploy --prod`7. Verify production URL responds with HTTP 200Non-negotiable guardrails
Section titled “Non-negotiable guardrails”These guardrails are not optional. Production deployments without them create incidents:
| Guardrail | Implementation | Why |
|---|---|---|
| Staging-first | Always deploy to staging before prod | Catch environment-specific failures |
| Human confirmation | Stop and ask before --prod flag | No autonomous production deploys |
| Smoke test | Verify HTTP 200 on key endpoints after deploy | Catch silent deployment failures |
| Rollback ready | Keep previous deployment ID before promoting | vercel rollback <deployment-id> |
Hook for confirmation (prevent accidental production deploys):
{ "hooks": { "PreToolUse": [{ "matcher": "Bash", "hooks": [{ "type": "command", "command": "scripts/check-prod-deploy.sh" }] }] }}#!/bin/bash# check-prod-deploy.sh — exit 2 to block, exit 0 to allowINPUT=$(cat)if echo "$INPUT" | grep -q "vercel deploy --prod\|gcloud deploy.*production"; then echo "BLOCKED: Production deploy requires manual confirmation. Run the command directly from your terminal." exit 2fiexit 0Sources: Vercel deploy skill pattern documented by the community (lobehub.com, haniakrim21); Infisical multi-platform secrets management at infisical.com. No end-to-end automated deploy workflow exists in the community as of March 2026 — the building blocks are available but the staging-to-production promotion pattern is something each team assembles themselves.
9.4 IDE Integration
Section titled “9.4 IDE Integration”VS Code Integration
Section titled “VS Code Integration”Claude Code integrates with VS Code:
- Install Extension: Search “Claude Code” in Extensions
- Configure: Set API key in settings
- Use:
Ctrl+Shift+P→ “Claude Code: Start Session”- Select text → Right-click → “Ask Claude”
JetBrains Integration
Section titled “JetBrains Integration”Works with IntelliJ, WebStorm, PyCharm:
- Install Plugin: Settings → Plugins → “Claude Code”
- Configure: Tools → Claude Code → Set API key
- Use:
Ctrl+Shift+A→ “Claude Code”- Tool window for persistent session
Xcode Integration (Feb 2026)
Section titled “Xcode Integration (Feb 2026)”New: Xcode 26.3 RC+ includes native Claude Agent SDK support, using the same harness as Claude Code:
- Requirements: Xcode 26.3 RC or later (macOS)
- Setup: Configure API key in Xcode → Preferences → Claude
- Use:
- Built-in code assistant powered by Claude
- Same capabilities as Claude Code CLI
- Native integration with Xcode workflows
Claude Agent SDK: Separate product from Claude Code, but shares the same agent execution framework. Enables Claude-powered development tools in IDEs beyond VS Code.
Note: Claude Agent SDK is not Claude Code — it’s Anthropic’s framework for building agent-powered developer tools. Claude Code CLI and Xcode integration both use this SDK.
Terminal Integration
Section titled “Terminal Integration”For terminal-native workflow:
macOS/Linux (Bash/Zsh)
Section titled “macOS/Linux (Bash/Zsh)”# Add to .bashrc or .zshrcalias cc='claude'alias ccp='claude --plan'alias cce='claude --execute'
# Quick code questioncq() { claude -p "$*"}Usage:
cq "What does this regex do: ^[a-z]+$"Windows (PowerShell)
Section titled “Windows (PowerShell)”# Add to $PROFILE (run: notepad $PROFILE to edit)function cc { claude $args }function ccp { claude --plan $args }function cce { claude --execute $args }
function cq { param([Parameter(ValueFromRemainingArguments)]$question) claude -p ($question -join ' ')}To find your profile location: echo $PROFILE
Common locations:
C:\Users\YourName\Documents\PowerShell\Microsoft.PowerShell_profile.ps1C:\Users\YourName\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1
If the file doesn’t exist, create it:
New-Item -Path $PROFILE -Type File -Force9.5 Tight Feedback Loops
Section titled “9.5 Tight Feedback Loops”Reading time: 5 minutes Skill level: Week 1+
Tight feedback loops accelerate learning and catch issues early. Design your workflow to validate changes immediately.
The Feedback Loop Pyramid
Section titled “The Feedback Loop Pyramid” ┌─────────────┐ │ Deploy │ ← Hours/Days │ Tests │ ├─────────────┤ │ CI/CD │ ← Minutes │ Pipeline │ ├─────────────┤ │ Local │ ← Seconds │ Tests │ ├─────────────┤ │ TypeCheck │ ← Immediate │ Lint │ └─────────────┘Implementing Tight Loops
Section titled “Implementing Tight Loops”Level 1: Immediate (IDE/Editor)
Section titled “Level 1: Immediate (IDE/Editor)”# Watch mode for instant feedbackpnpm tsc --watchpnpm lint --watchLevel 2: On-Save (Git Hooks)
Section titled “Level 2: On-Save (Git Hooks)”# Pre-commit hook#!/bin/bashpnpm lint-staged && pnpm tsc --noEmitLevel 3: On-Commit (CI)
Section titled “Level 3: On-Commit (CI)”# GitHub Action for PR checks- run: pnpm lint && pnpm tsc && pnpm testClaude Code Integration
Section titled “Claude Code Integration”Use hooks for automatic validation:
{ "hooks": { "PostToolUse": [{ "matcher": "Edit|Write", "hooks": ["./scripts/validate.sh"] }] }}validate.sh:
#!/bin/bash# Run after every file changeFILE=$(echo "$TOOL_INPUT" | jq -r '.file_path // .file')if [[ "$FILE" == *.ts || "$FILE" == *.tsx ]]; then npx tsc --noEmit "$FILE" 2>&1 | head -5fiFeedback Loop Checklist
Section titled “Feedback Loop Checklist”| Loop | Trigger | Response Time | What It Catches |
|---|---|---|---|
| Lint | On type | <1s | Style, imports |
| TypeCheck | On save | 1-3s | Type errors |
| Unit tests | On save | 5-15s | Logic errors |
| Integration | On commit | 1-5min | API contracts |
| E2E | On PR | 5-15min | User flows |
💡 Tip: Faster loops catch more bugs. Invest in making your test suite fast.
Background Tasks for Fullstack Development
Section titled “Background Tasks for Fullstack Development”Problem: Fullstack development often requires long-running processes (dev servers, watchers) that block the main Claude session, preventing iterative frontend work.
Solution: Use Ctrl+B to background tasks and maintain tight feedback loops across the stack.
When to Background Tasks
Section titled “When to Background Tasks”| Scenario | Background Command | Why |
|---|---|---|
| Dev server running | pnpm dev → Ctrl+B | Keeps server alive while iterating on frontend |
| Test watcher | pnpm test --watch → Ctrl+B | Monitor test results while coding |
| Build watcher | pnpm build --watch → Ctrl+B | Detect build errors without blocking session |
| Database migration | pnpm migrate → Ctrl+B | Long-running migration, work on other features |
| Docker compose | docker compose up → Ctrl+B | Infrastructure running, develop application |
Fullstack Workflow Pattern
Section titled “Fullstack Workflow Pattern”# 1. Start backend dev serverpnpm dev:backend# Press Ctrl+B to background
# 2. Now Claude can iterate on frontend"Update the login form UI to match Figma designs"# Claude can read files, make changes, all while backend runs
# 3. Check server logs when needed/tasks # View background task status
# 4. Bring server back to foreground if needed# (Currently: no built-in foreground command, restart if needed)Real-World Example: API + Frontend Iteration
Section titled “Real-World Example: API + Frontend Iteration”Traditional (blocked) flow:
$ pnpm dev:backend# Server starts... Claude waits... session blocked# Cannot iterate on frontend until server stops# Kill server → work on frontend → restart server → repeatBackground task flow:
$ pnpm dev:backend# Server starts...$ Ctrl+B # Background the server# Claude is now free to work
"Add loading state to the API calls"# Claude iterates on frontend# Backend still running, can test immediately# Tight feedback loop maintainedContext Rot Prevention
Section titled “Context Rot Prevention”Problem: Long-running background tasks can cause context rot—Claude loses awareness of what’s running.
Solution: Check task status periodically:
# Before major changes/tasks
# Output example:# Task 1 (background): pnpm dev:backend# Status: Running (35 minutes)# Last output: Server listening on :3000Best practices:
- Background tasks at session start (setup phase)
- Check
/tasksbefore major architecture changes - Restart backgrounded tasks if context is lost
- Use descriptive commands (
pnpm dev:backendnot justnpm run dev)
Limitations
Section titled “Limitations”- No foreground command: Cannot bring tasks back to foreground (yet)
- Context loss: Long-running tasks may lose relevance to current work
- Output not streamed: Background task output not visible unless checked
- Session-scoped: Background tasks tied to Claude session, killed on exit
Workaround for foreground: If you need to interact with a backgrounded task, restart it in foreground:
# Can't foreground task directly# Instead: check status, then restart if needed/tasks # See what's running# Ctrl+C to stop current session interaction# Restart the command you need in foregroundIntegration with Teleportation
Section titled “Integration with Teleportation”When using session teleportation (web → local), background tasks are not transferred:
- Web sessions cannot background tasks
- Teleported sessions start with clean slate
- Restart required dev servers after teleportation
Teleport workflow:
# 1. Teleport session from web to localclaude --teleport
# 2. Restart dev environmentpnpm dev:backendCtrl+B # Background
# 3. Continue work locally with full feedback loopsMonitoring Background Tasks
Section titled “Monitoring Background Tasks”/tasks # View all background tasks
# Output includes:# - Task ID# - Command run# - Runtime duration# - Recent output (last few lines)# - Status (running, completed, failed)Use /tasks when:
- Starting new feature work (verify infrastructure running)
- Debugging (check for error output in background tasks)
- Before committing (ensure tests passed in background)
- Session feels slow (check if background tasks consuming resources)
Disabling Background Tasks
Section titled “Disabling Background Tasks”# Environment variable (v2.1.4+)export CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=trueclaude
# Useful when:# - Debugging Claude Code itself# - Running in resource-constrained environments# - Avoiding accidental backgrounding💡 Key insight: Background tasks optimize fullstack workflows by decoupling infrastructure (servers, watchers) from iterative development. Use them strategically to maintain tight feedback loops across the entire stack.
9.6 Todo as Instruction Mirrors
Section titled “9.6 Todo as Instruction Mirrors”Reading time: 5 minutes Skill level: Week 1+
TodoWrite isn’t just tracking—it’s an instruction mechanism. Well-crafted todos guide Claude’s execution.
The Mirror Principle
Section titled “The Mirror Principle”What you write as a todo becomes Claude’s instruction:
❌ Vague Todo → Vague Execution"Fix the bug"
✅ Specific Todo → Precise Execution"Fix null pointer in getUserById when user not found - return null instead of throwing"Todo as Specification
Section titled “Todo as Specification”## Effective Todo Pattern
- [ ] **What**: Create user validation function- [ ] **Where**: src/lib/validation.ts- [ ] **How**: Use Zod schema with email, password rules- [ ] **Verify**: Test with edge cases (empty, invalid format)Todo Granularity Guide
Section titled “Todo Granularity Guide”| Task Complexity | Todo Granularity | Example |
|---|---|---|
| Simple fix | 1-2 todos | ”Fix typo in header component” |
| Feature | 3-5 todos | Auth flow steps |
| Epic | 10+ todos | Full feature with tests |
Instruction Embedding
Section titled “Instruction Embedding”Embed constraints directly in todos:
## Bad- [ ] Add error handling
## Good- [ ] Add error handling: try/catch around API calls, log errors with context, return user-friendly messages, use existing ErrorBoundary componentTodo Templates
Section titled “Todo Templates”Bug Fix:
- [ ] Reproduce: [steps to reproduce]- [ ] Root cause: [investigation findings]- [ ] Fix: [specific change needed]- [ ] Verify: [test command or manual check]Feature:
- [ ] Design: [what components/functions needed]- [ ] Implement: [core logic]- [ ] Tests: [test coverage expectations]- [ ] Docs: [if public API]9.7 Output Styles
Section titled “9.7 Output Styles”Reading time: 5 minutes Skill level: Week 1+
Control how Claude responds to match your workflow preferences.
Output Style Spectrum
Section titled “Output Style Spectrum”← Minimal Verbose →───────────────────────────────────────────────────────Code only | Code + comments | Explanations | TutorialStyle Directives
Section titled “Style Directives”Add to CLAUDE.md or prompt:
Minimal (Expert Mode):
Output code only. No explanations unless asked.Assume I understand the codebase.Balanced:
Explain significant decisions. Comment complex logic.Skip obvious explanations.Verbose (Learning Mode):
Explain each step. Include alternatives considered.Link to documentation for concepts used.Context-Aware Styles
Section titled “Context-Aware Styles”## In CLAUDE.md
### Output Preferences- **Code reviews**: Detailed, cite specific lines- **Bug fixes**: Minimal, show diff only- **New features**: Balanced, explain architecture decisions- **Refactoring**: Minimal, trust my reviewFormat Control
Section titled “Format Control”For code:
Format code output as:- Full file with changes marked: // CHANGED- Diff format for reviews- Inline for small changesFor explanations:
Explain using:- Bullet points for lists- Tables for comparisons- Diagrams for architectureOutput Templates
Section titled “Output Templates”Bug Fix Output:
**Root Cause**: [one line]**Fix**: [code block]**Test**: [verification command]Feature Output:
**Files Changed**: [list]**Key Decisions**: [bullet points]**Next Steps**: [if any]Mermaid Diagram Generation
Section titled “Mermaid Diagram Generation”Claude Code can generate Mermaid diagrams for visual documentation. This is useful for architecture documentation, flow visualization, and system understanding.
Supported Diagram Types
Section titled “Supported Diagram Types”| Type | Use Case | Syntax Start |
|---|---|---|
| Flowchart | Process flows, decision trees | flowchart TD |
| Sequence | API calls, interactions | sequenceDiagram |
| Class | OOP structure, relationships | classDiagram |
| ER | Database schema | erDiagram |
| State | State machines | stateDiagram-v2 |
| Gantt | Project timelines | gantt |
Request Patterns
Section titled “Request Patterns”Architecture diagram:
Generate a Mermaid flowchart showing the authentication flow:1. User submits credentials2. Server validates3. JWT issued or error returnedDatabase schema:
Create an ER diagram for our user management systemshowing User, Role, and Permission relationships.Sequence diagram:
Show me a Mermaid sequence diagram of how ourcheckout process calls payment API → inventory → notification services.Example Outputs
Section titled “Example Outputs”Flowchart:
flowchart TD A[User Request] --> B{Authenticated?} B -->|Yes| C[Process Request] B -->|No| D[Return 401] C --> E[Return Response]Sequence:
sequenceDiagram Client->>+API: POST /checkout API->>+Payment: charge() Payment-->>-API: success API->>+Inventory: reserve() Inventory-->>-API: confirmed API-->>-Client: order_idClass:
classDiagram class User { +String email +String passwordHash +login() +logout() } class Role { +String name +Permission[] permissions } User "1" --> "*" RoleWhere to Visualize
Section titled “Where to Visualize”| Platform | Support |
|---|---|
| GitHub | Native rendering in README, issues, PRs |
| VS Code | Mermaid Preview extension |
| GitLab | Native rendering |
| Notion | Code block with mermaid language |
| mermaid.live | Online editor with export |
Integration Tips
Section titled “Integration Tips”- In CLAUDE.md: Ask Claude to document architecture decisions with diagrams
- In PRs: Include sequence diagrams for complex flows
- In docs/: Generate architecture.md with embedded diagrams
- Export: Use mermaid.live to export as PNG/SVG for presentations
9.8 Vibe Coding & Skeleton Projects
Section titled “9.8 Vibe Coding & Skeleton Projects”Reading time: 5 minutes Skill level: Month 1
Vibe Coding
Section titled “Vibe Coding”“Vibe Coding” is rapid prototyping through natural conversation—exploring ideas before committing to implementation.
When to Vibe Code
Section titled “When to Vibe Code”- Early exploration: Testing if an approach works
- Proof of concept: Quick validation before full implementation
- Learning: Understanding a new library or pattern
Vibe Coding Session
Section titled “Vibe Coding Session”User: Let's explore how we could add real-time updates to the dashboard. Don't write code yet, just brainstorm approaches.
Claude: Here are three approaches we could consider:
1. **WebSockets**: Full duplex, lowest latency - Pro: Real-time push - Con: Connection management complexity
2. **Server-Sent Events (SSE)**: One-way server push - Pro: Simple, native browser support - Con: One direction only
3. **Polling**: Periodic HTTP requests - Pro: Simplest implementation - Con: Not truly real-time, more server load
User: Let's vibe on option 2. Show me a minimal SSE setup.
Claude: [Provides minimal example for exploration]Vibe Coding Rules
Section titled “Vibe Coding Rules”- No production code: This is exploration only
- Throw away freely: No attachment to vibe code
- Focus on learning: Understand the approach
- Signal clearly: “This is vibe code, not for production”
Anti-Pattern: Context Overload
Section titled “Anti-Pattern: Context Overload”Jens Rusitschka identifies “context overload” as the primary failure mode of vibe coding: dumping entire codebases into context, hoping Claude will figure it out.
Symptoms:
- Pasting 5K+ lines of code in first prompt
- “Read the entire repo and implement X”
- Expecting Claude to maintain context across 20+ file changes
- Performance degradation after context pollution (see §2.2 Fresh Context Pattern)
Why it fails:
- Attention dilution across too many files and concerns
- Lost architectural reasoning in noise
- Failed attempts accumulate, further degrading quality
- Context bleeding between unrelated tasks
The Phased Context Strategy:
Instead of big-bang context dump, use a staged approach that leverages Claude Code’s native features:
| Phase | Tool | Purpose | Context Size |
|---|---|---|---|
| 1. Exploration | /plan mode | Read-only analysis, safe investigation | Controlled (plan writes findings) |
| 2. Implementation | Normal mode | Execute planned changes | Focused (plan guides scope) |
| 3. Fresh Start | Session handoff | Reset when context >75% | Minimal (handoff doc only) |
Practical workflow:
# Phase 1: Exploration (read-only, safe)/planYou: "How should I refactor the auth system for OAuth?"Claude: [explores codebase, writes plan to .claude/plans/oauth-refactor.md]/execute # exit plan mode
# Phase 2: Implementation (focused context)You: "Execute the plan from .claude/plans/oauth-refactor.md"Claude: [reads plan, implements in focused scope]
# Phase 3: Fresh start if needed (context >75%)You: "Create session handoff document"Claude: [writes handoff to claudedocs/handoffs/oauth-implementation.md]# New session: cat claudedocs/handoffs/oauth-implementation.md | claude -pCross-references:
- Full
/planworkflow: See §2.3 Plan Mode (line 2100) - Fresh context pattern: See §2.2 Fresh Context Pattern (line 1525)
- Session handoffs: See Session Handoffs (line 2278)
The insight: Rusitschka’s “Vibe Coding, Level 2” is Claude Code’s native workflow — it just needed explicit framing as an anti-pattern antidote. Plan mode prevents context pollution during exploration, fresh context prevents accumulation during implementation, and handoffs enable clean phase transitions.
Skeleton Projects
Section titled “Skeleton Projects”Skeleton projects are minimal, working templates that establish patterns before full implementation.
Skeleton Structure
Section titled “Skeleton Structure”project/├── src/│ ├── index.ts # Entry point (working)│ ├── config.ts # Config structure (minimal)│ ├── types.ts # Core types (defined)│ └── features/│ └── example/ # One working example│ ├── route.ts│ ├── service.ts│ └── repo.ts├── tests/│ └── example.test.ts # One working test└── package.json # Dependencies definedSkeleton Principles
Section titled “Skeleton Principles”- It must run:
pnpm devworks from day 1 - One complete vertical: Full stack for one feature
- Patterns, not features: Shows HOW, not WHAT
- Minimal dependencies: Only what’s needed
Creating a Skeleton
Section titled “Creating a Skeleton”User: Create a skeleton for our new microservice. Include: - Express setup - One complete route (health check) - Database connection pattern - Test setup - Docker configuration
Claude: [Creates minimal, working skeleton with these elements]Skeleton Expansion
Section titled “Skeleton Expansion”Skeleton (Day 1) → MVP (Week 1) → Full (Month 1)────────────────────────────────────────────────────────────1 route → 5 routes → 20 routes1 test → 20 tests → 100+ testsBasic config → Env-based → Full configLocal DB → Docker DB → Production DB9.9 Batch Operations Pattern
Section titled “9.9 Batch Operations Pattern”Reading time: 5 minutes Skill level: Week 1+
Batch operations improve efficiency and reduce context usage when making similar changes across files.
When to Batch
Section titled “When to Batch”| Scenario | Batch? | Why |
|---|---|---|
| Same change in 5+ files | ✅ Yes | Efficiency |
| Related changes in 3 files | ✅ Yes | Coherence |
| Unrelated fixes | ❌ No | Risk of errors |
| Complex refactoring | ⚠️ Maybe | Depends on pattern |
Batch Patterns
Section titled “Batch Patterns”1. Import Updates
Section titled “1. Import Updates”User: Update all files in src/components to use the new Button import: - Old: import { Button } from "~/ui/button" - New: import { Button } from "~/components/ui/button"2. API Migration
Section titled “2. API Migration”User: Migrate all API calls from v1 to v2: - Change: /api/v1/* → /api/v2/* - Update response handling for new format - Files: src/services/*.ts3. Pattern Application
Section titled “3. Pattern Application”User: Add error boundaries to all page components: - Wrap each page export with ErrorBoundary - Use consistent error fallback - Files: src/pages/**/*.tsxBatch Execution Strategy
Section titled “Batch Execution Strategy”1. Identify scope → List all affected files2. Define pattern → Exact change needed3. Create template → One example implementation4. Batch apply → Apply to all files5. Verify all → Run tests, typecheckBatch with Claude
Section titled “Batch with Claude”## Effective Batch Request
"Apply this change pattern to all matching files:
**Pattern**: Add 'use client' directive to components using hooks**Scope**: src/components/**/*.tsx**Rule**: If file contains useState, useEffect, or useContext**Change**: Add 'use client' as first line
List affected files first, then make changes."9.10 Continuous Improvement Mindset
Section titled “9.10 Continuous Improvement Mindset”The goal isn’t just to use AI for coding — it’s to continuously improve the workflow so AI produces better results with less intervention.
The Key Question
Section titled “The Key Question”After every manual intervention, ask yourself:
“How can I improve the process so this error or manual fix can be avoided next time?”
Improvement Pipeline
Section titled “Improvement Pipeline”Error or manual intervention detected │ ▼Can a linting rule catch it? │ YES ─┴─ NO │ │ ▼ ▼Add lint Can it go in conventions/docs?rule │ YES ─┴─ NO │ │ ▼ ▼ Add to Accept as CLAUDE.md edge case or ADRsPractical Examples
Section titled “Practical Examples”| Problem | Solution | Where to Add |
|---|---|---|
| Agent forgets to run tests | Add to workflow command | .claude/commands/complete-task.md |
| Code review catches style issue | Add ESLint rule | .eslintrc.js |
| Same architecture mistake repeated | Document decision | docs/conventions/architecture.md |
| Agent uses wrong import pattern | Add example | CLAUDE.md |
The Mindset Shift
Section titled “The Mindset Shift”Traditional: “I write code, AI helps”
AI-native: “I improve the workflow and context so AI writes better code”
“Software engineering might be more workflow + context engineering.” — Nick Tune
This is the meta-skill: instead of fixing code, fix the system that produces the code.
Inspired by Nick Tune’s Coding Agent Development Workflows
See also: §2.5 From Chatbot to Context System — the four-layer framework (CLAUDE.md, skills, hooks, memory) that makes this mindset operational.
9.11 Common Pitfalls & Best Practices
Section titled “9.11 Common Pitfalls & Best Practices”Learn from common mistakes to avoid frustration and maximize productivity.
Security Pitfalls
Section titled “Security Pitfalls”❌ Don’t:
- Use
--dangerously-skip-permissionson production systems or sensitive codebases - Hard-code secrets in commands, config files, or CLAUDE.md
- Grant overly broad permissions like
Bash(*)without restrictions - Run Claude Code with elevated privileges (sudo/Administrator) unnecessarily
- Commit
.claude/settings.local.jsonto version control (contains API keys) - Share session IDs or logs that may contain sensitive information
- Disable security hooks during normal development
✅ Do:
- Store secrets in environment variables or secure vaults
- Start from minimal permissions and expand gradually as needed
- Audit regularly with
claude config listto review active permissions - Isolate risky operations in containers, VMs, or separate environments
- Use
.gitignoreto exclude sensitive configuration files - Review all diffs before accepting changes, especially in security-critical code
- Implement PreToolUse hooks to catch accidental secret exposure
- Use Plan Mode for exploring unfamiliar or sensitive codebases
Example Security Hook:
#!/bin/bash# .claude/hooks/PreToolUse.sh - Block secrets in commits
INPUT=$(cat)TOOL_NAME=$(echo "$INPUT" | jq -r '.tool.name')
if [[ "$TOOL_NAME" == "Bash" ]]; then COMMAND=$(echo "$INPUT" | jq -r '.tool.input.command')
# Block git commits with potential secrets if [[ "$COMMAND" == *"git commit"* ]] || [[ "$COMMAND" == *"git add"* ]]; then # Check for common secret patterns if git diff --cached | grep -E "(password|secret|api_key|token).*=.*['\"]"; then echo "❌ Potential secret detected in staged files" >&2 exit 2 # Block the operation fi fifi
exit 0 # AllowPerformance Pitfalls
Section titled “Performance Pitfalls”❌ Don’t:
- Load entire monorepo when you only need one package
- Max out thinking/turn budgets for simple tasks (wastes time and money)
- Ignore session cleanup - old sessions accumulate and slow down Claude Code
- Use deep thinking prompts for trivial edits like typo fixes
- Keep context at 90%+ for extended periods
- Load large binary files or generated code into context
- Run expensive MCP operations in tight loops
✅ Do:
- Use
--add-dirto allow tool access to directories outside the current working directory - Manage thinking mode for cost efficiency:
- Simple tasks: Alt+T to disable thinking → faster, cheaper
- Complex tasks: Leave thinking enabled (default in Opus 4.5)
- Note: Keywords like “ultrathink” no longer have effect
- Set
cleanupPeriodDaysin config to prune old sessions automatically - Use
/compactproactively when context reaches 70% - Block sensitive files with
permissions.denyin settings.json - Monitor cost with
/statusand adjust model/thinking levels accordingly - Cache expensive computations in memory with Serena MCP
Context Management Strategy:
| Context Level | Action | Why |
|---|---|---|
| 0-50% | Work freely | Optimal performance |
| 50-70% | Be selective | Start monitoring |
| 70-85% | /compact now | Prevent degradation |
| 85-95% | /compact or /clear | Significant slowdown |
| 95%+ | /clear required | Risk of errors |
Workflow Pitfalls
Section titled “Workflow Pitfalls”❌ Don’t:
- Skip project context (
CLAUDE.md) - leads to repeated corrections - Use vague prompts like “fix this” or “check my code”
- Ignore errors in logs or dismiss warnings
- Automate workflows without testing in safe environments first
- Accept changes blindly without reviewing diffs
- Work without version control or backups
- Mix multiple unrelated tasks in one session
- Forget to commit after completing tasks
✅ Do:
- Maintain and update
CLAUDE.mdregularly with:- Tech stack and versions
- Coding conventions and patterns
- Architecture decisions
- Common gotchas specific to your project
- Be specific and goal-oriented in prompts using WHAT/WHERE/HOW/VERIFY format
- Monitor via logs or OpenTelemetry when appropriate
- Test automation in dev/staging environments first
- Always review agent outputs before accepting — especially polished ones (see Artifact Paradox below)
- Use git branches for experimental changes
- Break complex tasks into focused sessions
- Commit frequently with descriptive messages
⚠️ The Artifact Paradox — Anthropic AI Fluency Index (Feb 2026)
Anthropic research on 9,830 Claude conversations reveals a critical counter-intuitive finding: when Claude produces a polished artifact (code, files, configs), users become measurably less critical, not more.
Compared to sessions without artifact production:
- −5.2pp likelihood of identifying missing context
- −3.7pp likelihood of fact-checking the output
- −3.1pp likelihood of questioning the reasoning
Users do become more directive (+14.7pp clarifying goals, +14.5pp specifying format) — but their critical evaluation drops precisely when the output looks finished.
For Claude Code, this is the nominal case. Every generated file, every written test, every created config is an artifact. The polished compile-and-run output is exactly when you should apply the most scrutiny — not the least.
Counter-measures:
- Run tests before accepting generated code, not after
- Explicitly ask: “What edge cases or requirements did you not address?”
- Use the
output-validatorhook for automated checks- Apply the VERIFY step of the WHAT/WHERE/HOW/VERIFY format even when output looks complete
- In Plan Mode: challenge the plan before executing, not after seeing the result
Source: Swanson et al., “The AI Fluency Index”, Anthropic (2026-02-23) — anthropic.com/research/AI-fluency-index
📊 Visual: AI Fluency — High vs Low Fluency Paths
Effective Prompt Format:
## Task Template
**WHAT**: [Concrete deliverable - e.g., "Add email validation to signup form"]**WHERE**: [File paths - e.g., "src/components/SignupForm.tsx"]**HOW**: [Constraints/approach - e.g., "Use Zod schema, show inline errors"]**VERIFY**: [Success criteria - e.g., "Empty email shows error, invalid format shows error, valid email allows submit"]
## Example
WHAT: Add input validation to the login formWHERE: src/components/LoginForm.tsx, src/schemas/auth.tsHOW: Use Zod schema validation, display errors inline below inputsVERIFY:- Empty email shows "Email required"- Invalid email format shows "Invalid email"- Empty password shows "Password required"- Valid inputs clear errors and allow submissionCollaboration Pitfalls
Section titled “Collaboration Pitfalls”❌ Don’t:
- Commit personal API keys or local settings to shared repos
- Override team conventions in personal
.claude/without discussion - Use non-standard agents/skills without team alignment
- Modify shared hooks without testing across team
- Skip documentation for custom commands/agents
- Use different Claude Code versions across team without coordinating
✅ Do:
- Use
.gitignorefor.claude/settings.local.jsonand personal configs - Document team-wide conventions in project
CLAUDE.md(committed) - Share useful agents/skills via team repository or wiki
- Test hooks in isolation before committing
- Maintain README for
.claude/agents/and.claude/commands/ - Coordinate Claude Code updates and test compatibility
- Use consistent naming conventions for custom components
- Share useful prompts and patterns in team knowledge base
Recommended .gitignore:
# Claude Code - Personal.claude/settings.local.json.claude/CLAUDE.md.claude/.serena/
# Claude Code - Team (committed)# .claude/agents/# .claude/commands/# .claude/hooks/# .claude/settings.json
# Environment.env.local.env.*.localCodebase Structure Pitfalls
Section titled “Codebase Structure Pitfalls”❌ Don’t:
- Use abbreviated variable/function names (
usr,evt,calcDur) - agents can’t find them - Write obvious comments that waste tokens (
// Import React) - Keep large monolithic files (>500 lines) that agents must read in chunks
- Hide business logic in tribal knowledge - agents need explicit documentation
- Assume agents know your custom patterns without documentation (ADRs)
- Delegate test writing to agents - they’ll write tests that match their (potentially flawed) implementation
✅ Do:
- Use complete, searchable terms (
user,event,calculateDuration) - Add synonyms in comments for discoverability (“member, subscriber, customer”)
- Split large files by concern (validation, sync, business logic)
- Embed domain knowledge in CLAUDE.md, ADRs, and code comments
- Document custom architectures with Architecture Decision Records (ADRs)
- Write tests manually first (TDD), then have agents implement to pass tests
- Use standard design patterns (Singleton, Factory, Repository) that agents know from training
- Add cross-references between related modules
Agent-hostile example:
class UsrMgr { async getUsr(id: string) { /* ... */ }}Agent-friendly example:
/** * User account management service. * Also known as: member manager, subscriber service * * Related: user-repository.ts, auth-service.ts */class UserManager { /** * Fetch user by ID. Returns null if not found. * Common use: authentication, profile rendering */ async getUser(userId: string): Promise<User | null> { /* ... */ }}Comprehensive guide: For complete codebase optimization strategies including token efficiency, testing approaches, and guardrails, see Section 9.18: Codebase Design for Agent Productivity.
Cost Optimization Pitfalls
Section titled “Cost Optimization Pitfalls”❌ Don’t:
- Use Opus for simple tasks that Sonnet can handle
- Use deep thinking prompts for every task by default
- Ignore the cost metrics in
/status - Use MCP servers that make external API calls excessively
- Load entire codebase for focused tasks
- Re-analyze unchanged code repeatedly
✅ Do:
- Use OpusPlan mode: Opus for planning, Sonnet for execution
- Match model to task complexity:
- Haiku: Code review, simple fixes
- Sonnet: Most development tasks
- Opus: Architecture, complex debugging
- Monitor cost with
/statusregularly - Set budget alerts if using API directly
- Use Serena memory to avoid re-analyzing code
- Leverage context caching with
/compact - Batch similar operations together
Cost-Effective Model Selection:
See Section 2.5 Model Selection & Thinking Guide for the canonical decision table with effort levels and cost estimates.
Learning & Adoption Pitfalls
Section titled “Learning & Adoption Pitfalls”❌ Don’t:
- Try to learn everything at once - overwhelming and inefficient
- Skip the basics and jump to advanced features
- Expect perfection from AI - it’s a tool, not magic
- Blame Claude for errors without reviewing your prompts
- Work in isolation without checking community resources
- Give up after first frustration
- Trust AI output without proportional verification - AI code has 1.75× more logic errors than human-written code (source). Match verification effort to risk level (see Section 1.7)
✅ Do:
- Follow progressive learning path:
- Week 1: Basic commands, context management
- Week 2: CLAUDE.md, permissions
- Week 3: Agents and commands
- Month 2+: MCP servers, advanced patterns
- Start with simple, low-risk tasks
- Iterate on prompts based on results
- Review this guide and community resources regularly
- Join Claude Code communities (Discord, GitHub discussions)
- Share learnings and ask questions
- Celebrate small wins and track productivity gains
Learning Checklist:
□ Week 1: Installation & Basic Usage □ Install Claude Code successfully □ Complete first task (simple edit) □ Understand context management (use /compact) □ Learn permission modes (try Plan Mode)
□ Week 2: Configuration & Memory □ Create project CLAUDE.md □ Set up .gitignore correctly □ Configure permissions in settings.local.json □ Use @file references effectively
□ Week 3-4: Customization □ Create first custom agent □ Create first custom command □ Set up at least one hook □ Explore one MCP server (suggest: Context7)
□ Month 2+: Advanced Patterns □ Implement Trinity pattern (Git + TodoWrite + Agent) □ Set up CI/CD integration □ Configure OpusPlan mode □ Build team workflow patternsEnterprise Anti-Patterns (2026 Industry Data)
Section titled “Enterprise Anti-Patterns (2026 Industry Data)”Based on Anthropic research across 5000+ organizations, these anti-patterns emerged as the most costly mistakes in agentic coding adoption.
❌ Over-Delegation (>5 Agents)
Section titled “❌ Over-Delegation (>5 Agents)”Symptom: Context switching cost exceeds productivity gain
Example:
Team spawns 10 agents simultaneously:- 6 agents blocked waiting for each other- 3 agents working on conflicting changes- 1 agent actually productive→ Net result: Slower than 2 well-coordinated agentsWhy it fails: Coordination overhead grows quadratically (N agents = N² potential conflicts)
✅ Fix:
- Start with 2-3 agents maximum
- Measure productivity gain before scaling
- Anthropic data: Sweet spot = 3-5 agents for most teams
- Boris Cherny (creator): 5-15 agents, but with ideal architecture + resources
❌ Premature Automation
Section titled “❌ Premature Automation”Symptom: Automating workflow not mastered manually first
Example:
Team automates PR review before:- Understanding what good reviews look like- Having manual review checklist- Testing on 10+ PRs manually→ Automated garbage (agent reproduces poor manual practices)Why it fails: AI amplifies existing patterns (garbage in = garbage out)
✅ Fix:
- Manual → Semi-auto → Full-auto (progressive)
- Document manual process first (becomes CLAUDE.md rules)
- Test automation on 20+ examples before full rollout
- Anthropic finding: 60% use AI, but only 0-20% fully delegate (collaboration ≠ replacement)
❌ Tool Sprawl (>10 MCP Servers)
Section titled “❌ Tool Sprawl (>10 MCP Servers)”Symptom: Maintenance burden, version conflicts, debugging hell
Example:
Project has 15 MCP servers:- 8 unused (installed for one-off task)- 4 duplicative (3 different doc lookup servers)- 2 conflicting (competing file search implementations)- 1 actually needed daily→ Startup time: 45 seconds, frequent crashesWhy it fails: Each MCP server = additional failure point, dependency, configuration
✅ Fix:
- Start core stack: Serena (symbols), Context7 (docs), Sequential (reasoning)
- Add selectively: One MCP server at a time, measure value
- Audit quarterly: Remove unused servers (
/mcp list→ usage stats) - Anthropic team pattern: CLI/scripts over MCP unless bidirectional communication needed
❌ Ignoring Collaboration Paradox
Section titled “❌ Ignoring Collaboration Paradox”Symptom: Expecting 100% delegation, frustrated by constant supervision needed
Example:
Engineer assumes "AI writes code, I review":- Reality: Constant clarification questions- Reality: Edge cases require human judgment- Reality: Architecture decisions still need human input→ Burnout from micromanaging instead of collaboratingWhy it fails: Current AI state = collaboration tool, not autonomous replacement
✅ Fix:
- Accept 60% AI usage, 0-20% full delegation as normal (Anthropic data)
- Design workflows for collaboration, not delegation
- Use AI for: Easily verifiable, well-defined, repetitive tasks
- Keep human: High-level design, organizational context, “taste” decisions
❌ No ROI Measurement
Section titled “❌ No ROI Measurement”Symptom: Scaling spend without tracking productivity gain
Example:
Team increases from 3 to 10 Claude instances:- Monthly cost: $500 → $2,000- Measured output: ??? (no tracking)- Actual gain: Unclear if positive ROI→ CFO asks "Why $2K/month?" → No answer → Budget cutWhy it fails: Can’t optimize what you don’t measure
✅ Fix:
- Track baseline: PRs/week, features shipped/month, bugs fixed/sprint
- Measure after scaling: Same metrics
- Calculate ROI: (Productivity gain × engineer hourly rate) - Claude cost
- Anthropic validation: 67% more PRs merged/day = measurable productivity
- Share metrics with leadership (justify budget, demonstrate value)
Quick Reference: Avoiding Anti-Patterns
Section titled “Quick Reference: Avoiding Anti-Patterns”| Anti-Pattern | Limit | Measurement | Fix Trigger |
|---|---|---|---|
| Over-delegation | >5 agents | Coordination overhead | Reduce to 2-3, measure |
| Tool sprawl | >10 MCP servers | Startup time, crashes | Quarterly audit, remove unused |
| Premature automation | - | Manual process unclear | Document → Test → Automate |
| No ROI tracking | - | Can’t answer “What gain?” | Baseline → Measure → Optimize |
Industry benchmark (Anthropic 2026):
- 3-6 months adoption timeline for Agent Teams
- $500-1K/month cost for Multi-Instance (positive ROI at >3 instances)
- 27% new work (wouldn’t be done without AI) = harder to measure but valuable
9.12 Git Best Practices & Workflows
Section titled “9.12 Git Best Practices & Workflows”Effective git workflows with Claude Code for professional development.
Commit Message Best Practices
Section titled “Commit Message Best Practices”Claude Code generates commit messages automatically. Guide it with clear context.
Default behavior:
# After changes, Claude creates commits like:git commit -m "feat: add user authentication middleware
- Implement JWT validation- Add session management- Create auth error handling
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"AI Attribution Standards
Section titled “AI Attribution Standards”For comprehensive AI code attribution beyond Co-Authored-By, including:
- LLVM’s
Assisted-by:trailer standard - git-ai checkpoint tracking
- Team and enterprise compliance patterns
Customize commit style in CLAUDE.md:
## Git Commit Conventions
Follow Conventional Commits format:- feat: New features- fix: Bug fixes- docs: Documentation changes- refactor: Code restructuring- test: Test additions/changes- chore: Maintenance tasks
Keep first line under 72 characters.Include ticket number: [PROJ-123]Example with guidance:
You: "Implement login endpoint"
Claude: Creates commit:git commit -m "feat: implement user login endpoint [AUTH-42]
- Add POST /api/auth/login route- Validate credentials with bcrypt- Generate JWT tokens- Return user profile on success
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"Git Amend Workflow
Section titled “Git Amend Workflow”When to amend:
- Fix typos in recent commit
- Add forgotten files to last commit
- Improve commit message
- Only for unpushed commits
Safe amend pattern:
# Check if commit is unpushedgit status# Output: "Your branch is ahead of 'origin/main' by 1 commit"
# Amend the commitYou: "Add the missing test file to the last commit"
Claude:git add tests/auth.test.tsgit commit --amend --no-edit❌ Never amend pushed commits (requires force push, breaks team history)
✅ For pushed commits, create new commit:
You: "Fix typo in authentication function"
Claude:# Creates new commit instead of amendinggit commit -m "fix: correct typo in authenticateUser function"Branch Management
Section titled “Branch Management”Feature branch pattern:
## CLAUDE.md Git Workflow
Always work on feature branches:1. Create branch from main: git checkout -b feature/name2. Make changes and commit regularly3. Push to remote: git push -u origin feature/name4. Create pull request when ready5. Never commit directly to main/masterBranch naming conventions:
feature/user-authenticationfix/login-validation-bugrefactor/api-endpointsdocs/update-readmechore/dependency-updatesClaude Code with branches:
# Start new featureYou: "Create a new branch for adding search functionality"
Claude:git checkout -b feature/search-functionality
# Work on feature...# ...
# When completeYou: "Create pull request for search feature"
Claude:git push -u origin feature/search-functionality# Opens PR creation in browserRewind vs Revert
Section titled “Rewind vs Revert”/rewind (local undo):
- Undoes Claude’s recent changes in current session
- Does NOT create git commits
- Works only for uncommitted changes
- Use when: Claude made a mistake, you want to try different approach
Example:
You: "Add email validation to login form"Claude: [Makes changes]You: [Reviews diff] "This breaks the existing flow"/rewind# Changes are undone, back to previous stateYou: "Add email validation but preserve existing flow"git revert (committed changes):
- Creates new commit that undoes previous commit
- Safe for pushed commits (preserves history)
- Use when: Need to undo committed changes
Example:
You: "Revert the authentication changes from the last commit"
Claude:git revert HEAD# Creates new commit: "Revert 'feat: add authentication'"Decision tree:
Changes not committed yet? → Use /rewindChanges committed but not pushed? → Use git reset (careful!)Changes committed and pushed? → Use git revertGit Worktrees for Parallel Development
Section titled “Git Worktrees for Parallel Development”What are worktrees?
Git worktrees (available since Git 2.5.0, July 2015) create multiple working directories from the same repository, each checked out to a different branch.
Traditional workflow problem:
# Working on feature Agit checkout feature-a# 2 hours of work...
# Urgent hotfix neededgit stash # Save current workgit checkout maingit checkout -b hotfix# Fix the bug...git checkout feature-agit stash pop # Resume workWorktree solution:
# One-time setupgit worktree add ../myproject-hotfix hotfixgit worktree add ../myproject-feature-a feature-a
# Now work in parallelcd ../myproject-hotfix # Terminal 1claude # Fix the bug
cd ../myproject-feature-a # Terminal 2claude # Continue feature workWhen to use worktrees:
✅ Use worktrees when:
- Working on multiple features simultaneously
- Need to test different approaches in parallel
- Reviewing code while developing
- Running long CI/CD builds while coding
- Maintaining multiple versions (v1 support + v2 development)
❌ Don’t use worktrees when:
- Simple branch switching is sufficient
- Disk space is limited (each worktree = full working directory)
- Team is unfamiliar with worktrees (adds complexity)
Worktree lifecycle commands:
The full worktree lifecycle is covered by 4 companion commands:
| Command | Purpose |
|---|---|
/git-worktree | Create worktree with branch validation, symlinked deps, background checks |
/git-worktree-status | Check background verification tasks (type check, tests, build) |
/git-worktree-remove | Safely remove single worktree with merge checks and DB cleanup |
/git-worktree-clean | Batch cleanup of stale worktrees with disk usage report |
# Create with auto-prefix and symlinked node_modulesYou: "/git-worktree auth"# → Creates feat/auth branch, symlinks node_modules, runs checks in background
# Check background verification statusYou: "/git-worktree-status"# → Type check: PASS, Tests: PASS (142 tests)
# Remove after mergeYou: "/git-worktree-remove feat/auth"# → Removes worktree + branch (local + remote) + DB cleanup reminder
# Batch cleanup of all merged worktreesYou: "/git-worktree-clean --dry-run"# → Preview: 3 merged (4.2 MB), 1 unmerged (kept)💡 Tip — Symlink node_modules: The
/git-worktreecommand symlinksnode_modulesfrom the main worktree by default, saving ~30s per worktree creation and significant disk space. Use--isolatedwhen you need fresh dependencies (e.g., testing upgrades).
Worktree management:
# List all worktreesgit worktree list
# Remove worktree (after merging feature)git worktree remove .worktrees/feature/new-api
# Cleanup stale worktree referencesgit worktree prune💡 Team tip — Shell aliases for fast worktree navigation: The Claude Code team uses single-letter aliases to hop between worktrees instantly:
Terminal window # ~/.zshrc or ~/.bashrcalias za="cd .worktrees/feature-a"alias zb="cd .worktrees/feature-b"alias zc="cd .worktrees/feature-c"alias zlog="cd .worktrees/analysis" # Dedicated worktree for logs & queriesThe dedicated “analysis” worktree is used for reviewing logs and running database queries without polluting active feature branches.
Claude Code context in worktrees:
Each worktree maintains independent Claude Code context:
# Terminal 1 - Worktree Acd .worktrees/feature-aclaudeYou: "Implement user authentication"# Claude indexes feature-a worktree
# Terminal 2 - Worktree B (simultaneous)cd .worktrees/feature-bclaudeYou: "Add payment integration"# Claude indexes feature-b worktree (separate context)Memory files with worktrees:
- Global memory (
~/.claude/CLAUDE.md): Shared across all worktrees - Project memory (repo root
CLAUDE.md): Committed, shared - Worktree-local memory (
.claude/CLAUDE.mdin worktree): Specific to that worktree
Recommended structure:
~/projects/├── myproject/ # Main worktree (main branch)│ ├── CLAUDE.md # Project conventions (committed)│ └── .claude/├── myproject-develop/ # develop branch worktree│ └── .claude/ # Develop-specific config├── myproject-feature-a/ # feature-a branch worktree│ └── .claude/ # Feature A context└── myproject-hotfix/ # hotfix branch worktree └── .claude/ # Hotfix contextBest practices:
-
Name worktrees clearly:
Terminal window # Badgit worktree add ../temp feature-x# Goodgit worktree add ../myproject-feature-x feature-x -
Add to .gitignore:
# Worktree directories.worktrees/worktrees/ -
Clean up merged branches:
Terminal window git worktree remove myproject-feature-xgit branch -d feature-x # Delete local branch after mergegit push origin --delete feature-x # Delete remote branch -
Use consistent location:
.worktrees/(hidden, in project root)worktrees/(visible, in project root)../myproject-*(sibling directories)
-
Don’t commit worktree contents:
- Always ensure worktree directories are in
.gitignore - The
/git-worktreecommand verifies this automatically
- Always ensure worktree directories are in
Advanced: Parallel testing pattern:
# Test feature A while working on feature Bcd .worktrees/feature-anpm test -- --watch & # Run tests in background
cd .worktrees/feature-bclaude # Continue developmentYou: "Add new API endpoint"# Tests for feature A still running in parallelWorktree troubleshooting:
Problem: Worktree creation fails with “already checked out”
# Solution: You can't check out the same branch in multiple worktreesgit worktree list # See which branches are checked out# Use a different branch or remove the existing worktree firstProblem: Disk space issues
# Each worktree is a full working directory# Solution: Clean up unused worktrees regularlygit worktree pruneProblem: Can’t delete worktree directory
# Solution: Use git worktree remove, not rm -rfgit worktree remove --force .worktrees/old-featureResources:
- Git Worktree Documentation
- Worktree lifecycle commands:
examples/commands/git-worktree.md— Createexamples/commands/git-worktree-status.md— Statusexamples/commands/git-worktree-remove.md— Removeexamples/commands/git-worktree-clean.md— Clean
Claude Code Native Worktree Features (v2.1.49–v2.1.50)
Section titled “Claude Code Native Worktree Features (v2.1.49–v2.1.50)”Claude Code has built-in worktree integration beyond the manual git worktree workflow above.
Start Claude in an isolated worktree
Section titled “Start Claude in an isolated worktree”# --worktree / -w flag: creates a temporary worktree based on HEADclaude --worktreeclaude -wThe worktree is created automatically, Claude runs inside it, and it is cleaned up on exit (if no changes were made).
Declarative isolation in agent definitions
Section titled “Declarative isolation in agent definitions”Set isolation: "worktree" in an agent’s frontmatter to automatically spawn it in a fresh worktree every time (v2.1.50+):
---name: refactoring-agentdescription: Large-scale refactors that must not pollute the main working treemodel: opusisolation: "worktree" # Each invocation gets its own isolated checkout---
Perform the requested refactoring. Commit your changes inside the worktree.This replaces the earlier pattern of manually passing isolation: "worktree" to each Task tool call.
Custom VCS setup with hook events (v2.1.50+)
Section titled “Custom VCS setup with hook events (v2.1.50+)”Two new hook events fire around agent worktree lifecycle:
| Event | Fires | Use case |
|---|---|---|
WorktreeCreate | When an agent worktree is created | Set up DB branch, copy .env, install deps |
WorktreeRemove | When an agent worktree is torn down | Clean up DB branch, delete temp credentials |
{ "hooks": { "WorktreeCreate": [ { "matcher": "", "hooks": [ { "type": "command", "command": "scripts/worktree-setup.sh $CLAUDE_WORKTREE_PATH" } ] } ], "WorktreeRemove": [ { "matcher": "", "hooks": [ { "type": "command", "command": "scripts/worktree-teardown.sh $CLAUDE_WORKTREE_PATH" } ] } ] }}Typical worktree-setup.sh: create a Neon/PlanetScale DB branch, copy .env.local, run npm install.
Enterprise config auditing with ConfigChange (v2.1.49+)
Section titled “Enterprise config auditing with ConfigChange (v2.1.49+)”The ConfigChange hook fires whenever a configuration file changes during a session. Use it to audit or block unauthorized live configuration modifications — particularly useful in enterprise environments with managed policy hooks.
{ "hooks": { "ConfigChange": [ { "matcher": "", "hooks": [ { "type": "command", "command": "scripts/audit-config-change.sh" } ] } ] }}Example audit-config-change.sh (log + optionally block):
#!/bin/bash# Receives JSON on stdin with changed config pathCONFIG=$(cat | jq -r '.config_path // "unknown"')echo "[ConfigChange] $(date -u +%Y-%m-%dT%H:%M:%SZ) $CONFIG" >> ~/.claude/logs/config-audit.log# Exit 2 to block the change, exit 0 to allow itexit 0Enterprise note:
disableAllHooks(v2.1.49+) can no longer bypass managed hooks — hooks set via organizational policy always run regardless of this setting. Only non-managed hooks are affected.
Database Branch Isolation with Worktrees
Section titled “Database Branch Isolation with Worktrees”Modern pattern (2024+): Combine git worktrees with database branches for true feature isolation.
The Problem:
Traditional workflow:Git branch → Shared dev database → Schema conflicts → Migration hellThe Solution:
Modern workflow:Git worktree + DB branch → Isolated environments → Safe experimentationHow it works:
# 1. Create worktree (standard)/git-worktree feature/auth
# 2. Claude detects your database and suggests:🔍 Detected Neon database💡 DB Isolation: neonctl branches create --name feature-auth --parent main Then update .env with new DATABASE_URL
# 3. You run the commands (or skip if not needed)# 4. Work in isolated environmentProvider detection:
The /git-worktree command automatically detects:
- Neon → Suggests
neonctl branches create - PlanetScale → Suggests
pscale branch create - Supabase → Notes lack of branching support
- Local Postgres → Suggests schema-based isolation
- Other → Reminds about isolation options
When to create DB branch:
| Scenario | Create Branch? |
|---|---|
| Adding database migrations | ✅ Yes |
| Refactoring data model | ✅ Yes |
| Bug fix (no schema change) | ❌ No |
| Performance experiments | ✅ Yes |
Prerequisites:
# For Neon:npm install -g neonctlneonctl auth
# For PlanetScale:brew install pscalepscale auth login
# For all providers:# Ensure .worktreeinclude contains .envecho ".env" >> .worktreeincludeecho ".env.local" >> .worktreeincludeComplete workflow:
# 1. Create worktree/git-worktree feature/payments
# 2. Follow suggestion to create DB branchcd .worktrees/feature-paymentsneonctl branches create --name feature-payments --parent main
# 3. Update .env with new DATABASE_URL# (Get connection string from neonctl output)
# 4. Work in isolationnpx prisma migrate devpnpm test
# 5. After PR merge, cleanupgit worktree remove .worktrees/feature-paymentsneonctl branches delete feature-paymentsSee also:
- Database Branch Setup Guide - Complete provider-specific workflows
- Neon Branching - Official Neon documentation
- PlanetScale Branching - Official PlanetScale guide
Coordinating Parallel Worktrees: Task Dependencies
Section titled “Coordinating Parallel Worktrees: Task Dependencies”When running multiple agents in parallel worktrees, the hardest problem isn’t setup — it’s coordination. There is no built-in automatic dependency detection between worktree agents. You manage it explicitly.
The pattern: analyze files touched, then set blockedBy manually
Before spawning parallel agents, identify which tasks share files:
# Quick dependency check: list files each task will touchecho "Task A (auth feature):"grep -r "UserService\|auth/" src/ --include="*.ts" -l
echo "Task B (payment feature):"grep -r "PaymentService\|billing/" src/ --include="*.ts" -l
# No overlap? Safe to parallelize.# Overlap detected? Sequence them.In the Tasks API, set blockedBy for tasks that depend on others completing first:
// Task B cannot start until Task A mergesTaskCreate("Implement payment service", { blockedBy: ["task-a-id"] })Decision matrix:
| Scenario | Strategy |
|---|---|
| Tasks touch different files, different modules | Parallelize freely |
| Tasks touch same module, different files | Parallelize with explicit conflict resolution step |
| Tasks touch same files | Sequence them |
| Task B needs Task A’s API contract | Block Task B until Task A’s interface is defined |
Practical rule: A 5-minute analysis to find file overlaps before spawning agents saves hours of merge conflict resolution.
Tooling: coderabbitai/git-worktree-runner provides a bash-based worktree manager with basic AI tool integration. It handles the worktree lifecycle but not dependency detection — that stays manual.
Note: Fully automatic dependency detection (where the system infers which tasks conflict) doesn’t exist in Claude Code or the broader ecosystem as of March 2026. The approaches above are the practical state of the art.
9.13 Cost Optimization Strategies
Section titled “9.13 Cost Optimization Strategies”Practical techniques to minimize API costs while maximizing productivity.
Model Selection Matrix
Section titled “Model Selection Matrix”Choose the right model for each task to balance cost and capability.
See Section 2.5 Model Selection & Thinking Guide for the canonical decision table with effort levels and cost estimates.
OpusPlan mode (recommended):
- Planning: Opus for high-level thinking
- Execution: Sonnet for implementation
- Best of both worlds: Strategic thinking + cost-effective execution
# Activate OpusPlan mode/model opusplan
# Enter Plan Mode (Opus for planning)Shift+Tab × 2
You: "Design a caching layer for the API"# Opus creates detailed architectural plan
# Exit Plan Mode (Sonnet for execution)Shift+Tab
You: "Implement the caching layer following the plan"# Sonnet executes the plan at lower costToken-Saving Techniques
Section titled “Token-Saving Techniques”Important: Claude Code uses lazy loading - it doesn’t “load” your entire codebase at startup. Files are read on-demand when you ask Claude to analyze them. The main context consumers at startup are your CLAUDE.md files and auto-loaded rules.
CLAUDE.md Token Cost Estimation:
| File Size | Approximate Tokens | Impact |
|---|---|---|
| 50 lines | 500-1,000 tokens | Minimal (recommended) |
| 100 lines | 1,000-2,000 tokens | Acceptable |
| 200 lines | 2,000-3,500 tokens | Upper limit |
| 500+ lines | 5,000+ tokens | Consider splitting |
Note: These are loaded once at session start, not per request. A 200-line CLAUDE.md costs ~2K tokens upfront but doesn’t grow during the session. The concern is the cumulative effect when combined with multiple @includes and all files in .claude/rules/.
Important: Beyond file size, context files containing non-essential information (style guides, architecture descriptions, general conventions) add +20-23% inference cost per session regardless of line count — because agents process and act on every instruction. (Gloaguen et al., 2026)
See also: Memory Loading Comparison for when each method loads.
1. Keep CLAUDE.md files concise:
# ❌ Bloated CLAUDE.md (wastes tokens on every session)- 500+ lines of instructions- Multiple @includes importing other files- Rarely-used guidelines
# ✅ Lean CLAUDE.md- Essential project context only (<200 lines)- Move specialized rules to .claude/rules/ (auto-loaded at session start)- Split by concern: team rules in project CLAUDE.md, personal prefs in ~/.claude/CLAUDE.mdResearch note (Gloaguen et al., ETH Zürich, Feb 2026 — 138 benchmarks, 12 repos): The first empirical study on context files shows developer-written CLAUDE.md improves agent success rate by +4%, but LLM-generated files reduce it by -3%. Cause: agents faithfully follow all instructions, even those irrelevant to the task, leading to broader file exploration and longer reasoning chains. Recommendation: include only build/test commands and project-specific tooling. Style guides and architecture descriptions belong in separate docs. (Full evaluation)
2. Use targeted file references:
# ❌ Vague request (Claude reads many files to find context)"Fix the authentication bug"
# ✅ Specific request (Claude reads only what's needed)"Fix the JWT validation in @src/auth/middleware.ts line 45"3. Compact proactively:
# ❌ Wait until 90% context/status # Context: 92% - Too late, degraded performance
# ✅ Compact at 70%/status # Context: 72%/compact # Frees up context, maintains performance4. Agent specialization:
---name: test-writerdescription: Generate unit tests (use for test generation only)model: haiku---
Generate comprehensive unit tests with edge cases.Benefits:
- Haiku costs less than Sonnet
- Focused context (tests only)
- Faster execution
5. Batch similar operations:
# ❌ Individual sessions for each fixclaude -p "Fix typo in auth.ts"claude -p "Fix typo in user.ts"claude -p "Fix typo in api.ts"
# ✅ Batch in single sessionclaudeYou: "Fix typos in auth.ts, user.ts, and api.ts"# Single context load, multiple fixesCommand Output Optimization with RTK
Section titled “Command Output Optimization with RTK”RTK (Rust Token Killer) filters bash command outputs before they reach Claude’s context, achieving 60-90% token reduction across git, testing, and development workflows. 446 stars, 38 forks, 700+ upvotes on r/ClaudeAI.
Repository: rtk-ai/rtk | Website: rtk-ai.app
Installation:
# Option 1: Homebrew (macOS/Linux)brew install rtk-ai/tap/rtk
# Option 2: Cargo (all platforms)cargo install rtk
# Option 3: Install scriptcurl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/main/install.sh | bash
# Verify installationrtk --version # v0.16.0+Proven Token Savings (Benchmarked on v0.2.0):
| Command | Baseline | RTK | Reduction |
|---|---|---|---|
rtk git log | 13,994 chars | 1,076 chars | 92.3% |
rtk git status | 100 chars | 24 chars | 76.0% |
rtk git diff | 15,815 chars | 6,982 chars | 55.9% |
rtk vitest run | ~50,000 chars | ~5,000 chars | 90.0% |
rtk pnpm list | ~8,000 chars | ~2,400 chars | 70.0% |
rtk cat CHANGELOG.md | 163,587 chars | 61,339 chars | 62.5% |
Average: 60-90% token reduction depending on commands
Key Features (v0.16.0):
# Git operationsrtk git logrtk git statusrtk git diff HEAD~1
# JS/TS Stackrtk vitest run # Test results condensedrtk pnpm list # Dependency tree optimizedrtk prisma migrate status # Migration status filtered
# Pythonrtk python pytest # Python test output condensed
# Gortk go test # Go test results filtered
# Rustrtk cargo test # Cargo test output condensedrtk cargo build # Build output filteredrtk cargo clippy # Lints grouped by severity
# Project Setup & Learningrtk init # Initialize RTK in a project (hook-first install)rtk tree # Project structure condensedrtk learn # Interactive RTK learning
# Analyticsrtk gain # Token savings dashboard (SQLite tracking)rtk discover # Find missed optimization opportunitiesReal-World Impact:
30-minute Claude Code session:- Without RTK: ~150K tokens (10-15 git commands @ ~10K tokens each)- With RTK: ~41K tokens (10-15 git commands @ ~2.7K tokens each)- Savings: 109K tokens (72.6% reduction)Integration Strategies:
-
Hook-first install (recommended):
Terminal window rtk init # Sets up PreToolUse hook automatically -
CLAUDE.md instruction (manual wrapper):
## Token OptimizationUse RTK for all supported commands:- `rtk git log` (92.3% reduction)- `rtk git status` (76.0% reduction)- `rtk git diff` (55.9% reduction) -
Skill (auto-suggestion):
- Template:
examples/skills/rtk-optimizer/SKILL.md - Detects high-verbosity commands
- Suggests RTK wrapper automatically
- Template:
-
Hook (automatic wrapper):
- Template:
examples/hooks/bash/rtk-auto-wrapper.sh - PreToolUse hook intercepts bash commands
- Applies RTK wrapper when beneficial
- Template:
Recommendation:
- ✅ Use RTK: Full-stack projects (JS/TS, Rust, Python, Go), testing workflows, analytics
- ❌ Skip RTK: Small outputs (<100 chars), quick exploration, interactive commands
See also:
- Evaluation:
docs/resource-evaluations/rtk-evaluation.md - Templates:
examples/{claude-md,skills,hooks}/rtk-* - GitHub: https://github.com/rtk-ai/rtk
- Website: https://www.rtk-ai.app/
- Third-party tools comparison:
guide/third-party-tools.md#rtk-rust-token-killer
Cost Tracking
Section titled “Cost Tracking”Monitor cost with /status:
/status
# Output:Model: Sonnet | Ctx: 45.2k | Cost: $1.23 | Ctx(u): 42.0%Set budget alerts (API usage):
# If using Anthropic API directlyimport anthropic
client = anthropic.Anthropic()
# Track spendingresponse = client.messages.create( model="claude-sonnet-4-5", max_tokens=1024, messages=[...], metadata={ "user_id": "user_123", "project": "api_development" })
# Log cost per requestcost = calculate_cost(response.usage)if cost > BUDGET_THRESHOLD: alert_team(f"Budget threshold exceeded: ${cost}")Session cost limits:
## CLAUDE.md - Cost Awareness
**Budget-conscious mode:**- Use Haiku for reviews and simple tasks- Reserve Sonnet for feature work- Use Opus only for critical decisions- Compact context at 70% to avoid waste- Close sessions after task completionEconomic Workflows
Section titled “Economic Workflows”Pattern 1: Haiku for tests, Sonnet for implementation
# Terminal 1: Test generation (Haiku)claude --model haikuYou: "Generate tests for the authentication module"
# Terminal 2: Implementation (Sonnet)claude --model sonnetYou: "Implement the authentication module"Pattern 2: Progressive model escalation
# Start with Haikuclaude --model haikuYou: "Review this code for obvious issues"
# If complex issues found, escalate to Sonnet/model sonnetYou: "Deep analysis of the race condition"
# If architectural issue, escalate to Opus/model opusYou: "Redesign the concurrency model"Pattern 3: Context reuse
# Build context once, reuse for multiple tasksclaudeYou: "Analyze the authentication flow"# Context built: ~20k tokens
# Same session - context already loadedYou: "Now add 2FA to the authentication flow"# No context rebuild needed
You: "Generate tests for the 2FA feature"# Still same context
# Commit when doneYou: "Create commit for 2FA implementation"Token Calculation Reference
Section titled “Token Calculation Reference”Input tokens:
- Source code loaded into context
- Conversation history
- Memory files (CLAUDE.md)
- Agent/skill instructions
Output tokens:
- Claude’s responses
- Generated code
- Explanations
Rough estimates:
- 1 token ≈ 0.75 words (English)
- 1 token ≈ 4 characters
- Average function: 50-200 tokens
- Average file (500 LOC): 2,000-5,000 tokens
Example calculation:
Context loaded:- 10 files × 500 LOC × 4 tokens/LOC = 20,000 tokens- Conversation history: 5,000 tokens- CLAUDE.md: 1,000 tokensTotal input: 26,000 tokens
Claude response:- Generated code: 500 LOC × 4 = 2,000 tokens- Explanation: 500 tokensTotal output: 2,500 tokens
Total cost per request: (26,000 + 2,500) tokens × model priceSonnet pricing (approximate):
- Input: $3 per million tokens
- Output: $15 per million tokens
Session cost:
Input: 26,000 × $3 / 1,000,000 = $0.078Output: 2,500 × $15 / 1,000,000 = $0.0375Total: ~$0.12 per interactionCost Optimization Checklist
Section titled “Cost Optimization Checklist”Daily practices:□ Use /status to monitor context and cost□ Compact at 70% context usage□ Close sessions after task completion□ Use `permissions.deny` to block sensitive files
Model selection:□ Default to Sonnet for most work□ Use Haiku for reviews and simple fixes□ Reserve Opus for architecture and critical debugging□ Try OpusPlan mode for strategic work
Context management:□ Use specific file references (@path/to/file.ts)□ Batch similar tasks in single session□ Reuse context for multiple related tasks□ Create specialized agents with focused context
Team practices:□ Share cost-effective patterns in team wiki□ Track spending per project□ Set budget alerts for high-cost operations□ Review cost metrics in retrospectivesAlternative: Flat-Rate via Copilot Pro+
Section titled “Alternative: Flat-Rate via Copilot Pro+”For heavy usage, consider cc-copilot-bridge to route requests through GitHub Copilot Pro+ ($10/month flat) instead of per-token billing.
# Switch to Copilot mode (flat rate)ccc # Uses Copilot Pro+ subscription
# Back to direct Anthropic (per-token)ccd # Uses ANTHROPIC_API_KEYWhen this makes sense:
- You’re hitting rate limits frequently
- Monthly costs exceed $50-100
- You already have Copilot Pro+ subscription
See Section 11.2: Multi-Provider Setup for full details.
Advanced: Cost-Aware CI/CD
Section titled “Advanced: Cost-Aware CI/CD”name: Claude Code Review
on: [pull_request]
jobs: review: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3
# Use Haiku for cost-effective reviews - name: Run Claude review run: | claude --model haiku \ -p "Review changes for security and style issues" \ --add-dir src/ \ --output-format json > review.json
# Only escalate to Sonnet if issues found - name: Deep analysis (if needed) if: ${{ contains(steps.*.outputs.*, 'CRITICAL') }} run: | claude --model sonnet \ -p "Detailed analysis of critical issues found" \ --add-dir src/Cost comparison:
Haiku review (per PR): ~$0.02Sonnet review (per PR): ~$0.10Opus review (per PR): ~$0.50
With 100 PRs/month:- Haiku: $2/month- Sonnet: $10/month- Opus: $50/month
Smart escalation (Haiku → Sonnet for 10% of PRs):- Base cost: $2 (Haiku for all)- Escalation: $1 (Sonnet for 10%)- Total: $3/month (vs $10 or $50)Cost vs Productivity Trade-offs
Section titled “Cost vs Productivity Trade-offs”Don’t be penny-wise, pound-foolish:
❌ False economy:
- Spending 2 hours manually debugging to save $1 in API costs
- Using Haiku for complex tasks, generating incorrect code
- Over-compacting context, losing valuable history
✅ Smart optimization:
- Use right model for the task (time saved >> cost)
- Invest in good prompts and memory files (reduce iterations)
- Automate with agents (consistent, efficient)
Perspective on ROI:
Time savings from effective Claude Code usage typically far outweigh API costs for most development tasks. Rather than calculating precise ROI (which depends heavily on your specific context, hourly rate, and task complexity), focus on whether the tool is genuinely helping you ship faster. For team-level measurement, see Contribution Metrics — Anthropic’s GitHub-integrated dashboard for tracking PR and code attribution (Team/Enterprise plans, public beta).
When to optimize aggressively:
- High-volume operations (>1000 requests/day)
- Automated pipelines running 24/7
- Large teams (cost scales with users)
- Budget-constrained projects
When productivity matters more:
- Critical bug fixes
- Time-sensitive features
- Learning and experimentation
- Complex architectural decisions
9.14 Development Methodologies
Section titled “9.14 Development Methodologies”Full reference: methodologies.md | Hands-on workflows: workflows/
15 structured development methodologies have emerged for AI-assisted development (2025-2026). This section provides quick navigation; detailed workflows are in dedicated files.
Quick Decision Tree
Section titled “Quick Decision Tree”┌─ "I want quality code" ────────────→ workflows/tdd-with-claude.md├─ "I want to spec before code" ─────→ workflows/spec-first.md├─ "I need to plan architecture" ────→ workflows/plan-driven.md├─ "I'm iterating on something" ─────→ workflows/iterative-refinement.md└─ "I need methodology theory" ──────→ methodologies.mdThe 4 Core Workflows for Claude Code
Section titled “The 4 Core Workflows for Claude Code”| Workflow | When to Use | Key Prompt Pattern |
|---|---|---|
| TDD | Quality-critical code | ”Write FAILING tests first, then implement” |
| Spec-First | New features, APIs | Define in CLAUDE.md before asking |
| Plan-Driven | Multi-file changes | Use /plan mode |
| Iterative | Refinement | Specific feedback: “Change X because Y” |
The 15 Methodologies (Reference)
Section titled “The 15 Methodologies (Reference)”| Tier | Methodologies | Claude Fit |
|---|---|---|
| Orchestration | BMAD | ⭐⭐ Enterprise governance |
| Specification | SDD, Doc-Driven, Req-Driven, DDD | ⭐⭐⭐ Core patterns |
| Behavior | BDD, ATDD, CDD | ⭐⭐⭐ Testing focus |
| Delivery | FDD, Context Engineering | ⭐⭐ Process |
| Implementation | TDD, Eval-Driven, Multi-Agent | ⭐⭐⭐ Core workflows |
| Optimization | Iterative Loops, Prompt Engineering | ⭐⭐⭐ Foundation |
→ Full descriptions with examples: methodologies.md
SDD Tools (External)
Section titled “SDD Tools (External)”| Tool | Use Case | Integration |
|---|---|---|
| Spec Kit | Greenfield projects | /speckit.* slash commands |
| OpenSpec | Brownfield/existing | /openspec:* slash commands |
| Specmatic | API contract testing | MCP agent available |
→ See official documentation for installation and detailed usage.
Combination Patterns
Section titled “Combination Patterns”| Situation | Recommended Stack |
|---|---|
| Solo MVP | SDD + TDD |
| Team 5-10, greenfield | Spec Kit + TDD + BDD |
| Microservices | CDD + Specmatic |
| Existing SaaS | OpenSpec + BDD |
| Enterprise 10+ | BMAD + Spec Kit |
| LLM-native product | Eval-Driven + Multi-Agent |
9.15 Named Prompting Patterns
Section titled “9.15 Named Prompting Patterns”Reading time: 5 minutes Skill level: Week 2+
Memorable named patterns for effective Claude Code interaction. These patterns have emerged from community best practices and help you communicate more effectively.
The “As If” Pattern
Section titled “The “As If” Pattern”Set quality expectations by establishing context and standards.
Pattern: “Implement as if you were a [role] at [high-standard company/context]”
Examples:
# High quality codeImplement this authentication system as if you were a senior security engineer at a major bank.
# Production readinessReview this code as if preparing for a SOC2 audit.
# Performance focusOptimize this function as if it will handle 10,000 requests per second.Why it works: Activates relevant knowledge patterns and raises output quality to match the stated context.
The Constraint Pattern
Section titled “The Constraint Pattern”Force creative solutions by adding explicit limitations.
Pattern: “Solve this [with constraint X] [without using Y]”
Examples:
# Dependency constraintImplement this feature without adding any new dependencies.
# Size constraintSolve this in under 50 lines of code.
# Time constraint (execution)This must complete in under 100ms.
# Simplicity constraintUse only standard library functions.Why it works: Constraints prevent over-engineering and force focus on the essential solution.
The “Explain First” Pattern
Section titled “The “Explain First” Pattern”Force planning before implementation.
Pattern: “Before implementing, explain your approach in [N] sentences”
Examples:
# Simple planningBefore writing code, explain in 2-3 sentences how you'll approach this.
# Detailed planningBefore implementing, outline:1. What components you'll modify2. What edge cases you've considered3. What could go wrong
# Trade-off analysisBefore choosing an approach, explain 2-3 alternatives and why you'd pick one.Why it works: Prevents premature coding and catches misunderstandings early. Especially useful for complex tasks.
The “Rubber Duck” Pattern
Section titled “The “Rubber Duck” Pattern”Debug collaboratively by having Claude ask questions.
Pattern: “I’m stuck on [X]. Ask me questions to help me figure it out.”
Examples:
# DebuggingI'm stuck on why this test is failing. Ask me questions to help diagnose the issue.
# DesignI can't decide on the right architecture. Ask me questions about my requirements.
# Problem understandingI don't fully understand what I need to build. Ask clarifying questions.Why it works: Often the problem is unclear requirements or assumptions. Questions surface hidden constraints.
The “Incremental” Pattern
Section titled “The “Incremental” Pattern”Build complex features step by step with validation.
Pattern: “Let’s build this incrementally. Start with [minimal version], then we’ll add [features].”
Examples:
# Feature developmentBuild the user registration incrementally:1. First: Basic form that saves to database2. Then: Email validation3. Then: Password strength requirements4. Finally: Email verification flow
Show me step 1 first.
# RefactoringRefactor this incrementally. First extract the validation logic,run tests, then we'll continue.Why it works: Reduces risk, enables validation at each step, maintains working code throughout.
The “Boundary” Pattern
Section titled “The “Boundary” Pattern”Define explicit scope to prevent over-engineering.
Pattern: “Only modify [X]. Don’t touch [Y].”
Examples:
# File scopeOnly modify auth.ts. Don't change any other files.
# Function scopeFix just the calculateTotal function. Don't refactor surrounding code.
# Feature scopeAdd the logout button only. Don't add session management or remember-me features.Why it works: Prevents scope creep and keeps changes focused and reviewable.
Pattern Combinations
Section titled “Pattern Combinations”| Situation | Pattern Combination |
|---|---|
| Critical feature | As If + Explain First + Incremental |
| Quick fix | Constraint + Boundary |
| Debugging session | Rubber Duck + Incremental |
| Architecture decision | Explain First + As If |
| Refactoring | Boundary + Incremental + Constraint |
Anti-Patterns to Avoid
Section titled “Anti-Patterns to Avoid”| Anti-Pattern | Problem | Better Approach |
|---|---|---|
| ”Make it perfect” | Undefined standard | Use “As If” with specific context |
| ”Fix everything” | Scope explosion | Use “Boundary” pattern |
| ”Just do it” | No validation | Use “Explain First" |
| "Make it fast” | Vague constraint | Specify: “under 100ms” |
| Overwhelming detail | Context pollution | Focus on relevant constraints only |
9.16 Session Teleportation
Section titled “9.16 Session Teleportation”Reading time: 5 minutes Skill level: Week 2+ Status: Research Preview (as of January 2026)
Session teleportation allows migrating coding sessions between cloud (claude.ai/code) and local (CLI) environments. This enables workflows where you start work on mobile/web and continue locally with full filesystem access.
Evolution Timeline
Section titled “Evolution Timeline”| Version | Feature |
|---|---|
| 2.0.24 | Initial Web → CLI teleport capability |
| 2.0.41 | Teleporting auto-sets upstream branch |
| 2.0.45 | & prefix for background tasks to web |
| 2.1.0 | /teleport and /remote-env commands |
Commands Reference
Section titled “Commands Reference”| Command | Usage |
|---|---|
% or & prefix | Send task to cloud (e.g., % Fix the auth bug) |
claude --teleport | Interactive picker for available sessions |
claude --teleport <id> | Teleport specific session by ID |
/teleport | In-REPL command to teleport current session |
/tasks | Monitor background tasks status |
/remote-env | Configure cloud environment settings |
Ctrl+B | Background all running tasks (unified in 2.1.0) |
Prerequisites
Section titled “Prerequisites”Required for teleportation:
- GitHub account connected + Claude GitHub App installed
- Clean git state (0 uncommitted changes)
- Same repository (not a fork)
- Branch exists on remote
- Same Claude.ai account on both environments
- CLI version 2.1.0+
Workflow Example
Section titled “Workflow Example”# 1. Start task on web (claude.ai/code)# "Refactor the authentication middleware"
# 2. Session works in cloud sandbox
# 3. Later, on local machine:claude --teleport# → Interactive picker shows available sessions
# 4. Select session, Claude syncs:# - Conversation context# - File changes (via git)# - Task state
# 5. Continue work locally with full filesystem accessEnvironment Support
Section titled “Environment Support”| Environment | Teleport Support |
|---|---|
| CLI/Terminal | Full bidirectional |
| VS Code | Via terminal (not Chat view) |
| Cursor | Via terminal |
| Web (claude.ai/code) | Outbound only (web → local) |
| iOS app | Monitoring only |
Current Limitations (Research Preview)
Section titled “Current Limitations (Research Preview)”⚠️ Important: Session teleportation is in research preview. Expect rough edges.
- Unidirectional: Web → local only (cannot teleport local → web)
- GitHub only: No GitLab or Bitbucket support yet
- Subscription required: Pro, Max, Team Premium, or Enterprise Premium
- Rate limits: Parallel sessions consume proportional rate limits
- Git dependency: Requires clean git state for sync
Troubleshooting
Section titled “Troubleshooting”| Issue | Solution |
|---|---|
| ”Uncommitted changes” | Commit or stash changes before teleporting |
| ”Branch not found” | Push local branch to remote first |
| ”Session not found” | Verify same Claude.ai account on both |
| ”Teleport failed” | Check internet connectivity, try again |
| Connection timeout | Use claude --teleport <id> with explicit ID |
Best Practices
Section titled “Best Practices”- Commit frequently — Clean git state is required
- Use meaningful branch names — Helps identify sessions
- Check
/tasks— Verify background task status before teleporting - Same account — Ensure CLI and web use same Claude.ai login
- Push branches — Remote must have the branch for sync
Environment Variables
Section titled “Environment Variables”| Variable | Purpose |
|---|---|
CLAUDE_CODE_DISABLE_BACKGROUND_TASKS | Disable background task functionality (v2.1.4+) |
9.17 Scaling Patterns: Multi-Instance Workflows
Section titled “9.17 Scaling Patterns: Multi-Instance Workflows”Reading time: 10 minutes
TL;DR: Multi-instance orchestration = advanced pattern for teams managing 10+ concurrent features. Requires modular architecture + budget + monitoring. 95% of users don’t need this — sequential workflows with 1-2 instances are more efficient for most contexts.
When Multi-Instance Makes Sense
Section titled “When Multi-Instance Makes Sense”Don’t scale prematurely. Multi-instance workflows introduce coordination overhead that outweighs benefits for most teams.
| Context | Recommendation | Monthly Cost | Reasoning |
|---|---|---|---|
| Solo dev | ❌ Don’t | - | Overhead > benefit, use Cursor instead |
| Startup <10 devs | ⚠️ Maybe | $400-750 | Only if modular architecture + tests |
| Scale-up 10-50 devs | ✅ Consider | $1,000-2,000 | Headless PM framework + monitoring justified |
| Enterprise 50+ | ✅ Yes | $2,000-5,000 | Clear ROI, budget available |
Red flags (don’t use multi-instance if true):
- Architecture: Legacy monolith, no tests, tight coupling
- Budget: <$500/month available for API costs
- Expertise: Team unfamiliar with Claude Code basics
- Context: Solo dev or <3 people
📊 Industry Validation: Multi-Instance ROI (Anthropic 2026)
Section titled “📊 Industry Validation: Multi-Instance ROI (Anthropic 2026)”Timeline Compression (weeks → days):
| Pattern | Before AI | With Multi-Instance | Gain |
|---|---|---|---|
| Feature implementation | 2-3 weeks | 3-5 days | 4-6x faster |
| Onboarding new codebase | 2-4 weeks | 4-8 hours | 10-50x faster |
| Legacy refactoring | Months (backlog) | 1-2 weeks | Finally viable |
Productivity Economics (Anthropic research):
| Metric | Finding | Implications |
|---|---|---|
| Output volume | +67% PRs merged/engineer/day | Gain via more output, not just speed |
| New work | 27% wouldn’t be done without AI | Experimental, nice-to-have, exploratory |
| Full delegation | 0-20% tasks | Collaboration > replacement |
| Cost multiplier | 3x (capabilities × orchestration × experience) | Compounds over time |
Enterprise Case Studies:
- TELUS (telecom, 50K+ employees): 500K hours saved, 13K custom solutions, 30% faster shipping
- Fountain (workforce platform): 50% faster screening, 40% faster onboarding via hierarchical multi-agent
- Rakuten (tech): 7h autonomous vLLM implementation (12.5M lines code, 99.9% accuracy)
The Boris pattern validation: Boris’s $500-1K/month cost and 259 PRs/month aligns with Anthropic’s enterprise data showing positive ROI at >3 parallel instances.
Anti-pattern alert (Anthropic findings):
- Over-delegation (>5 agents): Coordination overhead > productivity gain
- Premature scaling: Start 1-2 instances, measure ROI, scale progressively
- Tool sprawl: >10 MCP servers = maintenance burden (stick to core stack)
Real-World Case: Boris Cherny (Interval)
Section titled “Real-World Case: Boris Cherny (Interval)”Boris Cherny, creator of Claude Code, shared his workflow orchestrating 5-15 Claude instances in parallel.
Setup:
- 5 instances in local terminal (iTerm2 tabs, numbered 1-5)
- 5-10 instances on claude.ai/code (
--teleportto sync with local) - Git worktrees for isolation (each instance = separate checkout)
- CLAUDE.md: 2.5k tokens, team-shared and versioned in git
- Model: Opus 4.6 (slower but fewer corrections needed, adaptive thinking)
- Slash commands:
/commit-push-prused “dozens of times per day”
Results (30 days, January 2026):
- 259 PRs merged
- 497 commits
- 40k lines added, 38k lines deleted (refactor-heavy)
Cost: ~$500-1,000/month API (Opus pricing)
Critical context: Boris is the creator of Claude Code, working with perfect architecture, Anthropic resources, and ideal conditions. This is not representative of average teams.
Key insights from Boris:
On multi-clauding: “I use Cowork as a ‘doer,’ not a chat: it touches files, browsers, and tools directly. I think about productivity as parallelism: multiple tasks running while I steer outcomes.”
On CLAUDE.md: “I treat Claude.md as compounding memory: every mistake becomes a durable rule for the team.”
On plan-first workflow: “I run plan-first workflows: once the plan is solid, execution gets dramatically cleaner.”
On verification loops: “I give Claude a way to verify output (browser/tests): verification drives quality.”
Why Opus 4.6 with Adaptive Thinking: Although more expensive per token ($5/1M input vs $3/1M for Sonnet, or $10/1M for 1M context beta), Opus requires fewer correction iterations thanks to adaptive thinking. Net result: faster delivery and lower total cost despite higher unit price.
The supervision model: Boris describes his role as “tending to multiple agents” rather than “doing every click yourself.” The workflow becomes about steering outcomes across 5-10 parallel sessions, unblocking when needed, rather than sequential execution.
Source: InfoQ - Claude Code Creator Workflow (Jan 2026) | Interview: I got a private lesson on Claude Cowork & Claude Code
Team patterns (broader Claude Code team, Feb 2026):
The broader team extends Boris’s individual workflow with institutional patterns:
- Skills as institutional knowledge: Anything done more than once daily becomes a skill checked into version control. Examples:
/techdebt— run at end of session to eliminate duplicate code- Context dump skills — sync 7 days of Slack, Google Drive, Asana, and GitHub into a single context
- Analytics agents — dbt-powered skills that query BigQuery; one engineer reports not writing SQL manually for 6+ months
- CLI and scripts over MCP: The team prefers shell scripts and CLI integrations over MCP servers for external tool connections. Rationale: less magic, easier to debug, and more predictable behavior. MCP is reserved for cases where bidirectional communication is genuinely needed.
- Re-plan when stuck: Rather than pushing through a stalled implementation, the team switches back to Plan Mode. One engineer uses a secondary Claude instance to review plans “as a staff engineer” before resuming execution.
- Claude writes its own rules: After each correction, the team instructs Claude to update CLAUDE.md with the lesson learned. Over time, this compounds into a team-specific ruleset that prevents recurring mistakes.
Source: 10 Tips from Inside the Claude Code Team (Boris Cherny thread, Feb 2026)
Alternative Pattern: Dual-Instance Planning (Vertical Separation)
Section titled “Alternative Pattern: Dual-Instance Planning (Vertical Separation)”While Boris’s workflow demonstrates horizontal scaling (5-15 instances in parallel), an alternative pattern focuses on vertical separation: using two Claude instances with distinct roles for quality-focused workflows.
Pattern source: Jon Williams (Product Designer, UK), transition from Cursor to Claude Code after 6 months. LinkedIn post, Feb 3, 2026
When to Use Dual-Instance Pattern
Section titled “When to Use Dual-Instance Pattern”This pattern is orthogonal to Boris’s approach: instead of scaling breadth (more features in parallel), it scales depth (separation of planning and execution phases).
| Your Context | Use Dual-Instance? | Monthly Cost |
|---|---|---|
| Solo dev, spec-heavy work | ✅ Yes | $100-200 |
| Small team, complex requirements | ✅ Yes | $150-300 |
| Product designers coding | ✅ Yes | $100-200 |
| High-volume parallel features | ❌ No, use Boris pattern | $500-1K+ |
Use when:
- You need plan verification before execution
- Specs are complex or ambiguous (interview-based clarification helps)
- Lower budget than Boris pattern ($100-200/month vs $500-1K+)
- Quality > speed (willing to sacrifice parallelism for better plans)
Don’t use when:
- You need to ship 10+ features simultaneously (use Boris pattern)
- Plans are straightforward (single instance with
/planis enough) - Budget is very limited (<$100/month)
Setup: Two Instances, Two Roles
Section titled “Setup: Two Instances, Two Roles”┌─────────────────────────────────────────────────────┐│ DUAL-INSTANCE ARCHITECTURE │├─────────────────────────────────────────────────────┤│ ││ ┌──────────────────┐ ││ │ Claude Zero │ Planning & Review ││ │ (Planner) │ - Explores codebase ││ └────────┬─────────┘ - Writes plans ││ │ - Reviews implementations ││ │ - NEVER touches code ││ ▼ ││ ┌─────────────────┐ ││ │ Plans/Review/ │ Human review checkpoint ││ │ Plans/Active/ │ ││ └────────┬────────┘ ││ │ ││ ▼ ││ ┌──────────────────┐ ││ │ Claude One │ Implementation ││ │ (Implementer) │ - Reads approved plans ││ └──────────────────┘ - Writes code ││ - Commits changes ││ - Reports completion ││ ││ Key: Separation of concerns = fewer mistakes ││ │└─────────────────────────────────────────────────────┘Setup steps:
- Create directory structure:
mkdir -p .claude/plans/{Review,Active,Completed}- Launch Claude Zero (Terminal 1):
cd ~/projects/your-projectclaude# Set role in first message:# "You are Claude Zero. Your role: explore codebase, write plans,# review implementations. NEVER edit code. Save all plans to# .claude/plans/Review/"- Launch Claude One (Terminal 2):
cd ~/projects/your-projectclaude# Set role in first message:# "You are Claude One. Your role: read plans from .claude/plans/Active/,# implement them, commit changes, report back."Workflow: 5 Steps
Section titled “Workflow: 5 Steps”Step 1: Planning (Claude Zero)
You (to Claude Zero): /plan
Implement JWT authentication for the API.- Support access tokens (15min expiry)- Support refresh tokens (7 day expiry)- Middleware to validate tokens on protected routesClaude Zero explores codebase, interviews you about requirements:
- “Should we support multiple sessions per user?”
- “Do you want token revocation (logout) capability?”
- “Which routes should be protected vs public?”
Claude Zero writes plan to .claude/plans/Review/auth-jwt.md:
# Plan: JWT Authentication
## SummaryAdd JWT-based authentication with access/refresh tokens.Support token revocation for logout.
## Files to Create- src/auth/jwt.ts (line 1-120) - generateAccessToken(userId) - generateRefreshToken(userId) - verifyToken(token)
- src/middleware/auth.ts (line 1-45) - requireAuth middleware - Token validation logic
## Files to Modify- src/routes/api.ts (line 23) - Add auth middleware to protected routes
- src/config/env.ts (line 15) - Add JWT_SECRET, JWT_REFRESH_SECRET env vars
## Implementation Steps1. Install jsonwebtoken library2. Create JWT utility functions3. Create auth middleware4. Add JWT secrets to .env5. Protect existing routes6. Write tests for auth flow
## Success Criteria- POST /auth/login returns access + refresh token- Protected routes reject without valid token- POST /auth/refresh exchanges refresh token for new access token- POST /auth/logout revokes refresh token
## Risks- Token secrets must be in .env (never committed)- Refresh token storage needs database tableStep 2: Human Review
You review .claude/plans/Review/auth-jwt.md:
- Is the approach correct?
- Are all requirements covered?
- Any security issues?
If approved, move to Active:
mv .claude/plans/Review/auth-jwt.md .claude/plans/Active/Step 3: Implementation (Claude One)
You (to Claude One): Implement .claude/plans/Active/auth-jwt.mdClaude One reads the plan file, implements all steps, commits.
Step 4: Verification (Claude Zero)
You (to Claude Zero): Review the JWT implementation Claude One just completed.Claude Zero reviews:
- Code matches plan?
- Security best practices followed?
- Tests cover success criteria?
Step 5: Archive
If approved:
mv .claude/plans/Active/auth-jwt.md .claude/plans/Completed/Comparison: Boris (Horizontal) vs Jon (Vertical)
Section titled “Comparison: Boris (Horizontal) vs Jon (Vertical)”| Dimension | Boris Pattern | Jon Pattern (Dual-Instance) |
|---|---|---|
| Scaling axis | Horizontal (5-15 instances, parallel features) | Vertical (2 instances, separated phases) |
| Primary goal | Speed via parallelism | Quality via separation of concerns |
| Monthly cost | $500-1,000 (Opus × 5-15) | $100-200 (Opus × 2 sequential) |
| Entry barrier | High (worktrees, CLAUDE.md 2.5K, orchestration) | Low (2 terminals, Plans/ directory) |
| Audience | Teams, high-volume, 10+ devs | Solo devs, product designers, spec-heavy |
| Context pollution | Isolated by worktrees (git branches) | Isolated by role separation (planner vs implementer) |
| Accountability | Git history (commits per instance) | Human-in-the-loop (review plans before execution) |
| Tooling required | Worktrees, teleport, /commit-push-pr | Plans/ directory structure |
| Coordination | Self-orchestrated (Boris steers 10 sessions) | Human gatekeeper (approve plans) |
| Best for | Shipping 10+ features/day, experienced teams | Complex specs, quality-critical, budget-conscious |
Key insight: These patterns are not mutually exclusive. You can use dual-instance for complex features (planning rigor) and Boris pattern for high-volume simple features (speed).
Cost Analysis: 2 Instances vs Correction Loops
Section titled “Cost Analysis: 2 Instances vs Correction Loops”Question: Is it cheaper to use 2 instances (planner + implementer) or 1 instance with correction loops?
| Scenario | 1 Instance (Corrections) | 2 Instances (Dual) | Winner |
|---|---|---|---|
| Simple feature (login form) | 1 session × $5 = $5 | 2 sessions × $3 each = $6 | 1 instance |
| Complex spec (auth system) | 1 session × $15 + 2 correction loops × $10 = $35 | 2 sessions × $12 each = $24 | 2 instances |
| Ambiguous requirements | 1 session × $20 + 3 correction loops × $15 = $65 | 2 sessions × $18 each = $36 | 2 instances |
Breakeven point: For features requiring ≥2 correction loops, dual-instance is cheaper and faster.
Hidden cost savings:
- Context pollution: Planner doesn’t see implementation details → cleaner reasoning
- Fewer hallucinations: Plans have file paths + line numbers → implementer is grounded
- Learning: Review step catches mistakes before they compound
Agent-Ready Plans: Best Practices
Section titled “Agent-Ready Plans: Best Practices”The key to dual-instance efficiency is plan structure. Jon Williams emphasizes “agent-ready plans with specific file references and line numbers.”
Bad plan (vague):
## ImplementationAdd authentication to the API.Update the routes.Create middleware.Good plan (agent-ready):
## Implementation
### Step 1: Create JWT utilities**File**: src/auth/jwt.ts (new file, ~120 lines)**Functions**:- Line 10-30: generateAccessToken(userId: string): string- Line 35-55: generateRefreshToken(userId: string): string- Line 60-85: verifyToken(token: string): { userId: string } | null
**Dependencies**: jsonwebtoken (npm install)
### Step 2: Create auth middleware**File**: src/middleware/auth.ts (new file, ~45 lines)**Export**:- Line 15-40: requireAuth middleware (checks Authorization header)
**Imports**: jwt.ts (Step 1)
### Step 3: Protect routes**File**: src/routes/api.ts**Location**: Line 23 (after imports, before route definitions)**Change**: Import requireAuth, apply to /api/protected routes
**Example**:router.get('/profile', requireAuth, profileController)Why agent-ready plans work:
- File paths → Claude One knows exactly where to work
- Line numbers → Reduces guessing, fewer file reads
- Dependencies explicit → No surprises during implementation
- Examples included → Claude One understands expected structure
Template: See guide/workflows/dual-instance-planning.md for full plan template.
Tips for Success
Section titled “Tips for Success”1. Role enforcement: Set roles in first message of each session:
- Claude Zero: “NEVER edit code, only write plans to .claude/plans/Review/”
- Claude One: “ONLY implement plans from .claude/plans/Active/, never plan”
2. Plans directory in .gitignore:
.claude/plans/Review/ # Work in progress.claude/plans/Active/ # Under implementation# Don't ignore Completed/ (optional: archive for team learning)3. Use /plan mode:
Claude Zero should start with /plan for safe exploration:
/plan
[Your feature request]4. Interview prompts: Encourage Claude Zero to ask clarifying questions:
"Interview me about requirements before drafting the plan.Ask about edge cases, success criteria, and constraints."5. Review checklist: When Claude Zero reviews Claude One’s implementation:
- Code matches plan structure?
- All files from plan created/modified?
- Tests cover success criteria?
- Security best practices followed?
- No TODO comments for core functionality?
Limitations
Section titled “Limitations”When dual-instance doesn’t help:
- Trivial changes: Typo fixes, simple refactors → 1 instance faster
- Exploratory coding: Unknown problem space → planning overhead not justified
- Tight deadlines: Speed > quality → use 1 instance, accept corrections
- Very limited budget: <$100/month → use Sonnet, 1 instance
Overhead:
- Manual coordination: You move plans between directories (no automation)
- Context switching: Managing 2 terminal sessions
- Slower iteration: Plan → approve → implement (vs immediate execution)
Partial adoption: You can use this pattern selectively:
- Dual-instance for complex features
- Single instance for simple tasks
- No need to commit to one pattern exclusively
See Also
Section titled “See Also”- Workflow guide: dual-instance-planning.md — Full workflow with templates
- Plan Mode: Section 9.1 “The Trinity” — Foundation for planning
- Multi-Instance (Boris): Section 9.17 — Horizontal scaling alternative
- Cost optimization: Section 8.10 — Budget management strategies
External resource: Jon Williams LinkedIn post (Feb 3, 2026)
Foundation: Git Worktrees (Non-Negotiable)
Section titled “Foundation: Git Worktrees (Non-Negotiable)”Multi-instance workflows REQUIRE git worktrees to avoid conflicts. Without worktrees, parallel instances create merge hell.
Why worktrees are critical:
- Each instance operates in isolated git checkout
- No branch switching = no context loss
- No merge conflicts during development
- Instant creation (~1s vs minutes for full clone)
Quick setup:
# Create worktree with new branch/git-worktree feature/auth
# - Separate checkout# - Shared .git history# - Zero duplication overheadSee also:
- Command: /git-worktree
- Workflow: Database Branch Setup
Advanced Tooling for Worktree Management (Optional)
Section titled “Advanced Tooling for Worktree Management (Optional)”While git worktrees are foundational, daily productivity improves with automation wrappers. Multiple professional teams have independently created worktree management tools—a validated pattern.
Pattern Validation: 3 Independent Implementations
Section titled “Pattern Validation: 3 Independent Implementations”| Team | Solution | Key Features |
|---|---|---|
| incident.io | Custom bash wrapper w | Auto-completion, organized in ~/projects/worktrees/, Claude auto-launch |
| GitHub #1052 | Fish shell functions (8 commands) | LLM commits, rebase automation, worktree lifecycle |
| Worktrunk | Rust CLI (1.6K stars, 64 releases) | Project hooks, CI status, PR links, multi-platform |
Conclusion: The worktree wrapper pattern is reinvented by power users. Vanilla git is sufficient but verbose for 5-10+ daily worktree operations.
Do I Need Worktrunk? (Self-Assessment)
Section titled “Do I Need Worktrunk? (Self-Assessment)”Answer these 3 questions honestly:
-
Volume: How many worktrees do you create per week?
- ❌ <5/week → Vanilla git sufficient
- ⚠️ 5-15/week → Consider lightweight alias
- ✅ 15+/week → Worktrunk or DIY wrapper justified
-
Multi-instance workflow: Are you running 5+ parallel Claude instances regularly?
- ❌ No, 1-2 instances → Vanilla git sufficient
- ⚠️ Sometimes 3-5 instances → Alias or lightweight wrapper
- ✅ Yes, 5-10+ instances daily → Worktrunk features valuable (CI status, hooks)
-
Team context: Who else uses your worktree workflow?
- ❌ Solo dev → Alias (zero dependency)
- ⚠️ Small team, same OS/shell → DIY wrapper (shared script)
- ✅ Multi-platform team → Worktrunk (Homebrew/Cargo/Winget)
Decision matrix:
| Profile | Weekly Worktrees | Instances | Team | Recommendation |
|---|---|---|---|---|
| Beginner | <5 | 1-2 | Solo | ✅ Vanilla git - Learn fundamentals first |
| Casual user | 5-15 | 2-3 | Solo/Small | ⚠️ Alias (2 min setup, example below) |
| Power user | 15-30 | 5-10 | Multi-platform | ✅ Worktrunk - ROI justified |
| Boris scale | 30+ | 10-15 | Team | ✅ Worktrunk + orchestrator |
Quick alias alternative (for “Casual user” profile):
If you scored ⚠️ (5-15 worktrees/week), try this first before installing Worktrunk:
# Add to ~/.zshrc or ~/.bashrc (2 minutes setup)wtc() { local branch=$1 local path="../${PWD##*/}.${branch//\//-}" git worktree add -b "$branch" "$path" && cd "$path"}alias wtl='git worktree list'alias wtd='git worktree remove'Usage: wtc feature/auth (18 chars vs 88 chars vanilla git, -79% typing)
When to upgrade to Worktrunk:
- Alias feels limiting (want CI status, LLM commits, project hooks)
- Volume increases to 15+ worktrees/week
- Team adopts multi-instance workflows (need consistent tooling)
Bottom line: Most readers (80%) should start with vanilla git or alias. Worktrunk is for power users managing 5-10+ instances daily where typing friction and CI visibility matter.
Benchmark: Wrapper vs Vanilla Git
Section titled “Benchmark: Wrapper vs Vanilla Git”| Operation | Vanilla Git | Worktrunk | Custom Wrapper |
|---|---|---|---|
| Create + switch | git worktree add -b feat ../repo.feat && cd ../repo.feat | wt switch -c feat | w myproject feat |
| List worktrees | git worktree list | wt list (with CI status) | w list |
| Remove + cleanup | git worktree remove ../repo.feat && git worktree prune | wt remove feat | w finish feat |
| LLM commit msg | Manual or custom script | Built-in via llm tool | Custom via LLM API |
| Setup time | 0 (git installed) | 2 min (Homebrew/Cargo) | 10-30 min (copy-paste script) |
| Maintenance | Git updates only | Active (64 releases) | Manual (custom code) |
Trade-off: Wrappers reduce typing ~60% but add dependency. Learn git fundamentals first, add wrapper for speed later.
Option 1: Worktrunk (Recommended for Scale)
Section titled “Option 1: Worktrunk (Recommended for Scale)”What: Rust CLI simplifying worktree management (1.6K stars, active development since 2023)
Unique features not in git:
- Project-level hooks: Automate post-create, pre-remove actions
- LLM integration:
wt commitgenerates messages viallmtool - CI status tracking: See build status inline with
wt list - PR link generation: Quick links to open PRs per worktree
- Path templates: Configure worktree location pattern once
Installation:
# macOS/Linuxbrew install worktrunk
# Or via Rustcargo install worktrunk
# Windowswinget install worktrunkTypical workflow:
# Create worktree + switchwt switch -c feature/auth
# Work with Claude...claude
# LLM-powered commitwt commit # Generates message from diff
# List all worktrees with statuswt list
# Remove when donewt remove feature/authWhen to use: Managing 5+ worktrees daily, want CI integration, multi-platform team (macOS/Linux/Windows).
Source: github.com/max-sixty/worktrunk
Option 2: DIY Custom Wrapper (Lightweight Alternative)
Section titled “Option 2: DIY Custom Wrapper (Lightweight Alternative)”What: 10-50 lines of bash/fish/PowerShell tailored to your workflow.
Examples from production teams:
-
incident.io approach (bash wrapper):
Terminal window # Function: w myproject feature-name claude# - Creates worktree in ~/projects/worktrees/myproject.feature-name# - Auto-completion for projects and branches# - Launches Claude automatically- ROI: 18% improvement (30s) on API generation time
- Source: incident.io blog post
-
GitHub #1052 approach (Fish shell, 8 functions):
Terminal window git worktree-llm feature-name # Create + start Claudegit worktree-merge # Finish, commit, rebase, mergegit commit-llm # LLM-generated commit messages- Author quote: “I now use it for basically all my development where I can use claude code”
- Source: Claude Code issue #1052
When to use: Want full control, small team (same shell), already have shell functions for git.
Trade-off: Custom scripts lack maintenance, cross-platform support, but are zero-dependency and infinitely customizable.
Recommendation: Learn → Wrapper → Scale
Section titled “Recommendation: Learn → Wrapper → Scale”Phase 1 (Weeks 1-2): Master vanilla git worktree via /git-worktree command └─ Understand fundamentals, safety checks, database branching
Phase 2 (Week 3+): Add wrapper for productivity ├─ Worktrunk (if multi-platform, want CI status, LLM commits) └─ DIY bash/fish (if lightweight, team uses same shell)
Phase 3 (Multi-instance scale): Combine with orchestration └─ Worktrunk/wrapper + Headless PM for 5-10 instancesPhilosophy: Tools amplify knowledge. Master git patterns (this guide) before adding convenience layers. Wrappers save 5-10 minutes/day but don’t replace understanding.
Anthropic stance: Official best practices recommend git worktrees (vanilla) but remain agnostic on wrappers. Choose what fits your team.
Anthropic Internal Study (August 2025)
Section titled “Anthropic Internal Study (August 2025)”Anthropic studied how their own engineers use Claude Code, providing empirical data on productivity and limitations.
Study scope:
- 132 engineers and researchers surveyed
- 53 qualitative interviews conducted
- 200,000 session transcripts analyzed (Feb-Aug 2025)
Productivity gains:
- +50% productivity (self-reported, vs +20% 12 months prior)
- 2-3x increase year-over-year in usage and output
- 59% of work involves Claude (vs 28% a year ago)
- 27% of work “wouldn’t have been done otherwise” (scope expansion, not velocity)
Autonomous actions:
- 21.2 consecutive tool calls without human intervention (vs 9.8 six months prior)
- +116% increase in autonomous action chains
- 33% reduction in human interventions required
- Average task complexity: 3.8/5 (vs 3.2 six months before)
Critical concerns (verbatim quotes from engineers):
“When producing is so easy and fast, it’s hard to really learn”
“It’s difficult to say what roles will be in a few years”
“I feel like I come to work each day to automate myself”
Implications: Even at Anthropic (perfect conditions: created the tool, ideal architecture, unlimited budget), engineers express uncertainty about long-term skill development and role evolution.
Source: Anthropic Research - How AI is Transforming Work at Anthropic (Aug 2025)
Contribution Metrics (January 2026)
Section titled “Contribution Metrics (January 2026)”Five months after the internal study, Anthropic published updated productivity data alongside a new analytics feature for Team and Enterprise customers.
Updated metrics (Anthropic internal):
- +67% PRs merged per engineer per day (vs Aug 2025 self-reported +50%)
- 70-90% of code now written with Claude Code assistance across teams
Methodological note: These figures are PR/commit-based (measured via GitHub integration), not self-reported surveys as in the Aug 2025 study. However, Anthropic discloses no baseline period, no team breakdown, and defines measurement only as “conservative — only code where we have high confidence in Claude Code’s involvement.” Treat as directional indicators, not rigorous benchmarks.
Product feature — Contribution Metrics dashboard:
- Status: Public beta (January 2026)
- Availability: Claude Team and Enterprise plans (exact add-on requirements unconfirmed)
- Tracks: PRs merged and lines of code committed, with/without Claude Code attribution
- Access: Workspace admins and owners only
- Setup: Install Claude GitHub App → Enable GitHub Analytics in Admin settings → Authenticate GitHub organization
- Positioning: Complement to existing engineering KPIs (DORA metrics, sprint velocity), not a replacement
Source: Anthropic — Contribution Metrics (Jan 2026)
Cost-Benefit Analysis
Section titled “Cost-Benefit Analysis”Multi-instance workflows have hard costs and soft overhead (coordination, supervision, merge conflicts).
Direct API Costs
Section titled “Direct API Costs”| Scale | Model | Monthly Cost | Break-Even Productivity Gain |
|---|---|---|---|
| 5 devs, 2 instances each | Sonnet | $390-750 | 3-5% |
| 10 devs, 2-3 instances | Sonnet | $1,080-1,650 | 1.3-2% |
| Boris scale (15 instances) | Opus | $500-1,000 | Justified if 259 PRs/month |
Calculation basis (Sonnet 4.5):
- Input: $3/million tokens
- Output: $15/million tokens
- Estimate: 30k tokens/instance/day × 20 days
- 5 devs × 2 instances × 600k tokens/month = ~$540/month
OpusPlan optimization: Use Opus for planning (10-20% of work), Sonnet for execution (80-90%). Reduces cost while maintaining quality.
Hidden Costs (Not in API Bill)
Section titled “Hidden Costs (Not in API Bill)”| Cost Type | Impact | Mitigation |
|---|---|---|
| Coordination overhead | 10-20% time managing instances | Headless PM framework |
| Merge conflicts | 5-15% time resolving conflicts | Git worktrees + modular architecture |
| Context switching | Cognitive load × number of instances | Limit to 2-3 instances per developer |
| Supervision | Must review all autonomous output | Automated tests + code review |
ROI monitoring:
- Baseline: Track PRs/month before multi-instance (3 months)
- Implement: Scale to multi-instance with monitoring
- Measure: PRs/month after 3 months
- Decision: If gain <3%, rollback to sequential
Orchestration Frameworks
Section titled “Orchestration Frameworks”Coordinating multiple Claude instances without chaos requires tooling.
Headless PM (Open Source)
Section titled “Headless PM (Open Source)”Project: madviking/headless-pm (158 stars)
Architecture:
- REST API for centralized coordination
- Task locking: Prevents parallel work on same file
- Role-based agents: PM, Architect, Backend, Frontend, QA
- Document-based communication: Agents @mention each other
- Git workflow guidance: Automatic PR/commit suggestions
Workflow:
Epic → Features → Tasks (major=PR, minor=commit) ↓Agents register, lock tasks, update status ↓Architect reviews (approve/reject) ↓Communication via docs with @mentionUse case: Teams managing 5-10 instances without manual coordination overhead.
Alternatives
Section titled “Alternatives”| Tool | Best For | Cost | Key Feature |
|---|---|---|---|
| Cursor Parallel Agents | Solo/small teams | $20-40/month | UI integrated, git worktrees built-in |
| Windsurf Cascade | Large codebases | $15/month | 10x faster context (Codemaps) |
| Sequential Claude | Most teams | $20/month | 1-2 instances with better prompting |
Implementation Guide (Progressive Scaling)
Section titled “Implementation Guide (Progressive Scaling)”Don’t jump to 10 instances. Scale progressively with validation gates.
Phase 1: Single Instance Mastery (2-4 weeks)
Section titled “Phase 1: Single Instance Mastery (2-4 weeks)”Goal: Achieve >80% success rate with 1 instance before scaling.
# 1. Create CLAUDE.md (2-3k tokens)# - Conventions (naming, imports)# - Workflows (git, testing)# - Patterns (state management)
# 2. Implement feedback loops# - Automated tests (run after every change)# - Pre-commit hooks (validation gates)# - /validate command (quality checks)
# 3. Measure baseline# - PRs/month# - Test pass rate# - Time to mergeSuccess criteria: 80%+ PRs merged without major revisions.
Phase 2: Dual Instance Testing (1 month)
Section titled “Phase 2: Dual Instance Testing (1 month)”Goal: Validate that 2 instances increase throughput without chaos.
# 1. Setup git worktrees/git-worktree feature/backend/git-worktree feature/frontend
# 2. Parallel development# - Instance 1: Backend API# - Instance 2: Frontend UI# - Ensure decoupled work (no file overlap)
# 3. Monitor conflicts# - Track merge conflicts per week# - If >2% conflict rate, pause and fix architectureSuccess criteria: <2% merge conflicts, >5% productivity gain vs single instance.
Phase 3: Multi-Instance (if Phase 2 successful)
Section titled “Phase 3: Multi-Instance (if Phase 2 successful)”Goal: Scale to 3-5 instances with orchestration framework.
# 1. Deploy orchestration framework (choose based on needs)# - Headless PM (manual coordination)# - Gas Town (parallel task execution)# - multiclaude (self-hosted, tmux-based)# - Entire CLI (governance + sequential handoffs)
# 2. Define roles# - Architect (reviews PRs)# - Backend (API development)# - Frontend (UI development)# - QA (test automation)
# 3. Weekly retrospectives# - Review conflict rate# - Measure ROI (cost vs output)# - Adjust instance countOrchestration framework options:
| Tool | Paradigm | Best For |
|---|---|---|
| Manual (worktrees) | No framework | 2-3 instances, full control |
| Gas Town | Parallel coordination | 5+ instances, complex parallel tasks |
| multiclaude | Self-hosted spawner | Teams needing on-prem/airgap |
| Entire CLI | Governance + handoffs | Sequential workflows with compliance |
Entire CLI (Feb 2026): Alternative to parallel orchestration, focuses on sequential agent handoffs with governance layer (approval gates, audit trails). Useful for compliance-critical workflows (SOC2, HIPAA) or multi-agent handoffs (Claude → Gemini). See AI Ecosystem Guide for details.
Success criteria: Sustained 3-5% productivity gain over 3 months.
Monitoring & Observability
Section titled “Monitoring & Observability”Track multi-instance workflows with metrics to validate ROI.
Essential Metrics
Section titled “Essential Metrics”| Metric | Tool | Target | Red Flag |
|---|---|---|---|
| Merge conflicts | git log --grep="Merge conflict" | <2% | >5% |
| PRs/month | GitHub Insights | +3-5% vs baseline | Flat or declining |
| Test pass rate | CI/CD | >95% | <90% |
| API cost | Session stats script | Within budget | >20% over |
Session stats script (from this guide):
# Track API usage across all instances./examples/scripts/session-stats.sh --range 7d --json
# Monitor per-instance cost./examples/scripts/session-stats.sh --project backend --range 30dSee also: Session Observability Guide
Warning Signs (Rollback Triggers)
Section titled “Warning Signs (Rollback Triggers)”Stop multi-instance and return to sequential if you see:
- Merge conflicts >5% of PRs
- CLAUDE.md grows >5k tokens (sign of chaos)
- Test quality degrades (coverage drops, flaky tests increase)
- Supervision overhead >30% developer time
- Team reports skill atrophy or frustration
When NOT to Use Multi-Instance
Section titled “When NOT to Use Multi-Instance”Be honest about your context. Most teams should stay sequential.
Architecture Red Flags
Section titled “Architecture Red Flags”❌ Legacy monolith (tight coupling):
- Claude struggles with implicit dependencies
- Context pollution across instances
- Merge conflicts frequent
❌ Event-driven systems (complex interactions):
- Hard to decompose into parallel tasks
- Integration testing becomes nightmare
❌ No automated tests:
- Can’t validate autonomous output
- “Death spirals” where broken tests stay broken
Team Red Flags
Section titled “Team Red Flags”❌ Solo developer:
- Coordination overhead unjustified
- Cursor parallel agents simpler (UI integrated)
❌ Team <3 people:
- Not enough concurrent work to parallelize
- Better ROI from optimizing single-instance workflow
❌ Junior team:
- Requires expertise in Claude Code, git worktrees, prompt engineering
- Start with single instance, scale later
Budget Red Flags
Section titled “Budget Red Flags”❌ <$500/month available:
- Multi-instance costs $400-1,000/month minimum
- Better investment: training, better prompts, Cursor
Decision Matrix
Section titled “Decision Matrix”Use this flowchart to decide if multi-instance is right for you:
New feature request├─ Solo dev?│ └─ Use Cursor ($20/month)│├─ Startup <10 devs?│ ├─ Legacy code without tests?│ │ └─ Fix architecture first (1-2 months)│ └─ Modular + tested?│ └─ Try 2 instances (1 month pilot)│├─ Scale-up 10-50 devs?│ ├─ Budget >$1k/month?│ │ └─ Deploy Headless PM framework│ └─ Budget <$1k/month?│ └─ Sequential optimized (better prompts)│└─ Enterprise 50+ devs? └─ Windsurf + custom orchestrationResources
Section titled “Resources”Primary sources:
- Boris Cherny workflow (InfoQ, Jan 2026)
- Anthropic internal study (Aug 2025)
- Headless PM framework (GitHub)
Related guides:
Community discussions:
9.18 Codebase Design for Agent Productivity
Section titled “9.18 Codebase Design for Agent Productivity”Source: Agent Experience Best Practices for Coding Agent Productivity François Zaninotto, Marmelab (January 21, 2026) Additional validation: Netlify AX framework (2025), Speakeasy implementation guide, ArXiv papers on agent context engineering
📌 Section 9.18 TL;DR (2 minutes)
Section titled “📌 Section 9.18 TL;DR (2 minutes)”The paradigm shift: Traditional codebases are optimized for human developers. AI agents have different needs—they excel at pattern matching but struggle with implicit knowledge and scattered context.
Key principles:
- Domain Knowledge Embedding: Put business logic and design decisions directly in code (CLAUDE.md, ADRs, comments)
- Code Discoverability: Make code “searchable” like SEO—use synonyms, tags, complete terms
- Documentation Formats: Use llms.txt for AI-optimized documentation indexing (complements MCP servers)
- Token Efficiency: Split large files, remove obvious comments, use verbose flags for debug output
- Testing for Autonomy: TDD is more critical for agents than humans—tests guide behavior
- Guardrails: Hooks, CI checks, and PR reviews catch agent mistakes early
When to optimize for agents: High-impact files (core business logic, frequently modified modules) and greenfield projects. Don’t refactor stable code just for agents.
Cross-references: CLAUDE.md patterns (3.1) · Hooks (6.2) · Pitfalls (9.11) · Methodologies (9.14)
9.18.1 The Paradigm Shift: Designing for Agents
Section titled “9.18.1 The Paradigm Shift: Designing for Agents”Traditional vs AI-Native Codebase Design
Section titled “Traditional vs AI-Native Codebase Design”| Aspect | Human-Optimized | Agent-Optimized |
|---|---|---|
| Comments | Sparse, assume context | Explicit “why” + synonyms |
| File size | 1000+ lines OK | Split at 500 lines |
| Architecture docs | Separate wiki/Confluence | Embedded in CLAUDE.md + ADRs |
| Conventions | Oral tradition, tribal knowledge | Written, discoverable, tagged |
| Testing | Optional for prototypes | Critical—agents follow tests |
| Error messages | Generic | Specific with recovery hints |
Why this matters: Agents read code sequentially and lack the “mental model” humans build over time. What’s obvious to you (e.g., “this service handles auth”) must be made explicit.
The Agent Experience (AX) Framework
Section titled “The Agent Experience (AX) Framework”Netlify coined “Agent Experience” as the agent equivalent of Developer Experience (DX). Key questions:
- Can the agent find what it needs? (Discoverability)
- Can it understand design decisions? (Domain Knowledge)
- Can it validate its work? (Testing + Guardrails)
- Can it work efficiently? (Token budget)
“Agent Experience is about reducing cognitive friction for AI, just as DX reduces friction for humans.” — Netlify AX Research Team
Real-world impact:
- Marmelab: Refactored Atomic CRM codebase with AX principles → 40% faster feature delivery
- Speakeasy: Agent-friendly API docs → 3x higher API adoption rates
- Anthropic internal: Codebase restructuring → 60% reduction in agent hallucinations
When to invest in AX:
- ✅ Greenfield projects (design agent-friendly from start)
- ✅ High-churn files (business logic, API routes)
- ✅ Teams using agents extensively (>50% of commits)
- ❌ Stable legacy code (don’t refactor just for agents)
- ❌ Small scripts (<100 lines, agents handle fine)
Convention-Over-Configuration for AI Agents
Section titled “Convention-Over-Configuration for AI Agents”Problem: Every configuration decision adds cognitive load for agents. Custom architectures require extensive CLAUDE.md documentation to prevent hallucinations.
Solution: Choose opinionated frameworks that reduce decision space through enforced conventions.
Why opinionated frameworks help agents:
| Aspect | Custom Architecture | Opinionated Framework |
|---|---|---|
| File organization | Agent must learn your structure | Standard conventions (e.g., Next.js app/, Rails MVC) |
| Routing | Custom logic, must be documented | Convention-based (file = route) |
| Data access | Multiple patterns possible | Single pattern enforced (e.g., Rails Active Record) |
| Testing setup | Agent must discover your approach | Framework provides defaults |
| CLAUDE.md size | Large (must document everything) | Smaller (conventions already known) |
Examples of opinionated frameworks:
- Next.js:
app/directory structure, file-based routing, server components conventions - Rails: MVC structure, Active Record patterns, generator conventions
- Phoenix (Elixir): Context boundaries, schema conventions, LiveView patterns
- Django: Apps structure, settings conventions, admin interface patterns
Real-world impact:
When agents work with opinionated frameworks, they:
- Make fewer mistakes (fewer choices = fewer wrong choices)
- Generate boilerplate faster (know the patterns)
- Require less CLAUDE.md documentation (conventions replace custom instructions)
- Produce more consistent code (follow framework idioms)
Trade-offs:
| Benefit | Cost |
|---|---|
| Faster agent onboarding | Less architectural flexibility |
| Smaller CLAUDE.md files | Framework lock-in |
| Fewer hallucinations | Must accept framework opinions |
| Consistent patterns | Learning curve for team |
Connection to CLAUDE.md sizing:
Convention-over-configuration directly reduces CLAUDE.md token requirements:
# Custom Architecture (500+ lines CLAUDE.md)## File Organization- API routes in `src/endpoints/`- Business logic in `src/domain/`- Data access in `src/repositories/`- Validation in `src/validators/`... (extensive documentation of custom patterns)
# Next.js (50 lines CLAUDE.md)## Project ContextWe use Next.js 14 with App Router.... (minimal context, rest is framework conventions)Recommendation: For greenfield projects with AI-assisted development, prefer opinionated frameworks unless architectural constraints require custom design. The reduction in agent cognitive load often outweighs loss of flexibility.
See also: CLAUDE.md sizing guidelines (Section 3.2) for token optimization patterns.
9.18.2 Domain Knowledge Embedding
Section titled “9.18.2 Domain Knowledge Embedding”Problem: Agents lack context about your business domain, design decisions, and project history. They can read code syntax but miss the “why” behind decisions.
Solution: Embed domain knowledge directly in discoverable locations.
CLAUDE.md: Advanced Patterns
Section titled “CLAUDE.md: Advanced Patterns”Beyond basic project setup, use CLAUDE.md to encode deep domain knowledge:
Personas and roles:
## Domain Context
**Product**: SaaS platform for event management (B2B, enterprise clients)**Business model**: Subscription-based, tiered pricing**Core value prop**: Seamless integration with 20+ calendar providers
## Design Principles
1. **Idempotency First**: All API mutations must be idempotent (event industry = duplicate requests common)2. **Eventual Consistency**: Calendar sync uses queue-based reconciliation (not real-time)3. **Graceful Degradation**: If external calendar API fails, store locally + retry (never block user)
## Domain Terms
- **Event**: User-created calendar entry (our domain model)- **Appointment**: External calendar system's term (Google/Outlook)- **Sync Job**: Background process reconciling our DB with external calendars- **Conflict Resolution**: Algorithm handling overlapping events (see `src/services/conflict-resolver.ts`)
## Gotchas
- Google Calendar API has 10 req/sec rate limit per user → batch operations in `syncEvents()`- Outlook timezone handling is non-standard → use `normalizeTimezone()` helper- Event deletion = soft delete (set `deletedAt`) to maintain audit trail for complianceWhy this works: When the agent encounters syncEvents(), it understands the rate limiting constraint. When it sees deletedAt, it knows not to use hard deletes.
See also: CLAUDE.md Best Practices (3.1) for foundational setup.
Code Comments: What vs How
Section titled “Code Comments: What vs How”❌ Don’t write obvious comments:
// Get user by IDfunction getUserById(id: string) { return db.users.findOne({ id });}✅ Do explain the “why” and business context:
// Fetch user with calendar permissions. Returns null if user exists but// lacks calendar access (common after OAuth token expiration).// Callers should handle null by redirecting to re-auth flow.function getUserById(id: string) { return db.users.findOne({ id });}Even better: Add domain knowledge + edge cases:
// Fetch user with calendar permissions for event sync operations.//// Returns null in two cases:// 1. User doesn't exist (rare, DB inconsistency)// 2. User exists but calendar OAuth token expired (common, ~5% of calls)//// Callers MUST handle null by:// - Redirecting to /auth/calendar/reauth (UI flows)// - Logging + skipping sync (background jobs)//// Related: See `refreshCalendarToken()` for automatic token refresh strategy.// Rate limits: Google Calendar = 10 req/sec, Outlook = 20 req/secfunction getUserById(id: string): Promise<User | null> { return db.users.findOne({ id });}What the agent gains:
- Knows null is expected, not an error condition
- Understands business context (OAuth expiration)
- Has concrete recovery strategies
- Can navigate to related code (
refreshCalendarToken) - Knows external API constraints
Architecture Decision Records (ADRs)
Section titled “Architecture Decision Records (ADRs)”Store ADRs in docs/decisions/ and reference from code:
# ADR-007: Event Deletion Strategy
**Status**: Accepted**Date**: 2025-11-15**Authors**: Engineering team
## Context
Event deletion is complex because:1. Legal requirement to retain audit trail (GDPR Article 30)2. External calendar APIs handle deletes differently (Google = permanent, Outlook = recoverable)3. Users expect "undo" within 30-day window
## Decision
Use soft deletes with `deletedAt` timestamp:- Events marked deleted remain in DB for 90 days- UI hides deleted events immediately- Background job purges after 90 days- External calendars notified via webhook (eventual consistency)
## Consequences
**Benefits**:- Compliance with GDPR audit requirements- Consistent "undo" experience regardless of calendar provider- Simpler conflict resolution (deleted events participate in sync)
**Drawbacks**:- DB grows ~10% larger (deleted events retained)- Complex query patterns (always filter `deletedAt IS NULL`)
## Related Code
- `src/models/event.ts` (Event model with deletedAt field)- `src/services/event-deleter.ts` (soft delete logic)- `src/jobs/purge-deleted-events.ts` (90-day cleanup)In code, reference ADRs:
// Soft delete per ADR-007. Never use db.events.delete() due to// compliance requirements (GDPR audit trail).async function deleteEvent(eventId: string) { await db.events.update( { id: eventId }, { deletedAt: new Date() } );}Agent benefit: When agent sees deletedAt, it can read ADR-007 to understand full context and constraints.
9.18.3 Code Discoverability (SEO for Agents)
Section titled “9.18.3 Code Discoverability (SEO for Agents)”Problem: Agents search for code using keyword matching. If your variable is named usr, the agent won’t find it when searching for “user”.
Solution: Treat code discoverability like SEO—use complete terms, synonyms, and tags.
Use Complete Terms, Not Abbreviations
Section titled “Use Complete Terms, Not Abbreviations”❌ Agent-hostile:
function calcEvtDur(evt: Evt): number { const st = evt.stTm; const et = evt.etTm; return et - st;}✅ Agent-friendly:
// Calculate event duration in milliseconds.// Also known as: event length, time span, appointment durationfunction calculateEventDuration(event: Event): number { const startTime = event.startTime; const endTime = event.endTime; return endTime - startTime;}What changed:
calcEvtDur→calculateEventDuration(full term)- Comment includes synonyms (“event length”, “time span”) so agent finds this when searching for those terms
- Type
Evt→Event(no abbreviation)
Add Synonyms in Comments
Section titled “Add Synonyms in Comments”Your domain may use multiple terms for the same concept. Make them all searchable:
// User account record. Also called: member, subscriber, customer, client.// Note: In external calendar APIs, this maps to their "principal" or "identity" concepts.interface User { id: string; email: string; calendarToken: string; // OAuth token for calendar access, aka "access token", "auth credential"}Why this works: When agent searches for “subscriber” or “principal”, it finds this code despite those terms not being in the type name.
Tags and Faceting
Section titled “Tags and Faceting”Use JSDoc-style tags for categorization:
/** * Process incoming webhook from Google Calendar. * * @domain calendar-sync * @external google-calendar-api * @rate-limit 100/min (Google's limit, not ours) * @failure-mode Queues failed webhooks for retry (see retry-queue.ts) * @related syncEvents, refreshCalendarToken */async function handleGoogleWebhook(payload: WebhookPayload) { // implementation}Agent queries enabled:
- “What code touches the google calendar api?” → Finds via
@externaltag - “Which functions have rate limits?” → Finds via
@rate-limittag - “What’s related to syncEvents?” → Finds via
@relatedtag
Directory README Pattern
Section titled “Directory README Pattern”Place a README.md in each major directory explaining its purpose:
src/├── services/│ ├── README.md ← "Service layer: business logic, no HTTP concerns"│ ├── event-service.ts│ └── user-service.ts├── controllers/│ ├── README.md ← "HTTP controllers: request/response handling only"│ ├── event-controller.ts│ └── user-controller.tssrc/services/README.md:
# Services Layer
**Purpose**: Business logic and domain operations. Services are framework-agnostic (no Express/HTTP concerns).
**Conventions**:- One service per domain entity (EventService, UserService)- Services interact with repositories (data layer) and other services- All service methods return domain objects, never HTTP responses- Error handling: Throw domain errors (EventNotFoundError), not HTTP errors
**Dependencies**:- Services may call other services- Services may call repositories (`src/repositories/`)- Services must NOT import from `controllers/` (layering violation)
**Testing**: Unit test services with mocked repositories. See `tests/services/` for examples.
**Related**: See ADR-003 for layered architecture rationale.Agent benefit: When working in services/, agent reads README and understands constraints (no HTTP concerns, layer boundaries).
Example: Before vs After Discoverability
Section titled “Example: Before vs After Discoverability”❌ Before (Agent-hostile):
class UsrMgr { async getUsr(id: string) { return db.query('SELECT * FROM usr WHERE id = ?', [id]); }
async updUsr(id: string, data: any) { return db.query('UPDATE usr SET ? WHERE id = ?', [data, id]); }}Agent challenges:
- Abbreviated names (
UsrMgr,getUsr) → hard to find - No comments → no context
anytype → agent doesn’t know data shape- No domain knowledge → what is “usr”?
✅ After (Agent-friendly):
/** * User account management service. * Also known as: member manager, subscriber service, customer service * * @domain user-management * @layer service * @related user-repository, auth-service */class UserManager { /** * Fetch user account by ID. Returns null if not found. * Also called: get member, fetch subscriber, load customer * * Common use cases: * - Authentication flows (verifying user exists) * - Profile page rendering (loading user details) * - Admin operations (fetching user for support) */ async getUser(userId: string): Promise<User | null> { return db.query('SELECT * FROM users WHERE id = ?', [userId]); }
/** * Update user account fields. Performs partial update (only provided fields). * Also known as: modify user, edit member, change subscriber details * * @param userId - Unique user identifier (UUID v4) * @param updates - Partial user data (email, name, etc.) * @throws {UserNotFoundError} If user doesn't exist * @throws {ValidationError} If updates fail schema validation * * Example: * await userManager.updateUser('user-123', { email: 'new@example.com' }); */ async updateUser(userId: string, updates: Partial<User>): Promise<User> { return db.query('UPDATE users SET ? WHERE id = ?', [updates, userId]); }}Improvements:
- Full names (
UserManager,getUser) - Synonyms in comments (member, subscriber, customer)
- Tags for faceting (
@domain,@layer,@related) - Typed parameters and return values
- Use case examples
- Error documentation
Agent search results:
| Query | Finds Before? | Finds After? |
|---|---|---|
| “user management” | ❌ | ✅ (class comment) |
| “member service” | ❌ | ✅ (synonym) |
| “fetch subscriber” | ❌ | ✅ (synonym) |
| “service layer” | ❌ | ✅ (@layer tag) |
| “authentication” | ❌ | ✅ (use case) |
9.18.4 Documentation Formats for Agents (llms.txt)
Section titled “9.18.4 Documentation Formats for Agents (llms.txt)”Problem: Agents need to discover and consume project documentation efficiently. Traditional documentation (wikis, Confluence) is hard to find and parse. MCP doc servers require installation and configuration.
Solution: Use the llms.txt standard for AI-optimized documentation indexing.
What is llms.txt?
Section titled “What is llms.txt?”llms.txt is a lightweight standard for making documentation discoverable to LLMs. It’s like robots.txt for AI agents—a simple index file that tells agents where to find relevant documentation.
Specification: https://llmstxt.org/
Format: Plain text file at /llms.txt or /machine-readable/llms.txt containing:
- Markdown content directly (inline docs)
- Links to external documentation files
- Structured sections for different topics
Example from this repo (machine-readable/llms.txt):
# Claude Code Ultimate Guide
Complete guide for Anthropic's Claude Code CLI (19,000+ lines, 120 templates)
## Quick Start- Installation: guide/ultimate-guide.md#installation (line 450)- First Session: guide/cheatsheet.md#first-session- CLAUDE.md Setup: guide/ultimate-guide.md#31-claudemd-project-context (line 1850)
## Core Concepts- Agents: guide/ultimate-guide.md#4-agents (line 4100)- Skills: guide/ultimate-guide.md#5-skills (line 5400)- Hooks: guide/ultimate-guide.md#62-hooks (line 7200)
## Templates- Custom agents: examples/agents/- Slash commands: examples/commands/- Event hooks: examples/hooks/Why llms.txt Complements MCP Servers
Section titled “Why llms.txt Complements MCP Servers”llms.txt and MCP doc servers solve different problems:
| Aspect | llms.txt | Context7 MCP |
|---|---|---|
| Purpose | Static documentation index | Runtime library lookup |
| Setup | Zero config (just a file) | Requires MCP server install |
| Content | Project-specific docs | Official library docs |
| Token cost | Low (index only, ~500 tokens) | Medium (full doc fetching) |
| Use case | Project README, architecture | React API, Next.js patterns |
| Update frequency | Manual (on doc changes) | Automatic (tracks library versions) |
Best practice: Use both:
- llms.txt for project-specific documentation (architecture, conventions, getting started)
- Context7 MCP for official library documentation (React hooks, Express API)
Creating llms.txt for Your Project
Section titled “Creating llms.txt for Your Project”Minimal example:
# MyProject
Enterprise SaaS platform for event management
## Getting Started- Setup: docs/setup.md- Architecture: docs/architecture.md- API Reference: docs/api.md
## Development- Testing: docs/testing.md- Deployment: docs/deployment.md- Troubleshooting: docs/troubleshooting.mdAdvanced example with line numbers:
# MyProject
## Architecture Decisions- Why microservices: docs/decisions/ADR-001.md (line 15)- Event-driven design: docs/architecture.md#event-bus (line 230)- Database strategy: docs/decisions/ADR-005.md (line 42)
## Common Patterns- Authentication flow: src/services/auth-service.ts (line 78-125)- Error handling: CLAUDE.md#error-patterns (line 150)- Rate limiting: src/middleware/rate-limiter.ts (line 45)
## Domain Knowledge- Event lifecycle: docs/domain/events.md- Payment processing: docs/domain/payments.md- Webhook handling: docs/domain/webhooks.mdLine numbers help agents jump directly to relevant sections without reading entire files.
When to Update llms.txt
Section titled “When to Update llms.txt”Update llms.txt when:
- Adding new major documentation files
- Restructuring docs directory
- Documenting new architectural patterns
- Adding ADRs (Architecture Decision Records)
- Creating domain-specific guides
Don’t update for:
- Code changes (unless architecture shifts)
- Minor doc tweaks
- Dependency updates
Integration with CLAUDE.md
Section titled “Integration with CLAUDE.md”llms.txt and CLAUDE.md serve different purposes:
| File | Purpose | Audience |
|---|---|---|
| CLAUDE.md | Active instructions, project context | Claude during this session |
| llms.txt | Documentation index | Claude discovering resources |
Pattern: Reference llms.txt from CLAUDE.md:
## Project Documentation
Complete documentation is indexed in `machine-readable/llms.txt`.
Key resources:- Architecture overview: docs/architecture.md- API reference: docs/api.md- Testing guide: docs/testing.md
For domain-specific knowledge, consult llms.txt index.Real-World Example: This Guide
Section titled “Real-World Example: This Guide”This guide uses both llms.txt and CLAUDE.md:
llms.txt (machine-readable/llms.txt):
- Indexes all major sections with line numbers
- Points to templates in
examples/ - References workflows in
guide/workflows/
CLAUDE.md (CLAUDE.md):
- Active project context (repo structure, conventions)
- Current focus (guide version, changelog)
- Working instructions (version sync, landing sync)
Result: Agents can discover content via llms.txt, then consult CLAUDE.md for active context.
Real-World: Anthropic’s Official llms.txt
Section titled “Real-World: Anthropic’s Official llms.txt”Anthropic publie deux variantes LLM-optimized pour Claude Code :
| Fichier | URL | Taille | Tokens (approx) | Use case |
|---|---|---|---|---|
llms.txt | code.claude.com/docs/llms.txt | ~65 pages | ~15-20K | Index rapide, découverte de sections |
llms-full.txt | code.claude.com/docs/llms-full.txt | ~98 KB | ~25-30K | Fact-checking, doc complète, source de vérité |
Pattern recommandé : fetch llms.txt d’abord pour identifier la section pertinente, puis fetch la page spécifique (ou llms-full.txt) pour les détails. Évite de charger 98 KB quand seules 2 pages sont nécessaires.
Ces URLs sont la source officielle à consulter en priorité quand un claim sur Claude Code semble incertain ou potentiellement obsolète.
Specification Resources
Section titled “Specification Resources”- Official spec: https://llmstxt.org/
- Community examples: https://github.com/topics/llms-txt
- This guide’s implementation:
machine-readable/llms.txt
Not recommended source: Framework-specific blog posts (often present llms.txt in opposition to MCP servers, when they’re complementary).
9.18.5 Token-Efficient Codebase
Section titled “9.18.5 Token-Efficient Codebase”Problem: Agents have token limits. Large files consume context budget quickly, forcing agents to read in chunks and lose coherence.
Solution: Structure code to minimize token usage while maximizing agent comprehension.
Split Large Files (Agents Read in Chunks)
Section titled “Split Large Files (Agents Read in Chunks)”Guideline: Keep files under 500 lines. Agents typically read 200-300 lines at a time (depending on model context).
❌ Monolithic file (1200 lines):
src/services/event-service.ts✅ Split by concern:
src/services/event/├── event-service.ts (200 lines: public API + orchestration)├── event-validator.ts (150 lines: validation logic)├── event-calendar-sync.ts (300 lines: external calendar sync)├── event-conflict-resolver.ts (250 lines: overlap detection)└── README.md (explains module structure)Why this works:
- Agent can load just what it needs (
event-validator.tsfor validation work) - Each file has clear responsibility
- Easier to navigate via imports
When to split:
- File >500 lines and growing
- File has multiple unrelated concerns (validation + sync + conflict resolution)
- Agent frequently reads only part of the file
When NOT to split:
- File is cohesive (one class with related methods)
- Splitting would create artificial boundaries
- File size <300 lines
See also: Context Management (2.1) for token optimization strategies.
Remove Obvious Comments (Reduce Noise)
Section titled “Remove Obvious Comments (Reduce Noise)”❌ Wasteful tokens:
// Import Reactimport React from 'react';
// Import useState hookimport { useState } from 'react';
// Define Props interfaceinterface Props { // User name name: string; // User age age: number;}
// User componentfunction User(props: Props) { // Render user info return <div>{props.name}</div>;}✅ Remove noise, keep value:
import React, { useState } from 'react';
interface Props { name: string; age: number;}
// Displays user name. Age is required for future age-gating feature (see ADR-012).function User(props: Props) { return <div>{props.name}</div>;}Savings: Reduced from ~150 tokens to ~80 tokens (47% reduction) without losing critical info.
Keep comments that provide:
- Business context (“age for future age-gating”)
- Non-obvious decisions (“why age is required now but unused”)
- References (ADR-012)
Remove comments that are:
- Obvious from code (“Import React”)
- Redundant with types (“User name” when field is
name: string)
Verbose Flags for Debug Output
Section titled “Verbose Flags for Debug Output”Problem: Debug logging consumes tokens but is sometimes necessary.
Solution: Use verbose flags to conditionally include detailed output.
export const DEBUG = process.env.DEBUG === 'true';
// event-service.tsclass EventService { async syncEvent(eventId: string) { if (DEBUG) { console.log(`[EventService.syncEvent] Starting sync for event ${eventId}`); console.log(`[EventService.syncEvent] Fetching external calendar data`); }
const event = await this.getEvent(eventId);
if (DEBUG) { console.log(`[EventService.syncEvent] Event data:`, event); }
// sync logic }}CLAUDE.md configuration:
## Debug Mode
To enable verbose logging:
\`\`\`bashDEBUG=true npm run dev\`\`\`
This adds detailed logs to help trace execution flow. Disable in production (default).Agent behavior:
- In normal mode: Reads clean code without log noise
- In debug mode: Sees detailed execution trace when troubleshooting
Alternative: Use logger with levels:
import { logger } from './logger';
class EventService { async syncEvent(eventId: string) { logger.debug(`Starting sync for event ${eventId}`); const event = await this.getEvent(eventId); logger.debug(`Event data:`, event); // sync logic }}Configure logger in CLAUDE.md:
## Logging
- `logger.debug()`: Verbose details (disabled in production)- `logger.info()`: Important milestones (always enabled)- `logger.warn()`: Recoverable issues- `logger.error()`: Failures requiring attention9.18.6 Testing for Autonomy
Section titled “9.18.6 Testing for Autonomy”Problem: Agents follow tests more reliably than documentation. Incomplete tests lead to incorrect implementations.
Solution: Use Test-Driven Development (TDD) with manually-written tests. Tests become the specification.
Why TDD is More Critical for Agents
Section titled “Why TDD is More Critical for Agents”Humans: Can infer intent from vague requirements and course-correct during implementation.
Agents: Implement exactly what tests specify. Missing test = missing feature.
Example: Human vs Agent Behavior
Requirement: “Add email validation to signup form”
Human developer:
- Infers “validation” includes format check AND duplicate check
- Adds both even if tests only cover format
- Asks clarifying questions if uncertain
Agent:
- Implements only what tests specify
- If tests only cover format → agent only implements format
- If tests don’t cover edge cases → agent doesn’t handle them
Lesson: For agents, tests ARE the spec. Write comprehensive tests manually.
Tests Written Manually, Not Delegated
Section titled “Tests Written Manually, Not Delegated”❌ Don’t ask the agent to write tests:
User: "Implement email validation and write tests for it"Why this fails:
- Agent may write incomplete tests (missing edge cases)
- Agent tests match its implementation (circular validation)
- No independent verification
✅ Do write tests first yourself:
describe('Email validation', () => { it('accepts valid email formats', () => { expect(validateEmail('user@example.com')).toBe(true); expect(validateEmail('user+tag@example.co.uk')).toBe(true); });
it('rejects invalid formats', () => { expect(validateEmail('invalid')).toBe(false); expect(validateEmail('user@')).toBe(false); expect(validateEmail('@example.com')).toBe(false); });
it('rejects disposable email domains', () => { // Business requirement: Block temporary email services expect(validateEmail('user@tempmail.com')).toBe(false); expect(validateEmail('user@10minutemail.com')).toBe(false); });
it('handles international characters', () => { // Business requirement: Support international domains expect(validateEmail('user@münchen.de')).toBe(true); });
it('checks for duplicate emails in database', async () => { // Business requirement: Email must be unique await db.users.create({ email: 'existing@example.com' }); await expect(validateEmail('existing@example.com')).rejects.toThrow('Email already registered'); });});Then give agent the tests:
User: "Implement the email validation function to pass all tests in tests/validation/email.test.ts. Requirements:- Use validator.js for format checking- Disposable domain list at src/data/disposable-domains.json- Database check via userRepository.findByEmail()"Agent outcome: Implements exactly what tests specify, including:
- Format validation
- Disposable domain blocking
- International character support
- Duplicate database check
Without manual tests: Agent might skip disposable domain blocking (not obvious from “email validation”) or miss international character support.
TDD Workflow for Agents
Section titled “TDD Workflow for Agents”Step 1: Write failing test (you, the human)
describe('EventService.createEvent', () => { it('prevents double-booking for same user + time', async () => { const userId = 'user-123'; await eventService.createEvent({ userId, startTime: '2026-01-21T10:00:00Z', endTime: '2026-01-21T11:00:00Z' });
// Attempt overlapping event await expect( eventService.createEvent({ userId, startTime: '2026-01-21T10:30:00Z', // overlaps by 30 min endTime: '2026-01-21T11:30:00Z' }) ).rejects.toThrow('Scheduling conflict detected'); });});Step 2: Give agent the test with implementation constraints
User: "Implement EventService.createEvent() to pass the double-booking test. Requirements:- Check for conflicts using conflictResolver.detectOverlap()- Throw SchedulingConflictError with list of conflicting event IDs- See ADR-009 for conflict resolution algorithm"Step 3: Agent implements to pass the test
Step 4: Verify with test run
npm test tests/services/event-service.test.tsStep 5: Iterate if test fails (agent fixes implementation)
Cross-reference: TDD Methodology (9.14) for full TDD workflow patterns.
Browser Automation for Validation
Section titled “Browser Automation for Validation”For UI features, use browser automation to validate agent output:
import { test, expect } from '@playwright/test';
test('signup form validates email', async ({ page }) => { await page.goto('/signup');
// Test invalid format await page.fill('[name="email"]', 'invalid-email'); await page.click('button[type="submit"]'); await expect(page.locator('.error')).toHaveText('Invalid email format');
// Test disposable domain await page.fill('[name="email"]', 'user@tempmail.com'); await page.click('button[type="submit"]'); await expect(page.locator('.error')).toHaveText('Temporary email addresses not allowed');
// Test valid email await page.fill('[name="email"]', 'user@example.com'); await page.click('button[type="submit"]'); await expect(page.locator('.error')).not.toBeVisible();});Why browser tests matter for agents:
- Validates actual user experience (not just unit logic)
- Catches CSS/accessibility issues agents might miss
- Provides visual proof of correctness
Give agent the E2E test:
User: "Implement signup form email validation to pass tests/e2e/signup-form.spec.ts. Use React Hook Form + Zod schema."Agent knows:
- Error messages must match test expectations
- Error display must use
.errorclass - Form must prevent submission on invalid input
Test Coverage as Guardrail
Section titled “Test Coverage as Guardrail”Post-implementation check:
npm test -- --coverageCoverage thresholds in CI:
{ "jest": { "coverageThreshold": { "global": { "statements": 80, "branches": 80, "functions": 80, "lines": 80 } } }}CLAUDE.md instruction:
## Testing Requirements
All features must have:- Unit tests (>80% coverage)- Integration tests for API endpoints- E2E tests for user-facing features
Run before committing:\`\`\`bashnpm test -- --coverage\`\`\`
CI will reject PRs below 80% coverage.9.18.7 Conventions & Patterns
Section titled “9.18.7 Conventions & Patterns”Problem: Agents hallucinate less when using familiar patterns from their training data.
Solution: Use well-known design patterns and mainstream technologies. Document custom patterns explicitly.
Design Patterns Agents Know
Section titled “Design Patterns Agents Know”Agents are trained on massive codebases using standard design patterns. Leverage this:
✅ Use standard patterns:
// Singleton pattern (widely known)class DatabaseConnection { private static instance: DatabaseConnection;
private constructor() { /* ... */ }
public static getInstance(): DatabaseConnection { if (!DatabaseConnection.instance) { DatabaseConnection.instance = new DatabaseConnection(); } return DatabaseConnection.instance; }}Agent recognizes: “This is Singleton pattern” → understands getInstance() returns same instance.
❌ Custom pattern without documentation:
// Undocumented custom patternclass DatabaseConnection { private static conn: DatabaseConnection;
static make() { return this.conn ?? (this.conn = new DatabaseConnection()); }}Agent confusion: “What’s make()? Is it factory? Builder? Why conn instead of instance?”
If you must use custom patterns, document heavily:
/** * Database connection using Lazy Singleton pattern. * * Pattern: Singleton with lazy initialization (no eager instantiation). * Why custom naming: "make()" aligns with our framework's naming convention (Laravel-inspired). * Standard Singleton uses "getInstance()" but we use "make()" for consistency across all singletons. * * Related: See ADR-004 for singleton usage policy. */class DatabaseConnection { private static conn: DatabaseConnection;
static make() { return this.conn ?? (this.conn = new DatabaseConnection()); }}The “Boring Tech” Advantage
Section titled “The “Boring Tech” Advantage”Principle: Popular frameworks and libraries have more training data → agents perform better.
Framework training data volume (approximate):
| Framework/Library | GitHub repos | Agent performance |
|---|---|---|
| React | 10M+ | Excellent |
| Express | 5M+ | Excellent |
| Vue | 3M+ | Good |
| Angular | 2M+ | Good |
| Svelte | 500K | Fair |
| Custom framework | <1K | Poor |
Recommendation: Use mainstream tech unless you have strong reasons otherwise.
Example: React vs Custom Framework
React (agent-friendly):
// Agent knows React patterns from training datafunction UserProfile({ userId }: { userId: string }) { const [user, setUser] = useState<User | null>(null);
useEffect(() => { fetchUser(userId).then(setUser); }, [userId]);
if (!user) return <div>Loading...</div>; return <div>{user.name}</div>;}Custom framework (agent-hostile without docs):
// Agent has no training data for "Fluxor" framework@Component({ state: ['user'], effects: ['loadUser']})class UserProfile { onMount() { this.loadUser(this.props.userId); }
render() { return this.state.user ? `<div>${this.state.user.name}</div>` : '<div>Loading...</div>'; }}Without Fluxor documentation: Agent doesn’t know @Component decorator, state, effects, or lifecycle hooks.
With Fluxor documentation:
# Fluxor Framework
## Component Lifecycle
Fluxor components use decorators (similar to Angular):
- `@Component({ state, effects })` - Define component with reactive state- `onMount()` - Equivalent to React's `useEffect` with empty deps- `render()` - Returns HTML string (not JSX)
## State Management
- `this.state.user` - Access reactive state (equivalent to React `useState`)- `this.loadUser()` - Dispatch effect (equivalent to Redux action)
## Example
\`\`\`typescript@Component({ state: ['user'] })class UserProfile { onMount() { // Runs once on component mount (like React useEffect) this.loadUser(this.props.userId); }
render() { // Reactive: re-runs when this.state.user changes return this.state.user ? `<div>${this.state.user.name}</div>` : '<div>Loading...</div>'; }}\`\`\`Agent with docs: Understands Fluxor by mapping to familiar React concepts.
Document Architectural Decisions (ADRs)
Section titled “Document Architectural Decisions (ADRs)”Problem: Custom architectures lack training data.
Solution: Document decisions in Architecture Decision Records.
ADR example:
# ADR-011: Service Layer Architecture
**Status**: Accepted**Date**: 2025-12-10
## Context
We need clear separation between HTTP handling and business logic.
## Decision
Adopt 3-layer architecture:
1. **Controllers** (`src/controllers/`): HTTP request/response, no business logic2. **Services** (`src/services/`): Business logic, framework-agnostic3. **Repositories** (`src/repositories/`): Data access, abstracts database
**Rules**:- Controllers call services, never repositories directly- Services call repositories, never touch HTTP (no `req`, `res` objects)- Repositories encapsulate all database queries
**Similar to**: NestJS architecture, Spring Boot layers, Clean Architecture use cases
## Example
\`\`\`typescript// ✅ Correct: Controller → Service → Repository// src/controllers/user-controller.tsclass UserController { async getUser(req: Request, res: Response) { const user = await userService.getUser(req.params.id); // Calls service res.json(user); }}
// src/services/user-service.tsclass UserService { async getUser(userId: string) { return userRepository.findById(userId); // Calls repository }}
// src/repositories/user-repository.tsclass UserRepository { async findById(userId: string) { return db.query('SELECT * FROM users WHERE id = ?', [userId]); }}\`\`\`
\`\`\`typescript// ❌ Incorrect: Controller calls repository directlyclass UserController { async getUser(req: Request, res: Response) { const user = await userRepository.findById(req.params.id); // Layering violation! res.json(user); }}\`\`\`Agent benefit: When working in controllers, agent reads ADR-011 and knows to call services (not repositories).
9.18.8 Guardrails & Validation
Section titled “9.18.8 Guardrails & Validation”Problem: Agents make mistakes—hallucinations, incorrect assumptions, security oversights.
Solution: Multi-layer guardrails to catch errors before they reach production.
Hooks as Anti-Pattern Validators
Section titled “Hooks as Anti-Pattern Validators”Beyond secrets: Use hooks to enforce codebase conventions.
Example: Prevent layering violations:
#!/bin/bashINPUT=$(cat)TOOL_NAME=$(echo "$INPUT" | jq -r '.tool.name')
if [[ "$TOOL_NAME" == "Edit" ]] || [[ "$TOOL_NAME" == "Write" ]]; then FILE_PATH=$(echo "$INPUT" | jq -r '.tool.input.file_path')
# Block controllers calling repositories directly (layering violation) if [[ "$FILE_PATH" == *"/controllers/"* ]]; then CONTENT=$(echo "$INPUT" | jq -r '.tool.input.new_string // .tool.input.content')
if echo "$CONTENT" | grep -q "Repository\\."; then echo "❌ Layering violation: Controllers must call Services, not Repositories directly" >&2 echo "See ADR-011 for architecture rules" >&2 exit 2 # Block fi fifi
exit 0 # AllowCatches:
// ❌ This edit will be BLOCKED by hookclass UserController { async getUser(req: Request, res: Response) { const user = await userRepository.findById(req.params.id); // BLOCKED! }}Agent sees: ”❌ Layering violation: Controllers must call Services…” → revises to call service.
See: Hooks (6.2) for comprehensive hook examples.
”Tainted Code” Philosophy
Section titled “”Tainted Code” Philosophy”Principle: Treat all agent-generated code as “tainted” until validated by CI.
CI checks:
name: Agent Code Validation
on: [pull_request]
jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3
- name: Run linter run: npm run lint
- name: Run type checker run: npm run type-check
- name: Run tests run: npm test -- --coverage
- name: Check test coverage run: | COVERAGE=$(npm test -- --coverage --json | jq '.coverage') if (( $(echo "$COVERAGE < 80" | bc -l) )); then echo "Coverage below 80%: $COVERAGE" exit 1 fi
- name: Check for TODO comments run: | if grep -r "TODO" src/; then echo "TODO comments found. Agent must implement fully, no placeholders." exit 1 fi
- name: Architecture compliance run: | # Check for layering violations if grep -r "Repository" src/controllers/; then echo "Controllers calling repositories directly (ADR-011 violation)" exit 1 fiWhat CI catches:
- Syntax errors (linting)
- Type mismatches (type checking)
- Broken logic (tests)
- Incomplete implementations (TODO comments)
- Architecture violations (custom checks)
CLAUDE.md instruction:
## CI/CD Validation
All PRs run automated validation:- Linting (ESLint)- Type checking (TypeScript)- Unit tests (Jest, >80% coverage)- Architecture compliance (layering rules)
Agents must pass CI before PR approval. Never disable CI checks.PR Reviews: Human-in-the-Loop
Section titled “PR Reviews: Human-in-the-Loop”Even with CI, require human review:
name: PR Rules
on: [pull_request]
jobs: require-review: runs-on: ubuntu-latest steps: - name: Check for approval run: | APPROVALS=$(gh pr view ${{ github.event.pull_request.number }} --json reviews --jq '.reviews | length') if [ "$APPROVALS" -lt 1 ]; then echo "PR requires at least 1 human review" exit 1 fiWhy human review matters:
- Agents miss context (business requirements not in code)
- Agents may implement correct code for wrong problem
- Security vulnerabilities AI doesn’t recognize (novel attack vectors)
Review checklist for agent PRs:
## Agent PR Review Checklist
- [ ] **Intent**: Does the code solve the actual problem (not just pass tests)?- [ ] **Edge cases**: Are unusual inputs handled (null, empty, negative, extreme values)?- [ ] **Security**: Any potential injection, XSS, or authorization bypasses?- [ ] **Performance**: Will this scale (N+1 queries, memory leaks, inefficient algorithms)?- [ ] **Maintainability**: Is code readable and well-documented for future humans?- [ ] **Tests**: Do tests cover meaningful scenarios (not just happy path)?See also: CI/CD Integration (9.3) for complete CI setup patterns.
Validation Layers Summary
Section titled “Validation Layers Summary”| Layer | Catches | Speed | Automation |
|---|---|---|---|
| Hooks | Pre-execution (secrets, anti-patterns) | Instant | 100% |
| Linter | Syntax, style violations | <10s | 100% |
| Type checker | Type mismatches | <30s | 100% |
| Tests | Logic errors, broken functionality | <2min | 100% |
| CI checks | Coverage, TODOs, architecture | <5min | 100% |
| Human review | Intent, security, context | Hours | Manual |
Defense in depth: Each layer catches different error classes. All layers together minimize risk.
9.18.9 Serendipity & Cross-References
Section titled “9.18.9 Serendipity & Cross-References”Problem: Agents work on isolated files and miss related code elsewhere in the codebase.
Solution: Add cross-references so agents discover related modules.
Module Cross-References
Section titled “Module Cross-References”In each module, reference related code:
/** * Event management service. * * Related modules: * - src/services/calendar-sync-service.ts (external calendar integration) * - src/services/conflict-resolver.ts (overlap detection) * - src/repositories/event-repository.ts (data access) * - src/jobs/reminder-sender.ts (sends event reminders via queue) * * See also: ADR-007 (event deletion strategy), ADR-009 (conflict resolution) */class EventService { // implementation}Agent behavior:
- Working on event service → reads cross-references
- Discovers
conflict-resolver.tsexists → uses it instead of re-implementing - Knows to check ADRs for business logic context
Pattern: “See also” chains:
/** * Syncs events with external calendar providers (Google, Outlook). * * Related: * - src/services/event-service.ts (main event operations) * - src/integrations/google-calendar.ts (Google Calendar API client) * - src/integrations/outlook-calendar.ts (Outlook API client) */
// src/integrations/google-calendar.ts/** * Google Calendar API integration. * * Related: * - src/services/calendar-sync-service.ts (orchestrates sync) * - src/models/calendar-event.ts (domain model) * * Rate limits: 10 req/sec per user (enforced in sync service) * See ADR-014 for rate limiting strategy. */Result: Agent navigates from event-service → calendar-sync → google-calendar → understands full flow.
Self-Documenting Commands (—help)
Section titled “Self-Documenting Commands (—help)”CLI tools should explain themselves:
#!/usr/bin/env node/** * CLI tool to manually trigger calendar sync for a user. * * Usage: * npm run sync-calendars -- --user-id=USER_ID [--provider=google|outlook] * * Examples: * npm run sync-calendars -- --user-id=user-123 * npm run sync-calendars -- --user-id=user-123 --provider=google * * What it does: * 1. Fetches user calendar credentials from database * 2. Connects to external calendar API (Google or Outlook) * 3. Syncs events bidirectionally (our DB ↔ external calendar) * 4. Logs sync results (events added/updated/deleted) * * Related: * - src/services/calendar-sync-service.ts (sync logic) * - docs/runbooks/calendar-sync-troubleshooting.md (debugging guide) */
if (process.argv.includes('--help')) { console.log(`Calendar Sync CLI
Usage: npm run sync-calendars -- --user-id=USER_ID [--provider=google|outlook]
Options: --user-id Required. User ID to sync calendars for --provider Optional. Specific provider to sync (google or outlook). Default: all providers
Examples: npm run sync-calendars -- --user-id=user-123 npm run sync-calendars -- --user-id=user-123 --provider=google
See: docs/runbooks/calendar-sync-troubleshooting.md `); process.exit(0);}
// CLI implementationAgent discovers:
- Reads
--helpoutput to understand CLI usage - Finds related code (
calendar-sync-service.ts) - Knows where to look for troubleshooting (runbook)
Embedded Technical Docs
Section titled “Embedded Technical Docs”Instead of separate wiki, embed docs near code:
src/integrations/google-calendar/├── google-calendar.ts├── google-calendar.test.ts├── README.md ← "How to use Google Calendar integration"├── RATE_LIMITS.md ← "Google Calendar API rate limits + handling"└── TROUBLESHOOTING.md ← "Common errors + solutions"README.md:
# Google Calendar Integration
API client for Google Calendar API v3.
## Usage
\`\`\`typescriptimport { GoogleCalendarClient } from './google-calendar';
const client = new GoogleCalendarClient(userCredentials);const events = await client.listEvents(startDate, endDate);\`\`\`
## Authentication
Uses OAuth 2.0 tokens stored in `users.calendar_token` field. If token expired, throws `TokenExpiredError` (caller should redirect to re-auth).
## Rate Limits
Google enforces 10 requests/second per user. Client automatically throttles using rate-limiter-flexible library. See RATE_LIMITS.md for details.
## Error Handling
Common errors:- `TokenExpiredError`: Token expired, re-auth needed- `RateLimitError`: Exceeded Google's rate limit (rare, automatic retry)- `CalendarNotFoundError`: User hasn't granted calendar permission
See TROUBLESHOOTING.md for full error catalog + solutions.Agent workflow:
- Agent needs to integrate Google Calendar
- Reads
google-calendar.ts→ seesREADME.mdreference - Reads README → understands usage, auth, rate limits
- Encounters error → reads TROUBLESHOOTING.md
- Implements correctly without hallucinating
Contrast with wiki:
- Wiki: Agent doesn’t know wiki exists or where to look
- Embedded docs: Agent finds docs naturally via file system
9.18.10 Usage Instructions
Section titled “9.18.10 Usage Instructions”Problem: Agents guess API usage patterns and often guess wrong (argument order, error handling, return types).
Solution: Provide explicit usage examples in doc blocks.
Doc Blocks with Examples
Section titled “Doc Blocks with Examples”❌ Minimal docs (agent guesses):
// Validate email addressfunction validateEmail(email: string): boolean { // implementation}Agent must guess:
- What does “validate” mean? Format only? Uniqueness check?
- What about
nullor empty string? - Are there side effects (database lookups)?
✅ Comprehensive docs with examples:
/** * Validate email address format and uniqueness. * * Checks: * 1. Valid email format (RFC 5322 compliant) * 2. Not a disposable email domain (e.g., tempmail.com) * 3. Not already registered in database * * @param email - Email address to validate (trimmed automatically) * @returns Promise resolving to true if valid, throws error otherwise * @throws {ValidationError} If format invalid or disposable domain * @throws {DuplicateEmailError} If email already registered * * @example * // Valid email * await validateEmail('user@example.com'); // Returns true * * @example * // Invalid format * await validateEmail('invalid-email'); * // Throws ValidationError: "Invalid email format" * * @example * // Disposable domain * await validateEmail('user@tempmail.com'); * // Throws ValidationError: "Disposable email addresses not allowed" * * @example * // Duplicate email * await validateEmail('existing@example.com'); * // Throws DuplicateEmailError: "Email already registered" * * @example * // Null handling * await validateEmail(null); * // Throws ValidationError: "Email is required" */async function validateEmail(email: string | null): Promise<boolean> { // implementation}Agent now knows:
- Function is async (returns Promise)
- Throws errors (doesn’t return false)
- Handles null input
- Trims whitespace automatically
- Checks format, disposable domains, AND uniqueness
Agent can implement correctly:
// In signup form handlertry { await validateEmail(formData.email); // Proceed with signup} catch (error) { if (error instanceof DuplicateEmailError) { showError('This email is already registered. Try logging in instead.'); } else if (error instanceof ValidationError) { showError(error.message); // "Invalid email format" or "Disposable email not allowed" }}Context7 MCP for Official Docs
Section titled “Context7 MCP for Official Docs”Problem: Agents may use outdated API patterns from training data.
Solution: Use Context7 MCP to fetch current documentation.
CLAUDE.md configuration:
## External Dependencies
### Google Calendar API
**Version**: v3 (current as of 2026-01-21)**Docs**: Use Context7 MCP to fetch latest: "google calendar api v3 nodejs"
**Key methods**:- `calendar.events.list()` - List events- `calendar.events.insert()` - Create event- `calendar.events.update()` - Update event- `calendar.events.delete()` - Delete event
**Rate limits**: 10 req/sec per user (enforced by our client)
### Why Context7
Agent's training data may be outdated (pre-2025). Use Context7 to fetch current docs at implementation time.
Agent instruction: "When implementing Google Calendar integration, use Context7 MCP to fetch latest API docs."Agent behavior:
- Reads CLAUDE.md → sees Context7 instruction
- Uses Context7 MCP → fetches current docs
- Implements with correct API (not outdated training data)
See: Context7 MCP (5.3) for setup.
Sensible Defaults
Section titled “Sensible Defaults”Design APIs to work with minimal configuration:
❌ Requires all parameters:
const client = new GoogleCalendarClient({ credentials: userCredentials, rateLimit: 10, rateLimitWindow: 1000, retryAttempts: 3, retryDelay: 1000, timeout: 30000, userAgent: 'MyApp/1.0'});✅ Sensible defaults:
// Minimal usage (defaults applied)const client = new GoogleCalendarClient(userCredentials);
// Override defaults if neededconst client = new GoogleCalendarClient(userCredentials, { timeout: 60000 // Only override timeout, other defaults remain});Implementation with defaults:
interface GoogleCalendarOptions { rateLimit?: number; // Default: 10 req/sec retryAttempts?: number; // Default: 3 retryDelay?: number; // Default: 1000ms timeout?: number; // Default: 30000ms}
class GoogleCalendarClient { private options: Required<GoogleCalendarOptions>;
constructor( private credentials: Credentials, options: GoogleCalendarOptions = {} ) { // Apply defaults this.options = { rateLimit: options.rateLimit ?? 10, retryAttempts: options.retryAttempts ?? 3, retryDelay: options.retryDelay ?? 1000, timeout: options.timeout ?? 30000 }; }}Agent benefit: Can use API immediately without researching all options.
Document defaults in code:
/** * Google Calendar API client with automatic rate limiting and retries. * * Default configuration: * - Rate limit: 10 requests/second (Google's limit) * - Retry attempts: 3 (exponential backoff) * - Timeout: 30 seconds * * @example * // Use defaults * const client = new GoogleCalendarClient(credentials); * * @example * // Override specific options * const client = new GoogleCalendarClient(credentials, { * timeout: 60000 // 60 second timeout for slow connections * }); */9.18.11 Decision Matrix & Implementation Checklist
Section titled “9.18.11 Decision Matrix & Implementation Checklist”When to Optimize for Agents vs Humans
Section titled “When to Optimize for Agents vs Humans”Not all code needs agent optimization. Use this decision matrix:
| Factor | Optimize for Agents | Optimize for Humans |
|---|---|---|
| Code churn | High (>5 edits/month) | Low (<2 edits/month) |
| Team usage | >50% commits by agents | <30% commits by agents |
| Complexity | Business logic, APIs | Infrastructure, DevOps |
| Project phase | Greenfield, active development | Stable, maintenance mode |
| File size | >500 lines | <300 lines |
| Team size | >5 developers | Solo or pair |
✅ High ROI for agent optimization:
- Core business logic files (e.g.,
order-service.ts,payment-processor.ts) - Frequently modified features (e.g., UI components, API routes)
- Complex domains requiring context (e.g., healthcare, finance, legal)
- Greenfield projects (design agent-friendly from start)
❌ Low ROI for agent optimization:
- Stable infrastructure code (rarely modified)
- Small utility functions (<50 lines, self-evident)
- DevOps scripts (agents rarely touch these)
- Legacy code in maintenance mode (refactoring cost > benefit)
Agent-Friendly Codebase Checklist
Section titled “Agent-Friendly Codebase Checklist”Use this checklist to assess your codebase’s agent-friendliness:
Domain Knowledge (Score: ___ / 5)
- CLAUDE.md exists with business context, design principles, domain terms
- Architecture Decision Records (ADRs) document key decisions
- Code comments explain “why” (not just “what”)
- Cross-references link related modules
- Directory READMEs explain module purpose
Discoverability (Score: ___ / 6)
- Files use complete terms (not abbreviations:
usernotusr) - Comments include synonyms (e.g., “member, subscriber, customer”)
- Functions have JSDoc tags (
@domain,@related,@external) - README files in major directories
- CLI tools have
--helpwith examples - Embedded docs near code (not separate wiki)
Token Efficiency (Score: ___ / 4)
- Files under 500 lines (split larger files by concern)
- Obvious comments removed (keep only valuable context)
- Debug output controlled by verbose flags
- Large generated files excluded via
.claudeignore
Testing (Score: ___ / 5)
- Tests written manually (not delegated to agent)
- TDD workflow for new features (test first, implement second)
- E2E tests for UI features (Playwright or similar)
- Test coverage >80% enforced in CI
- Tests cover edge cases (not just happy path)
Conventions (Score: ___ / 4)
- Standard design patterns used (Singleton, Factory, Repository, etc.)
- Mainstream frameworks (React, Express, etc.) preferred over custom
- ADRs document custom patterns
- “See also” comments reference similar patterns
Guardrails (Score: ___ / 5)
- Hooks validate code at pre-execution (layering, secrets, conventions)
- CI enforces linting, type checking, tests
- Test coverage thresholds in CI (e.g., 80%)
- Architecture compliance checks (layering violations, etc.)
- Human PR review required before merge
Usage Instructions (Score: ___ / 4)
- Functions have doc blocks with
@exampleusage - Error conditions documented (
@throws) - APIs have sensible defaults (minimal config required)
- Context7 MCP used for fetching current docs
Total Score: ___ / 33
Scoring:
- 25-33: Excellent agent-friendliness
- 18-24: Good, some improvements possible
- 10-17: Fair, significant gaps exist
- <10: Poor, major refactoring needed
Quick Wins (Immediate Impact)
Section titled “Quick Wins (Immediate Impact)”Start with these high-impact, low-effort improvements:
1. Add CLAUDE.md (30 minutes)
# Project Context
**Tech stack**: React, Express, PostgreSQL**Architecture**: 3-layer (controllers, services, repositories)**Conventions**: ESLint + Prettier, 80% test coverage required
## Key Files
- `src/services/` - Business logic (framework-agnostic)- `src/controllers/` - HTTP handlers (thin layer)- `src/repositories/` - Database access
See ADR-011 for layering rules.2. Add directory READMEs (15 minutes per directory)
# Services Layer
Business logic and domain operations. Services are framework-agnostic.
**Rules**:- Call repositories for data access- Never import from controllers (layering violation)- Return domain objects (not HTTP responses)3. Add cross-references to hot files (10 minutes per file)
/** * Event service - core business logic for event management. * * Related: * - src/services/calendar-sync-service.ts (external calendar sync) * - src/repositories/event-repository.ts (data access) * * See ADR-007 for event deletion strategy. */4. Split one large file (30 minutes)
- Find file >500 lines
- Split by concern (e.g., validation, sync, conflict resolution)
- Add README in new directory
5. Enable test coverage in CI (15 minutes)
- name: Run tests with coverage run: npm test -- --coverage
- name: Check coverage threshold run: | COVERAGE=$(npm test -- --coverage --json | jq '.coverage') if (( $(echo "$COVERAGE < 80" | bc -l) )); then exit 1 fiTotal time: ~2 hours for foundational improvements.
Resources
Section titled “Resources”Primary source:
- Agent Experience Best Practices by François Zaninotto (Marmelab)
Related frameworks:
- Netlify AX (Agent Experience) Research (2025)
- Speakeasy API Developer Experience Guide (includes agent-friendly patterns)
Academic research:
- “Context Engineering for AI Agents” (ArXiv, June 2025)
- “Agent-Oriented Software Engineering” (ArXiv, March 2025)
- “Prompt Injection Prevention in Code Agents” (ArXiv, November 2024)
Cross-references in this guide:
- CLAUDE.md patterns (3.1)
- Hooks (6.2)
- CI/CD Integration (9.3)
- Pitfalls (9.11)
- Methodologies - TDD (9.14)
9.19 Permutation Frameworks
Section titled “9.19 Permutation Frameworks”Reading time: 10 minutes Skill level: Month 1+
The Problem: Single-Approach Thinking
Section titled “The Problem: Single-Approach Thinking”Most developers pick one approach and stick with it. But Claude Code’s tooling supports systematic variation—testing multiple approaches to find the optimal solution.
Permutation Frameworks formalize this: instead of hoping your first approach works, you systematically generate and evaluate variations.
What Is a Permutation Framework?
Section titled “What Is a Permutation Framework?”A permutation framework defines dimensions of variation and lets Claude generate all meaningful combinations. Each dimension represents a design choice; each combination is a distinct implementation approach.
Dimension 1: Architecture → [Monolith, Modular, Microservice]Dimension 2: State Mgmt → [Server-side, Client-side, Hybrid]Dimension 3: Auth Strategy → [JWT, Session, OAuth]
Total permutations: 3 × 3 × 3 = 27 approachesPractical subset: 4-6 worth evaluatingWhen to Use Permutation Frameworks
Section titled “When to Use Permutation Frameworks”| Scenario | Use Permutation? | Why |
|---|---|---|
| New project architecture | ✅ Yes | Multiple valid approaches, high impact |
| Component design with tradeoffs | ✅ Yes | Performance vs. readability vs. maintainability |
| Migration strategy | ✅ Yes | Big-bang vs. strangler vs. parallel |
| Bug fix with known root cause | ❌ No | One correct fix |
| Styling changes | ❌ No | Low impact, subjective |
| Performance optimization | ✅ Maybe | Profile first, then permute solutions |
Implementation: CLAUDE.md-Driven Permutations
Section titled “Implementation: CLAUDE.md-Driven Permutations”The key insight: use CLAUDE.md variations to generate consistent implementations across different approaches.
Step 1: Define the Base Template
Section titled “Step 1: Define the Base Template”# CLAUDE.md (base)
## Project: [Project Name]## Permutation: {{VARIANT_NAME}}
### Architecture{{ARCHITECTURE_PATTERN}}
### State Management{{STATE_STRATEGY}}
### Conventions- All implementations must include tests- Use the same data model across variants- Each variant in its own branch: `perm/{{VARIANT_NAME}}`Step 2: Generate Variants
Section titled “Step 2: Generate Variants”# Create variant branches with Claudeclaude -p "Create 4 CLAUDE.md variants for our dashboard project:1. 'server-heavy': Server components, minimal client JS, session auth2. 'spa-optimized': Client SPA, REST API, JWT auth3. 'hybrid-ssr': SSR with hydration, tRPC, session + JWT4. 'edge-first': Edge functions, client cache, token auth
For each: create branch perm/<name>, write CLAUDE.md with filled template,scaffold the base structure. Same data model across all variants."Step 3: Implement in Parallel
Section titled “Step 3: Implement in Parallel”# Terminal 1git checkout perm/server-heavyclaude "Implement the dashboard following CLAUDE.md conventions"
# Terminal 2git checkout perm/spa-optimizedclaude "Implement the dashboard following CLAUDE.md conventions"
# Terminal 3 (or sequential)git checkout perm/hybrid-ssrclaude "Implement the dashboard following CLAUDE.md conventions"Step 4: Evaluate with Sub-Agents
Section titled “Step 4: Evaluate with Sub-Agents”User: Compare the 4 permutation branches. For each, evaluate:- Bundle size and load time- Code complexity (files, lines, dependencies)- Test coverage achievable- Maintenance burden estimate
Create a comparison matrix and recommend the best approachfor our team of 3 developers with moderate React experience.Practical Example: API Design Permutations
Section titled “Practical Example: API Design Permutations”# Permutation: REST vs GraphQL vs tRPC
## Shared constraints (all variants)- Same database schema (PostgreSQL + Prisma)- Same auth (JWT)- Same business logic (services layer)
## Variant A: REST- Express routes, OpenAPI spec- Separate validation layer (Zod)- Standard REST conventions (GET/POST/PUT/DELETE)
## Variant B: GraphQL- Apollo Server, schema-first- Resolvers calling same services- Dataloader for N+1 prevention
## Variant C: tRPC- Type-safe end-to-end- Shared types between client/server- Zod validation built-inEvaluation prompt:
User: I've implemented all 3 API variants. Now act as a reviewer:
1. Run tests for each: which has better coverage?2. Count total lines of boilerplate vs business logic3. Measure type safety (any manual type assertions?)4. Rate developer experience for adding a new endpoint (1-5)
Give me a decision matrix, not a recommendation.I'll decide based on our team context.Permutation Anti-Patterns
Section titled “Permutation Anti-Patterns”| Anti-Pattern | Problem | Fix |
|---|---|---|
| Too many dimensions | Combinatorial explosion (3⁴ = 81) | Cap at 3 dimensions, 3-4 variants each |
| No shared constraints | Variants aren’t comparable | Define fixed elements first |
| Permuting the trivial | Wasting tokens on style choices | Only permute architectural decisions |
| No evaluation criteria | Can’t pick a winner | Define scoring before generating variants |
| Skipping implementation | Comparing on paper only | Build at least a skeleton for each |
Integration with Other Patterns
Section titled “Integration with Other Patterns”Permutation + Plan Mode:
1. /plan → Define dimensions and constraints2. Generate CLAUDE.md variants3. /execute → Implement each variant4. /plan → Compare and decidePermutation + TDD:
1. Write tests that ALL variants must pass (shared spec)2. Implement each variant against the same test suite3. The variant with cleanest implementation winsPermutation + Skeleton Projects:
1. Start from same skeleton2. Branch per variant3. Each variant evolves the skeleton differently4. Compare which skeleton evolution is most maintainableCross-references:
- Skeleton Projects workflow: See Skeleton Projects Workflow
- Plan Mode: See §2.3 Plan Mode
- TDD workflow: See TDD with Claude
- Multi-Instance parallel execution: See §9.17 Scaling Patterns
9.20 Agent Teams (Multi-Agent Coordination)
Section titled “9.20 Agent Teams (Multi-Agent Coordination)”Reading time: 5 minutes (overview) | Quick Start → (8-10 min, practical) | Full workflow guide → (~30 min, theory) Skill level: Month 2+ (Advanced) Status: ⚠️ Experimental (v2.1.32+, Opus 4.6 required)
What Are Agent Teams?
Section titled “What Are Agent Teams?”Agent teams enable multiple Claude instances to work in parallel on a shared codebase, coordinating autonomously without human intervention. One session acts as team lead to break down tasks and synthesize findings from teammate sessions.
Key difference from Multi-Instance (§9.17):
- Multi-Instance = You manually orchestrate separate Claude sessions (independent projects, no shared state)
- Agent Teams = Claude manages coordination automatically (shared codebase, git-based communication)
Setup:export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1claude
OR in ~/.claude/settings.json:{ "experimental": { "agentTeams": true }}When Introduced & Production Validation
Section titled “When Introduced & Production Validation”Version: v2.1.32 (2026-02-05) as research preview Model requirement: Opus 4.6 minimum
Production metrics (validated cases):
- Fountain (workforce management): 50% faster screening, 2x conversions
- CRED (15M users, financial services): 2x execution speed
- Anthropic Research: Autonomous C compiler completion (no human intervention)
Source: 2026 Agentic Coding Trends Report, Anthropic Engineering Blog
Architecture Quick View
Section titled “Architecture Quick View”Team Lead (Main Session) ├─ Breaks tasks into subtasks ├─ Spawns teammate sessions (each with 1M token context) └─ Synthesizes findings from all agents │ ├─ Teammate 1: Task A (independent context) └─ Teammate 2: Task B (independent context)
Coordination: Git-based (task locking, continuous merge, conflict resolution)Navigation: Shift+Down to cycle through teammates, or tmux panesTeams vs Multi-Instance vs Dual-Instance
Section titled “Teams vs Multi-Instance vs Dual-Instance”| Pattern | Coordination | Best For | Cost | Setup |
|---|---|---|---|---|
| Agent Teams | Automatic (git-based) | Read-heavy tasks needing coordination | High (3x+) | Experimental flag |
| Multi-Instance (§9.17) | Manual (human) | Independent parallel tasks | Medium (2x) | Multiple terminals |
| Dual-Instance | Manual (human) | Quality assurance (plan-execute) | Medium (2x) | 2 terminals |
Use Cases That Work Well
Section titled “Use Cases That Work Well”✅ Excellent fit (read-heavy, clear boundaries):
- Multi-layer code review: Security scope + API scope + Frontend scope (Fountain: 50% faster)
- Parallel hypothesis testing: Debug by testing 3 theories simultaneously
- Large-scale refactoring: 47+ files across layers with clear interfaces
- Full codebase analysis: Architecture review, pattern detection
❌ Poor fit (avoid these):
- Simple tasks (<5 files affected) — coordination overhead not justified
- Write-heavy tasks (many shared file modifications) — merge conflict risks
- Sequential dependencies — no parallelization benefit
- Budget-constrained projects — 3x token cost multiplier
Quick Example: Multi-Layer Code Review
Section titled “Quick Example: Multi-Layer Code Review”Prompt:"Review this PR comprehensively using agent teams with scope-focused analysis:- Security Scope: Check for vulnerabilities, auth issues, data exposure (context: auth, validation code)- API Design Scope: Review endpoint design, validation, error handling (context: API routes, controllers)- Frontend Scope: Check UI patterns, accessibility, performance (context: components, styles)
PR: https://github.com/company/repo/pull/123"
Result:Team lead spawns 3 scope-focused agents → Each analyzes their scope in parallel →Team lead synthesizes findings → Comprehensive review in 1/3 the timeCritical Limitations
Section titled “Critical Limitations”Read-heavy > Write-heavy trade-off:
✅ Good: Code review (agents read, analyze, report)✅ Good: Bug tracing (agents read logs, trace execution)✅ Good: Architecture analysis (agents read structure)
⚠️ Risky: Refactoring shared types (merge conflicts)⚠️ Risky: Database schema changes (coordinated migrations)❌ Bad: Same file modified by multiple agents (conflict hell)Mitigation: Assign non-overlapping file sets, use interface-first approach, define contracts before parallel work.
Token intensity: 3x+ cost multiplier (3 agents = 3 model inferences). Only justified when time saved > cost increase.
Experimental status: No stability guarantee, bugs expected, feature may change. Report issues to Anthropic GitHub.
Decision Tree: When to Use Agent Teams
Section titled “Decision Tree: When to Use Agent Teams”Is task simple (<5 files)? ──YES──> Single agent │ NO │Tasks completely independent? ──YES──> Multi-Instance (§9.17) │ NO │Need quality assurance split? ──YES──> Dual-Instance │ NO │Read-heavy (analysis, review)? ──YES──> Agent Teams ✓ │ NO │Write-heavy (many file mods)? ──YES──> Single agent │ NO │Budget-constrained? ──YES──> Single agent │ NO │Complex coordination needed? ──YES──> Agent Teams ✓ ──NO──> Single agentSwarm vs Sequential Coordination
Section titled “Swarm vs Sequential Coordination”Two distinct coordination patterns exist for multi-agent review, and the choice matters:
| Dimension | Sequential Specialists | Swarm Mode |
|---|---|---|
| Structure | Predefined lead + members | Ad-hoc, no hierarchy |
| Coordination | Lead assigns tasks, synthesizes | Each reviewer works independently |
| Leadership | Team lead orchestrates | Human synthesizes findings |
| Task assignment | Lead delegates to specific agents | All relevant agents get the same input |
| Best for | Tasks with dependencies between reviewers | Independent review, final pre-merge pass |
| When to use | Complex workflows, state needs sharing | PR review, unfamiliar codebase, thoroughness |
Swarm Mode in practice (Every.to compound-engineering pattern):
Launch all relevant specialist reviewers in parallel against the same diff or PR, with no coordination between them. Each produces independent findings. You read all findings and decide what to act on.
# Swarm: all reviewers see the same input, report independently/workflows:review --swarm # Every.to compound-engineering commandThis is distinct from Agent Teams: there is no persistent team structure, no shared context between agents, no lead synthesizing in real time. It is faster to set up and appropriate when thoroughness matters more than coordination.
Rule of thumb: Use Agent Teams for workflows with sequential dependencies (agent A’s output feeds agent B). Use Swarm when each reviewer can work from the same starting point and you want maximum coverage with minimum setup overhead.
Practitioner Testimonial
Section titled “Practitioner Testimonial”Paul Rayner (CEO Virtual Genius, EventStorming Handbook author):
“Running 3 concurrent agent team sessions across separate terminals. Pretty impressive compared to previous multi-terminal workflows without coordination.”
Workflows used (Feb 2026):
- Job search app: Design research + bug fixing
- Business ops: Operating system + conference planning
- Infrastructure: Playwright MCP + beads framework management
Source: Paul Rayner LinkedIn
Navigation Between Agents
Section titled “Navigation Between Agents”Built-in controls:
- Shift+Down: Cycle through active teammates (in-process mode)
- tmux: Use tmux commands if in tmux session
- Direct takeover: Take control of any agent’s work mid-execution
Monitoring: Each agent reports progress, team lead synthesizes when all complete.
Full Documentation
Section titled “Full Documentation”This section is a quick overview. For complete guide:
- Agent Teams Workflow (~30 min, 10 sections)
- Architecture deep-dive (team lead, teammates, git coordination)
- Setup instructions (2 methods)
- 5 production use cases with metrics
- Workflow impact analysis (before/after)
- Limitations & gotchas (read/write trade-offs)
- Decision framework (Teams vs Multi-Instance vs Beads)
- Best practices, troubleshooting
Related patterns:
- §9.17 Multi-Instance Workflows — Manual parallel coordination
- §4.3 Sub-Agents — Single-agent task delegation
- AI Ecosystem: Beads Framework — Alternative orchestration (Gas Town)
Official sources:
- Introducing Claude Opus 4.6 (Anthropic, Feb 2026)
- Building a C compiler with agent teams (Anthropic Engineering, Feb 2026)
- 2026 Agentic Coding Trends Report (Anthropic, Jan 2026)
9.21 Legacy Codebase Modernization
Section titled “9.21 Legacy Codebase Modernization”Context: In February 2026, Anthropic published a COBOL modernization playbook positioning Claude Code as a direct replacement for legacy consulting teams. The same day, IBM stock dropped -13% (its worst single-day performance since October 2000). The workflow described is validated by independent research — it applies to any large legacy codebase (COBOL, Fortran, VB6, PL/I), not just COBOL.
Why Legacy Modernization Is Hard
Section titled “Why Legacy Modernization Is Hard”The real cost isn’t the migration itself — it’s the discovery phase. Original developers have retired. Documentation is absent or wrong. Code has been patched for decades by engineers who never understood the full system. Finding what talks to what requires consultants billing by the hour.
AI changes the economics by automating this exact phase.
COBOL context (for scale reference):
- ~220 billion lines of COBOL still in production (IBM estimate)
- ~95% of US ATM transactions run on COBOL-based systems (Reuters/industry consensus — methodology varies by source)
- Modernization previously required multi-year, multi-team projects
The 4-Step Workflow
Section titled “The 4-Step Workflow”Independent validation: Academic research (WJAETS 2025) shows -25 to -30% timeline reduction on average. Best-case: Airbnb migrated 3,500 test files in 6 weeks vs. an estimated 1.5 years. COBOL→Java accuracy: 93% in controlled studies (arXiv, April 2025).
Step 1 — Automated Exploration & Discovery
Map the entire codebase:- Identify all program entry points and execution paths- Trace subroutine calls across hundreds of files- Document implicit dependencies via shared files, databases, and global state- Generate a dependency graph before touching a single linePrompt pattern:
"Read the entire [COBOL/legacy] codebase. Map its structure:entry points, execution paths, subroutine call chains,and any implicit dependencies via shared data structures,global variables, or file I/O. Output a dependency map."
Step 2 — Risk Analysis & Opportunity Mapping
With the dependency map in hand:- Assess coupling levels between modules (high coupling = high risk)- Surface isolated components as safe modernization candidates- Identify duplicated logic and dead code- Flag shared state as the highest-risk zonesPrompt pattern:
"Based on the dependency map: rank modules by coupling level.Which components can be modernized in isolation?Which share state with 3+ other modules and should be touched last?"
Step 3 — Strategic Planning
Human + AI collaboration:- AI suggests prioritization based on risk/dependency analysis- Team reviews against business priorities (what breaks = most expensive)- Define target architecture and code standards- Design function-level tests for validation before migration beginsThis phase is not fully automatable — business context requires human judgment. Hybrid human-AI workflows show 31% higher completion rates within initial time estimates vs. purely automated approaches (WJAETS 2025).
Step 4 — Incremental Implementation
Never migrate the whole system at once:- Translate logic component by component- Create API wrappers for legacy components still in use- Run old and new code side-by-side in production- Validate each component independently before proceeding to the nextPrompt pattern:
"Translate [module X] to [target language].Preserve exact business logic — no optimization yet.Add a compatibility wrapper so both versions can run in parallel.Write tests that verify identical outputs for identical inputs."
Key Principles
Section titled “Key Principles”| Principle | Why it matters |
|---|---|
| Map before touching | Blind migrations fail; discovery first |
| Isolate before migrating | High-coupling modules = cascade failures |
| Parallel run | Rollback possible only if both versions coexist |
| Test at boundary | Test inputs/outputs, not internal logic (which will change) |
| Human review on business logic | AI doesn’t know which edge case is regulatory vs. dead code |
Realistic Expectations
Section titled “Realistic Expectations”“Years to quarters” is real — but it’s the optimistic scenario, not the average:
| Scenario | Timeline reduction | Source |
|---|---|---|
| Conservative estimate | -25 to -30% | WJAETS 2025 academic review |
| Automation-heavy phases | -40 to -50% | Fullstack Labs industry synthesis |
| Best-case (test migration) | -88% (6 weeks vs 1.5 yr) | Airbnb case study |
| COBOL→Java conversion accuracy | 93% | arXiv, April 2025 |
The average gains are real and significant. The headline numbers require favorable conditions: good test coverage, isolated modules, and a team that understands both the legacy system and the target stack.
Anti-Patterns
Section titled “Anti-Patterns”- ❌ Big bang migration — Rewriting everything at once. No company has survived this at scale.
- ❌ No parallel run — Cutting over without a fallback. One undiscovered edge case = production outage.
- ❌ Skipping discovery — Starting to translate before mapping. You will break things you didn’t know existed.
- ❌ Trusting AI on business logic — AI translates faithfully what it reads. If the original was wrong or context-dependent, the translation will be too.
Resources
Section titled “Resources”- Anthropic COBOL Modernization Playbook (Feb 2026)
- AI-Driven Legacy Systems Modernization: COBOL to Java (arXiv, April 2025)
- AWS EKS COBOL Modernization Case Study (July 2025)
9.22 Remote Control (Mobile Access)
Section titled “9.22 Remote Control (Mobile Access)”Reading time: 7 minutes Skill level: Week 2+ Status: Research Preview (as of February 2026) Availability: Pro and Max plans only — not available on Team, Enterprise, or API keys
Remote Control lets you monitor and control a local Claude Code session from a phone, tablet, or web browser — without migrating anything to the cloud. Your terminal keeps running locally; the mobile/web interface is a remote window onto that session.
Key difference from Session Teleportation (§9.16): Teleportation migrates a session (web → local). Remote Control mirrors a local session to a remote viewer. Execution always stays on your local machine.
How It Works
Section titled “How It Works”Local terminal (running claude) │ │ HTTPS outbound only (no inbound ports) ▼ Anthropic relay │ ▼Phone / tablet / browser (claude.ai/code or Claude app)- Execution: 100% local — your terminal does all the work
- Security: HTTPS outbound only, zero inbound ports, short-lived scoped credentials
- What you can do remotely: Send messages, approve/deny tool calls, read responses
Requirements:
- Claude Code v2.1.51+
- Active Pro or Max subscription (not Team/Enterprise)
- Logged in (
/login)
Two Ways to Start
Section titled “Two Ways to Start”Option A — From the command line (start a new session):
claude remote-control
# Optional flags:# --verbose Show detailed connection logs# --sandbox Restrict to sandbox modeOption B — From inside an active session:
/remote-control
# or the shorter alias:/rcConnecting from Your Device
Section titled “Connecting from Your Device”Once started, Claude Code displays:
- A session URL (open in any browser)
- Press spacebar to show a QR code (scan with your phone)
- Or open the Claude app (iOS / Android) — your active session appears automatically
To enable remote control on every session by default:
/config → toggle "Remote Control: auto-enable"Download the Mobile App
Section titled “Download the Mobile App”/mobile # Shows App Store + Google Play download linksKnown Limitations (Research Preview)
Section titled “Known Limitations (Research Preview)”| Limitation | Detail |
|---|---|
| 1 session at a time | Only one active remote control session |
| Terminal must stay open | Closing the local terminal ends the session |
| Network timeout | ~10 min before session expires on disconnect |
| Slash commands don’t work remotely | /new, /compact, etc. are treated as plain text in the remote UI |
| Pro/Max only | Not available on Team, Enterprise, or API keys |
⚠️ Slash commands limitation: When you type
/new,/compact, or any slash command in the remote interface (mobile app or browser), they are treated as plain text messages — not forwarded as commands to the local CLI. Use slash commands from your local terminal instead.
Advanced Patterns (Community-Validated)
Section titled “Advanced Patterns (Community-Validated)”Multi-Session via tmux (Workaround for 1-Session Limit)
Section titled “Multi-Session via tmux (Workaround for 1-Session Limit)”# Start a tmux session with multiple panestmux new-session -s dev
# Each tmux pane can run its own claude session:# Pane 1: claude → run /rc → share URL with your phone# Pane 2: claude (local only)# Pane 3: claude (local only)
# To switch which session you're controlling remotely:# → Go to pane 2, run /rc (disconnects pane 1's remote, connects pane 2)Each tmux pane hosts its own Claude session. Only one can use remote-control at a time, but you can switch between sessions by running /rc in different panes.
Persistent Server Architecture (VM/Cloud)
Section titled “Persistent Server Architecture (VM/Cloud)”Remote Control works on remote machines (VMs, cloud servers) running in tmux:
# On your cloud server (e.g., Clever Cloud, AWS, etc.):tmux new-session -s claude-serverclaude remote-control# → Scan QR code from your phone# → Control a cloud-hosted Claude session from mobile# → Sessions survive laptop reboots (tmux keeps them alive)This gives you persistent sessions that survive closing your laptop. Combine 6-8 Claude sessions in tmux for continuous uninterrupted work while traveling.
Alternatives (Pre-Remote Control)
Section titled “Alternatives (Pre-Remote Control)”| Alternative | How it worked | Status |
|---|---|---|
| happy.engineering | Open-source remote access for Claude Code | Community-declared obsolete post-RC |
| OpenClaw | Alternative Claude Code remote interface | Community-declared obsolete post-RC |
| SSH + mobile terminal | SSH into dev machine, run claude | Still valid for Team/Enterprise users |
| VS Code Remote | Remote SSH extension + Claude Code | Still valid, more complex setup |
Security Considerations
Section titled “Security Considerations”Full threat model: Security Hardening Guide: Remote Control Security
Quick summary:
- The session URL is a live access key — treat it like a password
- Anyone with the URL can send commands to your local Claude session while active
- Short-lived credentials + HTTPS outbound-only limits the exposure window
- Per-command approval prompts on mobile guard against accidental execution (not against active attackers)
- Not recommended on shared or untrusted workstations
- Corporate machines: verify your security policy even on personal Pro/Max accounts
Troubleshooting
Section titled “Troubleshooting”| Issue | Solution |
|---|---|
| Session not appearing in Claude app | Known bug (Research Preview) — use claude.ai/code in Safari instead (see below) |
| QR code opens app but session not visible | Known bug on iOS — scan with native camera app, open in Safari rather than Claude app |
| QR code not showing | Press spacebar after starting remote-control |
| Slash commands not working | Type them in your local terminal instead |
| Session expired | Reconnect: run /rc again |
| Corporate firewall blocking | HTTPS outbound (port 443) must be allowed |
| ”Not available” error | Verify Pro or Max subscription (not Team/Enterprise) |
Known bug (Research Preview, March 2026): On iOS (confirmed iPhone), scanning the QR code opens the Claude app but the remote session doesn’t appear in the session list. The bug also affects automatic session discovery in the Claude mobile app. MacStories confirmed this is inconsistent on non-local machines.
Most reliable workaround: open
claude.ai/codein Safari on your phone — your active session appears in the list there. Alternatively, copy the session URL from the terminal and paste it directly in Safari. Both paths bypass the app’s sync bug entirely.
Evolution Timeline
Section titled “Evolution Timeline”| Version | Feature |
|---|---|
| 2.1.51 | Initial Remote Control feature (Research Preview) |
| 2.1.53 | Stability improvements and bug fixes |
🎯 Section 9 Recap: Pattern Mastery Checklist
Section titled “🎯 Section 9 Recap: Pattern Mastery Checklist”Before moving to Section 10 (Reference), verify you understand:
Core Patterns:
- Trinity Pattern: Plan Mode → Extended Thinking → Sequential MCP for critical work
- Composition: Agents + Skills + Hooks working together seamlessly
- CI/CD Integration: Automated reviews and quality gates in pipelines
- IDE Integration: VS Code + Claude Code = seamless development flow
Productivity Patterns:
- Tight Feedback Loops: Test-driven workflows with instant validation
- Todo as Instruction Mirrors: Keep context aligned with reality
- Vibe Coding: Skeleton → iterate → production-ready
- Batch Operations: Process multiple files efficiently
Quality Awareness:
- Common Pitfalls: Understand security, performance, workflow mistakes
- Continuous Improvement: Refine over multiple sessions with learning mindset
- Best Practices: Do/Don’t patterns for professional work
- Development Methodologies: TDD, SDD, BDD, and other structured approaches
- Codebase Design for Agents: Optimize code for agent productivity (domain knowledge, discoverability, testing)
Communication Patterns:
- Named Prompting Patterns: As If, Constraint, Explain First, Rubber Duck, Incremental, Boundary
- Mermaid Diagrams: Generate visual documentation for architecture and flows
Advanced Workflows:
- Session Teleportation: Migrate sessions between cloud and local environments
- Remote Control: Monitor/control local sessions from mobile or browser (Research Preview, Pro/Max)
- Background Tasks: Run tasks in cloud while working locally (
%prefix) - Multi-Instance Scaling: Understand when/how to orchestrate parallel Claude instances (advanced teams only)
- Agent Teams: Multi-agent coordination for read-heavy tasks (experimental, Opus 4.6+)
- Permutation Frameworks: Systematically test multiple approaches before committing
- Legacy Modernization: 4-step workflow (Discovery → Risk → Planning → Incremental) for large legacy codebases
What’s Next?
Section titled “What’s Next?”Section 10 is your command reference — bookmark it for quick lookups during daily work.
You’ve mastered the concepts and patterns. Now Section 10 gives you the technical reference for efficient execution.
9.23 Configuration Lifecycle & The Update Loop
Section titled “9.23 Configuration Lifecycle & The Update Loop”Reading time: 8 minutes Skill level: Month 1+
See also: §9.10 Continuous Improvement Mindset — the conceptual foundation for this section. §9.23 is the operational layer: detecting when to act, and how.
As your Claude Code setup matures — skills, agents, rules, CLAUDE.md — a silent failure mode emerges: your configuration drifts away from how you actually work. Skills accumulate assumptions that no longer hold. CLAUDE.md describes a codebase that has evolved. Rules cover edge cases that became the norm. The agent keeps making the same correctable mistakes because nothing captures what you learned last week.
This section covers how to detect that drift early and close the loop — turning session observations into concrete config improvements.
Why Configurations Go Stale
Section titled “Why Configurations Go Stale”Staleness doesn’t happen in one go. It accumulates from small gaps:
- A skill was written for a v1 API that’s now v2 — the skill still “works” but generates code that needs manual fixing every time
- CLAUDE.md has context that’s 6 months old — the agent reasons from a mental model of the codebase that no longer exists
- A rule was added for an edge case that’s now the default pattern — it fires constantly and you’ve stopped reading its output
- You’ve corrected the same mistake across 5 sessions — but nothing ever captured that correction as a rule
The signal is always there: you keep doing the same manual fixes. The work is identifying which fixes are worth encoding.
Detecting Friction from Your JSONL Logs
Section titled “Detecting Friction from Your JSONL Logs”Your sessions are already logged (see §Observability: Setting Up Session Logging). What’s missing is reading them for quality signals, not just cost metrics.
Three patterns that reliably indicate a skill or rule needs updating:
| Pattern | Signal | Likely Cause |
|---|---|---|
| Same file read multiple times per session | Missing context | Content should move to CLAUDE.md or a skill |
| Tool failure followed immediately by retry | Wrong assumption | A skill has an outdated command or path |
| User correction immediately after assistant turn | Prompt gap | A skill or rule doesn’t cover this case |
Run this script weekly against your session logs to surface these patterns:
#!/bin/bash# Usage: ./scripts/detect-friction.sh [days-back]# Requires: jq
DAYS=${1:-7}LOG_DIR="${CLAUDE_LOG_DIR:-$HOME/.claude/logs}"SINCE=$(date -v-${DAYS}d +%Y-%m-%d 2>/dev/null || date -d "-${DAYS} days" +%Y-%m-%d)
echo "=== Friction Report — last ${DAYS} days ==="echo
# 1. Files read more than 3x in any single sessionecho "## Repeated Reads (same file >3x in one session)"for f in "$LOG_DIR"/activity-*.jsonl; do [[ "$(basename "$f" .jsonl | cut -d- -f2-)" < "$SINCE" ]] && continue jq -r 'select(.tool == "Read") | .file' "$f" 2>/dev/nulldone | sort | uniq -c | sort -rn | awk '$1 > 3 {print " " $1 "x " $2}'
echo
# 2. Tool failures (Bash exit non-zero)echo "## Tool Failures (potential stale commands in skills)"for f in "$LOG_DIR"/activity-*.jsonl; do [[ "$(basename "$f" .jsonl | cut -d- -f2-)" < "$SINCE" ]] && continue jq -r 'select(.tool == "Bash" and (.exit_code // 0) != 0) | .command' "$f" 2>/dev/nulldone | sort | uniq -c | sort -rn | head -10 | awk '{print " " $0}'
echo
# 3. Most-edited files (proxy for agent missing context)echo "## Most Edited Files (context gap candidates)"for f in "$LOG_DIR"/activity-*.jsonl; do [[ "$(basename "$f" .jsonl | cut -d- -f2-)" < "$SINCE" ]] && continue jq -r 'select(.tool == "Edit") | .file' "$f" 2>/dev/nulldone | sort | uniq -c | sort -rn | head -10 | awk '{print " " $1 "x " $2}'
echoecho "→ For each friction point, ask: is there a skill, rule, or CLAUDE.md section that should cover this?"Skills Lifecycle Management
Section titled “Skills Lifecycle Management”Skills accumulate. Without a lifecycle policy, you end up with 20+ skills where half are unused, two contradict each other, and none have version history.
When to create a skill:
A task is worth encoding as a skill when you’ve done it manually 3+ times and the steps are stable enough to write down. If you’re still figuring out the right approach, don’t encode it yet — premature skills crystallize bad patterns.
When to update a skill (patch):
- A command in the skill fails because an API or path changed
- The output needs a small clarification you keep adding manually
- You added a convention and the skill doesn’t reflect it yet
When to version a skill (minor/major):
Add a version field and updated date to your skill frontmatter:
---version: 1.2.0updated: 2026-03-02breaking_since: null---Use a simple policy:
- patch (
x.x.Z): rewording, clarification, examples added — no behavior change - minor (
x.Y.z): new instructions, extended scope, new behavior opt-in - major (
X.y.z): default behavior changes — annotate what broke and when in your CHANGELOG
When to deprecate a skill:
Add a deprecated: true flag and a note explaining what replaced it. Don’t delete immediately — other skills or commands may reference it.
CI staleness check — CLAUDE.md vs source modules:
If your CLAUDE.md is assembled from source modules (e.g., via a pnpm ai:configure pipeline), add a CI job to catch divergence before it causes silent failures:
name: AI Config Staleness Checkon: push: paths: - '.claude/rules/**' - '.claude/skills/**' - '.claude/agents/**' - 'CLAUDE.md.src/**' # adjust to your source dir
jobs: check: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Verify CLAUDE.md is up to date run: | # Regenerate and compare pnpm ai:configure --dry-run > /tmp/expected-claude.md if ! diff -q CLAUDE.md /tmp/expected-claude.md > /dev/null; then echo "❌ CLAUDE.md is stale. Run: pnpm ai:configure" diff CLAUDE.md /tmp/expected-claude.md exit 1 fi echo "✅ CLAUDE.md is up to date"The Update Loop
Section titled “The Update Loop”The update loop formalizes what you already do informally: something doesn’t work well → you notice → you fix it. The difference is making the “notice” step systematic rather than accidental.
┌──────────────────────────────────────────────┐│ THE UPDATE LOOP ││ ││ Session → Observe friction ││ (repeated fixes, tool fails) ││ ↓ ││ Analyze root cause ││ (which skill/rule is missing?) ││ ↓ ││ Delta update ││ (targeted edit, not rewrite) ││ ↓ ││ Canary test ││ (verify the fix holds) ││ ↓ ││ Next session → repeat │└──────────────────────────────────────────────┘The delta update principle: when updating a skill or rule, make the smallest targeted edit that fixes the observed problem. Don’t rewrite the whole skill — you’ll lose what was working. One problem, one edit, one test.
Integrating into /tech:handoff:
If you use a handoff command to persist session context, add a mandatory retrospective step before saving:
# Append to your handoff command prompt
Before saving context, answer:- Which rules or skills were missing for today's work?- Which corrections did you make more than once?- What's the smallest edit that would prevent the most repeated friction?
Save conclusions via: write_memory("retro_[date]", your answers)Canary testing a skill after update:
Before committing a skill change, verify it still produces the expected output on a known input:
# Example: test that typescript-aristote skill generates Zod validationclaude -p "Using the typescript-aristote skill: create a basic user tRPC router" \ --output-format text | grep -qE "(z\.object|publicProcedure)" \ && echo "✅ Canary passed" \ || echo "❌ Canary failed — skill may have regressed"Run canary tests before merging skill changes, especially for skills that other agents depend on.
Going Further
Section titled “Going Further”If you want to automate prompt optimization beyond the manual update loop, two frameworks are worth knowing:
DSPy (Stanford, open-source) — optimizes prompts programmatically given a metric and a set of examples. Requires 20+ labeled examples per skill for reliable results. Useful when you have a well-defined task and enough session history to build a dataset. dspy.ai
TextGrad — treats prompts as differentiable parameters and iterates using LLM-generated feedback as “gradients”. Better for creative or domain-specific tasks where the evaluation is qualitative. github.com/zou-group/textgrad
Both require more setup than the manual loop above, and neither eliminates the need for human judgment on what to optimize. Start with the update loop and canary tests — they’ll surface most of the value with a fraction of the overhead.
What’s Next?
- §9.10 Continuous Improvement Mindset — the decision framework for when to encode vs. accept as an edge case
- §Observability: Reading for Quality — qualitative JSONL analysis patterns
- §9.12 Git Best Practices — version control for your config alongside your code
10. Reference
Section titled “10. Reference”Quick jump: Commands Table · Keyboard Shortcuts · Configuration Reference · Troubleshooting · Cheatsheet · Daily Workflow