9. Advanced Patterns

📌 Section 9 TL;DR (3 minutes)

What you’ll learn: Production-grade workflows that combine multiple Claude Code features.

Pattern Categories:

🎯 The Trinity (9.1) — Ultimate workflow: Plan Mode → Extended Thinking → Sequential MCP

When: Architecture decisions, complex refactoring, critical systems
Why: Maximum reasoning power + safe exploration

🔄 Integration Patterns (9.2-9.4)

Composition: Agents + Skills + Hooks working together
CI/CD: GitHub Actions, automated reviews, quality gates
IDE: VS Code + Claude Code = seamless flow

⚡ Productivity Patterns (9.5-9.8)

Tight feedback loops: Test-driven with instant validation
Todo as mirrors: Keep context aligned with reality
Vibe coding: Skeleton → iterate → production

🎨 Quality Patterns (9.9-9.11)

Batch operations: Process multiple files efficiently
Continuous improvement: Refine over multiple sessions
Common pitfalls: Learn from mistakes (Do/Don’t lists)

When to Use This Section:

✅ You’re productive with basics and want mastery
✅ You’re setting up team workflows or CI/CD
✅ You hit limits of simple “ask Claude” approach
❌ You’re still learning basics (finish Sections 1-8 first)

Reading time: 20 minutes Skill level: Month 1+ Goal: Master power-user techniques

🌍 Industry Context: 2026 Agentic Coding Trends

Source: Anthropic “2026 Agentic Coding Trends Report” (Feb 2026)

Les patterns de cette section reflètent l’évolution de l’industrie documentée par Anthropic auprès de 5000+ organisations.

📊 Données d’Adoption Validées

Pattern	Adoption Timeline	Productivity Gain	Business Impact
Agent Teams (9.20)	3-6 mois	50-67%	Timeline: semaines → jours
Multi-Instance (9.17)	1-2 mois	2x output	Cost: $500-1K/month
Sandbox Isolation (guide/sandbox-native.md)	Immediate	Security baseline	Compliance requirement

🎯 Research Insights (Anthropic Internal Study)

60% of work uses AI (vs 0% en 2023)
0-20% “fully delegated” → Collaboration centrale, pas remplacement
67% more PRs merged per engineer per day
27% new work wouldn’t be done without AI (exploratory, nice-to-have)

⚠️ Anti-Patterns Entreprise

Over-Delegation (trop d’agents):

Symptôme: Context switching cost > productivity gain
Limite: >5 agents simultanés = coordination overhead
Fix: Start 1-2 agents, scale progressivement

Premature Automation:

Symptôme: Automatiser workflow non maîtrisé manuellement
Fix: Manual → Semi-auto → Full-auto (progressive)

Tool Sprawl (MCP prolifération):

Symptôme: >10 MCP servers, conflicts, maintenance burden
Fix: Start core stack (Serena, Context7, Sequential), add selectively

📚 Case Studies Industrie

Fountain (workforce mgmt): 50% faster screening via hierarchical multi-agent
Rakuten (tech): 7h autonomous vLLM implementation (12.5M lines, 99.9% accuracy)
CRED (fintech): 2x execution speed, quality maintained (15M users)
TELUS (telecom): 500K hours saved, 13K custom solutions
Zapier (automation): 89% adoption, 800+ internal agents

Chaque pattern ci-dessous inclut:

✅ Industry validation (stats adoption, ROI)
✅ Practical guide (workflows step-by-step)
✅ Anti-patterns (pitfalls to avoid)

Full evaluation: docs/resource-evaluations/anthropic-2026-agentic-coding-trends.md

9.1 The Trinity

The most powerful Claude Code pattern combines three techniques:

┌─────────────────────────────────────────────────────────┐
│                      THE TRINITY                        │
├─────────────────────────────────────────────────────────┤
│                                                         │
│   ┌─────────────┐                                       │
│   │ Plan Mode   │  Safe exploration without changes     │
│   └──────┬──────┘                                       │
│          │                                              │
│          ▼                                              │
│   ┌─────────────┐                                       │
│   │ Ext.Thinking│  Deep analysis (Opus 4.5/4.6, adaptive in 4.6) │
│   └──────┬──────┘                                       │
│          │                                              │
│          ▼                                              │
│   ┌─────────────────────┐                               │
│   │ Sequential Thinking │  Structured multi-step reason │
│   └─────────────────────┘                               │
│                                                         │
│   Combined: Maximum understanding before action         │
│                                                         │
└─────────────────────────────────────────────────────────┘

When to Use the Trinity

Situation	Use Trinity?
Fixing a typo	❌ Overkill
Adding a feature	Maybe
Debugging complex issue	✅ Yes
Architectural decision	✅ Yes
Legacy system modernization	✅ Yes

Extended Thinking (Opus 4.5+) & Adaptive Thinking (Opus 4.6+)

⚠️ Breaking Change (Opus 4.6, Feb 2026): Opus 4.6 replaces budget-based thinking with Adaptive Thinking, which automatically decides when to use deep reasoning based on query complexity. The budget_tokens parameter is deprecated on Opus 4.6.

Evolution Timeline

Version	Thinking Approach	Control Method
Opus 4.5 (pre-v2.0.67)	Opt-in, keyword-triggered (~4K/10K/32K tokens)	Prompt keywords
Opus 4.5 (v2.0.67+)	Always-on at max budget	Alt+T toggle, `/config`
Opus 4.6 (Feb 2026)	Adaptive thinking (dynamic depth)	`effort` parameter (API), Alt+T (CLI)

Adaptive Thinking (Opus 4.6)

How it works: The effort parameter controls the model’s overall computational budget — not just thinking tokens, but the entire response including text generation and tool calls. The model dynamically allocates this budget based on query complexity.

Key insight: effort affects everything, even when thinking is disabled. Lower effort = fewer tool calls, more concise text. Higher effort = more tool calls with explanations, detailed analysis.

Effort levels (API only, official descriptions):

max: Maximum capability, no constraints. Opus 4.6 only (returns error on other models). Cross-system reasoning, irreversible decisions.

Example: "Analyze the microservices event pipeline for race conditions across order-service, inventory-service, and notification-service"
high (default): Complex reasoning, coding, agentic tasks. Best for production workflows requiring deep analysis.

Example: "Redesign error handling in the payment module: add retry logic, partial failure recovery, and idempotency guarantees"
medium: Balance between speed, cost, and performance. Good for agentic tasks with moderate complexity.

Example: "Convert fetchUser() in api/users.ts from callbacks to async/await"
low: Most efficient. Ideal for classification, lookups, sub-agents, or tasks where speed matters more than depth.

Example: "Rename getUserById to findUserById across src/"

See Section 2.5 Model Selection & Thinking Guide for a complete decision table with effort, model, and cost estimates.

API syntax:

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    output_config={"effort": "medium"},  # low|medium|high|max
    messages=[{"role": "user", "content": "Analyze..."}]
)

Effort and Tool Use:

The effort parameter significantly impacts how Claude uses tools:

low effort: Combines operations to minimize tool calls. No explanatory preamble before actions. Faster, more efficient for simple tasks.
high effort: More tool calls with detailed explanations. Describes the plan before executing. Provides comprehensive summaries after operations. Better for complex workflows requiring transparency.

Example: With low effort, Claude might read 3 files and edit them in one flow. With high effort, Claude explains why it’s reading those files, what it’s looking for, then provides a detailed summary of changes made.

Relationship between effort and thinking:

Opus 4.6: effort is the recommended control for thinking depth. The budget_tokens parameter is deprecated on 4.6 (though still functional for backward compatibility).
Opus 4.5: effort works in parallel with budget_tokens. Both parameters are supported and affect different aspects of the response.
Without thinking enabled: effort still controls text generation and tool calls. It’s not a thinking-only parameter.

CLI usage: Three methods to control effort level in Claude Code:

/model command with left/right arrow keys to adjust the effort slider (low, medium, high)
CLAUDE_CODE_EFFORT_LEVEL environment variable (set before launching Claude)
effortLevel field in settings.json (persistent across sessions)

Alt+T toggles thinking on/off globally (separate from effort level).

Controlling Thinking Mode

Method	Opus 4.5	Opus 4.6	Persistence
Alt+T (Option+T on macOS)	Toggle on/off	Toggle on/off	Current session
/config → Thinking mode	Enable/disable globally	Enable/disable globally	Across sessions
`/model` slider (left/right arrows)	`low\|medium\|high`	`low\|medium\|high`	Current session
`CLAUDE_CODE_EFFORT_LEVEL` env var	`low\|medium\|high`	`low\|medium\|high`	Shell session
`effortLevel` in settings.json	`low\|medium\|high`	`low\|medium\|high`	Permanent
Ctrl+O	View thinking blocks	View thinking blocks	Display only

Cost Implications

Thinking tokens are billed. With adaptive thinking:

Opus 4.6: Thinking usage varies dynamically (less predictable than fixed budget)
Simple tasks: Consider Alt+T to disable → faster responses, lower cost
Complex tasks: Leave enabled → better reasoning, adaptive depth
Sonnet/Haiku: No extended thinking available (Opus 4.5/4.6 only)

Migration for Existing Users

Before (no longer needed):

claude -p "Ultrathink. Analyze this architecture."

After (thinking is already max by default):

claude -p "Analyze this architecture."

To disable thinking for simple tasks: Press Alt+T before sending, or use Sonnet.

Legacy Keywords Reference

These keywords were functional before v2.0.67. They are now recognized visually but have no behavioral effect.

Keyword	Previous Effect	Current Effect
”Think”	~4K tokens	Cosmetic only
”Think hard”	~10K tokens	Cosmetic only
”Ultrathink”	~32K tokens	Cosmetic only

API Breaking Changes (Opus 4.6)

Removed features:

assistant-prefill: Deprecated on Opus 4.6. Previously allowed pre-filling Claude’s response to guide output format. Now unsupported — use system prompts or examples instead.

New features:

Fast mode API: Add speed: "fast" + beta header fast-mode-2026-02-01 for 2.5x faster responses (6x cost)

response = client.messages.create(
    model="claude-opus-4-6",
    speed="fast",  # 2.5x faster, 6x price
    headers={"anthropic-beta": "fast-mode-2026-02-01"},
    messages=[...]
)

Migration:

If using assistant-prefill: Replace with explicit instructions in system prompt
For speed: Use fast mode API or /fast command in CLI

Example: Using the Trinity

You: /plan

Let's analyze this legacy authentication system before we touch anything.
[Thinking mode is enabled by default with Opus 4.5 - no keyword needed]

[Claude enters Plan Mode and does deep analysis]

Claude: I've analyzed the auth system. Here's what I found:
- 47 files depend on the current auth module
- 3 critical security issues
- Migration path needs 4 phases

Ready to implement?

You: /execute
Let's start with phase 1

9.2 Composition Patterns

Multi-Agent Delegation

Launch multiple agents for different aspects:

You: For this feature, I need:
1. Backend architect to design the API
2. Security reviewer to audit the design
3. Test engineer to plan the tests

Run these in parallel.

Claude will coordinate:

Backend architect designs API
Security reviewer audits (in parallel)
Test engineer plans tests (in parallel)

Skill Stacking

Combine multiple skills for complex tasks:

# code-reviewer.md
skills:
  - security-guardian
  - performance-patterns
  - accessibility-checker

The reviewer now has all three knowledge domains.

The “Rev the Engine” Pattern

For quality work, use multiple rounds of critique:

You: Write the function, then critique it, then improve it.
Do this 3 times.

Round 1: [Initial implementation]
Critique: [What's wrong]
Improvement: [Better version]

Round 2: [Improved implementation]
Critique: [What's still wrong]
Improvement: [Even better version]

Round 3: [Final implementation]
Final check: [Verification]

The “Stack Maximum” Pattern

For critical work, combine everything:

1. Plan Mode + Extended Thinking → Deep exploration
2. Multiple Agents → Specialized analysis
3. Sequential Thinking → Structured reasoning
4. Rev the Engine → Iterative improvement
5. Code Review Agent → Final validation

9.3 CI/CD Integration

Headless Mode

Run Claude Code without interactive prompts:

# Basic headless execution
claude -p "Run the tests and report results"

# With timeout
claude -p --timeout 300 "Build the project"

# With specific model
claude -p --model sonnet "Analyze code quality"

Unix Piping Workflows

Claude Code supports Unix pipe operations, enabling powerful shell integration for automated code analysis and transformation.

How piping works:

# Pipe content to Claude with a prompt
cat file.txt | claude -p 'analyze this code'

# Pipe command output for analysis
git diff | claude -p 'explain these changes'

# Chain commands with Claude
npm test 2>&1 | claude -p 'summarize test failures and suggest fixes'

Common patterns:

Code review automation:

git diff main...feature-branch | claude -p 'Review this diff for security issues'

Log analysis:

tail -n 100 /var/log/app.log | claude -p 'Find the root cause of errors'

Test output parsing:

npm test 2>&1 | claude -p 'Create a summary of failing tests with priority order'

Documentation generation:

cat src/api/*.ts | claude -p 'Generate API documentation in Markdown'

Batch file analysis:

find . -name "*.js" -exec cat {} \; | claude -p 'Identify unused dependencies'

Using with --output-format:

# Get structured JSON output
git status --short | claude -p 'Categorize changes' --output-format json

# Stream JSON for real-time processing
cat large-file.txt | claude -p 'Analyze line by line' --output-format stream-json

Best practices:

Be specific: Clear prompts yield better results

# Good: Specific task
git diff | claude -p 'List all function signature changes'

# Less effective: Vague request
git diff | claude -p 'analyze this'

Limit input size: Pipe only relevant content to avoid context overload

# Good: Filtered scope
git diff --name-only | head -n 10 | xargs cat | claude -p 'review'

# Risky: Could exceed context
cat entire-codebase/* | claude -p 'review'

Use non-interactive mode: Add -p for automation

cat file.txt | claude -p -p 'fix linting errors' > output.txt

Combine with jq for JSON: Parse Claude’s JSON output

echo "const x = 1" | claude -p 'analyze' --output-format json | jq '.suggestions[]'

Output format control:

The --output-format flag controls Claude’s response format:

Format	Use Case	Example
`text`	Human-readable output (default)	`claude -p 'explain' --output-format text`
`json`	Machine-parseable structured data	`claude -p 'analyze' --output-format json`
`stream-json`	Real-time streaming for large outputs	`claude -p 'transform' --output-format stream-json`

Example JSON workflow:

# Get structured analysis
git log --oneline -10 | claude -p 'Categorize commits by type' --output-format json

# Output:
# {
#   "categories": {
#     "features": ["add user auth", "new dashboard"],
#     "fixes": ["fix login bug", "resolve crash"],
#     "chores": ["update deps", "refactor tests"]
#   },
#   "summary": "10 commits: 2 features, 2 fixes, 6 chores"
# }

Integration with build scripts (package.json):

{
  "scripts": {
    "claude-review": "git diff main | claude -p 'Review for security issues' --output-format json > review.json",
    "claude-test-summary": "npm test 2>&1 | claude -p -p 'Summarize failures and suggest fixes'",
    "claude-docs": "cat src/**/*.ts | claude -p 'Generate API documentation' > API.md",
    "precommit-check": "git diff --cached | claude -p -p 'Check for secrets or anti-patterns' && git diff --cached | prettier --check"
  }
}

CI/CD integration example:

name: AI Code Review
on: [pull_request]

jobs:
  claude-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Install Claude Code
        run: npm install -g @anthropic-ai/claude-code

      - name: Run Claude Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          git diff origin/main...HEAD | \
            claude -p -p 'Review this PR diff for security issues, performance problems, and code quality. Format as JSON.' \
            --output-format json > review.json

      - name: Comment on PR
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const review = JSON.parse(fs.readFileSync('review.json', 'utf8'));
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `## 🤖 Claude Code Review\n\n${review.summary}`
            });

Limitations:

Context size: Large pipes may exceed token limits (monitor with /status)
Interactive prompts: Use -p for automation to avoid blocking
Error handling: Pipe failures don’t always propagate; add set -e for strict mode
API costs: Automated pipes consume API credits; monitor usage with ccusage

💡 Pro tip: Combine piping with aliases for frequently used patterns:

# Add to ~/.bashrc or ~/.zshrc
alias claude-review='git diff | claude -p "Review for bugs and suggest improvements"'
alias claude-logs='tail -f /var/log/app.log | claude -p "Monitor for errors and alert on critical issues"'

Source: DeepTo Claude Code Guide - Unix Piping

Git Hooks Integration

Windows Note: Git hooks run in Git Bash on Windows, so the bash syntax below works. Alternatively, you can create .cmd or .ps1 versions and reference them from a wrapper script.

Pre-commit hook:

#!/bin/bash
# Run Claude Code for commit message validation
COMMIT_MSG=$(cat "$1")
claude -p "Is this commit message good? '$COMMIT_MSG'. Reply YES or NO with reason."

Pre-push hook:

#!/bin/bash
# Security check before push
claude -p "Scan staged files for secrets and security issues. Exit 1 if found."
EXIT_CODE=$?

if [ $EXIT_CODE -ne 0 ]; then
    echo "Security issues found. Push blocked."
    exit 1
fi

GitHub Actions Integration

name: Claude Code Review

on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Claude Code
        run: npm install -g @anthropic-ai/claude-code

      - name: Run Review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          claude -p "Review the changes in this PR. \
            Focus on security, performance, and code quality. \
            Output as markdown."

Debugging Failed CI Runs

When GitHub Actions fails, use the gh CLI to investigate without leaving your terminal:

Quick investigation workflow:

# List recent workflow runs
gh run list --limit 10

# View specific run details
gh run view <run-id>

# View logs for failed run
gh run view <run-id> --log-failed

# Download logs for detailed analysis
gh run download <run-id>

Common debugging commands:

Command	Purpose
`gh run list --workflow=test.yml`	Filter by workflow file
`gh run view --job=<job-id>`	View specific job details
`gh run watch`	Watch the current run in real-time
`gh run rerun <run-id>`	Retry a failed run
`gh run rerun <run-id> --failed`	Retry only failed jobs

Example: Investigate test failures:

# Get the latest failed run
FAILED_RUN=$(gh run list --status failure --limit 1 --json databaseId --jq '.[0].databaseId')

# View the failure
gh run view $FAILED_RUN --log-failed

# Ask Claude to analyze
gh run view $FAILED_RUN --log-failed | claude -p "Analyze this CI failure and suggest fixes"

Pro tip: Combine with Claude Code for automated debugging:

# Fetch failures and auto-fix
gh run view --log-failed | claude -p "
  Analyze these test failures.
  Identify the root cause.
  Propose fixes for each failing test.
  Output as actionable steps.
"

This workflow saves time compared to navigating GitHub’s web UI and enables faster iteration on CI failures.

Verify Gate Pattern

Before creating a PR, ensure all local checks pass. This prevents wasted CI cycles and review time.

The pattern:

Build ✓ → Lint ✓ → Test ✓ → Type-check ✓ → THEN create PR

Implementation as a command (.claude/commands/complete-task.md):

# Complete Task

Run the full verification gate before creating a PR:

1. **Build**: Run `pnpm build` - must succeed
2. **Lint**: Run `pnpm lint` - must have zero errors
3. **Test**: Run `pnpm test` - all tests must pass
4. **Type-check**: Run `pnpm typecheck` - no type errors

If ANY step fails:
- Stop immediately
- Report what failed and why
- Suggest fixes
- Do NOT proceed to PR creation

If ALL steps pass:
- Create the PR with `gh pr create`
- Wait for CI with `gh pr checks --watch`
- If CI fails, fetch feedback and auto-fix
- Loop until mergeable or blocked

Autonomous retry loop:

┌─────────────────────────────────────────┐
│         VERIFY GATE + AUTO-FIX          │
├─────────────────────────────────────────┤
│                                         │
│   Local checks (build/lint/test)        │
│        │                                │
│        ▼ FAIL?                          │
│   ┌─────────┐                           │
│   │ Auto-fix│ ──► Re-run checks         │
│   └─────────┘                           │
│        │                                │
│        ▼ PASS                           │
│   Create PR                             │
│        │                                │
│        ▼                                │
│   Wait for CI (gh pr checks --watch)    │
│        │                                │
│        ▼ FAIL?                          │
│   ┌─────────────────────┐               │
│   │ Fetch CI feedback   │               │
│   │ (CodeRabbit, etc.)  │               │
│   └─────────────────────┘               │
│        │                                │
│        ▼                                │
│   Auto-fix + push + loop                │
│        │                                │
│        ▼                                │
│   PR mergeable OR blocked (ask human)   │
│                                         │
└─────────────────────────────────────────┘

Fetching CI feedback (GitHub GraphQL):

# Get PR review status and comments
gh api graphql -f query='
  query($pr: Int!) {
    repository(owner: "OWNER", name: "REPO") {
      pullRequest(number: $pr) {
        reviewDecision
        reviewThreads(first: 100) {
          nodes {
            isResolved
            comments(first: 1) {
              nodes { body }
            }
          }
        }
      }
    }
  }' -F pr=$PR_NUMBER

Inspired by Nick Tune’s Coding Agent Development Workflows

Release Notes Generation

Automate release notes and changelog generation using Claude Code.

Why automate release notes?

Consistent format across releases
Captures technical details from commits
Translates technical changes to user-facing language
Saves 30-60 minutes per release

Pattern: Git commits → Claude analysis → User-friendly release notes

Approach 1: Command-Based

Create .claude/commands/release-notes.md:

# Generate Release Notes

Analyze git commits since last release and generate release notes.

## Process

1. **Get commits since last tag**:
   ```bash
   git log $(git describe --tags --abbrev=0)..HEAD --oneline

Read full commit details:
- Include commit messages
- Include file changes
- Include PR numbers if present
Categorize changes:
- ✨ Features - New functionality
- 🐛 Bug Fixes - Issue resolutions
- ⚡ Performance - Speed/efficiency improvements
- 🔒 Security - Security patches
- 📝 Documentation - Doc updates
- 🔧 Maintenance - Refactoring, dependencies
- ⚠️ Breaking Changes - API changes (highlight prominently)

Generate three versions:

A. CHANGELOG.md format (technical, for developers):

## [Version] - YYYY-MM-DD

### Added
- Feature description with PR reference

### Fixed
- Bug fix description

### Changed
- Breaking change with migration guide

B. GitHub Release Notes (balanced, technical + context):

## What's New

Brief summary of the release

### ✨ New Features
- User-facing feature description

### 🐛 Bug Fixes
- Issue resolution description

### ⚠️ Breaking Changes
- Migration instructions

**Full Changelog**: v1.0.0...v1.1.0

C. User Announcement (non-technical, benefits-focused):

We're excited to announce [Version]!

**Highlights**:
- What users can now do
- How it helps them
- When to use it

[Link to full release notes]

Output files:
- Prepend to CHANGELOG.md
- Save to release-notes-[version].md
- Copy “User Announcement” to clipboard for Slack/blog

Verification

Check for missed breaking changes
Verify all PR references are valid
Ensure migration guides are clear

#### Approach 2: CI/CD Automation

Add to `.github/workflows/release.yml`:

```yaml
name: Release

on:
  push:
    tags:
      - 'v*'

jobs:
  release:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for changelog

      - name: Generate Release Notes
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          # Get version from tag
          VERSION=${GITHUB_REF#refs/tags/}

          # Generate with Claude
          claude -p "Generate release notes for $VERSION. \
            Analyze commits since last tag. \
            Output in GitHub Release format. \
            Save to release-notes.md"

          # Create GitHub Release
          gh release create $VERSION \
            --title "Release $VERSION" \
            --notes-file release-notes.md

      - name: Update CHANGELOG.md
        run: |
          # Prepend to CHANGELOG
          cat release-notes.md CHANGELOG.md > CHANGELOG.tmp
          mv CHANGELOG.tmp CHANGELOG.md

          # Commit back
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add CHANGELOG.md
          git commit -m "docs: update changelog for $VERSION"
          git push

Approach 3: Interactive Workflow

For more control, use an interactive session:

# 1. Start Claude Code
claude

# 2. Request release notes
You: "Generate release notes for v2.0.0"

# 3. Claude will:
# - Run git log to get commits
# - Ask clarifying questions:
#   - "Is this a major/minor/patch release?"
#   - "Any breaking changes users should know?"
#   - "Target audience for announcement?"

# 4. Review and refine
You: "Add more detail to the authentication feature"

# 5. Finalize
You: "Save these notes and update CHANGELOG.md"

Best Practices

Before generation:

✅ Ensure commits follow conventional commits format
✅ All PRs have been merged
✅ Version number decided (semver)

During generation:

✅ Review for accuracy (Claude might miss context)
✅ Add migration guides for breaking changes
✅ Include upgrade instructions if needed

After generation:

✅ Cross-reference with closed issues/PRs
✅ Test upgrade path on a staging project
✅ Share draft with team before publishing

Example Output

Given these commits:

feat: add user avatar upload (PR #123)
fix: resolve login timeout issue (PR #124)
perf: optimize database queries by 40% (PR #125)
BREAKING: change API endpoint from /api/v1 to /v2 (PR #126)

Claude generates:

CHANGELOG.md (technical):

## [2.0.0] - 2025-01-10

### Added
- User avatar upload functionality (#123)

### Fixed
- Login timeout issue affecting mobile users (#124)

### Performance
- Optimized database queries, reducing load time by 40% (#125)

### Breaking Changes
- **API Endpoints**: Migrated from `/api/v1/*` to `/v2/*`
  - Update client code: replace `/api/v1/` with `/v2/`
  - Old endpoints will return 410 Gone after 2025-02-01
  - Migration guide: docs/migration-v2.md (#126)

GitHub Release (balanced):

## What's New in v2.0.0

This release brings performance improvements, bug fixes, and a new avatar feature.

### ✨ New Features
- **Avatar Upload**: Users can now upload custom profile pictures

### 🐛 Bug Fixes
- Fixed login timeout issue that affected some mobile users

### ⚡ Performance
- Database queries are now 40% faster

### ⚠️ Breaking Changes
- **API Endpoint Migration**: All endpoints have moved from `/api/v1` to `/v2`
  - **Action Required**: Update your API client code
  - **Timeline**: Old endpoints will stop working on February 1, 2025
  - **Migration Guide**: [See docs/migration-v2.md](./docs/migration-v2.md)

**Full Changelog**: v1.9.0...v2.0.0

User Announcement (non-technical):

📢 Version 2.0 is here!

We've made your experience faster and more personal:

✨ **Customize Your Profile** - Upload your own avatar
⚡ **Lightning Fast** - Pages load 40% faster
🐛 **More Reliable** - Fixed the login timeout issue

**For Developers**: This is a breaking release. See our migration guide for API changes.

[Read full release notes →]

Common Issues

“Release notes are too technical”

Solution: Specify audience in prompt: “Generate for non-technical users”

“Claude missed a breaking change”

Solution: Explicitly list breaking changes in prompt
Better: Use “BREAKING:” prefix in commit messages

“Generated notes are generic”

Solution: Provide more context: “This release focuses on mobile performance”

“Commits are messy/unclear”

Solution: Clean up commit history before generation (interactive rebase)
Better: Enforce commit message format with git hooks

Deployment Automation

Claude Code can automate deployments to Vercel, GCP, and other platforms using stored credentials. The key is assembling three components: secret management, a deploy skill, and mandatory guardrails.

Required secrets

Store credentials in the OS keychain rather than .env files:

# Vercel deployment (3 required variables)
security add-generic-password -a claude -s VERCEL_TOKEN -w "your_token"
security add-generic-password -a claude -s VERCEL_ORG_ID -w "your_org_id"
security add-generic-password -a claude -s VERCEL_PROJECT_ID -w "your_project_id"

# Retrieve in scripts
VERCEL_TOKEN=$(security find-generic-password -s VERCEL_TOKEN -w)

For multi-platform secrets (GitHub, Vercel, AWS simultaneously), Infisical provides centralized management with versioning and point-in-time recovery — a useful open-source alternative to HashiCorp Vault:

# Install Infisical CLI
brew install infisical/get-cli/infisical

# Inject secrets into Claude Code session
infisical run -- claude
# Infisical automatically sets all project secrets as env vars

Deployment skill

Create a skill that encapsulates the full deploy workflow:

---
name: deploy-to-vercel
description: Deploy to Vercel staging then production with smoke tests
allowed-tools: Bash
---

## Deploy Workflow

1. Run tests: `pnpm test` — stop if any fail
2. Build: `pnpm build` — stop if build fails
3. Deploy to staging: `vercel deploy`
4. Run smoke tests against staging URL
5. **PAUSE** — output staging URL and ask for human confirmation before production
6. On approval: `vercel deploy --prod`
7. Verify production URL responds with HTTP 200

Non-negotiable guardrails

These guardrails are not optional. Production deployments without them create incidents:

Guardrail	Implementation	Why
Staging-first	Always deploy to staging before prod	Catch environment-specific failures
Human confirmation	Stop and ask before `--prod` flag	No autonomous production deploys
Smoke test	Verify HTTP 200 on key endpoints after deploy	Catch silent deployment failures
Rollback ready	Keep previous deployment ID before promoting	`vercel rollback <deployment-id>`

Hook for confirmation (prevent accidental production deploys):

{
  "hooks": {
    "PreToolUse": [{
      "matcher": "Bash",
      "hooks": [{
        "type": "command",
        "command": "scripts/check-prod-deploy.sh"
      }]
    }]
  }
}

#!/bin/bash
# check-prod-deploy.sh — exit 2 to block, exit 0 to allow
INPUT=$(cat)
if echo "$INPUT" | grep -q "vercel deploy --prod\|gcloud deploy.*production"; then
  echo "BLOCKED: Production deploy requires manual confirmation. Run the command directly from your terminal."
  exit 2
fi
exit 0

Sources: Vercel deploy skill pattern documented by the community (lobehub.com, haniakrim21); Infisical multi-platform secrets management at infisical.com. No end-to-end automated deploy workflow exists in the community as of March 2026 — the building blocks are available but the staging-to-production promotion pattern is something each team assembles themselves.

9.4 IDE Integration

VS Code Integration

Claude Code integrates with VS Code:

Install Extension: Search “Claude Code” in Extensions
Configure: Set API key in settings
Use:
- Ctrl+Shift+P → “Claude Code: Start Session”
- Select text → Right-click → “Ask Claude”

JetBrains Integration

Works with IntelliJ, WebStorm, PyCharm:

Install Plugin: Settings → Plugins → “Claude Code”
Configure: Tools → Claude Code → Set API key
Use:
- Ctrl+Shift+A → “Claude Code”
- Tool window for persistent session

Xcode Integration (Feb 2026)

New: Xcode 26.3 RC+ includes native Claude Agent SDK support, using the same harness as Claude Code:

Requirements: Xcode 26.3 RC or later (macOS)
Setup: Configure API key in Xcode → Preferences → Claude
Use:
- Built-in code assistant powered by Claude
- Same capabilities as Claude Code CLI
- Native integration with Xcode workflows

Claude Agent SDK: Separate product from Claude Code, but shares the same agent execution framework. Enables Claude-powered development tools in IDEs beyond VS Code.

Note: Claude Agent SDK is not Claude Code — it’s Anthropic’s framework for building agent-powered developer tools. Claude Code CLI and Xcode integration both use this SDK.

Terminal Integration

For terminal-native workflow:

macOS/Linux (Bash/Zsh)

# Add to .bashrc or .zshrc
alias cc='claude'
alias ccp='claude --plan'
alias cce='claude --execute'

# Quick code question
cq() {
    claude -p "$*"
}

Usage:

cq "What does this regex do: ^[a-z]+$"

Windows (PowerShell)

# Add to $PROFILE (run: notepad $PROFILE to edit)
function cc { claude $args }
function ccp { claude --plan $args }
function cce { claude --execute $args }

function cq {
    param([Parameter(ValueFromRemainingArguments)]$question)
    claude -p ($question -join ' ')
}

To find your profile location: echo $PROFILE

Common locations:

C:\Users\YourName\Documents\PowerShell\Microsoft.PowerShell_profile.ps1
C:\Users\YourName\Documents\WindowsPowerShell\Microsoft.PowerShell_profile.ps1

If the file doesn’t exist, create it:

New-Item -Path $PROFILE -Type File -Force

9.5 Tight Feedback Loops

Reading time: 5 minutes Skill level: Week 1+

Tight feedback loops accelerate learning and catch issues early. Design your workflow to validate changes immediately.

The Feedback Loop Pyramid

                    ┌─────────────┐
                    │   Deploy    │  ← Hours/Days
                    │   Tests     │
                    ├─────────────┤
                    │    CI/CD    │  ← Minutes
                    │   Pipeline  │
                    ├─────────────┤
                    │   Local     │  ← Seconds
                    │   Tests     │
                    ├─────────────┤
                    │  TypeCheck  │  ← Immediate
                    │    Lint     │
                    └─────────────┘

Implementing Tight Loops

Level 1: Immediate (IDE/Editor)

# Watch mode for instant feedback
pnpm tsc --watch
pnpm lint --watch

Level 2: On-Save (Git Hooks)

# Pre-commit hook
#!/bin/bash
pnpm lint-staged && pnpm tsc --noEmit

Level 3: On-Commit (CI)

# GitHub Action for PR checks
- run: pnpm lint && pnpm tsc && pnpm test

Claude Code Integration

Use hooks for automatic validation:

{
  "hooks": {
    "PostToolUse": [{
      "matcher": "Edit|Write",
      "hooks": ["./scripts/validate.sh"]
    }]
  }
}

validate.sh:

#!/bin/bash
# Run after every file change
FILE=$(echo "$TOOL_INPUT" | jq -r '.file_path // .file')
if [[ "$FILE" == *.ts || "$FILE" == *.tsx ]]; then
    npx tsc --noEmit "$FILE" 2>&1 | head -5
fi

Feedback Loop Checklist

Loop	Trigger	Response Time	What It Catches
Lint	On type	<1s	Style, imports
TypeCheck	On save	1-3s	Type errors
Unit tests	On save	5-15s	Logic errors
Integration	On commit	1-5min	API contracts
E2E	On PR	5-15min	User flows

💡 Tip: Faster loops catch more bugs. Invest in making your test suite fast.

Background Tasks for Fullstack Development

Problem: Fullstack development often requires long-running processes (dev servers, watchers) that block the main Claude session, preventing iterative frontend work.

Solution: Use Ctrl+B to background tasks and maintain tight feedback loops across the stack.

When to Background Tasks

Scenario	Background Command	Why
Dev server running	`pnpm dev` → `Ctrl+B`	Keeps server alive while iterating on frontend
Test watcher	`pnpm test --watch` → `Ctrl+B`	Monitor test results while coding
Build watcher	`pnpm build --watch` → `Ctrl+B`	Detect build errors without blocking session
Database migration	`pnpm migrate` → `Ctrl+B`	Long-running migration, work on other features
Docker compose	`docker compose up` → `Ctrl+B`	Infrastructure running, develop application

Fullstack Workflow Pattern

# 1. Start backend dev server
pnpm dev:backend
# Press Ctrl+B to background

# 2. Now Claude can iterate on frontend
"Update the login form UI to match Figma designs"
# Claude can read files, make changes, all while backend runs

# 3. Check server logs when needed
/tasks  # View background task status

# 4. Bring server back to foreground if needed
# (Currently: no built-in foreground command, restart if needed)

Real-World Example: API + Frontend Iteration

Traditional (blocked) flow:

$ pnpm dev:backend
# Server starts... Claude waits... session blocked
# Cannot iterate on frontend until server stops
# Kill server → work on frontend → restart server → repeat

Background task flow:

$ pnpm dev:backend
# Server starts...
$ Ctrl+B  # Background the server
# Claude is now free to work

"Add loading state to the API calls"
# Claude iterates on frontend
# Backend still running, can test immediately
# Tight feedback loop maintained

Context Rot Prevention

Problem: Long-running background tasks can cause context rot—Claude loses awareness of what’s running.

Solution: Check task status periodically:

# Before major changes
/tasks

# Output example:
# Task 1 (background): pnpm dev:backend
#   Status: Running (35 minutes)
#   Last output: Server listening on :3000

Best practices:

Background tasks at session start (setup phase)
Check /tasks before major architecture changes
Restart backgrounded tasks if context is lost
Use descriptive commands (pnpm dev:backend not just npm run dev)

Limitations

No foreground command: Cannot bring tasks back to foreground (yet)
Context loss: Long-running tasks may lose relevance to current work
Output not streamed: Background task output not visible unless checked
Session-scoped: Background tasks tied to Claude session, killed on exit

Workaround for foreground: If you need to interact with a backgrounded task, restart it in foreground:

# Can't foreground task directly
# Instead: check status, then restart if needed
/tasks  # See what's running
# Ctrl+C to stop current session interaction
# Restart the command you need in foreground

Integration with Teleportation

When using session teleportation (web → local), background tasks are not transferred:

Web sessions cannot background tasks
Teleported sessions start with clean slate
Restart required dev servers after teleportation

Teleport workflow:

# 1. Teleport session from web to local
claude --teleport

# 2. Restart dev environment
pnpm dev:backend
Ctrl+B  # Background

# 3. Continue work locally with full feedback loops

Monitoring Background Tasks

/tasks  # View all background tasks

# Output includes:
# - Task ID
# - Command run
# - Runtime duration
# - Recent output (last few lines)
# - Status (running, completed, failed)

Use /tasks when:

Starting new feature work (verify infrastructure running)
Debugging (check for error output in background tasks)
Before committing (ensure tests passed in background)
Session feels slow (check if background tasks consuming resources)

Disabling Background Tasks

# Environment variable (v2.1.4+)
export CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=true
claude

# Useful when:
# - Debugging Claude Code itself
# - Running in resource-constrained environments
# - Avoiding accidental backgrounding

💡 Key insight: Background tasks optimize fullstack workflows by decoupling infrastructure (servers, watchers) from iterative development. Use them strategically to maintain tight feedback loops across the entire stack.

9.6 Todo as Instruction Mirrors

Reading time: 5 minutes Skill level: Week 1+

TodoWrite isn’t just tracking—it’s an instruction mechanism. Well-crafted todos guide Claude’s execution.

The Mirror Principle

What you write as a todo becomes Claude’s instruction:

❌ Vague Todo → Vague Execution
"Fix the bug"

✅ Specific Todo → Precise Execution
"Fix null pointer in getUserById when user not found - return null instead of throwing"

Todo as Specification

## Effective Todo Pattern

- [ ] **What**: Create user validation function
- [ ] **Where**: src/lib/validation.ts
- [ ] **How**: Use Zod schema with email, password rules
- [ ] **Verify**: Test with edge cases (empty, invalid format)

Todo Granularity Guide

Task Complexity	Todo Granularity	Example
Simple fix	1-2 todos	”Fix typo in header component”
Feature	3-5 todos	Auth flow steps
Epic	10+ todos	Full feature with tests

Instruction Embedding

Embed constraints directly in todos:

## Bad
- [ ] Add error handling

## Good
- [ ] Add error handling: try/catch around API calls,
      log errors with context, return user-friendly messages,
      use existing ErrorBoundary component

Todo Templates

Bug Fix:

- [ ] Reproduce: [steps to reproduce]
- [ ] Root cause: [investigation findings]
- [ ] Fix: [specific change needed]
- [ ] Verify: [test command or manual check]

Feature:

- [ ] Design: [what components/functions needed]
- [ ] Implement: [core logic]
- [ ] Tests: [test coverage expectations]
- [ ] Docs: [if public API]

9.7 Output Styles

Reading time: 5 minutes Skill level: Week 1+

Control how Claude responds to match your workflow preferences.

Output Style Spectrum

← Minimal                                      Verbose →
───────────────────────────────────────────────────────
Code only | Code + comments | Explanations | Tutorial

Style Directives

Add to CLAUDE.md or prompt:

Minimal (Expert Mode):

Output code only. No explanations unless asked.
Assume I understand the codebase.

Balanced:

Explain significant decisions. Comment complex logic.
Skip obvious explanations.

Verbose (Learning Mode):

Explain each step. Include alternatives considered.
Link to documentation for concepts used.

Context-Aware Styles

## In CLAUDE.md

### Output Preferences
- **Code reviews**: Detailed, cite specific lines
- **Bug fixes**: Minimal, show diff only
- **New features**: Balanced, explain architecture decisions
- **Refactoring**: Minimal, trust my review

Format Control

For code:

Format code output as:
- Full file with changes marked: // CHANGED
- Diff format for reviews
- Inline for small changes

For explanations:

Explain using:
- Bullet points for lists
- Tables for comparisons
- Diagrams for architecture

Output Templates

Bug Fix Output:

**Root Cause**: [one line]
**Fix**: [code block]
**Test**: [verification command]

Feature Output:

**Files Changed**: [list]
**Key Decisions**: [bullet points]
**Next Steps**: [if any]

Mermaid Diagram Generation

Claude Code can generate Mermaid diagrams for visual documentation. This is useful for architecture documentation, flow visualization, and system understanding.

Supported Diagram Types

Type	Use Case	Syntax Start
Flowchart	Process flows, decision trees	`flowchart TD`
Sequence	API calls, interactions	`sequenceDiagram`
Class	OOP structure, relationships	`classDiagram`
ER	Database schema	`erDiagram`
State	State machines	`stateDiagram-v2`
Gantt	Project timelines	`gantt`

Request Patterns

Architecture diagram:

Generate a Mermaid flowchart showing the authentication flow:
1. User submits credentials
2. Server validates
3. JWT issued or error returned

Database schema:

Create an ER diagram for our user management system
showing User, Role, and Permission relationships.

Sequence diagram:

Show me a Mermaid sequence diagram of how our
checkout process calls payment API → inventory → notification services.

Example Outputs

Flowchart:

flowchart TD
    A[User Request] --> B{Authenticated?}
    B -->|Yes| C[Process Request]
    B -->|No| D[Return 401]
    C --> E[Return Response]

Sequence:

sequenceDiagram
    Client->>+API: POST /checkout
    API->>+Payment: charge()
    Payment-->>-API: success
    API->>+Inventory: reserve()
    Inventory-->>-API: confirmed
    API-->>-Client: order_id

Class:

classDiagram
    class User {
        +String email
        +String passwordHash
        +login()
        +logout()
    }
    class Role {
        +String name
        +Permission[] permissions
    }
    User "1" --> "*" Role

Where to Visualize

Platform	Support
GitHub	Native rendering in README, issues, PRs
VS Code	Mermaid Preview extension
GitLab	Native rendering
Notion	Code block with mermaid language
mermaid.live	Online editor with export

Integration Tips

In CLAUDE.md: Ask Claude to document architecture decisions with diagrams
In PRs: Include sequence diagrams for complex flows
In docs/: Generate architecture.md with embedded diagrams
Export: Use mermaid.live to export as PNG/SVG for presentations

9.8 Vibe Coding & Skeleton Projects

Reading time: 5 minutes Skill level: Month 1

Vibe Coding

“Vibe Coding” is rapid prototyping through natural conversation—exploring ideas before committing to implementation.

When to Vibe Code

Early exploration: Testing if an approach works
Proof of concept: Quick validation before full implementation
Learning: Understanding a new library or pattern

Vibe Coding Session

User: Let's explore how we could add real-time updates to the dashboard.
      Don't write code yet, just brainstorm approaches.

Claude: Here are three approaches we could consider:

1. **WebSockets**: Full duplex, lowest latency
   - Pro: Real-time push
   - Con: Connection management complexity

2. **Server-Sent Events (SSE)**: One-way server push
   - Pro: Simple, native browser support
   - Con: One direction only

3. **Polling**: Periodic HTTP requests
   - Pro: Simplest implementation
   - Con: Not truly real-time, more server load

User: Let's vibe on option 2. Show me a minimal SSE setup.

Claude: [Provides minimal example for exploration]

Vibe Coding Rules

No production code: This is exploration only
Throw away freely: No attachment to vibe code
Focus on learning: Understand the approach
Signal clearly: “This is vibe code, not for production”

Anti-Pattern: Context Overload

Jens Rusitschka identifies “context overload” as the primary failure mode of vibe coding: dumping entire codebases into context, hoping Claude will figure it out.

Symptoms:

Pasting 5K+ lines of code in first prompt
“Read the entire repo and implement X”
Expecting Claude to maintain context across 20+ file changes
Performance degradation after context pollution (see §2.2 Fresh Context Pattern)

Why it fails:

Attention dilution across too many files and concerns
Lost architectural reasoning in noise
Failed attempts accumulate, further degrading quality
Context bleeding between unrelated tasks

The Phased Context Strategy:

Instead of big-bang context dump, use a staged approach that leverages Claude Code’s native features:

Phase	Tool	Purpose	Context Size
1. Exploration	`/plan` mode	Read-only analysis, safe investigation	Controlled (plan writes findings)
2. Implementation	Normal mode	Execute planned changes	Focused (plan guides scope)
3. Fresh Start	Session handoff	Reset when context >75%	Minimal (handoff doc only)

Practical workflow:

# Phase 1: Exploration (read-only, safe)
/plan
You: "How should I refactor the auth system for OAuth?"
Claude: [explores codebase, writes plan to .claude/plans/oauth-refactor.md]
/execute  # exit plan mode

# Phase 2: Implementation (focused context)
You: "Execute the plan from .claude/plans/oauth-refactor.md"
Claude: [reads plan, implements in focused scope]

# Phase 3: Fresh start if needed (context >75%)
You: "Create session handoff document"
Claude: [writes handoff to claudedocs/handoffs/oauth-implementation.md]
# New session: cat claudedocs/handoffs/oauth-implementation.md | claude -p

Cross-references:

Full /plan workflow: See §2.3 Plan Mode (line 2100)
Fresh context pattern: See §2.2 Fresh Context Pattern (line 1525)
Session handoffs: See Session Handoffs (line 2278)

The insight: Rusitschka’s “Vibe Coding, Level 2” is Claude Code’s native workflow — it just needed explicit framing as an anti-pattern antidote. Plan mode prevents context pollution during exploration, fresh context prevents accumulation during implementation, and handoffs enable clean phase transitions.

Skeleton Projects

Skeleton projects are minimal, working templates that establish patterns before full implementation.

Skeleton Structure

project/
├── src/
│   ├── index.ts           # Entry point (working)
│   ├── config.ts          # Config structure (minimal)
│   ├── types.ts           # Core types (defined)
│   └── features/
│       └── example/       # One working example
│           ├── route.ts
│           ├── service.ts
│           └── repo.ts
├── tests/
│   └── example.test.ts    # One working test
└── package.json           # Dependencies defined

Skeleton Principles

It must run: pnpm dev works from day 1
One complete vertical: Full stack for one feature
Patterns, not features: Shows HOW, not WHAT
Minimal dependencies: Only what’s needed

Creating a Skeleton

User: Create a skeleton for our new microservice. Include:
      - Express setup
      - One complete route (health check)
      - Database connection pattern
      - Test setup
      - Docker configuration

Claude: [Creates minimal, working skeleton with these elements]

Skeleton Expansion

Skeleton (Day 1)     →    MVP (Week 1)    →    Full (Month 1)
────────────────────────────────────────────────────────────
1 route              →    5 routes        →    20 routes
1 test               →    20 tests        →    100+ tests
Basic config         →    Env-based       →    Full config
Local DB             →    Docker DB       →    Production DB

9.9 Batch Operations Pattern

Reading time: 5 minutes Skill level: Week 1+

Batch operations improve efficiency and reduce context usage when making similar changes across files.

When to Batch

Scenario	Batch?	Why
Same change in 5+ files	✅ Yes	Efficiency
Related changes in 3 files	✅ Yes	Coherence
Unrelated fixes	❌ No	Risk of errors
Complex refactoring	⚠️ Maybe	Depends on pattern

Batch Patterns

1. Import Updates

User: Update all files in src/components to use the new Button import:
      - Old: import { Button } from "~/ui/button"
      - New: import { Button } from "~/components/ui/button"

2. API Migration

User: Migrate all API calls from v1 to v2:
      - Change: /api/v1/* → /api/v2/*
      - Update response handling for new format
      - Files: src/services/*.ts

3. Pattern Application

User: Add error boundaries to all page components:
      - Wrap each page export with ErrorBoundary
      - Use consistent error fallback
      - Files: src/pages/**/*.tsx

Batch Execution Strategy

1. Identify scope   → List all affected files
2. Define pattern   → Exact change needed
3. Create template  → One example implementation
4. Batch apply      → Apply to all files
5. Verify all       → Run tests, typecheck

Batch with Claude

## Effective Batch Request

"Apply this change pattern to all matching files:

**Pattern**: Add 'use client' directive to components using hooks
**Scope**: src/components/**/*.tsx
**Rule**: If file contains useState, useEffect, or useContext
**Change**: Add 'use client' as first line

List affected files first, then make changes."

9.10 Continuous Improvement Mindset

The goal isn’t just to use AI for coding — it’s to continuously improve the workflow so AI produces better results with less intervention.

The Key Question

After every manual intervention, ask yourself:

“How can I improve the process so this error or manual fix can be avoided next time?”

Improvement Pipeline

Error or manual intervention detected
        │
        ▼
Can a linting rule catch it?
        │
    YES ─┴─ NO
     │      │
     ▼      ▼
Add lint   Can it go in conventions/docs?
rule            │
            YES ─┴─ NO
             │      │
             ▼      ▼
        Add to    Accept as
      CLAUDE.md   edge case
       or ADRs

Practical Examples

Problem	Solution	Where to Add
Agent forgets to run tests	Add to workflow command	`.claude/commands/complete-task.md`
Code review catches style issue	Add ESLint rule	`.eslintrc.js`
Same architecture mistake repeated	Document decision	`docs/conventions/architecture.md`
Agent uses wrong import pattern	Add example	`CLAUDE.md`

The Mindset Shift

Traditional: “I write code, AI helps”

AI-native: “I improve the workflow and context so AI writes better code”

“Software engineering might be more workflow + context engineering.” — Nick Tune

This is the meta-skill: instead of fixing code, fix the system that produces the code.

Inspired by Nick Tune’s Coding Agent Development Workflows

See also: §2.5 From Chatbot to Context System — the four-layer framework (CLAUDE.md, skills, hooks, memory) that makes this mindset operational.

9.11 Common Pitfalls & Best Practices

Learn from common mistakes to avoid frustration and maximize productivity.

Security Pitfalls

❌ Don’t:

Use --dangerously-skip-permissions on production systems or sensitive codebases
Hard-code secrets in commands, config files, or CLAUDE.md
Grant overly broad permissions like Bash(*) without restrictions
Run Claude Code with elevated privileges (sudo/Administrator) unnecessarily
Commit .claude/settings.local.json to version control (contains API keys)
Share session IDs or logs that may contain sensitive information
Disable security hooks during normal development

✅ Do:

Store secrets in environment variables or secure vaults
Start from minimal permissions and expand gradually as needed
Audit regularly with claude config list to review active permissions
Isolate risky operations in containers, VMs, or separate environments
Use .gitignore to exclude sensitive configuration files
Review all diffs before accepting changes, especially in security-critical code
Implement PreToolUse hooks to catch accidental secret exposure
Use Plan Mode for exploring unfamiliar or sensitive codebases

Example Security Hook:

#!/bin/bash
# .claude/hooks/PreToolUse.sh - Block secrets in commits

INPUT=$(cat)
TOOL_NAME=$(echo "$INPUT" | jq -r '.tool.name')

if [[ "$TOOL_NAME" == "Bash" ]]; then
    COMMAND=$(echo "$INPUT" | jq -r '.tool.input.command')

    # Block git commits with potential secrets
    if [[ "$COMMAND" == *"git commit"* ]] || [[ "$COMMAND" == *"git add"* ]]; then
        # Check for common secret patterns
        if git diff --cached | grep -E "(password|secret|api_key|token).*=.*['\"]"; then
            echo "❌ Potential secret detected in staged files" >&2
            exit 2  # Block the operation
        fi
    fi
fi

exit 0  # Allow

Performance Pitfalls

❌ Don’t:

Load entire monorepo when you only need one package
Max out thinking/turn budgets for simple tasks (wastes time and money)
Ignore session cleanup - old sessions accumulate and slow down Claude Code
Use deep thinking prompts for trivial edits like typo fixes
Keep context at 90%+ for extended periods
Load large binary files or generated code into context
Run expensive MCP operations in tight loops

✅ Do:

Use --add-dir to allow tool access to directories outside the current working directory
Manage thinking mode for cost efficiency:
- Simple tasks: Alt+T to disable thinking → faster, cheaper
- Complex tasks: Leave thinking enabled (default in Opus 4.5)
- Note: Keywords like “ultrathink” no longer have effect
Set cleanupPeriodDays in config to prune old sessions automatically
Use /compact proactively when context reaches 70%
Block sensitive files with permissions.deny in settings.json
Monitor cost with /status and adjust model/thinking levels accordingly
Cache expensive computations in memory with Serena MCP

Context Management Strategy:

Context Level	Action	Why
0-50%	Work freely	Optimal performance
50-70%	Be selective	Start monitoring
70-85%	`/compact` now	Prevent degradation
85-95%	`/compact` or `/clear`	Significant slowdown
95%+	`/clear` required	Risk of errors

Workflow Pitfalls

❌ Don’t:

Skip project context (CLAUDE.md) - leads to repeated corrections
Use vague prompts like “fix this” or “check my code”
Ignore errors in logs or dismiss warnings
Automate workflows without testing in safe environments first
Accept changes blindly without reviewing diffs
Work without version control or backups
Mix multiple unrelated tasks in one session
Forget to commit after completing tasks

✅ Do:

Maintain and update CLAUDE.md regularly with:
- Tech stack and versions
- Coding conventions and patterns
- Architecture decisions
- Common gotchas specific to your project
Be specific and goal-oriented in prompts using WHAT/WHERE/HOW/VERIFY format
Monitor via logs or OpenTelemetry when appropriate
Test automation in dev/staging environments first
Always review agent outputs before accepting — especially polished ones (see Artifact Paradox below)
Use git branches for experimental changes
Break complex tasks into focused sessions
Commit frequently with descriptive messages

⚠️ The Artifact Paradox — Anthropic AI Fluency Index (Feb 2026)

Anthropic research on 9,830 Claude conversations reveals a critical counter-intuitive finding: when Claude produces a polished artifact (code, files, configs), users become measurably less critical, not more.

Compared to sessions without artifact production:

−5.2pp likelihood of identifying missing context

−3.7pp likelihood of fact-checking the output

−3.1pp likelihood of questioning the reasoning

Users do become more directive (+14.7pp clarifying goals, +14.5pp specifying format) — but their critical evaluation drops precisely when the output looks finished.

For Claude Code, this is the nominal case. Every generated file, every written test, every created config is an artifact. The polished compile-and-run output is exactly when you should apply the most scrutiny — not the least.

Counter-measures:

Run tests before accepting generated code, not after

Explicitly ask: “What edge cases or requirements did you not address?”

Use the output-validator hook for automated checks

Apply the VERIFY step of the WHAT/WHERE/HOW/VERIFY format even when output looks complete

In Plan Mode: challenge the plan before executing, not after seeing the result

Source: Swanson et al., “The AI Fluency Index”, Anthropic (2026-02-23) — anthropic.com/research/AI-fluency-index

📊 Visual: AI Fluency — High vs Low Fluency Paths

Effective Prompt Format:

## Task Template

**WHAT**: [Concrete deliverable - e.g., "Add email validation to signup form"]
**WHERE**: [File paths - e.g., "src/components/SignupForm.tsx"]
**HOW**: [Constraints/approach - e.g., "Use Zod schema, show inline errors"]
**VERIFY**: [Success criteria - e.g., "Empty email shows error, invalid format shows error, valid email allows submit"]

## Example

WHAT: Add input validation to the login form
WHERE: src/components/LoginForm.tsx, src/schemas/auth.ts
HOW: Use Zod schema validation, display errors inline below inputs
VERIFY:
- Empty email shows "Email required"
- Invalid email format shows "Invalid email"
- Empty password shows "Password required"
- Valid inputs clear errors and allow submission

Collaboration Pitfalls

❌ Don’t:

Commit personal API keys or local settings to shared repos
Override team conventions in personal .claude/ without discussion
Use non-standard agents/skills without team alignment
Modify shared hooks without testing across team
Skip documentation for custom commands/agents
Use different Claude Code versions across team without coordinating

✅ Do:

Use .gitignore for .claude/settings.local.json and personal configs
Document team-wide conventions in project CLAUDE.md (committed)
Share useful agents/skills via team repository or wiki
Test hooks in isolation before committing
Maintain README for .claude/agents/ and .claude/commands/
Coordinate Claude Code updates and test compatibility
Use consistent naming conventions for custom components
Share useful prompts and patterns in team knowledge base

Recommended .gitignore:

# Claude Code - Personal
.claude/settings.local.json
.claude/CLAUDE.md
.claude/.serena/

# Claude Code - Team (committed)
# .claude/agents/
# .claude/commands/
# .claude/hooks/
# .claude/settings.json

# Environment
.env.local
.env.*.local

Codebase Structure Pitfalls

❌ Don’t:

Use abbreviated variable/function names (usr, evt, calcDur) - agents can’t find them
Write obvious comments that waste tokens (// Import React)
Keep large monolithic files (>500 lines) that agents must read in chunks
Hide business logic in tribal knowledge - agents need explicit documentation
Assume agents know your custom patterns without documentation (ADRs)
Delegate test writing to agents - they’ll write tests that match their (potentially flawed) implementation

✅ Do:

Use complete, searchable terms (user, event, calculateDuration)
Add synonyms in comments for discoverability (“member, subscriber, customer”)
Split large files by concern (validation, sync, business logic)
Embed domain knowledge in CLAUDE.md, ADRs, and code comments
Document custom architectures with Architecture Decision Records (ADRs)
Write tests manually first (TDD), then have agents implement to pass tests
Use standard design patterns (Singleton, Factory, Repository) that agents know from training
Add cross-references between related modules

Agent-hostile example:

class UsrMgr {
  async getUsr(id: string) { /* ... */ }
}

Agent-friendly example:

/**
 * User account management service.
 * Also known as: member manager, subscriber service
 *
 * Related: user-repository.ts, auth-service.ts
 */
class UserManager {
  /**
   * Fetch user by ID. Returns null if not found.
   * Common use: authentication, profile rendering
   */
  async getUser(userId: string): Promise<User | null> { /* ... */ }
}

Comprehensive guide: For complete codebase optimization strategies including token efficiency, testing approaches, and guardrails, see Section 9.18: Codebase Design for Agent Productivity.

Cost Optimization Pitfalls

❌ Don’t:

Use Opus for simple tasks that Sonnet can handle
Use deep thinking prompts for every task by default
Ignore the cost metrics in /status
Use MCP servers that make external API calls excessively
Load entire codebase for focused tasks
Re-analyze unchanged code repeatedly

✅ Do:

Use OpusPlan mode: Opus for planning, Sonnet for execution
Match model to task complexity:
- Haiku: Code review, simple fixes
- Sonnet: Most development tasks
- Opus: Architecture, complex debugging
Monitor cost with /status regularly
Set budget alerts if using API directly
Use Serena memory to avoid re-analyzing code
Leverage context caching with /compact
Batch similar operations together

Cost-Effective Model Selection:

See Section 2.5 Model Selection & Thinking Guide for the canonical decision table with effort levels and cost estimates.

Learning & Adoption Pitfalls

❌ Don’t:

Try to learn everything at once - overwhelming and inefficient
Skip the basics and jump to advanced features
Expect perfection from AI - it’s a tool, not magic
Blame Claude for errors without reviewing your prompts
Work in isolation without checking community resources
Give up after first frustration
Trust AI output without proportional verification - AI code has 1.75× more logic errors than human-written code (source). Match verification effort to risk level (see Section 1.7)

✅ Do:

Follow progressive learning path:
1. Week 1: Basic commands, context management
2. Week 2: CLAUDE.md, permissions
3. Week 3: Agents and commands
4. Month 2+: MCP servers, advanced patterns
Start with simple, low-risk tasks
Iterate on prompts based on results
Review this guide and community resources regularly
Join Claude Code communities (Discord, GitHub discussions)
Share learnings and ask questions
Celebrate small wins and track productivity gains

Learning Checklist:

□ Week 1: Installation & Basic Usage
  □ Install Claude Code successfully
  □ Complete first task (simple edit)
  □ Understand context management (use /compact)
  □ Learn permission modes (try Plan Mode)

□ Week 2: Configuration & Memory
  □ Create project CLAUDE.md
  □ Set up .gitignore correctly
  □ Configure permissions in settings.local.json
  □ Use @file references effectively

□ Week 3-4: Customization
  □ Create first custom agent
  □ Create first custom command
  □ Set up at least one hook
  □ Explore one MCP server (suggest: Context7)

□ Month 2+: Advanced Patterns
  □ Implement Trinity pattern (Git + TodoWrite + Agent)
  □ Set up CI/CD integration
  □ Configure OpusPlan mode
  □ Build team workflow patterns

Enterprise Anti-Patterns (2026 Industry Data)

Source: Anthropic 2026 Agentic Coding Trends Report

Based on Anthropic research across 5000+ organizations, these anti-patterns emerged as the most costly mistakes in agentic coding adoption.

❌ Over-Delegation (>5 Agents)

Symptom: Context switching cost exceeds productivity gain

Example:

Team spawns 10 agents simultaneously:
- 6 agents blocked waiting for each other
- 3 agents working on conflicting changes
- 1 agent actually productive
→ Net result: Slower than 2 well-coordinated agents

Why it fails: Coordination overhead grows quadratically (N agents = N² potential conflicts)

✅ Fix:

Start with 2-3 agents maximum
Measure productivity gain before scaling
Anthropic data: Sweet spot = 3-5 agents for most teams
Boris Cherny (creator): 5-15 agents, but with ideal architecture + resources

❌ Premature Automation

Symptom: Automating workflow not mastered manually first

Example:

Team automates PR review before:
- Understanding what good reviews look like
- Having manual review checklist
- Testing on 10+ PRs manually
→ Automated garbage (agent reproduces poor manual practices)

Why it fails: AI amplifies existing patterns (garbage in = garbage out)

✅ Fix:

Manual → Semi-auto → Full-auto (progressive)
Document manual process first (becomes CLAUDE.md rules)
Test automation on 20+ examples before full rollout
Anthropic finding: 60% use AI, but only 0-20% fully delegate (collaboration ≠ replacement)

❌ Tool Sprawl (>10 MCP Servers)

Symptom: Maintenance burden, version conflicts, debugging hell

Example:

Project has 15 MCP servers:
- 8 unused (installed for one-off task)
- 4 duplicative (3 different doc lookup servers)
- 2 conflicting (competing file search implementations)
- 1 actually needed daily
→ Startup time: 45 seconds, frequent crashes

Why it fails: Each MCP server = additional failure point, dependency, configuration

✅ Fix:

Start core stack: Serena (symbols), Context7 (docs), Sequential (reasoning)
Add selectively: One MCP server at a time, measure value
Audit quarterly: Remove unused servers (/mcp list → usage stats)
Anthropic team pattern: CLI/scripts over MCP unless bidirectional communication needed

❌ Ignoring Collaboration Paradox

Symptom: Expecting 100% delegation, frustrated by constant supervision needed

Example:

Engineer assumes "AI writes code, I review":
- Reality: Constant clarification questions
- Reality: Edge cases require human judgment
- Reality: Architecture decisions still need human input
→ Burnout from micromanaging instead of collaborating

Why it fails: Current AI state = collaboration tool, not autonomous replacement

✅ Fix:

Accept 60% AI usage, 0-20% full delegation as normal (Anthropic data)
Design workflows for collaboration, not delegation
Use AI for: Easily verifiable, well-defined, repetitive tasks
Keep human: High-level design, organizational context, “taste” decisions

❌ No ROI Measurement

Symptom: Scaling spend without tracking productivity gain

Example:

Team increases from 3 to 10 Claude instances:
- Monthly cost: $500 → $2,000
- Measured output: ??? (no tracking)
- Actual gain: Unclear if positive ROI
→ CFO asks "Why $2K/month?" → No answer → Budget cut

Why it fails: Can’t optimize what you don’t measure

✅ Fix:

Track baseline: PRs/week, features shipped/month, bugs fixed/sprint
Measure after scaling: Same metrics
Calculate ROI: (Productivity gain × engineer hourly rate) - Claude cost
Anthropic validation: 67% more PRs merged/day = measurable productivity
Share metrics with leadership (justify budget, demonstrate value)

Quick Reference: Avoiding Anti-Patterns

Anti-Pattern	Limit	Measurement	Fix Trigger
Over-delegation	>5 agents	Coordination overhead	Reduce to 2-3, measure
Tool sprawl	>10 MCP servers	Startup time, crashes	Quarterly audit, remove unused
Premature automation	-	Manual process unclear	Document → Test → Automate
No ROI tracking	-	Can’t answer “What gain?”	Baseline → Measure → Optimize

Industry benchmark (Anthropic 2026):

3-6 months adoption timeline for Agent Teams
$500-1K/month cost for Multi-Instance (positive ROI at >3 instances)
27% new work (wouldn’t be done without AI) = harder to measure but valuable

9.12 Git Best Practices & Workflows

Effective git workflows with Claude Code for professional development.

Commit Message Best Practices

Claude Code generates commit messages automatically. Guide it with clear context.

Default behavior:

# After changes, Claude creates commits like:
git commit -m "feat: add user authentication middleware

- Implement JWT validation
- Add session management
- Create auth error handling

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"

AI Attribution Standards

For comprehensive AI code attribution beyond Co-Authored-By, including:

LLVM’s Assisted-by: trailer standard
git-ai checkpoint tracking
Team and enterprise compliance patterns

See: AI Traceability Guide

Customize commit style in CLAUDE.md:

## Git Commit Conventions

Follow Conventional Commits format:
- feat: New features
- fix: Bug fixes
- docs: Documentation changes
- refactor: Code restructuring
- test: Test additions/changes
- chore: Maintenance tasks

Keep first line under 72 characters.
Include ticket number: [PROJ-123]

Example with guidance:

You: "Implement login endpoint"

Claude: Creates commit:
git commit -m "feat: implement user login endpoint [AUTH-42]

- Add POST /api/auth/login route
- Validate credentials with bcrypt
- Generate JWT tokens
- Return user profile on success

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"

Git Amend Workflow

When to amend:

Fix typos in recent commit
Add forgotten files to last commit
Improve commit message
Only for unpushed commits

Safe amend pattern:

# Check if commit is unpushed
git status
# Output: "Your branch is ahead of 'origin/main' by 1 commit"

# Amend the commit
You: "Add the missing test file to the last commit"

Claude:
git add tests/auth.test.ts
git commit --amend --no-edit

❌ Never amend pushed commits (requires force push, breaks team history)

✅ For pushed commits, create new commit:

You: "Fix typo in authentication function"

Claude:
# Creates new commit instead of amending
git commit -m "fix: correct typo in authenticateUser function"

Branch Management

Feature branch pattern:

## CLAUDE.md Git Workflow

Always work on feature branches:
1. Create branch from main: git checkout -b feature/name
2. Make changes and commit regularly
3. Push to remote: git push -u origin feature/name
4. Create pull request when ready
5. Never commit directly to main/master

Branch naming conventions:

feature/user-authentication
fix/login-validation-bug
refactor/api-endpoints
docs/update-readme
chore/dependency-updates

Claude Code with branches:

# Start new feature
You: "Create a new branch for adding search functionality"

Claude:
git checkout -b feature/search-functionality

# Work on feature...
# ...

# When complete
You: "Create pull request for search feature"

Claude:
git push -u origin feature/search-functionality
# Opens PR creation in browser

Rewind vs Revert

/rewind (local undo):

Undoes Claude’s recent changes in current session
Does NOT create git commits
Works only for uncommitted changes
Use when: Claude made a mistake, you want to try different approach

Example:

You: "Add email validation to login form"
Claude: [Makes changes]
You: [Reviews diff] "This breaks the existing flow"
/rewind
# Changes are undone, back to previous state
You: "Add email validation but preserve existing flow"

git revert (committed changes):

Creates new commit that undoes previous commit
Safe for pushed commits (preserves history)
Use when: Need to undo committed changes

Example:

You: "Revert the authentication changes from the last commit"

Claude:
git revert HEAD
# Creates new commit: "Revert 'feat: add authentication'"

Decision tree:

Changes not committed yet? → Use /rewind
Changes committed but not pushed? → Use git reset (careful!)
Changes committed and pushed? → Use git revert

Git Worktrees for Parallel Development

What are worktrees?

Git worktrees (available since Git 2.5.0, July 2015) create multiple working directories from the same repository, each checked out to a different branch.

Traditional workflow problem:

# Working on feature A
git checkout feature-a
# 2 hours of work...

# Urgent hotfix needed
git stash              # Save current work
git checkout main
git checkout -b hotfix
# Fix the bug...
git checkout feature-a
git stash pop          # Resume work

Worktree solution:

# One-time setup
git worktree add ../myproject-hotfix hotfix
git worktree add ../myproject-feature-a feature-a

# Now work in parallel
cd ../myproject-hotfix    # Terminal 1
claude                    # Fix the bug

cd ../myproject-feature-a # Terminal 2
claude                    # Continue feature work

When to use worktrees:

✅ Use worktrees when:

Working on multiple features simultaneously
Need to test different approaches in parallel
Reviewing code while developing
Running long CI/CD builds while coding
Maintaining multiple versions (v1 support + v2 development)

❌ Don’t use worktrees when:

Simple branch switching is sufficient
Disk space is limited (each worktree = full working directory)
Team is unfamiliar with worktrees (adds complexity)

Worktree lifecycle commands:

The full worktree lifecycle is covered by 4 companion commands:

Command	Purpose
`/git-worktree`	Create worktree with branch validation, symlinked deps, background checks
`/git-worktree-status`	Check background verification tasks (type check, tests, build)
`/git-worktree-remove`	Safely remove single worktree with merge checks and DB cleanup
`/git-worktree-clean`	Batch cleanup of stale worktrees with disk usage report

# Create with auto-prefix and symlinked node_modules
You: "/git-worktree auth"
# → Creates feat/auth branch, symlinks node_modules, runs checks in background

# Check background verification status
You: "/git-worktree-status"
# → Type check: PASS, Tests: PASS (142 tests)

# Remove after merge
You: "/git-worktree-remove feat/auth"
# → Removes worktree + branch (local + remote) + DB cleanup reminder

# Batch cleanup of all merged worktrees
You: "/git-worktree-clean --dry-run"
# → Preview: 3 merged (4.2 MB), 1 unmerged (kept)

💡 Tip — Symlink node_modules: The /git-worktree command symlinks node_modules from the main worktree by default, saving ~30s per worktree creation and significant disk space. Use --isolated when you need fresh dependencies (e.g., testing upgrades).

Worktree management:

# List all worktrees
git worktree list

# Remove worktree (after merging feature)
git worktree remove .worktrees/feature/new-api

# Cleanup stale worktree references
git worktree prune

💡 Team tip — Shell aliases for fast worktree navigation: The Claude Code team uses single-letter aliases to hop between worktrees instantly:
Terminal window
# ~/.zshrc or ~/.bashrc
alias za="cd .worktrees/feature-a"
alias zb="cd .worktrees/feature-b"
alias zc="cd .worktrees/feature-c"
alias zlog="cd .worktrees/analysis"  # Dedicated worktree for logs & queries
The dedicated “analysis” worktree is used for reviewing logs and running database queries without polluting active feature branches.

Source: 10 Tips from Inside the Claude Code Team

Claude Code context in worktrees:

Each worktree maintains independent Claude Code context:

# Terminal 1 - Worktree A
cd .worktrees/feature-a
claude
You: "Implement user authentication"
# Claude indexes feature-a worktree

# Terminal 2 - Worktree B (simultaneous)
cd .worktrees/feature-b
claude
You: "Add payment integration"
# Claude indexes feature-b worktree (separate context)

Memory files with worktrees:

Global memory (~/.claude/CLAUDE.md): Shared across all worktrees
Project memory (repo root CLAUDE.md): Committed, shared
Worktree-local memory (.claude/CLAUDE.md in worktree): Specific to that worktree

Recommended structure:

~/projects/
├── myproject/              # Main worktree (main branch)
│   ├── CLAUDE.md          # Project conventions (committed)
│   └── .claude/
├── myproject-develop/      # develop branch worktree
│   └── .claude/           # Develop-specific config
├── myproject-feature-a/    # feature-a branch worktree
│   └── .claude/           # Feature A context
└── myproject-hotfix/       # hotfix branch worktree
    └── .claude/           # Hotfix context

Best practices:

Name worktrees clearly:

# Bad
git worktree add ../temp feature-x

# Good
git worktree add ../myproject-feature-x feature-x

Add to .gitignore:

# Worktree directories
.worktrees/
worktrees/

Clean up merged branches:

git worktree remove myproject-feature-x
git branch -d feature-x  # Delete local branch after merge
git push origin --delete feature-x  # Delete remote branch

Use consistent location:
- .worktrees/ (hidden, in project root)
- worktrees/ (visible, in project root)
- ../myproject-* (sibling directories)
Don’t commit worktree contents:
- Always ensure worktree directories are in .gitignore
- The /git-worktree command verifies this automatically

Advanced: Parallel testing pattern:

# Test feature A while working on feature B
cd .worktrees/feature-a
npm test -- --watch &      # Run tests in background

cd .worktrees/feature-b
claude                      # Continue development
You: "Add new API endpoint"
# Tests for feature A still running in parallel

Worktree troubleshooting:

Problem: Worktree creation fails with “already checked out”

# Solution: You can't check out the same branch in multiple worktrees
git worktree list  # See which branches are checked out
# Use a different branch or remove the existing worktree first

Problem: Disk space issues

# Each worktree is a full working directory
# Solution: Clean up unused worktrees regularly
git worktree prune

Problem: Can’t delete worktree directory

# Solution: Use git worktree remove, not rm -rf
git worktree remove --force .worktrees/old-feature

Resources:

Git Worktree Documentation
Worktree lifecycle commands:
- examples/commands/git-worktree.md — Create
- examples/commands/git-worktree-status.md — Status
- examples/commands/git-worktree-remove.md — Remove
- examples/commands/git-worktree-clean.md — Clean

Claude Code Native Worktree Features (v2.1.49–v2.1.50)

Claude Code has built-in worktree integration beyond the manual git worktree workflow above.

Start Claude in an isolated worktree

# --worktree / -w flag: creates a temporary worktree based on HEAD
claude --worktree
claude -w

The worktree is created automatically, Claude runs inside it, and it is cleaned up on exit (if no changes were made).

Declarative isolation in agent definitions

Set isolation: "worktree" in an agent’s frontmatter to automatically spawn it in a fresh worktree every time (v2.1.50+):

---
name: refactoring-agent
description: Large-scale refactors that must not pollute the main working tree
model: opus
isolation: "worktree"   # Each invocation gets its own isolated checkout
---

Perform the requested refactoring. Commit your changes inside the worktree.

This replaces the earlier pattern of manually passing isolation: "worktree" to each Task tool call.

Custom VCS setup with hook events (v2.1.50+)

Two new hook events fire around agent worktree lifecycle:

Event	Fires	Use case
`WorktreeCreate`	When an agent worktree is created	Set up DB branch, copy .env, install deps
`WorktreeRemove`	When an agent worktree is torn down	Clean up DB branch, delete temp credentials

{
  "hooks": {
    "WorktreeCreate": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "scripts/worktree-setup.sh $CLAUDE_WORKTREE_PATH"
          }
        ]
      }
    ],
    "WorktreeRemove": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "scripts/worktree-teardown.sh $CLAUDE_WORKTREE_PATH"
          }
        ]
      }
    ]
  }
}

Typical worktree-setup.sh: create a Neon/PlanetScale DB branch, copy .env.local, run npm install.

Enterprise config auditing with ConfigChange (v2.1.49+)

The ConfigChange hook fires whenever a configuration file changes during a session. Use it to audit or block unauthorized live configuration modifications — particularly useful in enterprise environments with managed policy hooks.

{
  "hooks": {
    "ConfigChange": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "scripts/audit-config-change.sh"
          }
        ]
      }
    ]
  }
}

Example audit-config-change.sh (log + optionally block):

#!/bin/bash
# Receives JSON on stdin with changed config path
CONFIG=$(cat | jq -r '.config_path // "unknown"')
echo "[ConfigChange] $(date -u +%Y-%m-%dT%H:%M:%SZ) $CONFIG" >> ~/.claude/logs/config-audit.log
# Exit 2 to block the change, exit 0 to allow it
exit 0

Enterprise note: disableAllHooks (v2.1.49+) can no longer bypass managed hooks — hooks set via organizational policy always run regardless of this setting. Only non-managed hooks are affected.

Database Branch Isolation with Worktrees

Modern pattern (2024+): Combine git worktrees with database branches for true feature isolation.

The Problem:

Traditional workflow:
Git branch → Shared dev database → Schema conflicts → Migration hell

The Solution:

Modern workflow:
Git worktree + DB branch → Isolated environments → Safe experimentation

How it works:

# 1. Create worktree (standard)
/git-worktree feature/auth

# 2. Claude detects your database and suggests:
🔍 Detected Neon database
💡 DB Isolation: neonctl branches create --name feature-auth --parent main
   Then update .env with new DATABASE_URL

# 3. You run the commands (or skip if not needed)
# 4. Work in isolated environment

Provider detection:

The /git-worktree command automatically detects:

Neon → Suggests neonctl branches create
PlanetScale → Suggests pscale branch create
Supabase → Notes lack of branching support
Local Postgres → Suggests schema-based isolation
Other → Reminds about isolation options

When to create DB branch:

Scenario	Create Branch?
Adding database migrations	✅ Yes
Refactoring data model	✅ Yes
Bug fix (no schema change)	❌ No
Performance experiments	✅ Yes

Prerequisites:

# For Neon:
npm install -g neonctl
neonctl auth

# For PlanetScale:
brew install pscale
pscale auth login

# For all providers:
# Ensure .worktreeinclude contains .env
echo ".env" >> .worktreeinclude
echo ".env.local" >> .worktreeinclude

Complete workflow:

# 1. Create worktree
/git-worktree feature/payments

# 2. Follow suggestion to create DB branch
cd .worktrees/feature-payments
neonctl branches create --name feature-payments --parent main

# 3. Update .env with new DATABASE_URL
# (Get connection string from neonctl output)

# 4. Work in isolation
npx prisma migrate dev
pnpm test

# 5. After PR merge, cleanup
git worktree remove .worktrees/feature-payments
neonctl branches delete feature-payments

See also:

Database Branch Setup Guide - Complete provider-specific workflows
Neon Branching - Official Neon documentation
PlanetScale Branching - Official PlanetScale guide

Coordinating Parallel Worktrees: Task Dependencies

When running multiple agents in parallel worktrees, the hardest problem isn’t setup — it’s coordination. There is no built-in automatic dependency detection between worktree agents. You manage it explicitly.

The pattern: analyze files touched, then set blockedBy manually

Before spawning parallel agents, identify which tasks share files:

# Quick dependency check: list files each task will touch
echo "Task A (auth feature):"
grep -r "UserService\|auth/" src/ --include="*.ts" -l

echo "Task B (payment feature):"
grep -r "PaymentService\|billing/" src/ --include="*.ts" -l

# No overlap? Safe to parallelize.
# Overlap detected? Sequence them.

In the Tasks API, set blockedBy for tasks that depend on others completing first:

// Task B cannot start until Task A merges
TaskCreate("Implement payment service", { blockedBy: ["task-a-id"] })

Decision matrix:

Scenario	Strategy
Tasks touch different files, different modules	Parallelize freely
Tasks touch same module, different files	Parallelize with explicit conflict resolution step
Tasks touch same files	Sequence them
Task B needs Task A’s API contract	Block Task B until Task A’s interface is defined

Practical rule: A 5-minute analysis to find file overlaps before spawning agents saves hours of merge conflict resolution.

Tooling: coderabbitai/git-worktree-runner provides a bash-based worktree manager with basic AI tool integration. It handles the worktree lifecycle but not dependency detection — that stays manual.

Note: Fully automatic dependency detection (where the system infers which tasks conflict) doesn’t exist in Claude Code or the broader ecosystem as of March 2026. The approaches above are the practical state of the art.

9.13 Cost Optimization Strategies

Practical techniques to minimize API costs while maximizing productivity.

Model Selection Matrix

Choose the right model for each task to balance cost and capability.

See Section 2.5 Model Selection & Thinking Guide for the canonical decision table with effort levels and cost estimates.

OpusPlan mode (recommended):

Planning: Opus for high-level thinking
Execution: Sonnet for implementation
Best of both worlds: Strategic thinking + cost-effective execution

# Activate OpusPlan mode
/model opusplan

# Enter Plan Mode (Opus for planning)
Shift+Tab × 2

You: "Design a caching layer for the API"
# Opus creates detailed architectural plan

# Exit Plan Mode (Sonnet for execution)
Shift+Tab

You: "Implement the caching layer following the plan"
# Sonnet executes the plan at lower cost

Token-Saving Techniques

Important: Claude Code uses lazy loading - it doesn’t “load” your entire codebase at startup. Files are read on-demand when you ask Claude to analyze them. The main context consumers at startup are your CLAUDE.md files and auto-loaded rules.

CLAUDE.md Token Cost Estimation:

File Size	Approximate Tokens	Impact
50 lines	500-1,000 tokens	Minimal (recommended)
100 lines	1,000-2,000 tokens	Acceptable
200 lines	2,000-3,500 tokens	Upper limit
500+ lines	5,000+ tokens	Consider splitting

Note: These are loaded once at session start, not per request. A 200-line CLAUDE.md costs ~2K tokens upfront but doesn’t grow during the session. The concern is the cumulative effect when combined with multiple @includes and all files in .claude/rules/.

Important: Beyond file size, context files containing non-essential information (style guides, architecture descriptions, general conventions) add +20-23% inference cost per session regardless of line count — because agents process and act on every instruction. (Gloaguen et al., 2026)

See also: Memory Loading Comparison for when each method loads.

1. Keep CLAUDE.md files concise:

# ❌ Bloated CLAUDE.md (wastes tokens on every session)
- 500+ lines of instructions
- Multiple @includes importing other files
- Rarely-used guidelines

# ✅ Lean CLAUDE.md
- Essential project context only (<200 lines)
- Move specialized rules to .claude/rules/ (auto-loaded at session start)
- Split by concern: team rules in project CLAUDE.md, personal prefs in ~/.claude/CLAUDE.md

Research note (Gloaguen et al., ETH Zürich, Feb 2026 — 138 benchmarks, 12 repos): The first empirical study on context files shows developer-written CLAUDE.md improves agent success rate by +4%, but LLM-generated files reduce it by -3%. Cause: agents faithfully follow all instructions, even those irrelevant to the task, leading to broader file exploration and longer reasoning chains. Recommendation: include only build/test commands and project-specific tooling. Style guides and architecture descriptions belong in separate docs. (Full evaluation)

2. Use targeted file references:

# ❌ Vague request (Claude reads many files to find context)
"Fix the authentication bug"

# ✅ Specific request (Claude reads only what's needed)
"Fix the JWT validation in @src/auth/middleware.ts line 45"

3. Compact proactively:

# ❌ Wait until 90% context
/status  # Context: 92% - Too late, degraded performance

# ✅ Compact at 70%
/status  # Context: 72%
/compact  # Frees up context, maintains performance

4. Agent specialization:

---
name: test-writer
description: Generate unit tests (use for test generation only)
model: haiku
---

Generate comprehensive unit tests with edge cases.

Benefits:

Haiku costs less than Sonnet
Focused context (tests only)
Faster execution

5. Batch similar operations:

# ❌ Individual sessions for each fix
claude -p "Fix typo in auth.ts"
claude -p "Fix typo in user.ts"
claude -p "Fix typo in api.ts"

# ✅ Batch in single session
claude
You: "Fix typos in auth.ts, user.ts, and api.ts"
# Single context load, multiple fixes

Command Output Optimization with RTK

RTK (Rust Token Killer) filters bash command outputs before they reach Claude’s context, achieving 60-90% token reduction across git, testing, and development workflows. 446 stars, 38 forks, 700+ upvotes on r/ClaudeAI.

Repository: rtk-ai/rtk | Website: rtk-ai.app

Installation:

# Option 1: Homebrew (macOS/Linux)
brew install rtk-ai/tap/rtk

# Option 2: Cargo (all platforms)
cargo install rtk

# Option 3: Install script
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/main/install.sh | bash

# Verify installation
rtk --version  # v0.16.0+

Proven Token Savings (Benchmarked on v0.2.0):

Command	Baseline	RTK	Reduction
`rtk git log`	13,994 chars	1,076 chars	92.3%
`rtk git status`	100 chars	24 chars	76.0%
`rtk git diff`	15,815 chars	6,982 chars	55.9%
`rtk vitest run`	~50,000 chars	~5,000 chars	90.0%
`rtk pnpm list`	~8,000 chars	~2,400 chars	70.0%
`rtk cat CHANGELOG.md`	163,587 chars	61,339 chars	62.5%

Average: 60-90% token reduction depending on commands

Key Features (v0.16.0):

# Git operations
rtk git log
rtk git status
rtk git diff HEAD~1

# JS/TS Stack
rtk vitest run           # Test results condensed
rtk pnpm list            # Dependency tree optimized
rtk prisma migrate status # Migration status filtered

# Python
rtk python pytest        # Python test output condensed

# Go
rtk go test              # Go test results filtered

# Rust
rtk cargo test           # Cargo test output condensed
rtk cargo build          # Build output filtered
rtk cargo clippy         # Lints grouped by severity

# Project Setup & Learning
rtk init                 # Initialize RTK in a project (hook-first install)
rtk tree                 # Project structure condensed
rtk learn                # Interactive RTK learning

# Analytics
rtk gain                 # Token savings dashboard (SQLite tracking)
rtk discover             # Find missed optimization opportunities

Real-World Impact:

30-minute Claude Code session:
- Without RTK: ~150K tokens (10-15 git commands @ ~10K tokens each)
- With RTK: ~41K tokens (10-15 git commands @ ~2.7K tokens each)
- Savings: 109K tokens (72.6% reduction)

Integration Strategies:

Hook-first install (recommended):

rtk init  # Sets up PreToolUse hook automatically

CLAUDE.md instruction (manual wrapper):

## Token Optimization

Use RTK for all supported commands:
- `rtk git log` (92.3% reduction)
- `rtk git status` (76.0% reduction)
- `rtk git diff` (55.9% reduction)

Skill (auto-suggestion):
- Template: examples/skills/rtk-optimizer/SKILL.md
- Detects high-verbosity commands
- Suggests RTK wrapper automatically
Hook (automatic wrapper):
- Template: examples/hooks/bash/rtk-auto-wrapper.sh
- PreToolUse hook intercepts bash commands
- Applies RTK wrapper when beneficial

Recommendation:

✅ Use RTK: Full-stack projects (JS/TS, Rust, Python, Go), testing workflows, analytics
❌ Skip RTK: Small outputs (<100 chars), quick exploration, interactive commands

See also:

Evaluation: docs/resource-evaluations/rtk-evaluation.md
Templates: examples/{claude-md,skills,hooks}/rtk-*
GitHub: https://github.com/rtk-ai/rtk
Website: https://www.rtk-ai.app/
Third-party tools comparison: guide/third-party-tools.md#rtk-rust-token-killer

Cost Tracking

Monitor cost with /status:

/status

# Output:
Model: Sonnet | Ctx: 45.2k | Cost: $1.23 | Ctx(u): 42.0%

Set budget alerts (API usage):

# If using Anthropic API directly
import anthropic

client = anthropic.Anthropic()

# Track spending
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[...],
    metadata={
        "user_id": "user_123",
        "project": "api_development"
    }
)

# Log cost per request
cost = calculate_cost(response.usage)
if cost > BUDGET_THRESHOLD:
    alert_team(f"Budget threshold exceeded: ${cost}")

Session cost limits:

## CLAUDE.md - Cost Awareness

**Budget-conscious mode:**
- Use Haiku for reviews and simple tasks
- Reserve Sonnet for feature work
- Use Opus only for critical decisions
- Compact context at 70% to avoid waste
- Close sessions after task completion

Economic Workflows

Pattern 1: Haiku for tests, Sonnet for implementation

# Terminal 1: Test generation (Haiku)
claude --model haiku
You: "Generate tests for the authentication module"

# Terminal 2: Implementation (Sonnet)
claude --model sonnet
You: "Implement the authentication module"

Pattern 2: Progressive model escalation

# Start with Haiku
claude --model haiku
You: "Review this code for obvious issues"

# If complex issues found, escalate to Sonnet
/model sonnet
You: "Deep analysis of the race condition"

# If architectural issue, escalate to Opus
/model opus
You: "Redesign the concurrency model"

Pattern 3: Context reuse

# Build context once, reuse for multiple tasks
claude
You: "Analyze the authentication flow"
# Context built: ~20k tokens

# Same session - context already loaded
You: "Now add 2FA to the authentication flow"
# No context rebuild needed

You: "Generate tests for the 2FA feature"
# Still same context

# Commit when done
You: "Create commit for 2FA implementation"

Token Calculation Reference

Input tokens:

Source code loaded into context
Conversation history
Memory files (CLAUDE.md)
Agent/skill instructions

Output tokens:

Claude’s responses
Generated code
Explanations

Rough estimates:

1 token ≈ 0.75 words (English)
1 token ≈ 4 characters
Average function: 50-200 tokens
Average file (500 LOC): 2,000-5,000 tokens

Example calculation:

Context loaded:
- 10 files × 500 LOC × 4 tokens/LOC = 20,000 tokens
- Conversation history: 5,000 tokens
- CLAUDE.md: 1,000 tokens
Total input: 26,000 tokens

Claude response:
- Generated code: 500 LOC × 4 = 2,000 tokens
- Explanation: 500 tokens
Total output: 2,500 tokens

Total cost per request: (26,000 + 2,500) tokens × model price

Sonnet pricing (approximate):

Input: $3 per million tokens
Output: $15 per million tokens

Session cost:

Input: 26,000 × $3 / 1,000,000 = $0.078
Output: 2,500 × $15 / 1,000,000 = $0.0375
Total: ~$0.12 per interaction

Cost Optimization Checklist

Daily practices:
□ Use /status to monitor context and cost
□ Compact at 70% context usage
□ Close sessions after task completion
□ Use `permissions.deny` to block sensitive files

Model selection:
□ Default to Sonnet for most work
□ Use Haiku for reviews and simple fixes
□ Reserve Opus for architecture and critical debugging
□ Try OpusPlan mode for strategic work

Context management:
□ Use specific file references (@path/to/file.ts)
□ Batch similar tasks in single session
□ Reuse context for multiple related tasks
□ Create specialized agents with focused context

Team practices:
□ Share cost-effective patterns in team wiki
□ Track spending per project
□ Set budget alerts for high-cost operations
□ Review cost metrics in retrospectives

Alternative: Flat-Rate via Copilot Pro+

For heavy usage, consider cc-copilot-bridge to route requests through GitHub Copilot Pro+ ($10/month flat) instead of per-token billing.

# Switch to Copilot mode (flat rate)
ccc  # Uses Copilot Pro+ subscription

# Back to direct Anthropic (per-token)
ccd  # Uses ANTHROPIC_API_KEY

When this makes sense:

You’re hitting rate limits frequently
Monthly costs exceed $50-100
You already have Copilot Pro+ subscription

See Section 11.2: Multi-Provider Setup for full details.

Advanced: Cost-Aware CI/CD

name: Claude Code Review

on: [pull_request]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      # Use Haiku for cost-effective reviews
      - name: Run Claude review
        run: |
          claude --model haiku \
                 -p "Review changes for security and style issues" \
                 --add-dir src/ \
                 --output-format json > review.json

      # Only escalate to Sonnet if issues found
      - name: Deep analysis (if needed)
        if: ${{ contains(steps.*.outputs.*, 'CRITICAL') }}
        run: |
          claude --model sonnet \
                 -p "Detailed analysis of critical issues found" \
                 --add-dir src/

Cost comparison:

Haiku review (per PR): ~$0.02
Sonnet review (per PR): ~$0.10
Opus review (per PR): ~$0.50

With 100 PRs/month:
- Haiku: $2/month
- Sonnet: $10/month
- Opus: $50/month

Smart escalation (Haiku → Sonnet for 10% of PRs):
- Base cost: $2 (Haiku for all)
- Escalation: $1 (Sonnet for 10%)
- Total: $3/month (vs $10 or $50)

Cost vs Productivity Trade-offs

Don’t be penny-wise, pound-foolish:

❌ False economy:

Spending 2 hours manually debugging to save $1 in API costs
Using Haiku for complex tasks, generating incorrect code
Over-compacting context, losing valuable history

✅ Smart optimization:

Use right model for the task (time saved >> cost)
Invest in good prompts and memory files (reduce iterations)
Automate with agents (consistent, efficient)

Perspective on ROI:

Time savings from effective Claude Code usage typically far outweigh API costs for most development tasks. Rather than calculating precise ROI (which depends heavily on your specific context, hourly rate, and task complexity), focus on whether the tool is genuinely helping you ship faster. For team-level measurement, see Contribution Metrics — Anthropic’s GitHub-integrated dashboard for tracking PR and code attribution (Team/Enterprise plans, public beta).

When to optimize aggressively:

High-volume operations (>1000 requests/day)
Automated pipelines running 24/7
Large teams (cost scales with users)
Budget-constrained projects

When productivity matters more:

Critical bug fixes
Time-sensitive features
Learning and experimentation
Complex architectural decisions

9.14 Development Methodologies

Full reference: methodologies.md | Hands-on workflows: workflows/

15 structured development methodologies have emerged for AI-assisted development (2025-2026). This section provides quick navigation; detailed workflows are in dedicated files.

Quick Decision Tree

┌─ "I want quality code" ────────────→ workflows/tdd-with-claude.md
├─ "I want to spec before code" ─────→ workflows/spec-first.md
├─ "I need to plan architecture" ────→ workflows/plan-driven.md
├─ "I'm iterating on something" ─────→ workflows/iterative-refinement.md
└─ "I need methodology theory" ──────→ methodologies.md

The 4 Core Workflows for Claude Code

Workflow	When to Use	Key Prompt Pattern
TDD	Quality-critical code	”Write FAILING tests first, then implement”
Spec-First	New features, APIs	Define in CLAUDE.md before asking
Plan-Driven	Multi-file changes	Use `/plan` mode
Iterative	Refinement	Specific feedback: “Change X because Y”

The 15 Methodologies (Reference)

Tier	Methodologies	Claude Fit
Orchestration	BMAD	⭐⭐ Enterprise governance
Specification	SDD, Doc-Driven, Req-Driven, DDD	⭐⭐⭐ Core patterns
Behavior	BDD, ATDD, CDD	⭐⭐⭐ Testing focus
Delivery	FDD, Context Engineering	⭐⭐ Process
Implementation	TDD, Eval-Driven, Multi-Agent	⭐⭐⭐ Core workflows
Optimization	Iterative Loops, Prompt Engineering	⭐⭐⭐ Foundation

→ Full descriptions with examples: methodologies.md

SDD Tools (External)

Tool	Use Case	Integration
Spec Kit	Greenfield projects	`/speckit.*` slash commands
OpenSpec	Brownfield/existing	`/openspec:*` slash commands
Specmatic	API contract testing	MCP agent available

→ See official documentation for installation and detailed usage.

Combination Patterns

Situation	Recommended Stack
Solo MVP	SDD + TDD
Team 5-10, greenfield	Spec Kit + TDD + BDD
Microservices	CDD + Specmatic
Existing SaaS	OpenSpec + BDD
Enterprise 10+	BMAD + Spec Kit
LLM-native product	Eval-Driven + Multi-Agent

9.15 Named Prompting Patterns

Reading time: 5 minutes Skill level: Week 2+

Memorable named patterns for effective Claude Code interaction. These patterns have emerged from community best practices and help you communicate more effectively.

The “As If” Pattern

Set quality expectations by establishing context and standards.

Pattern: “Implement as if you were a [role] at [high-standard company/context]”

Examples:

# High quality code
Implement this authentication system as if you were a senior security engineer at a major bank.

# Production readiness
Review this code as if preparing for a SOC2 audit.

# Performance focus
Optimize this function as if it will handle 10,000 requests per second.

Why it works: Activates relevant knowledge patterns and raises output quality to match the stated context.

The Constraint Pattern

Force creative solutions by adding explicit limitations.

Pattern: “Solve this [with constraint X] [without using Y]”

Examples:

# Dependency constraint
Implement this feature without adding any new dependencies.

# Size constraint
Solve this in under 50 lines of code.

# Time constraint (execution)
This must complete in under 100ms.

# Simplicity constraint
Use only standard library functions.

Why it works: Constraints prevent over-engineering and force focus on the essential solution.

The “Explain First” Pattern

Force planning before implementation.

Pattern: “Before implementing, explain your approach in [N] sentences”

Examples:

# Simple planning
Before writing code, explain in 2-3 sentences how you'll approach this.

# Detailed planning
Before implementing, outline:
1. What components you'll modify
2. What edge cases you've considered
3. What could go wrong

# Trade-off analysis
Before choosing an approach, explain 2-3 alternatives and why you'd pick one.

Why it works: Prevents premature coding and catches misunderstandings early. Especially useful for complex tasks.

The “Rubber Duck” Pattern

Debug collaboratively by having Claude ask questions.

Pattern: “I’m stuck on [X]. Ask me questions to help me figure it out.”

Examples:

# Debugging
I'm stuck on why this test is failing. Ask me questions to help diagnose the issue.

# Design
I can't decide on the right architecture. Ask me questions about my requirements.

# Problem understanding
I don't fully understand what I need to build. Ask clarifying questions.

Why it works: Often the problem is unclear requirements or assumptions. Questions surface hidden constraints.

The “Incremental” Pattern

Build complex features step by step with validation.

Pattern: “Let’s build this incrementally. Start with [minimal version], then we’ll add [features].”

Examples:

# Feature development
Build the user registration incrementally:
1. First: Basic form that saves to database
2. Then: Email validation
3. Then: Password strength requirements
4. Finally: Email verification flow

Show me step 1 first.

# Refactoring
Refactor this incrementally. First extract the validation logic,
run tests, then we'll continue.

Why it works: Reduces risk, enables validation at each step, maintains working code throughout.

The “Boundary” Pattern

Define explicit scope to prevent over-engineering.

Pattern: “Only modify [X]. Don’t touch [Y].”

Examples:

# File scope
Only modify auth.ts. Don't change any other files.

# Function scope
Fix just the calculateTotal function. Don't refactor surrounding code.

# Feature scope
Add the logout button only. Don't add session management or remember-me features.

Why it works: Prevents scope creep and keeps changes focused and reviewable.

Pattern Combinations

Situation	Pattern Combination
Critical feature	As If + Explain First + Incremental
Quick fix	Constraint + Boundary
Debugging session	Rubber Duck + Incremental
Architecture decision	Explain First + As If
Refactoring	Boundary + Incremental + Constraint

Anti-Patterns to Avoid

Anti-Pattern	Problem	Better Approach
”Make it perfect”	Undefined standard	Use “As If” with specific context
”Fix everything”	Scope explosion	Use “Boundary” pattern
”Just do it”	No validation	Use “Explain First"
"Make it fast”	Vague constraint	Specify: “under 100ms”
Overwhelming detail	Context pollution	Focus on relevant constraints only

9.16 Session Teleportation

Reading time: 5 minutes Skill level: Week 2+ Status: Research Preview (as of January 2026)

Session teleportation allows migrating coding sessions between cloud (claude.ai/code) and local (CLI) environments. This enables workflows where you start work on mobile/web and continue locally with full filesystem access.

Evolution Timeline

Version	Feature
2.0.24	Initial Web → CLI teleport capability
2.0.41	Teleporting auto-sets upstream branch
2.0.45	`&` prefix for background tasks to web
2.1.0	`/teleport` and `/remote-env` commands

Commands Reference

Command	Usage
`%` or `&` prefix	Send task to cloud (e.g., `% Fix the auth bug`)
`claude --teleport`	Interactive picker for available sessions
`claude --teleport <id>`	Teleport specific session by ID
`/teleport`	In-REPL command to teleport current session
`/tasks`	Monitor background tasks status
`/remote-env`	Configure cloud environment settings
`Ctrl+B`	Background all running tasks (unified in 2.1.0)

Prerequisites

Required for teleportation:

GitHub account connected + Claude GitHub App installed
Clean git state (0 uncommitted changes)
Same repository (not a fork)
Branch exists on remote
Same Claude.ai account on both environments
CLI version 2.1.0+

Workflow Example

# 1. Start task on web (claude.ai/code)
#    "Refactor the authentication middleware"

# 2. Session works in cloud sandbox

# 3. Later, on local machine:
claude --teleport
# → Interactive picker shows available sessions

# 4. Select session, Claude syncs:
#    - Conversation context
#    - File changes (via git)
#    - Task state

# 5. Continue work locally with full filesystem access

Environment Support

Environment	Teleport Support
CLI/Terminal	Full bidirectional
VS Code	Via terminal (not Chat view)
Cursor	Via terminal
Web (claude.ai/code)	Outbound only (web → local)
iOS app	Monitoring only

Current Limitations (Research Preview)

⚠️ Important: Session teleportation is in research preview. Expect rough edges.

Unidirectional: Web → local only (cannot teleport local → web)
GitHub only: No GitLab or Bitbucket support yet
Subscription required: Pro, Max, Team Premium, or Enterprise Premium
Rate limits: Parallel sessions consume proportional rate limits
Git dependency: Requires clean git state for sync

Troubleshooting

Issue	Solution
”Uncommitted changes”	Commit or stash changes before teleporting
”Branch not found”	Push local branch to remote first
”Session not found”	Verify same Claude.ai account on both
”Teleport failed”	Check internet connectivity, try again
Connection timeout	Use `claude --teleport <id>` with explicit ID

Best Practices

Commit frequently — Clean git state is required
Use meaningful branch names — Helps identify sessions
Check /tasks — Verify background task status before teleporting
Same account — Ensure CLI and web use same Claude.ai login
Push branches — Remote must have the branch for sync

Environment Variables

Variable	Purpose
`CLAUDE_CODE_DISABLE_BACKGROUND_TASKS`	Disable background task functionality (v2.1.4+)

9.17 Scaling Patterns: Multi-Instance Workflows

Reading time: 10 minutes

TL;DR: Multi-instance orchestration = advanced pattern for teams managing 10+ concurrent features. Requires modular architecture + budget + monitoring. 95% of users don’t need this — sequential workflows with 1-2 instances are more efficient for most contexts.

When Multi-Instance Makes Sense

Don’t scale prematurely. Multi-instance workflows introduce coordination overhead that outweighs benefits for most teams.

Context	Recommendation	Monthly Cost	Reasoning
Solo dev	❌ Don’t	-	Overhead > benefit, use Cursor instead
Startup <10 devs	⚠️ Maybe	$400-750	Only if modular architecture + tests
Scale-up 10-50 devs	✅ Consider	$1,000-2,000	Headless PM framework + monitoring justified
Enterprise 50+	✅ Yes	$2,000-5,000	Clear ROI, budget available

Red flags (don’t use multi-instance if true):

Architecture: Legacy monolith, no tests, tight coupling
Budget: <$500/month available for API costs
Expertise: Team unfamiliar with Claude Code basics
Context: Solo dev or <3 people

📊 Industry Validation: Multi-Instance ROI (Anthropic 2026)

Source: 2026 Agentic Coding Trends Report

Timeline Compression (weeks → days):

Pattern	Before AI	With Multi-Instance	Gain
Feature implementation	2-3 weeks	3-5 days	4-6x faster
Onboarding new codebase	2-4 weeks	4-8 hours	10-50x faster
Legacy refactoring	Months (backlog)	1-2 weeks	Finally viable

Productivity Economics (Anthropic research):

Metric	Finding	Implications
Output volume	+67% PRs merged/engineer/day	Gain via more output, not just speed
New work	27% wouldn’t be done without AI	Experimental, nice-to-have, exploratory
Full delegation	0-20% tasks	Collaboration > replacement
Cost multiplier	3x (capabilities × orchestration × experience)	Compounds over time

Enterprise Case Studies:

TELUS (telecom, 50K+ employees): 500K hours saved, 13K custom solutions, 30% faster shipping
Fountain (workforce platform): 50% faster screening, 40% faster onboarding via hierarchical multi-agent
Rakuten (tech): 7h autonomous vLLM implementation (12.5M lines code, 99.9% accuracy)

The Boris pattern validation: Boris’s $500-1K/month cost and 259 PRs/month aligns with Anthropic’s enterprise data showing positive ROI at >3 parallel instances.

Anti-pattern alert (Anthropic findings):

Over-delegation (>5 agents): Coordination overhead > productivity gain
Premature scaling: Start 1-2 instances, measure ROI, scale progressively
Tool sprawl: >10 MCP servers = maintenance burden (stick to core stack)

Real-World Case: Boris Cherny (Interval)

Boris Cherny, creator of Claude Code, shared his workflow orchestrating 5-15 Claude instances in parallel.

Setup:

5 instances in local terminal (iTerm2 tabs, numbered 1-5)
5-10 instances on claude.ai/code (--teleport to sync with local)
Git worktrees for isolation (each instance = separate checkout)
CLAUDE.md: 2.5k tokens, team-shared and versioned in git
Model: Opus 4.6 (slower but fewer corrections needed, adaptive thinking)
Slash commands: /commit-push-pr used “dozens of times per day”

Results (30 days, January 2026):

259 PRs merged
497 commits
40k lines added, 38k lines deleted (refactor-heavy)

Cost: ~$500-1,000/month API (Opus pricing)

Critical context: Boris is the creator of Claude Code, working with perfect architecture, Anthropic resources, and ideal conditions. This is not representative of average teams.

Key insights from Boris:

On multi-clauding: “I use Cowork as a ‘doer,’ not a chat: it touches files, browsers, and tools directly. I think about productivity as parallelism: multiple tasks running while I steer outcomes.”

On CLAUDE.md: “I treat Claude.md as compounding memory: every mistake becomes a durable rule for the team.”

On plan-first workflow: “I run plan-first workflows: once the plan is solid, execution gets dramatically cleaner.”

On verification loops: “I give Claude a way to verify output (browser/tests): verification drives quality.”

Why Opus 4.6 with Adaptive Thinking: Although more expensive per token ($5/1M input vs $3/1M for Sonnet, or $10/1M for 1M context beta), Opus requires fewer correction iterations thanks to adaptive thinking. Net result: faster delivery and lower total cost despite higher unit price.

The supervision model: Boris describes his role as “tending to multiple agents” rather than “doing every click yourself.” The workflow becomes about steering outcomes across 5-10 parallel sessions, unblocking when needed, rather than sequential execution.

Source: InfoQ - Claude Code Creator Workflow (Jan 2026) | Interview: I got a private lesson on Claude Cowork & Claude Code

Team patterns (broader Claude Code team, Feb 2026):

The broader team extends Boris’s individual workflow with institutional patterns:

Skills as institutional knowledge: Anything done more than once daily becomes a skill checked into version control. Examples:
- /techdebt — run at end of session to eliminate duplicate code
- Context dump skills — sync 7 days of Slack, Google Drive, Asana, and GitHub into a single context
- Analytics agents — dbt-powered skills that query BigQuery; one engineer reports not writing SQL manually for 6+ months
CLI and scripts over MCP: The team prefers shell scripts and CLI integrations over MCP servers for external tool connections. Rationale: less magic, easier to debug, and more predictable behavior. MCP is reserved for cases where bidirectional communication is genuinely needed.
Re-plan when stuck: Rather than pushing through a stalled implementation, the team switches back to Plan Mode. One engineer uses a secondary Claude instance to review plans “as a staff engineer” before resuming execution.
Claude writes its own rules: After each correction, the team instructs Claude to update CLAUDE.md with the lesson learned. Over time, this compounds into a team-specific ruleset that prevents recurring mistakes.

Source: 10 Tips from Inside the Claude Code Team (Boris Cherny thread, Feb 2026)

Alternative Pattern: Dual-Instance Planning (Vertical Separation)

While Boris’s workflow demonstrates horizontal scaling (5-15 instances in parallel), an alternative pattern focuses on vertical separation: using two Claude instances with distinct roles for quality-focused workflows.

Pattern source: Jon Williams (Product Designer, UK), transition from Cursor to Claude Code after 6 months. LinkedIn post, Feb 3, 2026

When to Use Dual-Instance Pattern

This pattern is orthogonal to Boris’s approach: instead of scaling breadth (more features in parallel), it scales depth (separation of planning and execution phases).

Your Context	Use Dual-Instance?	Monthly Cost
Solo dev, spec-heavy work	✅ Yes	$100-200
Small team, complex requirements	✅ Yes	$150-300
Product designers coding	✅ Yes	$100-200
High-volume parallel features	❌ No, use Boris pattern	$500-1K+

Use when:

You need plan verification before execution
Specs are complex or ambiguous (interview-based clarification helps)
Lower budget than Boris pattern ($100-200/month vs $500-1K+)
Quality > speed (willing to sacrifice parallelism for better plans)

Don’t use when:

You need to ship 10+ features simultaneously (use Boris pattern)
Plans are straightforward (single instance with /plan is enough)
Budget is very limited (<$100/month)

Setup: Two Instances, Two Roles

┌─────────────────────────────────────────────────────┐
│         DUAL-INSTANCE ARCHITECTURE                  │
├─────────────────────────────────────────────────────┤
│                                                     │
│  ┌──────────────────┐                               │
│  │  Claude Zero     │  Planning & Review            │
│  │  (Planner)       │  - Explores codebase          │
│  └────────┬─────────┘  - Writes plans               │
│           │            - Reviews implementations    │
│           │            - NEVER touches code         │
│           ▼                                          │
│  ┌─────────────────┐                                │
│  │  Plans/Review/  │  Human review checkpoint       │
│  │  Plans/Active/  │                                │
│  └────────┬────────┘                                │
│           │                                          │
│           ▼                                          │
│  ┌──────────────────┐                               │
│  │  Claude One      │  Implementation                │
│  │  (Implementer)   │  - Reads approved plans       │
│  └──────────────────┘  - Writes code                │
│                        - Commits changes            │
│                        - Reports completion         │
│                                                     │
│  Key: Separation of concerns = fewer mistakes      │
│                                                     │
└─────────────────────────────────────────────────────┘

Setup steps:

Create directory structure:

mkdir -p .claude/plans/{Review,Active,Completed}

Launch Claude Zero (Terminal 1):

cd ~/projects/your-project
claude
# Set role in first message:
# "You are Claude Zero. Your role: explore codebase, write plans,
#  review implementations. NEVER edit code. Save all plans to
#  .claude/plans/Review/"

Launch Claude One (Terminal 2):

cd ~/projects/your-project
claude
# Set role in first message:
# "You are Claude One. Your role: read plans from .claude/plans/Active/,
#  implement them, commit changes, report back."

Workflow: 5 Steps

Step 1: Planning (Claude Zero)

You (to Claude Zero): /plan

Implement JWT authentication for the API.
- Support access tokens (15min expiry)
- Support refresh tokens (7 day expiry)
- Middleware to validate tokens on protected routes

Claude Zero explores codebase, interviews you about requirements:

“Should we support multiple sessions per user?”
“Do you want token revocation (logout) capability?”
“Which routes should be protected vs public?”

Claude Zero writes plan to .claude/plans/Review/auth-jwt.md:

# Plan: JWT Authentication

## Summary
Add JWT-based authentication with access/refresh tokens.
Support token revocation for logout.

## Files to Create
- src/auth/jwt.ts (line 1-120)
  - generateAccessToken(userId)
  - generateRefreshToken(userId)
  - verifyToken(token)

- src/middleware/auth.ts (line 1-45)
  - requireAuth middleware
  - Token validation logic

## Files to Modify
- src/routes/api.ts (line 23)
  - Add auth middleware to protected routes

- src/config/env.ts (line 15)
  - Add JWT_SECRET, JWT_REFRESH_SECRET env vars

## Implementation Steps
1. Install jsonwebtoken library
2. Create JWT utility functions
3. Create auth middleware
4. Add JWT secrets to .env
5. Protect existing routes
6. Write tests for auth flow

## Success Criteria
- POST /auth/login returns access + refresh token
- Protected routes reject without valid token
- POST /auth/refresh exchanges refresh token for new access token
- POST /auth/logout revokes refresh token

## Risks
- Token secrets must be in .env (never committed)
- Refresh token storage needs database table

Step 2: Human Review

You review .claude/plans/Review/auth-jwt.md:

Is the approach correct?
Are all requirements covered?
Any security issues?

If approved, move to Active:

mv .claude/plans/Review/auth-jwt.md .claude/plans/Active/

Step 3: Implementation (Claude One)

You (to Claude One): Implement .claude/plans/Active/auth-jwt.md

Claude One reads the plan file, implements all steps, commits.

Step 4: Verification (Claude Zero)

You (to Claude Zero): Review the JWT implementation Claude One just completed.

Claude Zero reviews:

Code matches plan?
Security best practices followed?
Tests cover success criteria?

Step 5: Archive

If approved:

mv .claude/plans/Active/auth-jwt.md .claude/plans/Completed/

Comparison: Boris (Horizontal) vs Jon (Vertical)

Dimension	Boris Pattern	Jon Pattern (Dual-Instance)
Scaling axis	Horizontal (5-15 instances, parallel features)	Vertical (2 instances, separated phases)
Primary goal	Speed via parallelism	Quality via separation of concerns
Monthly cost	$500-1,000 (Opus × 5-15)	$100-200 (Opus × 2 sequential)
Entry barrier	High (worktrees, CLAUDE.md 2.5K, orchestration)	Low (2 terminals, Plans/ directory)
Audience	Teams, high-volume, 10+ devs	Solo devs, product designers, spec-heavy
Context pollution	Isolated by worktrees (git branches)	Isolated by role separation (planner vs implementer)
Accountability	Git history (commits per instance)	Human-in-the-loop (review plans before execution)
Tooling required	Worktrees, teleport, `/commit-push-pr`	Plans/ directory structure
Coordination	Self-orchestrated (Boris steers 10 sessions)	Human gatekeeper (approve plans)
Best for	Shipping 10+ features/day, experienced teams	Complex specs, quality-critical, budget-conscious

Key insight: These patterns are not mutually exclusive. You can use dual-instance for complex features (planning rigor) and Boris pattern for high-volume simple features (speed).

Cost Analysis: 2 Instances vs Correction Loops

Question: Is it cheaper to use 2 instances (planner + implementer) or 1 instance with correction loops?

Scenario	1 Instance (Corrections)	2 Instances (Dual)	Winner
Simple feature (login form)	1 session × $5 = $5	2 sessions × $3 each = $6	1 instance
Complex spec (auth system)	1 session × $15 + 2 correction loops × $10 = $35	2 sessions × $12 each = $24	2 instances
Ambiguous requirements	1 session × $20 + 3 correction loops × $15 = $65	2 sessions × $18 each = $36	2 instances

Breakeven point: For features requiring ≥2 correction loops, dual-instance is cheaper and faster.

Hidden cost savings:

Context pollution: Planner doesn’t see implementation details → cleaner reasoning
Fewer hallucinations: Plans have file paths + line numbers → implementer is grounded
Learning: Review step catches mistakes before they compound

Agent-Ready Plans: Best Practices

The key to dual-instance efficiency is plan structure. Jon Williams emphasizes “agent-ready plans with specific file references and line numbers.”

Bad plan (vague):

## Implementation
Add authentication to the API.
Update the routes.
Create middleware.

Good plan (agent-ready):

## Implementation

### Step 1: Create JWT utilities
**File**: src/auth/jwt.ts (new file, ~120 lines)
**Functions**:
- Line 10-30: generateAccessToken(userId: string): string
- Line 35-55: generateRefreshToken(userId: string): string
- Line 60-85: verifyToken(token: string): { userId: string } | null

**Dependencies**: jsonwebtoken (npm install)

### Step 2: Create auth middleware
**File**: src/middleware/auth.ts (new file, ~45 lines)
**Export**:
- Line 15-40: requireAuth middleware (checks Authorization header)

**Imports**: jwt.ts (Step 1)

### Step 3: Protect routes
**File**: src/routes/api.ts
**Location**: Line 23 (after imports, before route definitions)
**Change**: Import requireAuth, apply to /api/protected routes

**Example**:
router.get('/profile', requireAuth, profileController)

Why agent-ready plans work:

File paths → Claude One knows exactly where to work
Line numbers → Reduces guessing, fewer file reads
Dependencies explicit → No surprises during implementation
Examples included → Claude One understands expected structure

Template: See guide/workflows/dual-instance-planning.md for full plan template.

Tips for Success

1. Role enforcement: Set roles in first message of each session:

Claude Zero: “NEVER edit code, only write plans to .claude/plans/Review/”
Claude One: “ONLY implement plans from .claude/plans/Active/, never plan”

2. Plans directory in .gitignore:

.claude/plans/Review/    # Work in progress
.claude/plans/Active/    # Under implementation
# Don't ignore Completed/ (optional: archive for team learning)

3. Use /plan mode: Claude Zero should start with /plan for safe exploration:

/plan

[Your feature request]

4. Interview prompts: Encourage Claude Zero to ask clarifying questions:

"Interview me about requirements before drafting the plan.
Ask about edge cases, success criteria, and constraints."

5. Review checklist: When Claude Zero reviews Claude One’s implementation:

Code matches plan structure?
All files from plan created/modified?
Tests cover success criteria?
Security best practices followed?
No TODO comments for core functionality?

Limitations

When dual-instance doesn’t help:

Trivial changes: Typo fixes, simple refactors → 1 instance faster
Exploratory coding: Unknown problem space → planning overhead not justified
Tight deadlines: Speed > quality → use 1 instance, accept corrections
Very limited budget: <$100/month → use Sonnet, 1 instance

Overhead:

Manual coordination: You move plans between directories (no automation)
Context switching: Managing 2 terminal sessions
Slower iteration: Plan → approve → implement (vs immediate execution)

Partial adoption: You can use this pattern selectively:

Dual-instance for complex features
Single instance for simple tasks
No need to commit to one pattern exclusively

Foundation: Git Worktrees (Non-Negotiable)

Multi-instance workflows REQUIRE git worktrees to avoid conflicts. Without worktrees, parallel instances create merge hell.

Why worktrees are critical:

Each instance operates in isolated git checkout
No branch switching = no context loss
No merge conflicts during development
Instant creation (~1s vs minutes for full clone)

Quick setup:

# Create worktree with new branch
/git-worktree feature/auth

# - Separate checkout
# - Shared .git history
# - Zero duplication overhead

See also:

Command: /git-worktree
Workflow: Database Branch Setup

Advanced Tooling for Worktree Management (Optional)

While git worktrees are foundational, daily productivity improves with automation wrappers. Multiple professional teams have independently created worktree management tools—a validated pattern.

Pattern Validation: 3 Independent Implementations

Team	Solution	Key Features
incident.io	Custom bash wrapper `w`	Auto-completion, organized in `~/projects/worktrees/`, Claude auto-launch
GitHub #1052	Fish shell functions (8 commands)	LLM commits, rebase automation, worktree lifecycle
Worktrunk	Rust CLI (1.6K stars, 64 releases)	Project hooks, CI status, PR links, multi-platform

Conclusion: The worktree wrapper pattern is reinvented by power users. Vanilla git is sufficient but verbose for 5-10+ daily worktree operations.

Do I Need Worktrunk? (Self-Assessment)

Answer these 3 questions honestly:

Volume: How many worktrees do you create per week?
- ❌ <5/week → Vanilla git sufficient
- ⚠️ 5-15/week → Consider lightweight alias
- ✅ 15+/week → Worktrunk or DIY wrapper justified
Multi-instance workflow: Are you running 5+ parallel Claude instances regularly?
- ❌ No, 1-2 instances → Vanilla git sufficient
- ⚠️ Sometimes 3-5 instances → Alias or lightweight wrapper
- ✅ Yes, 5-10+ instances daily → Worktrunk features valuable (CI status, hooks)
Team context: Who else uses your worktree workflow?
- ❌ Solo dev → Alias (zero dependency)
- ⚠️ Small team, same OS/shell → DIY wrapper (shared script)
- ✅ Multi-platform team → Worktrunk (Homebrew/Cargo/Winget)

Decision matrix:

Profile	Weekly Worktrees	Instances	Team	Recommendation
Beginner	<5	1-2	Solo	✅ Vanilla git - Learn fundamentals first
Casual user	5-15	2-3	Solo/Small	⚠️ Alias (2 min setup, example below)
Power user	15-30	5-10	Multi-platform	✅ Worktrunk - ROI justified
Boris scale	30+	10-15	Team	✅ Worktrunk + orchestrator

Quick alias alternative (for “Casual user” profile):

If you scored ⚠️ (5-15 worktrees/week), try this first before installing Worktrunk:

# Add to ~/.zshrc or ~/.bashrc (2 minutes setup)
wtc() {
    local branch=$1
    local path="../${PWD##*/}.${branch//\//-}"
    git worktree add -b "$branch" "$path" && cd "$path"
}
alias wtl='git worktree list'
alias wtd='git worktree remove'

Usage: wtc feature/auth (18 chars vs 88 chars vanilla git, -79% typing)

When to upgrade to Worktrunk:

Alias feels limiting (want CI status, LLM commits, project hooks)
Volume increases to 15+ worktrees/week
Team adopts multi-instance workflows (need consistent tooling)

Bottom line: Most readers (80%) should start with vanilla git or alias. Worktrunk is for power users managing 5-10+ instances daily where typing friction and CI visibility matter.

Benchmark: Wrapper vs Vanilla Git

Operation	Vanilla Git	Worktrunk	Custom Wrapper
Create + switch	`git worktree add -b feat ../repo.feat && cd ../repo.feat`	`wt switch -c feat`	`w myproject feat`
List worktrees	`git worktree list`	`wt list` (with CI status)	`w list`
Remove + cleanup	`git worktree remove ../repo.feat && git worktree prune`	`wt remove feat`	`w finish feat`
LLM commit msg	Manual or custom script	Built-in via `llm` tool	Custom via LLM API
Setup time	0 (git installed)	2 min (Homebrew/Cargo)	10-30 min (copy-paste script)
Maintenance	Git updates only	Active (64 releases)	Manual (custom code)

Trade-off: Wrappers reduce typing ~60% but add dependency. Learn git fundamentals first, add wrapper for speed later.

Option 1: Worktrunk (Recommended for Scale)

What: Rust CLI simplifying worktree management (1.6K stars, active development since 2023)

Unique features not in git:

Project-level hooks: Automate post-create, pre-remove actions
LLM integration: wt commit generates messages via llm tool
CI status tracking: See build status inline with wt list
PR link generation: Quick links to open PRs per worktree
Path templates: Configure worktree location pattern once

Installation:

# macOS/Linux
brew install worktrunk

# Or via Rust
cargo install worktrunk

# Windows
winget install worktrunk

Typical workflow:

# Create worktree + switch
wt switch -c feature/auth

# Work with Claude...
claude

# LLM-powered commit
wt commit  # Generates message from diff

# List all worktrees with status
wt list

# Remove when done
wt remove feature/auth

When to use: Managing 5+ worktrees daily, want CI integration, multi-platform team (macOS/Linux/Windows).

Source: github.com/max-sixty/worktrunk

Option 2: DIY Custom Wrapper (Lightweight Alternative)

What: 10-50 lines of bash/fish/PowerShell tailored to your workflow.

Examples from production teams:

incident.io approach (bash wrapper):

# Function: w myproject feature-name claude
# - Creates worktree in ~/projects/worktrees/myproject.feature-name
# - Auto-completion for projects and branches
# - Launches Claude automatically

ROI: 18% improvement (30s) on API generation time
Source: incident.io blog post

GitHub #1052 approach (Fish shell, 8 functions):

git worktree-llm feature-name    # Create + start Claude
git worktree-merge                # Finish, commit, rebase, merge
git commit-llm                    # LLM-generated commit messages

Author quote: “I now use it for basically all my development where I can use claude code”
Source: Claude Code issue #1052

When to use: Want full control, small team (same shell), already have shell functions for git.

Trade-off: Custom scripts lack maintenance, cross-platform support, but are zero-dependency and infinitely customizable.

Recommendation: Learn → Wrapper → Scale

Phase 1 (Weeks 1-2): Master vanilla git worktree via /git-worktree command
  └─ Understand fundamentals, safety checks, database branching

Phase 2 (Week 3+): Add wrapper for productivity
  ├─ Worktrunk (if multi-platform, want CI status, LLM commits)
  └─ DIY bash/fish (if lightweight, team uses same shell)

Phase 3 (Multi-instance scale): Combine with orchestration
  └─ Worktrunk/wrapper + Headless PM for 5-10 instances

Philosophy: Tools amplify knowledge. Master git patterns (this guide) before adding convenience layers. Wrappers save 5-10 minutes/day but don’t replace understanding.

Anthropic stance: Official best practices recommend git worktrees (vanilla) but remain agnostic on wrappers. Choose what fits your team.

Anthropic Internal Study (August 2025)

Anthropic studied how their own engineers use Claude Code, providing empirical data on productivity and limitations.

Study scope:

132 engineers and researchers surveyed
53 qualitative interviews conducted
200,000 session transcripts analyzed (Feb-Aug 2025)

Productivity gains:

+50% productivity (self-reported, vs +20% 12 months prior)
2-3x increase year-over-year in usage and output
59% of work involves Claude (vs 28% a year ago)
27% of work “wouldn’t have been done otherwise” (scope expansion, not velocity)

Autonomous actions:

21.2 consecutive tool calls without human intervention (vs 9.8 six months prior)
+116% increase in autonomous action chains
33% reduction in human interventions required
Average task complexity: 3.8/5 (vs 3.2 six months before)

Critical concerns (verbatim quotes from engineers):

“When producing is so easy and fast, it’s hard to really learn”

“It’s difficult to say what roles will be in a few years”

“I feel like I come to work each day to automate myself”

Implications: Even at Anthropic (perfect conditions: created the tool, ideal architecture, unlimited budget), engineers express uncertainty about long-term skill development and role evolution.

Source: Anthropic Research - How AI is Transforming Work at Anthropic (Aug 2025)

Contribution Metrics (January 2026)

Five months after the internal study, Anthropic published updated productivity data alongside a new analytics feature for Team and Enterprise customers.

Updated metrics (Anthropic internal):

+67% PRs merged per engineer per day (vs Aug 2025 self-reported +50%)
70-90% of code now written with Claude Code assistance across teams

Methodological note: These figures are PR/commit-based (measured via GitHub integration), not self-reported surveys as in the Aug 2025 study. However, Anthropic discloses no baseline period, no team breakdown, and defines measurement only as “conservative — only code where we have high confidence in Claude Code’s involvement.” Treat as directional indicators, not rigorous benchmarks.

Product feature — Contribution Metrics dashboard:

Status: Public beta (January 2026)
Availability: Claude Team and Enterprise plans (exact add-on requirements unconfirmed)
Tracks: PRs merged and lines of code committed, with/without Claude Code attribution
Access: Workspace admins and owners only
Setup: Install Claude GitHub App → Enable GitHub Analytics in Admin settings → Authenticate GitHub organization
Positioning: Complement to existing engineering KPIs (DORA metrics, sprint velocity), not a replacement

Source: Anthropic — Contribution Metrics (Jan 2026)

Cost-Benefit Analysis

Multi-instance workflows have hard costs and soft overhead (coordination, supervision, merge conflicts).

Direct API Costs

Scale	Model	Monthly Cost	Break-Even Productivity Gain
5 devs, 2 instances each	Sonnet	$390-750	3-5%
10 devs, 2-3 instances	Sonnet	$1,080-1,650	1.3-2%
Boris scale (15 instances)	Opus	$500-1,000	Justified if 259 PRs/month

Calculation basis (Sonnet 4.5):

Input: $3/million tokens
Output: $15/million tokens
Estimate: 30k tokens/instance/day × 20 days
5 devs × 2 instances × 600k tokens/month = ~$540/month

OpusPlan optimization: Use Opus for planning (10-20% of work), Sonnet for execution (80-90%). Reduces cost while maintaining quality.

Hidden Costs (Not in API Bill)

Cost Type	Impact	Mitigation
Coordination overhead	10-20% time managing instances	Headless PM framework
Merge conflicts	5-15% time resolving conflicts	Git worktrees + modular architecture
Context switching	Cognitive load × number of instances	Limit to 2-3 instances per developer
Supervision	Must review all autonomous output	Automated tests + code review

ROI monitoring:

Baseline: Track PRs/month before multi-instance (3 months)
Implement: Scale to multi-instance with monitoring
Measure: PRs/month after 3 months
Decision: If gain <3%, rollback to sequential

Orchestration Frameworks

Coordinating multiple Claude instances without chaos requires tooling.

Headless PM (Open Source)

Project: madviking/headless-pm (158 stars)

Architecture:

REST API for centralized coordination
Task locking: Prevents parallel work on same file
Role-based agents: PM, Architect, Backend, Frontend, QA
Document-based communication: Agents @mention each other
Git workflow guidance: Automatic PR/commit suggestions

Workflow:

Epic → Features → Tasks (major=PR, minor=commit)
  ↓
Agents register, lock tasks, update status
  ↓
Architect reviews (approve/reject)
  ↓
Communication via docs with @mention

Use case: Teams managing 5-10 instances without manual coordination overhead.

Alternatives

Tool	Best For	Cost	Key Feature
Cursor Parallel Agents	Solo/small teams	$20-40/month	UI integrated, git worktrees built-in
Windsurf Cascade	Large codebases	$15/month	10x faster context (Codemaps)
Sequential Claude	Most teams	$20/month	1-2 instances with better prompting

Implementation Guide (Progressive Scaling)

Don’t jump to 10 instances. Scale progressively with validation gates.

Phase 1: Single Instance Mastery (2-4 weeks)

Goal: Achieve >80% success rate with 1 instance before scaling.

# 1. Create CLAUDE.md (2-3k tokens)
# - Conventions (naming, imports)
# - Workflows (git, testing)
# - Patterns (state management)

# 2. Implement feedback loops
# - Automated tests (run after every change)
# - Pre-commit hooks (validation gates)
# - /validate command (quality checks)

# 3. Measure baseline
# - PRs/month
# - Test pass rate
# - Time to merge

Success criteria: 80%+ PRs merged without major revisions.

Phase 2: Dual Instance Testing (1 month)

Goal: Validate that 2 instances increase throughput without chaos.

# 1. Setup git worktrees
/git-worktree feature/backend
/git-worktree feature/frontend

# 2. Parallel development
# - Instance 1: Backend API
# - Instance 2: Frontend UI
# - Ensure decoupled work (no file overlap)

# 3. Monitor conflicts
# - Track merge conflicts per week
# - If >2% conflict rate, pause and fix architecture

Success criteria: <2% merge conflicts, >5% productivity gain vs single instance.

Phase 3: Multi-Instance (if Phase 2 successful)

Goal: Scale to 3-5 instances with orchestration framework.

# 1. Deploy orchestration framework (choose based on needs)
# - Headless PM (manual coordination)
# - Gas Town (parallel task execution)
# - multiclaude (self-hosted, tmux-based)
# - Entire CLI (governance + sequential handoffs)

# 2. Define roles
# - Architect (reviews PRs)
# - Backend (API development)
# - Frontend (UI development)
# - QA (test automation)

# 3. Weekly retrospectives
# - Review conflict rate
# - Measure ROI (cost vs output)
# - Adjust instance count

Orchestration framework options:

Tool	Paradigm	Best For
Manual (worktrees)	No framework	2-3 instances, full control
Gas Town	Parallel coordination	5+ instances, complex parallel tasks
multiclaude	Self-hosted spawner	Teams needing on-prem/airgap
Entire CLI	Governance + handoffs	Sequential workflows with compliance

Entire CLI (Feb 2026): Alternative to parallel orchestration, focuses on sequential agent handoffs with governance layer (approval gates, audit trails). Useful for compliance-critical workflows (SOC2, HIPAA) or multi-agent handoffs (Claude → Gemini). See AI Ecosystem Guide for details.

Success criteria: Sustained 3-5% productivity gain over 3 months.

Monitoring & Observability

Track multi-instance workflows with metrics to validate ROI.

Essential Metrics

Metric	Tool	Target	Red Flag
Merge conflicts	`git log --grep="Merge conflict"`	<2%	>5%
PRs/month	GitHub Insights	+3-5% vs baseline	Flat or declining
Test pass rate	CI/CD	>95%	<90%
API cost	Session stats script	Within budget	>20% over

Session stats script (from this guide):

# Track API usage across all instances
./examples/scripts/session-stats.sh --range 7d --json

# Monitor per-instance cost
./examples/scripts/session-stats.sh --project backend --range 30d

See also: Session Observability Guide

Warning Signs (Rollback Triggers)

Stop multi-instance and return to sequential if you see:

Merge conflicts >5% of PRs
CLAUDE.md grows >5k tokens (sign of chaos)
Test quality degrades (coverage drops, flaky tests increase)
Supervision overhead >30% developer time
Team reports skill atrophy or frustration

When NOT to Use Multi-Instance

Be honest about your context. Most teams should stay sequential.

Architecture Red Flags

❌ Legacy monolith (tight coupling):

Claude struggles with implicit dependencies
Context pollution across instances
Merge conflicts frequent

❌ Event-driven systems (complex interactions):

Hard to decompose into parallel tasks
Integration testing becomes nightmare

❌ No automated tests:

Can’t validate autonomous output
“Death spirals” where broken tests stay broken

Team Red Flags

❌ Solo developer:

Coordination overhead unjustified
Cursor parallel agents simpler (UI integrated)

❌ Team <3 people:

Not enough concurrent work to parallelize
Better ROI from optimizing single-instance workflow

❌ Junior team:

Requires expertise in Claude Code, git worktrees, prompt engineering
Start with single instance, scale later

Budget Red Flags

❌ <$500/month available:

Multi-instance costs $400-1,000/month minimum
Better investment: training, better prompts, Cursor

Decision Matrix

Use this flowchart to decide if multi-instance is right for you:

New feature request
├─ Solo dev?
│  └─ Use Cursor ($20/month)
│
├─ Startup <10 devs?
│  ├─ Legacy code without tests?
│  │  └─ Fix architecture first (1-2 months)
│  └─ Modular + tested?
│     └─ Try 2 instances (1 month pilot)
│
├─ Scale-up 10-50 devs?
│  ├─ Budget >$1k/month?
│  │  └─ Deploy Headless PM framework
│  └─ Budget <$1k/month?
│     └─ Sequential optimized (better prompts)
│
└─ Enterprise 50+ devs?
   └─ Windsurf + custom orchestration

Resources

Primary sources:

Related guides:

Community discussions:

9.18 Codebase Design for Agent Productivity

Source: Agent Experience Best Practices for Coding Agent Productivity François Zaninotto, Marmelab (January 21, 2026) Additional validation: Netlify AX framework (2025), Speakeasy implementation guide, ArXiv papers on agent context engineering

📌 Section 9.18 TL;DR (2 minutes)

The paradigm shift: Traditional codebases are optimized for human developers. AI agents have different needs—they excel at pattern matching but struggle with implicit knowledge and scattered context.

Key principles:

Domain Knowledge Embedding: Put business logic and design decisions directly in code (CLAUDE.md, ADRs, comments)
Code Discoverability: Make code “searchable” like SEO—use synonyms, tags, complete terms
Documentation Formats: Use llms.txt for AI-optimized documentation indexing (complements MCP servers)
Token Efficiency: Split large files, remove obvious comments, use verbose flags for debug output
Testing for Autonomy: TDD is more critical for agents than humans—tests guide behavior
Guardrails: Hooks, CI checks, and PR reviews catch agent mistakes early

When to optimize for agents: High-impact files (core business logic, frequently modified modules) and greenfield projects. Don’t refactor stable code just for agents.

Cross-references: CLAUDE.md patterns (3.1) · Hooks (6.2) · Pitfalls (9.11) · Methodologies (9.14)

9.18.1 The Paradigm Shift: Designing for Agents

Traditional vs AI-Native Codebase Design

Aspect	Human-Optimized	Agent-Optimized
Comments	Sparse, assume context	Explicit “why” + synonyms
File size	1000+ lines OK	Split at 500 lines
Architecture docs	Separate wiki/Confluence	Embedded in CLAUDE.md + ADRs
Conventions	Oral tradition, tribal knowledge	Written, discoverable, tagged
Testing	Optional for prototypes	Critical—agents follow tests
Error messages	Generic	Specific with recovery hints

Why this matters: Agents read code sequentially and lack the “mental model” humans build over time. What’s obvious to you (e.g., “this service handles auth”) must be made explicit.

The Agent Experience (AX) Framework

Netlify coined “Agent Experience” as the agent equivalent of Developer Experience (DX). Key questions:

Can the agent find what it needs? (Discoverability)
Can it understand design decisions? (Domain Knowledge)
Can it validate its work? (Testing + Guardrails)
Can it work efficiently? (Token budget)

“Agent Experience is about reducing cognitive friction for AI, just as DX reduces friction for humans.” — Netlify AX Research Team

Real-world impact:

Marmelab: Refactored Atomic CRM codebase with AX principles → 40% faster feature delivery
Speakeasy: Agent-friendly API docs → 3x higher API adoption rates
Anthropic internal: Codebase restructuring → 60% reduction in agent hallucinations

When to invest in AX:

✅ Greenfield projects (design agent-friendly from start)
✅ High-churn files (business logic, API routes)
✅ Teams using agents extensively (>50% of commits)
❌ Stable legacy code (don’t refactor just for agents)
❌ Small scripts (<100 lines, agents handle fine)

Convention-Over-Configuration for AI Agents

Problem: Every configuration decision adds cognitive load for agents. Custom architectures require extensive CLAUDE.md documentation to prevent hallucinations.

Solution: Choose opinionated frameworks that reduce decision space through enforced conventions.

Why opinionated frameworks help agents:

Aspect	Custom Architecture	Opinionated Framework
File organization	Agent must learn your structure	Standard conventions (e.g., Next.js `app/`, Rails MVC)
Routing	Custom logic, must be documented	Convention-based (file = route)
Data access	Multiple patterns possible	Single pattern enforced (e.g., Rails Active Record)
Testing setup	Agent must discover your approach	Framework provides defaults
CLAUDE.md size	Large (must document everything)	Smaller (conventions already known)

Examples of opinionated frameworks:

Next.js: app/ directory structure, file-based routing, server components conventions
Rails: MVC structure, Active Record patterns, generator conventions
Phoenix (Elixir): Context boundaries, schema conventions, LiveView patterns
Django: Apps structure, settings conventions, admin interface patterns

Real-world impact:

When agents work with opinionated frameworks, they:

Make fewer mistakes (fewer choices = fewer wrong choices)
Generate boilerplate faster (know the patterns)
Require less CLAUDE.md documentation (conventions replace custom instructions)
Produce more consistent code (follow framework idioms)

Trade-offs:

Benefit	Cost
Faster agent onboarding	Less architectural flexibility
Smaller CLAUDE.md files	Framework lock-in
Fewer hallucinations	Must accept framework opinions
Consistent patterns	Learning curve for team

Connection to CLAUDE.md sizing:

Convention-over-configuration directly reduces CLAUDE.md token requirements:

# Custom Architecture (500+ lines CLAUDE.md)
## File Organization
- API routes in `src/endpoints/`
- Business logic in `src/domain/`
- Data access in `src/repositories/`
- Validation in `src/validators/`
... (extensive documentation of custom patterns)

# Next.js (50 lines CLAUDE.md)
## Project Context
We use Next.js 14 with App Router.
... (minimal context, rest is framework conventions)

Recommendation: For greenfield projects with AI-assisted development, prefer opinionated frameworks unless architectural constraints require custom design. The reduction in agent cognitive load often outweighs loss of flexibility.

See also: CLAUDE.md sizing guidelines (Section 3.2) for token optimization patterns.

9.18.2 Domain Knowledge Embedding

Problem: Agents lack context about your business domain, design decisions, and project history. They can read code syntax but miss the “why” behind decisions.

Solution: Embed domain knowledge directly in discoverable locations.

CLAUDE.md: Advanced Patterns

Beyond basic project setup, use CLAUDE.md to encode deep domain knowledge:

Personas and roles:

## Domain Context

**Product**: SaaS platform for event management (B2B, enterprise clients)
**Business model**: Subscription-based, tiered pricing
**Core value prop**: Seamless integration with 20+ calendar providers

## Design Principles

1. **Idempotency First**: All API mutations must be idempotent (event industry = duplicate requests common)
2. **Eventual Consistency**: Calendar sync uses queue-based reconciliation (not real-time)
3. **Graceful Degradation**: If external calendar API fails, store locally + retry (never block user)

## Domain Terms

- **Event**: User-created calendar entry (our domain model)
- **Appointment**: External calendar system's term (Google/Outlook)
- **Sync Job**: Background process reconciling our DB with external calendars
- **Conflict Resolution**: Algorithm handling overlapping events (see `src/services/conflict-resolver.ts`)

## Gotchas

- Google Calendar API has 10 req/sec rate limit per user → batch operations in `syncEvents()`
- Outlook timezone handling is non-standard → use `normalizeTimezone()` helper
- Event deletion = soft delete (set `deletedAt`) to maintain audit trail for compliance

Why this works: When the agent encounters syncEvents(), it understands the rate limiting constraint. When it sees deletedAt, it knows not to use hard deletes.

See also: CLAUDE.md Best Practices (3.1) for foundational setup.

Code Comments: What vs How

❌ Don’t write obvious comments:

// Get user by ID
function getUserById(id: string) {
  return db.users.findOne({ id });
}

✅ Do explain the “why” and business context:

// Fetch user with calendar permissions. Returns null if user exists but
// lacks calendar access (common after OAuth token expiration).
// Callers should handle null by redirecting to re-auth flow.
function getUserById(id: string) {
  return db.users.findOne({ id });
}

Even better: Add domain knowledge + edge cases:

// Fetch user with calendar permissions for event sync operations.
//
// Returns null in two cases:
// 1. User doesn't exist (rare, DB inconsistency)
// 2. User exists but calendar OAuth token expired (common, ~5% of calls)
//
// Callers MUST handle null by:
// - Redirecting to /auth/calendar/reauth (UI flows)
// - Logging + skipping sync (background jobs)
//
// Related: See `refreshCalendarToken()` for automatic token refresh strategy.
// Rate limits: Google Calendar = 10 req/sec, Outlook = 20 req/sec
function getUserById(id: string): Promise<User | null> {
  return db.users.findOne({ id });
}

What the agent gains:

Knows null is expected, not an error condition
Understands business context (OAuth expiration)
Has concrete recovery strategies
Can navigate to related code (refreshCalendarToken)
Knows external API constraints

Architecture Decision Records (ADRs)

Store ADRs in docs/decisions/ and reference from code:

# ADR-007: Event Deletion Strategy

**Status**: Accepted
**Date**: 2025-11-15
**Authors**: Engineering team

## Context

Event deletion is complex because:
1. Legal requirement to retain audit trail (GDPR Article 30)
2. External calendar APIs handle deletes differently (Google = permanent, Outlook = recoverable)
3. Users expect "undo" within 30-day window

## Decision

Use soft deletes with `deletedAt` timestamp:
- Events marked deleted remain in DB for 90 days
- UI hides deleted events immediately
- Background job purges after 90 days
- External calendars notified via webhook (eventual consistency)

## Consequences

**Benefits**:
- Compliance with GDPR audit requirements
- Consistent "undo" experience regardless of calendar provider
- Simpler conflict resolution (deleted events participate in sync)

**Drawbacks**:
- DB grows ~10% larger (deleted events retained)
- Complex query patterns (always filter `deletedAt IS NULL`)

## Related Code

- `src/models/event.ts` (Event model with deletedAt field)
- `src/services/event-deleter.ts` (soft delete logic)
- `src/jobs/purge-deleted-events.ts` (90-day cleanup)

In code, reference ADRs:

// Soft delete per ADR-007. Never use db.events.delete() due to
// compliance requirements (GDPR audit trail).
async function deleteEvent(eventId: string) {
  await db.events.update(
    { id: eventId },
    { deletedAt: new Date() }
  );
}

Agent benefit: When agent sees deletedAt, it can read ADR-007 to understand full context and constraints.

9.18.3 Code Discoverability (SEO for Agents)

Problem: Agents search for code using keyword matching. If your variable is named usr, the agent won’t find it when searching for “user”.

Solution: Treat code discoverability like SEO—use complete terms, synonyms, and tags.

Use Complete Terms, Not Abbreviations

❌ Agent-hostile:

function calcEvtDur(evt: Evt): number {
  const st = evt.stTm;
  const et = evt.etTm;
  return et - st;
}

✅ Agent-friendly:

// Calculate event duration in milliseconds.
// Also known as: event length, time span, appointment duration
function calculateEventDuration(event: Event): number {
  const startTime = event.startTime;
  const endTime = event.endTime;
  return endTime - startTime;
}

What changed:

calcEvtDur → calculateEventDuration (full term)
Comment includes synonyms (“event length”, “time span”) so agent finds this when searching for those terms
Type Evt → Event (no abbreviation)

Add Synonyms in Comments

Your domain may use multiple terms for the same concept. Make them all searchable:

// User account record. Also called: member, subscriber, customer, client.
// Note: In external calendar APIs, this maps to their "principal" or "identity" concepts.
interface User {
  id: string;
  email: string;
  calendarToken: string;  // OAuth token for calendar access, aka "access token", "auth credential"
}

Why this works: When agent searches for “subscriber” or “principal”, it finds this code despite those terms not being in the type name.

Tags and Faceting

Use JSDoc-style tags for categorization:

/**
 * Process incoming webhook from Google Calendar.
 *
 * @domain calendar-sync
 * @external google-calendar-api
 * @rate-limit 100/min (Google's limit, not ours)
 * @failure-mode Queues failed webhooks for retry (see retry-queue.ts)
 * @related syncEvents, refreshCalendarToken
 */
async function handleGoogleWebhook(payload: WebhookPayload) {
  // implementation
}

Agent queries enabled:

“What code touches the google calendar api?” → Finds via @external tag
“Which functions have rate limits?” → Finds via @rate-limit tag
“What’s related to syncEvents?” → Finds via @related tag

Directory README Pattern

Place a README.md in each major directory explaining its purpose:

src/
├── services/
│   ├── README.md          ← "Service layer: business logic, no HTTP concerns"
│   ├── event-service.ts
│   └── user-service.ts
├── controllers/
│   ├── README.md          ← "HTTP controllers: request/response handling only"
│   ├── event-controller.ts
│   └── user-controller.ts

src/services/README.md:

# Services Layer

**Purpose**: Business logic and domain operations. Services are framework-agnostic (no Express/HTTP concerns).

**Conventions**:
- One service per domain entity (EventService, UserService)
- Services interact with repositories (data layer) and other services
- All service methods return domain objects, never HTTP responses
- Error handling: Throw domain errors (EventNotFoundError), not HTTP errors

**Dependencies**:
- Services may call other services
- Services may call repositories (`src/repositories/`)
- Services must NOT import from `controllers/` (layering violation)

**Testing**: Unit test services with mocked repositories. See `tests/services/` for examples.

**Related**: See ADR-003 for layered architecture rationale.

Agent benefit: When working in services/, agent reads README and understands constraints (no HTTP concerns, layer boundaries).

Example: Before vs After Discoverability

❌ Before (Agent-hostile):

class UsrMgr {
  async getUsr(id: string) {
    return db.query('SELECT * FROM usr WHERE id = ?', [id]);
  }

  async updUsr(id: string, data: any) {
    return db.query('UPDATE usr SET ? WHERE id = ?', [data, id]);
  }
}

Agent challenges:

Abbreviated names (UsrMgr, getUsr) → hard to find
No comments → no context
any type → agent doesn’t know data shape
No domain knowledge → what is “usr”?

✅ After (Agent-friendly):

/**
 * User account management service.
 * Also known as: member manager, subscriber service, customer service
 *
 * @domain user-management
 * @layer service
 * @related user-repository, auth-service
 */
class UserManager {
  /**
   * Fetch user account by ID. Returns null if not found.
   * Also called: get member, fetch subscriber, load customer
   *
   * Common use cases:
   * - Authentication flows (verifying user exists)
   * - Profile page rendering (loading user details)
   * - Admin operations (fetching user for support)
   */
  async getUser(userId: string): Promise<User | null> {
    return db.query('SELECT * FROM users WHERE id = ?', [userId]);
  }

  /**
   * Update user account fields. Performs partial update (only provided fields).
   * Also known as: modify user, edit member, change subscriber details
   *
   * @param userId - Unique user identifier (UUID v4)
   * @param updates - Partial user data (email, name, etc.)
   * @throws {UserNotFoundError} If user doesn't exist
   * @throws {ValidationError} If updates fail schema validation
   *
   * Example:
   *   await userManager.updateUser('user-123', { email: 'new@example.com' });
   */
  async updateUser(userId: string, updates: Partial<User>): Promise<User> {
    return db.query('UPDATE users SET ? WHERE id = ?', [updates, userId]);
  }
}

Improvements:

Full names (UserManager, getUser)
Synonyms in comments (member, subscriber, customer)
Tags for faceting (@domain, @layer, @related)
Typed parameters and return values
Use case examples
Error documentation

Agent search results:

Query	Finds Before?	Finds After?
“user management”	❌	✅ (class comment)
“member service”	❌	✅ (synonym)
“fetch subscriber”	❌	✅ (synonym)
“service layer”	❌	✅ (@layer tag)
“authentication”	❌	✅ (use case)

9.18.4 Documentation Formats for Agents (llms.txt)

Problem: Agents need to discover and consume project documentation efficiently. Traditional documentation (wikis, Confluence) is hard to find and parse. MCP doc servers require installation and configuration.

Solution: Use the llms.txt standard for AI-optimized documentation indexing.

What is llms.txt?

llms.txt is a lightweight standard for making documentation discoverable to LLMs. It’s like robots.txt for AI agents—a simple index file that tells agents where to find relevant documentation.

Specification: https://llmstxt.org/

Format: Plain text file at /llms.txt or /machine-readable/llms.txt containing:

Markdown content directly (inline docs)
Links to external documentation files
Structured sections for different topics

Example from this repo (machine-readable/llms.txt):

# Claude Code Ultimate Guide

Complete guide for Anthropic's Claude Code CLI (19,000+ lines, 120 templates)

## Quick Start
- Installation: guide/ultimate-guide.md#installation (line 450)
- First Session: guide/cheatsheet.md#first-session
- CLAUDE.md Setup: guide/ultimate-guide.md#31-claudemd-project-context (line 1850)

## Core Concepts
- Agents: guide/ultimate-guide.md#4-agents (line 4100)
- Skills: guide/ultimate-guide.md#5-skills (line 5400)
- Hooks: guide/ultimate-guide.md#62-hooks (line 7200)

## Templates
- Custom agents: examples/agents/
- Slash commands: examples/commands/
- Event hooks: examples/hooks/

Why llms.txt Complements MCP Servers

llms.txt and MCP doc servers solve different problems:

Aspect	llms.txt	Context7 MCP
Purpose	Static documentation index	Runtime library lookup
Setup	Zero config (just a file)	Requires MCP server install
Content	Project-specific docs	Official library docs
Token cost	Low (index only, ~500 tokens)	Medium (full doc fetching)
Use case	Project README, architecture	React API, Next.js patterns
Update frequency	Manual (on doc changes)	Automatic (tracks library versions)

Best practice: Use both:

llms.txt for project-specific documentation (architecture, conventions, getting started)
Context7 MCP for official library documentation (React hooks, Express API)

Creating llms.txt for Your Project

Minimal example:

# MyProject

Enterprise SaaS platform for event management

## Getting Started
- Setup: docs/setup.md
- Architecture: docs/architecture.md
- API Reference: docs/api.md

## Development
- Testing: docs/testing.md
- Deployment: docs/deployment.md
- Troubleshooting: docs/troubleshooting.md

Advanced example with line numbers:

# MyProject

## Architecture Decisions
- Why microservices: docs/decisions/ADR-001.md (line 15)
- Event-driven design: docs/architecture.md#event-bus (line 230)
- Database strategy: docs/decisions/ADR-005.md (line 42)

## Common Patterns
- Authentication flow: src/services/auth-service.ts (line 78-125)
- Error handling: CLAUDE.md#error-patterns (line 150)
- Rate limiting: src/middleware/rate-limiter.ts (line 45)

## Domain Knowledge
- Event lifecycle: docs/domain/events.md
- Payment processing: docs/domain/payments.md
- Webhook handling: docs/domain/webhooks.md

Line numbers help agents jump directly to relevant sections without reading entire files.

When to Update llms.txt

Update llms.txt when:

Adding new major documentation files
Restructuring docs directory
Documenting new architectural patterns
Adding ADRs (Architecture Decision Records)
Creating domain-specific guides

Don’t update for:

Code changes (unless architecture shifts)
Minor doc tweaks
Dependency updates

Integration with CLAUDE.md

llms.txt and CLAUDE.md serve different purposes:

File	Purpose	Audience
CLAUDE.md	Active instructions, project context	Claude during this session
llms.txt	Documentation index	Claude discovering resources

Pattern: Reference llms.txt from CLAUDE.md:

## Project Documentation

Complete documentation is indexed in `machine-readable/llms.txt`.

Key resources:
- Architecture overview: docs/architecture.md
- API reference: docs/api.md
- Testing guide: docs/testing.md

For domain-specific knowledge, consult llms.txt index.

Real-World Example: This Guide

This guide uses both llms.txt and CLAUDE.md:

llms.txt (machine-readable/llms.txt):

Indexes all major sections with line numbers
Points to templates in examples/
References workflows in guide/workflows/

CLAUDE.md (CLAUDE.md):

Active project context (repo structure, conventions)
Current focus (guide version, changelog)
Working instructions (version sync, landing sync)

Result: Agents can discover content via llms.txt, then consult CLAUDE.md for active context.

Real-World: Anthropic’s Official llms.txt

Anthropic publie deux variantes LLM-optimized pour Claude Code :

Fichier	URL	Taille	Tokens (approx)	Use case
`llms.txt`	`code.claude.com/docs/llms.txt`	~65 pages	~15-20K	Index rapide, découverte de sections
`llms-full.txt`	`code.claude.com/docs/llms-full.txt`	~98 KB	~25-30K	Fact-checking, doc complète, source de vérité

Pattern recommandé : fetch llms.txt d’abord pour identifier la section pertinente, puis fetch la page spécifique (ou llms-full.txt) pour les détails. Évite de charger 98 KB quand seules 2 pages sont nécessaires.

Ces URLs sont la source officielle à consulter en priorité quand un claim sur Claude Code semble incertain ou potentiellement obsolète.

Specification Resources

Official spec: https://llmstxt.org/
Community examples: https://github.com/topics/llms-txt
This guide’s implementation: machine-readable/llms.txt

Not recommended source: Framework-specific blog posts (often present llms.txt in opposition to MCP servers, when they’re complementary).

9.18.5 Token-Efficient Codebase

Problem: Agents have token limits. Large files consume context budget quickly, forcing agents to read in chunks and lose coherence.

Solution: Structure code to minimize token usage while maximizing agent comprehension.

Split Large Files (Agents Read in Chunks)

Guideline: Keep files under 500 lines. Agents typically read 200-300 lines at a time (depending on model context).

❌ Monolithic file (1200 lines):

src/services/event-service.ts

✅ Split by concern:

src/services/event/
├── event-service.ts         (200 lines: public API + orchestration)
├── event-validator.ts       (150 lines: validation logic)
├── event-calendar-sync.ts   (300 lines: external calendar sync)
├── event-conflict-resolver.ts (250 lines: overlap detection)
└── README.md                (explains module structure)

Why this works:

Agent can load just what it needs (event-validator.ts for validation work)
Each file has clear responsibility
Easier to navigate via imports

When to split:

File >500 lines and growing
File has multiple unrelated concerns (validation + sync + conflict resolution)
Agent frequently reads only part of the file

When NOT to split:

File is cohesive (one class with related methods)
Splitting would create artificial boundaries
File size <300 lines

See also: Context Management (2.1) for token optimization strategies.

Remove Obvious Comments (Reduce Noise)

❌ Wasteful tokens:

// Import React
import React from 'react';

// Import useState hook
import { useState } from 'react';

// Define Props interface
interface Props {
  // User name
  name: string;
  // User age
  age: number;
}

// User component
function User(props: Props) {
  // Render user info
  return <div>{props.name}</div>;
}

✅ Remove noise, keep value:

import React, { useState } from 'react';

interface Props {
  name: string;
  age: number;
}

// Displays user name. Age is required for future age-gating feature (see ADR-012).
function User(props: Props) {
  return <div>{props.name}</div>;
}

Savings: Reduced from ~150 tokens to ~80 tokens (47% reduction) without losing critical info.

Keep comments that provide:

Business context (“age for future age-gating”)
Non-obvious decisions (“why age is required now but unused”)
References (ADR-012)

Remove comments that are:

Obvious from code (“Import React”)
Redundant with types (“User name” when field is name: string)

Verbose Flags for Debug Output

Problem: Debug logging consumes tokens but is sometimes necessary.

Solution: Use verbose flags to conditionally include detailed output.

export const DEBUG = process.env.DEBUG === 'true';

// event-service.ts
class EventService {
  async syncEvent(eventId: string) {
    if (DEBUG) {
      console.log(`[EventService.syncEvent] Starting sync for event ${eventId}`);
      console.log(`[EventService.syncEvent] Fetching external calendar data`);
    }

    const event = await this.getEvent(eventId);

    if (DEBUG) {
      console.log(`[EventService.syncEvent] Event data:`, event);
    }

    // sync logic
  }
}

CLAUDE.md configuration:

## Debug Mode

To enable verbose logging:

\`\`\`bash
DEBUG=true npm run dev
\`\`\`

This adds detailed logs to help trace execution flow. Disable in production (default).

Agent behavior:

In normal mode: Reads clean code without log noise
In debug mode: Sees detailed execution trace when troubleshooting

Alternative: Use logger with levels:

import { logger } from './logger';

class EventService {
  async syncEvent(eventId: string) {
    logger.debug(`Starting sync for event ${eventId}`);
    const event = await this.getEvent(eventId);
    logger.debug(`Event data:`, event);
    // sync logic
  }
}

Configure logger in CLAUDE.md:

## Logging

- `logger.debug()`: Verbose details (disabled in production)
- `logger.info()`: Important milestones (always enabled)
- `logger.warn()`: Recoverable issues
- `logger.error()`: Failures requiring attention

9.18.6 Testing for Autonomy

Problem: Agents follow tests more reliably than documentation. Incomplete tests lead to incorrect implementations.

Solution: Use Test-Driven Development (TDD) with manually-written tests. Tests become the specification.

Why TDD is More Critical for Agents

Humans: Can infer intent from vague requirements and course-correct during implementation.

Agents: Implement exactly what tests specify. Missing test = missing feature.

Example: Human vs Agent Behavior

Requirement: “Add email validation to signup form”

Human developer:

Infers “validation” includes format check AND duplicate check
Adds both even if tests only cover format
Asks clarifying questions if uncertain

Agent:

Implements only what tests specify
If tests only cover format → agent only implements format
If tests don’t cover edge cases → agent doesn’t handle them

Lesson: For agents, tests ARE the spec. Write comprehensive tests manually.

Tests Written Manually, Not Delegated

❌ Don’t ask the agent to write tests:

User: "Implement email validation and write tests for it"

Why this fails:

Agent may write incomplete tests (missing edge cases)
Agent tests match its implementation (circular validation)
No independent verification

✅ Do write tests first yourself:

describe('Email validation', () => {
  it('accepts valid email formats', () => {
    expect(validateEmail('user@example.com')).toBe(true);
    expect(validateEmail('user+tag@example.co.uk')).toBe(true);
  });

  it('rejects invalid formats', () => {
    expect(validateEmail('invalid')).toBe(false);
    expect(validateEmail('user@')).toBe(false);
    expect(validateEmail('@example.com')).toBe(false);
  });

  it('rejects disposable email domains', () => {
    // Business requirement: Block temporary email services
    expect(validateEmail('user@tempmail.com')).toBe(false);
    expect(validateEmail('user@10minutemail.com')).toBe(false);
  });

  it('handles international characters', () => {
    // Business requirement: Support international domains
    expect(validateEmail('user@münchen.de')).toBe(true);
  });

  it('checks for duplicate emails in database', async () => {
    // Business requirement: Email must be unique
    await db.users.create({ email: 'existing@example.com' });
    await expect(validateEmail('existing@example.com')).rejects.toThrow('Email already registered');
  });
});

Then give agent the tests:

User: "Implement the email validation function to pass all tests in tests/validation/email.test.ts. Requirements:
- Use validator.js for format checking
- Disposable domain list at src/data/disposable-domains.json
- Database check via userRepository.findByEmail()"

Agent outcome: Implements exactly what tests specify, including:

Format validation
Disposable domain blocking
International character support
Duplicate database check

Without manual tests: Agent might skip disposable domain blocking (not obvious from “email validation”) or miss international character support.

TDD Workflow for Agents

Step 1: Write failing test (you, the human)

describe('EventService.createEvent', () => {
  it('prevents double-booking for same user + time', async () => {
    const userId = 'user-123';
    await eventService.createEvent({
      userId,
      startTime: '2026-01-21T10:00:00Z',
      endTime: '2026-01-21T11:00:00Z'
    });

    // Attempt overlapping event
    await expect(
      eventService.createEvent({
        userId,
        startTime: '2026-01-21T10:30:00Z',  // overlaps by 30 min
        endTime: '2026-01-21T11:30:00Z'
      })
    ).rejects.toThrow('Scheduling conflict detected');
  });
});

Step 2: Give agent the test with implementation constraints

User: "Implement EventService.createEvent() to pass the double-booking test. Requirements:
- Check for conflicts using conflictResolver.detectOverlap()
- Throw SchedulingConflictError with list of conflicting event IDs
- See ADR-009 for conflict resolution algorithm"

Step 3: Agent implements to pass the test

Step 4: Verify with test run

npm test tests/services/event-service.test.ts

Step 5: Iterate if test fails (agent fixes implementation)

Cross-reference: TDD Methodology (9.14) for full TDD workflow patterns.

Browser Automation for Validation

For UI features, use browser automation to validate agent output:

import { test, expect } from '@playwright/test';

test('signup form validates email', async ({ page }) => {
  await page.goto('/signup');

  // Test invalid format
  await page.fill('[name="email"]', 'invalid-email');
  await page.click('button[type="submit"]');
  await expect(page.locator('.error')).toHaveText('Invalid email format');

  // Test disposable domain
  await page.fill('[name="email"]', 'user@tempmail.com');
  await page.click('button[type="submit"]');
  await expect(page.locator('.error')).toHaveText('Temporary email addresses not allowed');

  // Test valid email
  await page.fill('[name="email"]', 'user@example.com');
  await page.click('button[type="submit"]');
  await expect(page.locator('.error')).not.toBeVisible();
});

Why browser tests matter for agents:

Validates actual user experience (not just unit logic)
Catches CSS/accessibility issues agents might miss
Provides visual proof of correctness

Give agent the E2E test:

User: "Implement signup form email validation to pass tests/e2e/signup-form.spec.ts. Use React Hook Form + Zod schema."

Agent knows:

Error messages must match test expectations
Error display must use .error class
Form must prevent submission on invalid input

Test Coverage as Guardrail

Post-implementation check:

npm test -- --coverage

Coverage thresholds in CI:

{
  "jest": {
    "coverageThreshold": {
      "global": {
        "statements": 80,
        "branches": 80,
        "functions": 80,
        "lines": 80
      }
    }
  }
}

CLAUDE.md instruction:

## Testing Requirements

All features must have:
- Unit tests (>80% coverage)
- Integration tests for API endpoints
- E2E tests for user-facing features

Run before committing:
\`\`\`bash
npm test -- --coverage
\`\`\`

CI will reject PRs below 80% coverage.

9.18.7 Conventions & Patterns

Problem: Agents hallucinate less when using familiar patterns from their training data.

Solution: Use well-known design patterns and mainstream technologies. Document custom patterns explicitly.

Design Patterns Agents Know

Agents are trained on massive codebases using standard design patterns. Leverage this:

✅ Use standard patterns:

// Singleton pattern (widely known)
class DatabaseConnection {
  private static instance: DatabaseConnection;

  private constructor() { /* ... */ }

  public static getInstance(): DatabaseConnection {
    if (!DatabaseConnection.instance) {
      DatabaseConnection.instance = new DatabaseConnection();
    }
    return DatabaseConnection.instance;
  }
}

Agent recognizes: “This is Singleton pattern” → understands getInstance() returns same instance.

❌ Custom pattern without documentation:

// Undocumented custom pattern
class DatabaseConnection {
  private static conn: DatabaseConnection;

  static make() {
    return this.conn ?? (this.conn = new DatabaseConnection());
  }
}

Agent confusion: “What’s make()? Is it factory? Builder? Why conn instead of instance?”

If you must use custom patterns, document heavily:

/**
 * Database connection using Lazy Singleton pattern.
 *
 * Pattern: Singleton with lazy initialization (no eager instantiation).
 * Why custom naming: "make()" aligns with our framework's naming convention (Laravel-inspired).
 * Standard Singleton uses "getInstance()" but we use "make()" for consistency across all singletons.
 *
 * Related: See ADR-004 for singleton usage policy.
 */
class DatabaseConnection {
  private static conn: DatabaseConnection;

  static make() {
    return this.conn ?? (this.conn = new DatabaseConnection());
  }
}

The “Boring Tech” Advantage

Principle: Popular frameworks and libraries have more training data → agents perform better.

Framework training data volume (approximate):

Framework/Library	GitHub repos	Agent performance
React	10M+	Excellent
Express	5M+	Excellent
Vue	3M+	Good
Angular	2M+	Good
Svelte	500K	Fair
Custom framework	<1K	Poor

Recommendation: Use mainstream tech unless you have strong reasons otherwise.

Example: React vs Custom Framework

React (agent-friendly):

// Agent knows React patterns from training data
function UserProfile({ userId }: { userId: string }) {
  const [user, setUser] = useState<User | null>(null);

  useEffect(() => {
    fetchUser(userId).then(setUser);
  }, [userId]);

  if (!user) return <div>Loading...</div>;
  return <div>{user.name}</div>;
}

Custom framework (agent-hostile without docs):

// Agent has no training data for "Fluxor" framework
@Component({
  state: ['user'],
  effects: ['loadUser']
})
class UserProfile {
  onMount() {
    this.loadUser(this.props.userId);
  }

  render() {
    return this.state.user ? `<div>${this.state.user.name}</div>` : '<div>Loading...</div>';
  }
}

Without Fluxor documentation: Agent doesn’t know @Component decorator, state, effects, or lifecycle hooks.

With Fluxor documentation:

# Fluxor Framework

## Component Lifecycle

Fluxor components use decorators (similar to Angular):

- `@Component({ state, effects })` - Define component with reactive state
- `onMount()` - Equivalent to React's `useEffect` with empty deps
- `render()` - Returns HTML string (not JSX)

## State Management

- `this.state.user` - Access reactive state (equivalent to React `useState`)
- `this.loadUser()` - Dispatch effect (equivalent to Redux action)

## Example

\`\`\`typescript
@Component({ state: ['user'] })
class UserProfile {
  onMount() {
    // Runs once on component mount (like React useEffect)
    this.loadUser(this.props.userId);
  }

  render() {
    // Reactive: re-runs when this.state.user changes
    return this.state.user ? `<div>${this.state.user.name}</div>` : '<div>Loading...</div>';
  }
}
\`\`\`

Agent with docs: Understands Fluxor by mapping to familiar React concepts.

Document Architectural Decisions (ADRs)

Problem: Custom architectures lack training data.

Solution: Document decisions in Architecture Decision Records.

ADR example:

# ADR-011: Service Layer Architecture

**Status**: Accepted
**Date**: 2025-12-10

## Context

We need clear separation between HTTP handling and business logic.

## Decision

Adopt 3-layer architecture:

1. **Controllers** (`src/controllers/`): HTTP request/response, no business logic
2. **Services** (`src/services/`): Business logic, framework-agnostic
3. **Repositories** (`src/repositories/`): Data access, abstracts database

**Rules**:
- Controllers call services, never repositories directly
- Services call repositories, never touch HTTP (no `req`, `res` objects)
- Repositories encapsulate all database queries

**Similar to**: NestJS architecture, Spring Boot layers, Clean Architecture use cases

## Example

\`\`\`typescript
// ✅ Correct: Controller → Service → Repository
// src/controllers/user-controller.ts
class UserController {
  async getUser(req: Request, res: Response) {
    const user = await userService.getUser(req.params.id);  // Calls service
    res.json(user);
  }
}

// src/services/user-service.ts
class UserService {
  async getUser(userId: string) {
    return userRepository.findById(userId);  // Calls repository
  }
}

// src/repositories/user-repository.ts
class UserRepository {
  async findById(userId: string) {
    return db.query('SELECT * FROM users WHERE id = ?', [userId]);
  }
}
\`\`\`

\`\`\`typescript
// ❌ Incorrect: Controller calls repository directly
class UserController {
  async getUser(req: Request, res: Response) {
    const user = await userRepository.findById(req.params.id);  // Layering violation!
    res.json(user);
  }
}
\`\`\`

Agent benefit: When working in controllers, agent reads ADR-011 and knows to call services (not repositories).

9.18.8 Guardrails & Validation

Problem: Agents make mistakes—hallucinations, incorrect assumptions, security oversights.

Solution: Multi-layer guardrails to catch errors before they reach production.

Hooks as Anti-Pattern Validators

Beyond secrets: Use hooks to enforce codebase conventions.

Example: Prevent layering violations:

#!/bin/bash
INPUT=$(cat)
TOOL_NAME=$(echo "$INPUT" | jq -r '.tool.name')

if [[ "$TOOL_NAME" == "Edit" ]] || [[ "$TOOL_NAME" == "Write" ]]; then
  FILE_PATH=$(echo "$INPUT" | jq -r '.tool.input.file_path')

  # Block controllers calling repositories directly (layering violation)
  if [[ "$FILE_PATH" == *"/controllers/"* ]]; then
    CONTENT=$(echo "$INPUT" | jq -r '.tool.input.new_string // .tool.input.content')

    if echo "$CONTENT" | grep -q "Repository\\."; then
      echo "❌ Layering violation: Controllers must call Services, not Repositories directly" >&2
      echo "See ADR-011 for architecture rules" >&2
      exit 2  # Block
    fi
  fi
fi

exit 0  # Allow

Catches:

// ❌ This edit will be BLOCKED by hook
class UserController {
  async getUser(req: Request, res: Response) {
    const user = await userRepository.findById(req.params.id);  // BLOCKED!
  }
}

Agent sees: ”❌ Layering violation: Controllers must call Services…” → revises to call service.

See: Hooks (6.2) for comprehensive hook examples.

”Tainted Code” Philosophy

Principle: Treat all agent-generated code as “tainted” until validated by CI.

CI checks:

name: Agent Code Validation

on: [pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Run linter
        run: npm run lint

      - name: Run type checker
        run: npm run type-check

      - name: Run tests
        run: npm test -- --coverage

      - name: Check test coverage
        run: |
          COVERAGE=$(npm test -- --coverage --json | jq '.coverage')
          if (( $(echo "$COVERAGE < 80" | bc -l) )); then
            echo "Coverage below 80%: $COVERAGE"
            exit 1
          fi

      - name: Check for TODO comments
        run: |
          if grep -r "TODO" src/; then
            echo "TODO comments found. Agent must implement fully, no placeholders."
            exit 1
          fi

      - name: Architecture compliance
        run: |
          # Check for layering violations
          if grep -r "Repository" src/controllers/; then
            echo "Controllers calling repositories directly (ADR-011 violation)"
            exit 1
          fi

What CI catches:

Syntax errors (linting)
Type mismatches (type checking)
Broken logic (tests)
Incomplete implementations (TODO comments)
Architecture violations (custom checks)

CLAUDE.md instruction:

## CI/CD Validation

All PRs run automated validation:
- Linting (ESLint)
- Type checking (TypeScript)
- Unit tests (Jest, >80% coverage)
- Architecture compliance (layering rules)

Agents must pass CI before PR approval. Never disable CI checks.

PR Reviews: Human-in-the-Loop

Even with CI, require human review:

name: PR Rules

on: [pull_request]

jobs:
  require-review:
    runs-on: ubuntu-latest
    steps:
      - name: Check for approval
        run: |
          APPROVALS=$(gh pr view ${{ github.event.pull_request.number }} --json reviews --jq '.reviews | length')
          if [ "$APPROVALS" -lt 1 ]; then
            echo "PR requires at least 1 human review"
            exit 1
          fi

Why human review matters:

Agents miss context (business requirements not in code)
Agents may implement correct code for wrong problem
Security vulnerabilities AI doesn’t recognize (novel attack vectors)

Review checklist for agent PRs:

## Agent PR Review Checklist

- [ ] **Intent**: Does the code solve the actual problem (not just pass tests)?
- [ ] **Edge cases**: Are unusual inputs handled (null, empty, negative, extreme values)?
- [ ] **Security**: Any potential injection, XSS, or authorization bypasses?
- [ ] **Performance**: Will this scale (N+1 queries, memory leaks, inefficient algorithms)?
- [ ] **Maintainability**: Is code readable and well-documented for future humans?
- [ ] **Tests**: Do tests cover meaningful scenarios (not just happy path)?

See also: CI/CD Integration (9.3) for complete CI setup patterns.

Validation Layers Summary

Layer	Catches	Speed	Automation
Hooks	Pre-execution (secrets, anti-patterns)	Instant	100%
Linter	Syntax, style violations	<10s	100%
Type checker	Type mismatches	<30s	100%
Tests	Logic errors, broken functionality	<2min	100%
CI checks	Coverage, TODOs, architecture	<5min	100%
Human review	Intent, security, context	Hours	Manual

Defense in depth: Each layer catches different error classes. All layers together minimize risk.

9.18.9 Serendipity & Cross-References

Problem: Agents work on isolated files and miss related code elsewhere in the codebase.

Solution: Add cross-references so agents discover related modules.

Module Cross-References

In each module, reference related code:

/**
 * Event management service.
 *
 * Related modules:
 * - src/services/calendar-sync-service.ts (external calendar integration)
 * - src/services/conflict-resolver.ts (overlap detection)
 * - src/repositories/event-repository.ts (data access)
 * - src/jobs/reminder-sender.ts (sends event reminders via queue)
 *
 * See also: ADR-007 (event deletion strategy), ADR-009 (conflict resolution)
 */
class EventService {
  // implementation
}

Agent behavior:

Working on event service → reads cross-references
Discovers conflict-resolver.ts exists → uses it instead of re-implementing
Knows to check ADRs for business logic context

Pattern: “See also” chains:

/**
 * Syncs events with external calendar providers (Google, Outlook).
 *
 * Related:
 * - src/services/event-service.ts (main event operations)
 * - src/integrations/google-calendar.ts (Google Calendar API client)
 * - src/integrations/outlook-calendar.ts (Outlook API client)
 */

// src/integrations/google-calendar.ts
/**
 * Google Calendar API integration.
 *
 * Related:
 * - src/services/calendar-sync-service.ts (orchestrates sync)
 * - src/models/calendar-event.ts (domain model)
 *
 * Rate limits: 10 req/sec per user (enforced in sync service)
 * See ADR-014 for rate limiting strategy.
 */

Result: Agent navigates from event-service → calendar-sync → google-calendar → understands full flow.

Self-Documenting Commands (—help)

CLI tools should explain themselves:

#!/usr/bin/env node
/**
 * CLI tool to manually trigger calendar sync for a user.
 *
 * Usage:
 *   npm run sync-calendars -- --user-id=USER_ID [--provider=google|outlook]
 *
 * Examples:
 *   npm run sync-calendars -- --user-id=user-123
 *   npm run sync-calendars -- --user-id=user-123 --provider=google
 *
 * What it does:
 *   1. Fetches user calendar credentials from database
 *   2. Connects to external calendar API (Google or Outlook)
 *   3. Syncs events bidirectionally (our DB ↔ external calendar)
 *   4. Logs sync results (events added/updated/deleted)
 *
 * Related:
 *   - src/services/calendar-sync-service.ts (sync logic)
 *   - docs/runbooks/calendar-sync-troubleshooting.md (debugging guide)
 */

if (process.argv.includes('--help')) {
  console.log(`
Calendar Sync CLI

Usage:
  npm run sync-calendars -- --user-id=USER_ID [--provider=google|outlook]

Options:
  --user-id    Required. User ID to sync calendars for
  --provider   Optional. Specific provider to sync (google or outlook). Default: all providers

Examples:
  npm run sync-calendars -- --user-id=user-123
  npm run sync-calendars -- --user-id=user-123 --provider=google

See: docs/runbooks/calendar-sync-troubleshooting.md
  `);
  process.exit(0);
}

// CLI implementation

Agent discovers:

Reads --help output to understand CLI usage
Finds related code (calendar-sync-service.ts)
Knows where to look for troubleshooting (runbook)

Embedded Technical Docs

Instead of separate wiki, embed docs near code:

src/integrations/google-calendar/
├── google-calendar.ts
├── google-calendar.test.ts
├── README.md               ← "How to use Google Calendar integration"
├── RATE_LIMITS.md          ← "Google Calendar API rate limits + handling"
└── TROUBLESHOOTING.md      ← "Common errors + solutions"

README.md:

# Google Calendar Integration

API client for Google Calendar API v3.

## Usage

\`\`\`typescript
import { GoogleCalendarClient } from './google-calendar';

const client = new GoogleCalendarClient(userCredentials);
const events = await client.listEvents(startDate, endDate);
\`\`\`

## Authentication

Uses OAuth 2.0 tokens stored in `users.calendar_token` field. If token expired, throws `TokenExpiredError` (caller should redirect to re-auth).

## Rate Limits

Google enforces 10 requests/second per user. Client automatically throttles using rate-limiter-flexible library. See RATE_LIMITS.md for details.

## Error Handling

Common errors:
- `TokenExpiredError`: Token expired, re-auth needed
- `RateLimitError`: Exceeded Google's rate limit (rare, automatic retry)
- `CalendarNotFoundError`: User hasn't granted calendar permission

See TROUBLESHOOTING.md for full error catalog + solutions.

Agent workflow:

Agent needs to integrate Google Calendar
Reads google-calendar.ts → sees README.md reference
Reads README → understands usage, auth, rate limits
Encounters error → reads TROUBLESHOOTING.md
Implements correctly without hallucinating

Contrast with wiki:

Wiki: Agent doesn’t know wiki exists or where to look
Embedded docs: Agent finds docs naturally via file system

9.18.10 Usage Instructions

Problem: Agents guess API usage patterns and often guess wrong (argument order, error handling, return types).

Solution: Provide explicit usage examples in doc blocks.

Doc Blocks with Examples

❌ Minimal docs (agent guesses):

// Validate email address
function validateEmail(email: string): boolean {
  // implementation
}

Agent must guess:

What does “validate” mean? Format only? Uniqueness check?
What about null or empty string?
Are there side effects (database lookups)?

✅ Comprehensive docs with examples:

/**
 * Validate email address format and uniqueness.
 *
 * Checks:
 * 1. Valid email format (RFC 5322 compliant)
 * 2. Not a disposable email domain (e.g., tempmail.com)
 * 3. Not already registered in database
 *
 * @param email - Email address to validate (trimmed automatically)
 * @returns Promise resolving to true if valid, throws error otherwise
 * @throws {ValidationError} If format invalid or disposable domain
 * @throws {DuplicateEmailError} If email already registered
 *
 * @example
 * // Valid email
 * await validateEmail('user@example.com');  // Returns true
 *
 * @example
 * // Invalid format
 * await validateEmail('invalid-email');
 * // Throws ValidationError: "Invalid email format"
 *
 * @example
 * // Disposable domain
 * await validateEmail('user@tempmail.com');
 * // Throws ValidationError: "Disposable email addresses not allowed"
 *
 * @example
 * // Duplicate email
 * await validateEmail('existing@example.com');
 * // Throws DuplicateEmailError: "Email already registered"
 *
 * @example
 * // Null handling
 * await validateEmail(null);
 * // Throws ValidationError: "Email is required"
 */
async function validateEmail(email: string | null): Promise<boolean> {
  // implementation
}

Agent now knows:

Function is async (returns Promise)
Throws errors (doesn’t return false)
Handles null input
Trims whitespace automatically
Checks format, disposable domains, AND uniqueness

Agent can implement correctly:

// In signup form handler
try {
  await validateEmail(formData.email);
  // Proceed with signup
} catch (error) {
  if (error instanceof DuplicateEmailError) {
    showError('This email is already registered. Try logging in instead.');
  } else if (error instanceof ValidationError) {
    showError(error.message);  // "Invalid email format" or "Disposable email not allowed"
  }
}

Context7 MCP for Official Docs

Problem: Agents may use outdated API patterns from training data.

Solution: Use Context7 MCP to fetch current documentation.

CLAUDE.md configuration:

## External Dependencies

### Google Calendar API

**Version**: v3 (current as of 2026-01-21)
**Docs**: Use Context7 MCP to fetch latest: "google calendar api v3 nodejs"

**Key methods**:
- `calendar.events.list()` - List events
- `calendar.events.insert()` - Create event
- `calendar.events.update()` - Update event
- `calendar.events.delete()` - Delete event

**Rate limits**: 10 req/sec per user (enforced by our client)

### Why Context7

Agent's training data may be outdated (pre-2025). Use Context7 to fetch current docs at implementation time.

Agent instruction: "When implementing Google Calendar integration, use Context7 MCP to fetch latest API docs."

Agent behavior:

Reads CLAUDE.md → sees Context7 instruction
Uses Context7 MCP → fetches current docs
Implements with correct API (not outdated training data)

See: Context7 MCP (5.3) for setup.

Sensible Defaults

Design APIs to work with minimal configuration:

❌ Requires all parameters:

const client = new GoogleCalendarClient({
  credentials: userCredentials,
  rateLimit: 10,
  rateLimitWindow: 1000,
  retryAttempts: 3,
  retryDelay: 1000,
  timeout: 30000,
  userAgent: 'MyApp/1.0'
});

✅ Sensible defaults:

// Minimal usage (defaults applied)
const client = new GoogleCalendarClient(userCredentials);

// Override defaults if needed
const client = new GoogleCalendarClient(userCredentials, {
  timeout: 60000  // Only override timeout, other defaults remain
});

Implementation with defaults:

interface GoogleCalendarOptions {
  rateLimit?: number;        // Default: 10 req/sec
  retryAttempts?: number;    // Default: 3
  retryDelay?: number;       // Default: 1000ms
  timeout?: number;          // Default: 30000ms
}

class GoogleCalendarClient {
  private options: Required<GoogleCalendarOptions>;

  constructor(
    private credentials: Credentials,
    options: GoogleCalendarOptions = {}
  ) {
    // Apply defaults
    this.options = {
      rateLimit: options.rateLimit ?? 10,
      retryAttempts: options.retryAttempts ?? 3,
      retryDelay: options.retryDelay ?? 1000,
      timeout: options.timeout ?? 30000
    };
  }
}

Agent benefit: Can use API immediately without researching all options.

Document defaults in code:

/**
 * Google Calendar API client with automatic rate limiting and retries.
 *
 * Default configuration:
 * - Rate limit: 10 requests/second (Google's limit)
 * - Retry attempts: 3 (exponential backoff)
 * - Timeout: 30 seconds
 *
 * @example
 * // Use defaults
 * const client = new GoogleCalendarClient(credentials);
 *
 * @example
 * // Override specific options
 * const client = new GoogleCalendarClient(credentials, {
 *   timeout: 60000  // 60 second timeout for slow connections
 * });
 */

9.18.11 Decision Matrix & Implementation Checklist

When to Optimize for Agents vs Humans

Not all code needs agent optimization. Use this decision matrix:

Factor	Optimize for Agents	Optimize for Humans
Code churn	High (>5 edits/month)	Low (<2 edits/month)
Team usage	>50% commits by agents	<30% commits by agents
Complexity	Business logic, APIs	Infrastructure, DevOps
Project phase	Greenfield, active development	Stable, maintenance mode
File size	>500 lines	<300 lines
Team size	>5 developers	Solo or pair

✅ High ROI for agent optimization:

Core business logic files (e.g., order-service.ts, payment-processor.ts)
Frequently modified features (e.g., UI components, API routes)
Complex domains requiring context (e.g., healthcare, finance, legal)
Greenfield projects (design agent-friendly from start)

❌ Low ROI for agent optimization:

Stable infrastructure code (rarely modified)
Small utility functions (<50 lines, self-evident)
DevOps scripts (agents rarely touch these)
Legacy code in maintenance mode (refactoring cost > benefit)

Agent-Friendly Codebase Checklist

Use this checklist to assess your codebase’s agent-friendliness:

Domain Knowledge (Score: ___ / 5)

CLAUDE.md exists with business context, design principles, domain terms
Architecture Decision Records (ADRs) document key decisions
Code comments explain “why” (not just “what”)
Cross-references link related modules
Directory READMEs explain module purpose

Discoverability (Score: ___ / 6)

Files use complete terms (not abbreviations: user not usr)
Comments include synonyms (e.g., “member, subscriber, customer”)
Functions have JSDoc tags (@domain, @related, @external)
README files in major directories
CLI tools have --help with examples
Embedded docs near code (not separate wiki)

Token Efficiency (Score: ___ / 4)

Files under 500 lines (split larger files by concern)
Obvious comments removed (keep only valuable context)
Debug output controlled by verbose flags
Large generated files excluded via .claudeignore

Testing (Score: ___ / 5)

Tests written manually (not delegated to agent)
TDD workflow for new features (test first, implement second)
E2E tests for UI features (Playwright or similar)
Test coverage >80% enforced in CI
Tests cover edge cases (not just happy path)

Conventions (Score: ___ / 4)

Standard design patterns used (Singleton, Factory, Repository, etc.)
Mainstream frameworks (React, Express, etc.) preferred over custom
ADRs document custom patterns
“See also” comments reference similar patterns

Guardrails (Score: ___ / 5)

Hooks validate code at pre-execution (layering, secrets, conventions)
CI enforces linting, type checking, tests
Test coverage thresholds in CI (e.g., 80%)
Architecture compliance checks (layering violations, etc.)
Human PR review required before merge

Usage Instructions (Score: ___ / 4)

Functions have doc blocks with @example usage
Error conditions documented (@throws)
APIs have sensible defaults (minimal config required)
Context7 MCP used for fetching current docs

Total Score: ___ / 33

Scoring:

25-33: Excellent agent-friendliness
18-24: Good, some improvements possible
10-17: Fair, significant gaps exist
<10: Poor, major refactoring needed

Quick Wins (Immediate Impact)

Start with these high-impact, low-effort improvements:

1. Add CLAUDE.md (30 minutes)

# Project Context

**Tech stack**: React, Express, PostgreSQL
**Architecture**: 3-layer (controllers, services, repositories)
**Conventions**: ESLint + Prettier, 80% test coverage required

## Key Files

- `src/services/` - Business logic (framework-agnostic)
- `src/controllers/` - HTTP handlers (thin layer)
- `src/repositories/` - Database access

See ADR-011 for layering rules.

2. Add directory READMEs (15 minutes per directory)

# Services Layer

Business logic and domain operations. Services are framework-agnostic.

**Rules**:
- Call repositories for data access
- Never import from controllers (layering violation)
- Return domain objects (not HTTP responses)

3. Add cross-references to hot files (10 minutes per file)

/**
 * Event service - core business logic for event management.
 *
 * Related:
 * - src/services/calendar-sync-service.ts (external calendar sync)
 * - src/repositories/event-repository.ts (data access)
 *
 * See ADR-007 for event deletion strategy.
 */

4. Split one large file (30 minutes)

Find file >500 lines
Split by concern (e.g., validation, sync, conflict resolution)
Add README in new directory

5. Enable test coverage in CI (15 minutes)

- name: Run tests with coverage
  run: npm test -- --coverage

- name: Check coverage threshold
  run: |
    COVERAGE=$(npm test -- --coverage --json | jq '.coverage')
    if (( $(echo "$COVERAGE < 80" | bc -l) )); then
      exit 1
    fi

Total time: ~2 hours for foundational improvements.

Resources

Primary source:

Agent Experience Best Practices by François Zaninotto (Marmelab)

Related frameworks:

Netlify AX (Agent Experience) Research (2025)
Speakeasy API Developer Experience Guide (includes agent-friendly patterns)

Academic research:

“Context Engineering for AI Agents” (ArXiv, June 2025)
“Agent-Oriented Software Engineering” (ArXiv, March 2025)
“Prompt Injection Prevention in Code Agents” (ArXiv, November 2024)

Cross-references in this guide:

CLAUDE.md patterns (3.1)
Hooks (6.2)
CI/CD Integration (9.3)
Pitfalls (9.11)
Methodologies - TDD (9.14)

9.19 Permutation Frameworks

Reading time: 10 minutes Skill level: Month 1+

The Problem: Single-Approach Thinking

Most developers pick one approach and stick with it. But Claude Code’s tooling supports systematic variation—testing multiple approaches to find the optimal solution.

Permutation Frameworks formalize this: instead of hoping your first approach works, you systematically generate and evaluate variations.

What Is a Permutation Framework?

A permutation framework defines dimensions of variation and lets Claude generate all meaningful combinations. Each dimension represents a design choice; each combination is a distinct implementation approach.

Dimension 1: Architecture    → [Monolith, Modular, Microservice]
Dimension 2: State Mgmt      → [Server-side, Client-side, Hybrid]
Dimension 3: Auth Strategy    → [JWT, Session, OAuth]

Total permutations: 3 × 3 × 3 = 27 approaches
Practical subset: 4-6 worth evaluating

When to Use Permutation Frameworks

Scenario	Use Permutation?	Why
New project architecture	✅ Yes	Multiple valid approaches, high impact
Component design with tradeoffs	✅ Yes	Performance vs. readability vs. maintainability
Migration strategy	✅ Yes	Big-bang vs. strangler vs. parallel
Bug fix with known root cause	❌ No	One correct fix
Styling changes	❌ No	Low impact, subjective
Performance optimization	✅ Maybe	Profile first, then permute solutions

Implementation: CLAUDE.md-Driven Permutations

The key insight: use CLAUDE.md variations to generate consistent implementations across different approaches.

Step 1: Define the Base Template

# CLAUDE.md (base)

## Project: [Project Name]
## Permutation: {{VARIANT_NAME}}

### Architecture
{{ARCHITECTURE_PATTERN}}

### State Management
{{STATE_STRATEGY}}

### Conventions
- All implementations must include tests
- Use the same data model across variants
- Each variant in its own branch: `perm/{{VARIANT_NAME}}`

Step 2: Generate Variants

# Create variant branches with Claude
claude -p "Create 4 CLAUDE.md variants for our dashboard project:
1. 'server-heavy': Server components, minimal client JS, session auth
2. 'spa-optimized': Client SPA, REST API, JWT auth
3. 'hybrid-ssr': SSR with hydration, tRPC, session + JWT
4. 'edge-first': Edge functions, client cache, token auth

For each: create branch perm/<name>, write CLAUDE.md with filled template,
scaffold the base structure. Same data model across all variants."

Step 3: Implement in Parallel

# Terminal 1
git checkout perm/server-heavy
claude "Implement the dashboard following CLAUDE.md conventions"

# Terminal 2
git checkout perm/spa-optimized
claude "Implement the dashboard following CLAUDE.md conventions"

# Terminal 3 (or sequential)
git checkout perm/hybrid-ssr
claude "Implement the dashboard following CLAUDE.md conventions"

Step 4: Evaluate with Sub-Agents

User: Compare the 4 permutation branches. For each, evaluate:
- Bundle size and load time
- Code complexity (files, lines, dependencies)
- Test coverage achievable
- Maintenance burden estimate

Create a comparison matrix and recommend the best approach
for our team of 3 developers with moderate React experience.

Practical Example: API Design Permutations

# Permutation: REST vs GraphQL vs tRPC

## Shared constraints (all variants)
- Same database schema (PostgreSQL + Prisma)
- Same auth (JWT)
- Same business logic (services layer)

## Variant A: REST
- Express routes, OpenAPI spec
- Separate validation layer (Zod)
- Standard REST conventions (GET/POST/PUT/DELETE)

## Variant B: GraphQL
- Apollo Server, schema-first
- Resolvers calling same services
- Dataloader for N+1 prevention

## Variant C: tRPC
- Type-safe end-to-end
- Shared types between client/server
- Zod validation built-in

Evaluation prompt:

User: I've implemented all 3 API variants. Now act as a reviewer:

1. Run tests for each: which has better coverage?
2. Count total lines of boilerplate vs business logic
3. Measure type safety (any manual type assertions?)
4. Rate developer experience for adding a new endpoint (1-5)

Give me a decision matrix, not a recommendation.
I'll decide based on our team context.

Permutation Anti-Patterns

Anti-Pattern	Problem	Fix
Too many dimensions	Combinatorial explosion (3⁴ = 81)	Cap at 3 dimensions, 3-4 variants each
No shared constraints	Variants aren’t comparable	Define fixed elements first
Permuting the trivial	Wasting tokens on style choices	Only permute architectural decisions
No evaluation criteria	Can’t pick a winner	Define scoring before generating variants
Skipping implementation	Comparing on paper only	Build at least a skeleton for each

Integration with Other Patterns

Permutation + Plan Mode:

1. /plan → Define dimensions and constraints
2. Generate CLAUDE.md variants
3. /execute → Implement each variant
4. /plan → Compare and decide

Permutation + TDD:

1. Write tests that ALL variants must pass (shared spec)
2. Implement each variant against the same test suite
3. The variant with cleanest implementation wins

Permutation + Skeleton Projects:

1. Start from same skeleton
2. Branch per variant
3. Each variant evolves the skeleton differently
4. Compare which skeleton evolution is most maintainable

Cross-references:

Skeleton Projects workflow: See Skeleton Projects Workflow
Plan Mode: See §2.3 Plan Mode
TDD workflow: See TDD with Claude
Multi-Instance parallel execution: See §9.17 Scaling Patterns

9.20 Agent Teams (Multi-Agent Coordination)

Reading time: 5 minutes (overview) | Quick Start → (8-10 min, practical) | Full workflow guide → (~30 min, theory) Skill level: Month 2+ (Advanced) Status: ⚠️ Experimental (v2.1.32+, Opus 4.6 required)

What Are Agent Teams?

Agent teams enable multiple Claude instances to work in parallel on a shared codebase, coordinating autonomously without human intervention. One session acts as team lead to break down tasks and synthesize findings from teammate sessions.

Key difference from Multi-Instance (§9.17):

Multi-Instance = You manually orchestrate separate Claude sessions (independent projects, no shared state)
Agent Teams = Claude manages coordination automatically (shared codebase, git-based communication)

Setup:
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
claude

OR in ~/.claude/settings.json:
{
  "experimental": {
    "agentTeams": true
  }
}

When Introduced & Production Validation

Version: v2.1.32 (2026-02-05) as research preview Model requirement: Opus 4.6 minimum

Production metrics (validated cases):

Fountain (workforce management): 50% faster screening, 2x conversions
CRED (15M users, financial services): 2x execution speed
Anthropic Research: Autonomous C compiler completion (no human intervention)

Source: 2026 Agentic Coding Trends Report, Anthropic Engineering Blog

Architecture Quick View

Team Lead (Main Session)
    ├─ Breaks tasks into subtasks
    ├─ Spawns teammate sessions (each with 1M token context)
    └─ Synthesizes findings from all agents
         │
         ├─ Teammate 1: Task A (independent context)
         └─ Teammate 2: Task B (independent context)

Coordination: Git-based (task locking, continuous merge, conflict resolution)
Navigation: Shift+Down to cycle through teammates, or tmux panes

Teams vs Multi-Instance vs Dual-Instance

Pattern	Coordination	Best For	Cost	Setup
Agent Teams	Automatic (git-based)	Read-heavy tasks needing coordination	High (3x+)	Experimental flag
Multi-Instance (§9.17)	Manual (human)	Independent parallel tasks	Medium (2x)	Multiple terminals
Dual-Instance	Manual (human)	Quality assurance (plan-execute)	Medium (2x)	2 terminals

Use Cases That Work Well

✅ Excellent fit (read-heavy, clear boundaries):

Multi-layer code review: Security scope + API scope + Frontend scope (Fountain: 50% faster)
Parallel hypothesis testing: Debug by testing 3 theories simultaneously
Large-scale refactoring: 47+ files across layers with clear interfaces
Full codebase analysis: Architecture review, pattern detection

❌ Poor fit (avoid these):

Simple tasks (<5 files affected) — coordination overhead not justified
Write-heavy tasks (many shared file modifications) — merge conflict risks
Sequential dependencies — no parallelization benefit
Budget-constrained projects — 3x token cost multiplier

Quick Example: Multi-Layer Code Review

Prompt:
"Review this PR comprehensively using agent teams with scope-focused analysis:
- Security Scope: Check for vulnerabilities, auth issues, data exposure (context: auth, validation code)
- API Design Scope: Review endpoint design, validation, error handling (context: API routes, controllers)
- Frontend Scope: Check UI patterns, accessibility, performance (context: components, styles)

PR: https://github.com/company/repo/pull/123"

Result:
Team lead spawns 3 scope-focused agents → Each analyzes their scope in parallel →
Team lead synthesizes findings → Comprehensive review in 1/3 the time

Critical Limitations

Read-heavy > Write-heavy trade-off:

✅ Good: Code review (agents read, analyze, report)
✅ Good: Bug tracing (agents read logs, trace execution)
✅ Good: Architecture analysis (agents read structure)

⚠️ Risky: Refactoring shared types (merge conflicts)
⚠️ Risky: Database schema changes (coordinated migrations)
❌ Bad: Same file modified by multiple agents (conflict hell)

Mitigation: Assign non-overlapping file sets, use interface-first approach, define contracts before parallel work.

Token intensity: 3x+ cost multiplier (3 agents = 3 model inferences). Only justified when time saved > cost increase.

Experimental status: No stability guarantee, bugs expected, feature may change. Report issues to Anthropic GitHub.

Decision Tree: When to Use Agent Teams

Is task simple (<5 files)? ──YES──> Single agent
    │
    NO
    │
Tasks completely independent? ──YES──> Multi-Instance (§9.17)
    │
    NO
    │
Need quality assurance split? ──YES──> Dual-Instance
    │
    NO
    │
Read-heavy (analysis, review)? ──YES──> Agent Teams ✓
    │
    NO
    │
Write-heavy (many file mods)? ──YES──> Single agent
    │
    NO
    │
Budget-constrained? ──YES──> Single agent
    │
    NO
    │
Complex coordination needed? ──YES──> Agent Teams ✓
                            ──NO──> Single agent

Swarm vs Sequential Coordination

Two distinct coordination patterns exist for multi-agent review, and the choice matters:

Dimension	Sequential Specialists	Swarm Mode
Structure	Predefined lead + members	Ad-hoc, no hierarchy
Coordination	Lead assigns tasks, synthesizes	Each reviewer works independently
Leadership	Team lead orchestrates	Human synthesizes findings
Task assignment	Lead delegates to specific agents	All relevant agents get the same input
Best for	Tasks with dependencies between reviewers	Independent review, final pre-merge pass
When to use	Complex workflows, state needs sharing	PR review, unfamiliar codebase, thoroughness

Swarm Mode in practice (Every.to compound-engineering pattern):

Launch all relevant specialist reviewers in parallel against the same diff or PR, with no coordination between them. Each produces independent findings. You read all findings and decide what to act on.

# Swarm: all reviewers see the same input, report independently
/workflows:review --swarm   # Every.to compound-engineering command

This is distinct from Agent Teams: there is no persistent team structure, no shared context between agents, no lead synthesizing in real time. It is faster to set up and appropriate when thoroughness matters more than coordination.

Rule of thumb: Use Agent Teams for workflows with sequential dependencies (agent A’s output feeds agent B). Use Swarm when each reviewer can work from the same starting point and you want maximum coverage with minimum setup overhead.

Practitioner Testimonial

Paul Rayner (CEO Virtual Genius, EventStorming Handbook author):

“Running 3 concurrent agent team sessions across separate terminals. Pretty impressive compared to previous multi-terminal workflows without coordination.”

Workflows used (Feb 2026):

Job search app: Design research + bug fixing
Business ops: Operating system + conference planning
Infrastructure: Playwright MCP + beads framework management

Source: Paul Rayner LinkedIn

Built-in controls:

Shift+Down: Cycle through active teammates (in-process mode)
tmux: Use tmux commands if in tmux session
Direct takeover: Take control of any agent’s work mid-execution

Monitoring: Each agent reports progress, team lead synthesizes when all complete.

Full Documentation

This section is a quick overview. For complete guide:

Agent Teams Workflow (~30 min, 10 sections)
- Architecture deep-dive (team lead, teammates, git coordination)
- Setup instructions (2 methods)
- 5 production use cases with metrics
- Workflow impact analysis (before/after)
- Limitations & gotchas (read/write trade-offs)
- Decision framework (Teams vs Multi-Instance vs Beads)
- Best practices, troubleshooting

Related patterns:

§9.17 Multi-Instance Workflows — Manual parallel coordination
§4.3 Sub-Agents — Single-agent task delegation
AI Ecosystem: Beads Framework — Alternative orchestration (Gas Town)

Official sources:

Introducing Claude Opus 4.6 (Anthropic, Feb 2026)
Building a C compiler with agent teams (Anthropic Engineering, Feb 2026)
2026 Agentic Coding Trends Report (Anthropic, Jan 2026)

9.21 Legacy Codebase Modernization

Context: In February 2026, Anthropic published a COBOL modernization playbook positioning Claude Code as a direct replacement for legacy consulting teams. The same day, IBM stock dropped -13% (its worst single-day performance since October 2000). The workflow described is validated by independent research — it applies to any large legacy codebase (COBOL, Fortran, VB6, PL/I), not just COBOL.

Why Legacy Modernization Is Hard

The real cost isn’t the migration itself — it’s the discovery phase. Original developers have retired. Documentation is absent or wrong. Code has been patched for decades by engineers who never understood the full system. Finding what talks to what requires consultants billing by the hour.

AI changes the economics by automating this exact phase.

COBOL context (for scale reference):

~220 billion lines of COBOL still in production (IBM estimate)
~95% of US ATM transactions run on COBOL-based systems (Reuters/industry consensus — methodology varies by source)
Modernization previously required multi-year, multi-team projects

The 4-Step Workflow

Independent validation: Academic research (WJAETS 2025) shows -25 to -30% timeline reduction on average. Best-case: Airbnb migrated 3,500 test files in 6 weeks vs. an estimated 1.5 years. COBOL→Java accuracy: 93% in controlled studies (arXiv, April 2025).

Step 1 — Automated Exploration & Discovery

Map the entire codebase:
- Identify all program entry points and execution paths
- Trace subroutine calls across hundreds of files
- Document implicit dependencies via shared files, databases, and global state
- Generate a dependency graph before touching a single line

Prompt pattern:

"Read the entire [COBOL/legacy] codebase. Map its structure:
 entry points, execution paths, subroutine call chains,
 and any implicit dependencies via shared data structures,
 global variables, or file I/O. Output a dependency map."

Step 2 — Risk Analysis & Opportunity Mapping

With the dependency map in hand:
- Assess coupling levels between modules (high coupling = high risk)
- Surface isolated components as safe modernization candidates
- Identify duplicated logic and dead code
- Flag shared state as the highest-risk zones

Prompt pattern:

"Based on the dependency map: rank modules by coupling level.
 Which components can be modernized in isolation?
 Which share state with 3+ other modules and should be touched last?"

Step 3 — Strategic Planning

Human + AI collaboration:
- AI suggests prioritization based on risk/dependency analysis
- Team reviews against business priorities (what breaks = most expensive)
- Define target architecture and code standards
- Design function-level tests for validation before migration begins

This phase is not fully automatable — business context requires human judgment. Hybrid human-AI workflows show 31% higher completion rates within initial time estimates vs. purely automated approaches (WJAETS 2025).

Step 4 — Incremental Implementation

Never migrate the whole system at once:
- Translate logic component by component
- Create API wrappers for legacy components still in use
- Run old and new code side-by-side in production
- Validate each component independently before proceeding to the next

Prompt pattern:

"Translate [module X] to [target language].
 Preserve exact business logic — no optimization yet.
 Add a compatibility wrapper so both versions can run in parallel.
 Write tests that verify identical outputs for identical inputs."

Key Principles

Principle	Why it matters
Map before touching	Blind migrations fail; discovery first
Isolate before migrating	High-coupling modules = cascade failures
Parallel run	Rollback possible only if both versions coexist
Test at boundary	Test inputs/outputs, not internal logic (which will change)
Human review on business logic	AI doesn’t know which edge case is regulatory vs. dead code

Realistic Expectations

“Years to quarters” is real — but it’s the optimistic scenario, not the average:

Scenario	Timeline reduction	Source
Conservative estimate	-25 to -30%	WJAETS 2025 academic review
Automation-heavy phases	-40 to -50%	Fullstack Labs industry synthesis
Best-case (test migration)	-88% (6 weeks vs 1.5 yr)	Airbnb case study
COBOL→Java conversion accuracy	93%	arXiv, April 2025

The average gains are real and significant. The headline numbers require favorable conditions: good test coverage, isolated modules, and a team that understands both the legacy system and the target stack.

Anti-Patterns

❌ Big bang migration — Rewriting everything at once. No company has survived this at scale.
❌ No parallel run — Cutting over without a fallback. One undiscovered edge case = production outage.
❌ Skipping discovery — Starting to translate before mapping. You will break things you didn’t know existed.
❌ Trusting AI on business logic — AI translates faithfully what it reads. If the original was wrong or context-dependent, the translation will be too.

Resources

Anthropic COBOL Modernization Playbook (Feb 2026)
AI-Driven Legacy Systems Modernization: COBOL to Java (arXiv, April 2025)
AWS EKS COBOL Modernization Case Study (July 2025)

9.22 Remote Control (Mobile Access)

Reading time: 7 minutes Skill level: Week 2+ Status: Research Preview (as of February 2026) Availability: Pro and Max plans only — not available on Team, Enterprise, or API keys

Remote Control lets you monitor and control a local Claude Code session from a phone, tablet, or web browser — without migrating anything to the cloud. Your terminal keeps running locally; the mobile/web interface is a remote window onto that session.

Key difference from Session Teleportation (§9.16): Teleportation migrates a session (web → local). Remote Control mirrors a local session to a remote viewer. Execution always stays on your local machine.

How It Works

Local terminal (running claude)
        │
        │ HTTPS outbound only (no inbound ports)
        ▼
   Anthropic relay
        │
        ▼
Phone / tablet / browser (claude.ai/code or Claude app)

Execution: 100% local — your terminal does all the work
Security: HTTPS outbound only, zero inbound ports, short-lived scoped credentials
What you can do remotely: Send messages, approve/deny tool calls, read responses

Setup

Requirements:

Claude Code v2.1.51+
Active Pro or Max subscription (not Team/Enterprise)
Logged in (/login)

Two Ways to Start

Option A — From the command line (start a new session):

claude remote-control

# Optional flags:
#   --verbose    Show detailed connection logs
#   --sandbox    Restrict to sandbox mode

Option B — From inside an active session:

/remote-control

# or the shorter alias:
/rc

Connecting from Your Device

Once started, Claude Code displays:

A session URL (open in any browser)
Press spacebar to show a QR code (scan with your phone)
Or open the Claude app (iOS / Android) — your active session appears automatically

To enable remote control on every session by default:

/config   → toggle "Remote Control: auto-enable"

Download the Mobile App

/mobile   # Shows App Store + Google Play download links

Known Limitations (Research Preview)

Limitation	Detail
1 session at a time	Only one active remote control session
Terminal must stay open	Closing the local terminal ends the session
Network timeout	~10 min before session expires on disconnect
Slash commands don’t work remotely	`/new`, `/compact`, etc. are treated as plain text in the remote UI
Pro/Max only	Not available on Team, Enterprise, or API keys

⚠️ Slash commands limitation: When you type /new, /compact, or any slash command in the remote interface (mobile app or browser), they are treated as plain text messages — not forwarded as commands to the local CLI. Use slash commands from your local terminal instead.

Advanced Patterns (Community-Validated)

Multi-Session via tmux (Workaround for 1-Session Limit)

# Start a tmux session with multiple panes
tmux new-session -s dev

# Each tmux pane can run its own claude session:
# Pane 1: claude → run /rc → share URL with your phone
# Pane 2: claude (local only)
# Pane 3: claude (local only)

# To switch which session you're controlling remotely:
# → Go to pane 2, run /rc (disconnects pane 1's remote, connects pane 2)

Each tmux pane hosts its own Claude session. Only one can use remote-control at a time, but you can switch between sessions by running /rc in different panes.

Persistent Server Architecture (VM/Cloud)

Remote Control works on remote machines (VMs, cloud servers) running in tmux:

# On your cloud server (e.g., Clever Cloud, AWS, etc.):
tmux new-session -s claude-server
claude remote-control
# → Scan QR code from your phone
# → Control a cloud-hosted Claude session from mobile
# → Sessions survive laptop reboots (tmux keeps them alive)

This gives you persistent sessions that survive closing your laptop. Combine 6-8 Claude sessions in tmux for continuous uninterrupted work while traveling.

Alternatives (Pre-Remote Control)

Alternative	How it worked	Status
happy.engineering	Open-source remote access for Claude Code	Community-declared obsolete post-RC
OpenClaw	Alternative Claude Code remote interface	Community-declared obsolete post-RC
SSH + mobile terminal	SSH into dev machine, run claude	Still valid for Team/Enterprise users
VS Code Remote	Remote SSH extension + Claude Code	Still valid, more complex setup

Security Considerations

Full threat model: Security Hardening Guide: Remote Control Security

Quick summary:

The session URL is a live access key — treat it like a password
Anyone with the URL can send commands to your local Claude session while active
Short-lived credentials + HTTPS outbound-only limits the exposure window
Per-command approval prompts on mobile guard against accidental execution (not against active attackers)
Not recommended on shared or untrusted workstations
Corporate machines: verify your security policy even on personal Pro/Max accounts

Troubleshooting

Issue	Solution
Session not appearing in Claude app	Known bug (Research Preview) — use `claude.ai/code` in Safari instead (see below)
QR code opens app but session not visible	Known bug on iOS — scan with native camera app, open in Safari rather than Claude app
QR code not showing	Press spacebar after starting remote-control
Slash commands not working	Type them in your local terminal instead
Session expired	Reconnect: run `/rc` again
Corporate firewall blocking	HTTPS outbound (port 443) must be allowed
”Not available” error	Verify Pro or Max subscription (not Team/Enterprise)

Known bug (Research Preview, March 2026): On iOS (confirmed iPhone), scanning the QR code opens the Claude app but the remote session doesn’t appear in the session list. The bug also affects automatic session discovery in the Claude mobile app. MacStories confirmed this is inconsistent on non-local machines.

Most reliable workaround: open claude.ai/code in Safari on your phone — your active session appears in the list there. Alternatively, copy the session URL from the terminal and paste it directly in Safari. Both paths bypass the app’s sync bug entirely.

Evolution Timeline

Version	Feature
2.1.51	Initial Remote Control feature (Research Preview)
2.1.53	Stability improvements and bug fixes

🎯 Section 9 Recap: Pattern Mastery Checklist

Before moving to Section 10 (Reference), verify you understand:

Core Patterns:

Trinity Pattern: Plan Mode → Extended Thinking → Sequential MCP for critical work
Composition: Agents + Skills + Hooks working together seamlessly
CI/CD Integration: Automated reviews and quality gates in pipelines
IDE Integration: VS Code + Claude Code = seamless development flow

Productivity Patterns:

Tight Feedback Loops: Test-driven workflows with instant validation
Todo as Instruction Mirrors: Keep context aligned with reality
Vibe Coding: Skeleton → iterate → production-ready
Batch Operations: Process multiple files efficiently

Quality Awareness:

Common Pitfalls: Understand security, performance, workflow mistakes
Continuous Improvement: Refine over multiple sessions with learning mindset
Best Practices: Do/Don’t patterns for professional work
Development Methodologies: TDD, SDD, BDD, and other structured approaches
Codebase Design for Agents: Optimize code for agent productivity (domain knowledge, discoverability, testing)

Communication Patterns:

Named Prompting Patterns: As If, Constraint, Explain First, Rubber Duck, Incremental, Boundary
Mermaid Diagrams: Generate visual documentation for architecture and flows

Advanced Workflows:

Session Teleportation: Migrate sessions between cloud and local environments
Remote Control: Monitor/control local sessions from mobile or browser (Research Preview, Pro/Max)
Background Tasks: Run tasks in cloud while working locally (% prefix)
Multi-Instance Scaling: Understand when/how to orchestrate parallel Claude instances (advanced teams only)
Agent Teams: Multi-agent coordination for read-heavy tasks (experimental, Opus 4.6+)
Permutation Frameworks: Systematically test multiple approaches before committing
Legacy Modernization: 4-step workflow (Discovery → Risk → Planning → Incremental) for large legacy codebases

What’s Next?

Section 10 is your command reference — bookmark it for quick lookups during daily work.

You’ve mastered the concepts and patterns. Now Section 10 gives you the technical reference for efficient execution.

9.23 Configuration Lifecycle & The Update Loop

Reading time: 8 minutes Skill level: Month 1+

See also: §9.10 Continuous Improvement Mindset — the conceptual foundation for this section. §9.23 is the operational layer: detecting when to act, and how.

As your Claude Code setup matures — skills, agents, rules, CLAUDE.md — a silent failure mode emerges: your configuration drifts away from how you actually work. Skills accumulate assumptions that no longer hold. CLAUDE.md describes a codebase that has evolved. Rules cover edge cases that became the norm. The agent keeps making the same correctable mistakes because nothing captures what you learned last week.

This section covers how to detect that drift early and close the loop — turning session observations into concrete config improvements.

Why Configurations Go Stale

Staleness doesn’t happen in one go. It accumulates from small gaps:

A skill was written for a v1 API that’s now v2 — the skill still “works” but generates code that needs manual fixing every time
CLAUDE.md has context that’s 6 months old — the agent reasons from a mental model of the codebase that no longer exists
A rule was added for an edge case that’s now the default pattern — it fires constantly and you’ve stopped reading its output
You’ve corrected the same mistake across 5 sessions — but nothing ever captured that correction as a rule

The signal is always there: you keep doing the same manual fixes. The work is identifying which fixes are worth encoding.

Detecting Friction from Your JSONL Logs

Your sessions are already logged (see §Observability: Setting Up Session Logging). What’s missing is reading them for quality signals, not just cost metrics.

Three patterns that reliably indicate a skill or rule needs updating:

Pattern	Signal	Likely Cause
Same file read multiple times per session	Missing context	Content should move to CLAUDE.md or a skill
Tool failure followed immediately by retry	Wrong assumption	A skill has an outdated command or path
User correction immediately after assistant turn	Prompt gap	A skill or rule doesn’t cover this case

Run this script weekly against your session logs to surface these patterns:

#!/bin/bash
# Usage: ./scripts/detect-friction.sh [days-back]
# Requires: jq

DAYS=${1:-7}
LOG_DIR="${CLAUDE_LOG_DIR:-$HOME/.claude/logs}"
SINCE=$(date -v-${DAYS}d +%Y-%m-%d 2>/dev/null || date -d "-${DAYS} days" +%Y-%m-%d)

echo "=== Friction Report — last ${DAYS} days ==="
echo

# 1. Files read more than 3x in any single session
echo "## Repeated Reads (same file >3x in one session)"
for f in "$LOG_DIR"/activity-*.jsonl; do
  [[ "$(basename "$f" .jsonl | cut -d- -f2-)" < "$SINCE" ]] && continue
  jq -r 'select(.tool == "Read") | .file' "$f" 2>/dev/null
done | sort | uniq -c | sort -rn | awk '$1 > 3 {print "  " $1 "x  " $2}'

echo

# 2. Tool failures (Bash exit non-zero)
echo "## Tool Failures (potential stale commands in skills)"
for f in "$LOG_DIR"/activity-*.jsonl; do
  [[ "$(basename "$f" .jsonl | cut -d- -f2-)" < "$SINCE" ]] && continue
  jq -r 'select(.tool == "Bash" and (.exit_code // 0) != 0) | .command' "$f" 2>/dev/null
done | sort | uniq -c | sort -rn | head -10 | awk '{print "  " $0}'

echo

# 3. Most-edited files (proxy for agent missing context)
echo "## Most Edited Files (context gap candidates)"
for f in "$LOG_DIR"/activity-*.jsonl; do
  [[ "$(basename "$f" .jsonl | cut -d- -f2-)" < "$SINCE" ]] && continue
  jq -r 'select(.tool == "Edit") | .file' "$f" 2>/dev/null
done | sort | uniq -c | sort -rn | head -10 | awk '{print "  " $1 "x  " $2}'

echo
echo "→ For each friction point, ask: is there a skill, rule, or CLAUDE.md section that should cover this?"

Skills Lifecycle Management

Skills accumulate. Without a lifecycle policy, you end up with 20+ skills where half are unused, two contradict each other, and none have version history.

When to create a skill:

A task is worth encoding as a skill when you’ve done it manually 3+ times and the steps are stable enough to write down. If you’re still figuring out the right approach, don’t encode it yet — premature skills crystallize bad patterns.

When to update a skill (patch):

A command in the skill fails because an API or path changed
The output needs a small clarification you keep adding manually
You added a convention and the skill doesn’t reflect it yet

When to version a skill (minor/major):

Add a version field and updated date to your skill frontmatter:

---
version: 1.2.0
updated: 2026-03-02
breaking_since: null
---

Use a simple policy:

patch (x.x.Z): rewording, clarification, examples added — no behavior change
minor (x.Y.z): new instructions, extended scope, new behavior opt-in
major (X.y.z): default behavior changes — annotate what broke and when in your CHANGELOG

When to deprecate a skill:

Add a deprecated: true flag and a note explaining what replaced it. Don’t delete immediately — other skills or commands may reference it.

CI staleness check — CLAUDE.md vs source modules:

If your CLAUDE.md is assembled from source modules (e.g., via a pnpm ai:configure pipeline), add a CI job to catch divergence before it causes silent failures:

name: AI Config Staleness Check
on:
  push:
    paths:
      - '.claude/rules/**'
      - '.claude/skills/**'
      - '.claude/agents/**'
      - 'CLAUDE.md.src/**'   # adjust to your source dir

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Verify CLAUDE.md is up to date
        run: |
          # Regenerate and compare
          pnpm ai:configure --dry-run > /tmp/expected-claude.md
          if ! diff -q CLAUDE.md /tmp/expected-claude.md > /dev/null; then
            echo "❌ CLAUDE.md is stale. Run: pnpm ai:configure"
            diff CLAUDE.md /tmp/expected-claude.md
            exit 1
          fi
          echo "✅ CLAUDE.md is up to date"

The Update Loop

The update loop formalizes what you already do informally: something doesn’t work well → you notice → you fix it. The difference is making the “notice” step systematic rather than accidental.

┌──────────────────────────────────────────────┐
│              THE UPDATE LOOP                  │
│                                              │
│  Session  →  Observe friction               │
│               (repeated fixes, tool fails)   │
│                    ↓                         │
│             Analyze root cause               │
│               (which skill/rule is missing?) │
│                    ↓                         │
│             Delta update                     │
│               (targeted edit, not rewrite)   │
│                    ↓                         │
│             Canary test                      │
│               (verify the fix holds)         │
│                    ↓                         │
│           Next session → repeat              │
└──────────────────────────────────────────────┘

The delta update principle: when updating a skill or rule, make the smallest targeted edit that fixes the observed problem. Don’t rewrite the whole skill — you’ll lose what was working. One problem, one edit, one test.

Integrating into /tech:handoff:

If you use a handoff command to persist session context, add a mandatory retrospective step before saving:

# Append to your handoff command prompt

Before saving context, answer:
- Which rules or skills were missing for today's work?
- Which corrections did you make more than once?
- What's the smallest edit that would prevent the most repeated friction?

Save conclusions via: write_memory("retro_[date]", your answers)

Canary testing a skill after update:

Before committing a skill change, verify it still produces the expected output on a known input:

# Example: test that typescript-aristote skill generates Zod validation
claude -p "Using the typescript-aristote skill: create a basic user tRPC router" \
  --output-format text | grep -qE "(z\.object|publicProcedure)" \
  && echo "✅ Canary passed" \
  || echo "❌ Canary failed — skill may have regressed"

Run canary tests before merging skill changes, especially for skills that other agents depend on.

Going Further

If you want to automate prompt optimization beyond the manual update loop, two frameworks are worth knowing:

DSPy (Stanford, open-source) — optimizes prompts programmatically given a metric and a set of examples. Requires 20+ labeled examples per skill for reliable results. Useful when you have a well-defined task and enough session history to build a dataset. dspy.ai

TextGrad — treats prompts as differentiable parameters and iterates using LLM-generated feedback as “gradients”. Better for creative or domain-specific tasks where the evaluation is qualitative. github.com/zou-group/textgrad

Both require more setup than the manual loop above, and neither eliminates the need for human judgment on what to optimize. Start with the update loop and canary tests — they’ll surface most of the value with a fraction of the overhead.

What’s Next?

§9.10 Continuous Improvement Mindset — the decision framework for when to encode vs. accept as an edge case
§Observability: Reading for Quality — qualitative JSONL analysis patterns
§9.12 Git Best Practices — version control for your config alongside your code

10. Reference

Quick jump: Commands Table · Keyboard Shortcuts · Configuration Reference · Troubleshooting · Cheatsheet · Daily Workflow

9. Advanced Patterns

📌 Section 9 TL;DR (3 minutes)

Pattern Categories:

When to Use This Section:

🌍 Industry Context: 2026 Agentic Coding Trends

📊 Données d’Adoption Validées

🎯 Research Insights (Anthropic Internal Study)

⚠️ Anti-Patterns Entreprise

📚 Case Studies Industrie

🔗 Navigation

9.1 The Trinity

When to Use the Trinity

Extended Thinking (Opus 4.5+) & Adaptive Thinking (Opus 4.6+)

Evolution Timeline

Adaptive Thinking (Opus 4.6)

Controlling Thinking Mode

Cost Implications

Migration for Existing Users

Legacy Keywords Reference

API Breaking Changes (Opus 4.6)

Example: Using the Trinity

9.2 Composition Patterns

Multi-Agent Delegation

Skill Stacking

The “Rev the Engine” Pattern

The “Stack Maximum” Pattern

9.3 CI/CD Integration

Headless Mode

Unix Piping Workflows

Git Hooks Integration

GitHub Actions Integration

Debugging Failed CI Runs

Verify Gate Pattern

Release Notes Generation

Approach 1: Command-Based

Verification

Approach 3: Interactive Workflow

Best Practices

Example Output

Common Issues

Deployment Automation

Required secrets

Deployment skill

Non-negotiable guardrails

9.4 IDE Integration

VS Code Integration

JetBrains Integration

Xcode Integration (Feb 2026)

Terminal Integration

macOS/Linux (Bash/Zsh)

Windows (PowerShell)

9.5 Tight Feedback Loops

The Feedback Loop Pyramid

Implementing Tight Loops

Level 1: Immediate (IDE/Editor)

Level 2: On-Save (Git Hooks)

Level 3: On-Commit (CI)

Claude Code Integration

Feedback Loop Checklist

Background Tasks for Fullstack Development

When to Background Tasks

Fullstack Workflow Pattern

Real-World Example: API + Frontend Iteration

Context Rot Prevention

Limitations

Integration with Teleportation

Monitoring Background Tasks

Disabling Background Tasks

9.6 Todo as Instruction Mirrors

The Mirror Principle

Todo as Specification

Todo Granularity Guide

Instruction Embedding

Todo Templates

9.7 Output Styles

Output Style Spectrum

Style Directives

Context-Aware Styles

Format Control

Output Templates