Skip to content
Code Guide

Agent Teams Quick Start Guide

Practical guide for using agent teams in your projects Reading time: 8-10 min | Full documentation: Agent Teams (30 min overview)

You know agent teams exist. You’ve read the theory. But when should you actually use them in your projects?

This guide gives you:

  • 5-minute setup (environment → first test)
  • 4 copy-paste patterns for real projects (Guide + RTK)
  • Decision matrix (when YES, when NO)
  • Metrics to measure ROI
  • Red flags to avoid waste

Skip if: You want theory → Read Agent Teams full doc instead.


  1. 5-Minute Setup
  2. Patterns for Your Projects
  3. Decision Matrix: When to Use
  4. Minimal Workflow Template
  5. Success Metrics
  6. Limitations & Red Flags

Terminal window
# Check Claude Code version (v2.1.32+ required)
claude --version
# Check model availability
claude
> /model opus
# Should show: "Model changed to opus (claude-opus-4-6-20250624)"

Minimum requirements:

  • Claude Code v2.1.32+
  • Opus 4.6 model
  • Git repository (agent teams use git for coordination)
Terminal window
# Set environment variable (add to ~/.bashrc or ~/.zshrc for persistence)
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
# Launch Claude Code
claude
> Are agent teams enabled?

Expected response:

Yes, agent teams are enabled in this session. I can create teams
of agents to work in parallel on complex tasks using:
- Multi-agent coordination
- Git-based task claiming
- Autonomous team coordination

If disabled: Check environment variable is set (echo $CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS).

> Create a simple test team to analyze this README:
> - Agent 1: Check structure (sections, TOC, badges)
> - Agent 2: Check content quality (clarity, examples, completeness)

What happens:

  1. Claude spawns 2 agents (you’ll see “Creating team…” message)
  2. Agents work in parallel (you may see “Idle” messages - this is normal)
  3. Claude presents consolidated findings (convergence + unique insights)

Navigation:

  • Shift+Down: Cycle through teammate outputs (in-process mode)
  • Main view: Consolidated synthesis

Duration: 1-2 min for simple 2-agent task.


2.1 Claude Code Guide - Pre-Release Review

Section titled “2.1 Claude Code Guide - Pre-Release Review”

Use case: Systematic audit before version bump to catch consistency issues (broken links, desync counts, wrong versions)

Trigger: Before every /release command execution

Duration: 3-5 min

Team composition:

Team: pre-release-audit (3 agents)
├─ accuracy-auditor → Verify claims, stats, versions, external links
├─ consistency-checker → Check template count, eval count, guide lines match across files
└─ breaking-checker → Identify breaking changes vs last release

Copy-paste prompt:

> Create a pre-release audit team:
> - Accuracy: Check all claims, stats, version numbers, external links in CHANGELOG.md, README.md, and guide/ultimate-guide.md
> - Consistency: Verify template count (count examples/ files), eval count (count docs/resource-evaluations/ files), guide lines (wc -l guide/ultimate-guide.md) match across README.md, guide/cheatsheet.md, machine-readable/reference.yaml
> - Breaking: Identify breaking changes vs v3.23.0 by analyzing CHANGELOG.md [Unreleased] section

ROI: Catches bugs like:

  • LinkedIn URL corruption (caught in real audit today)
  • Desync between guide/landing (template counts)
  • Wrong version numbers in multiple files
  • Broken external links

When to skip: Simple typo fixes (no version/count/link changes).

Real example (test from 2026-02-08):

Findings:
- Accuracy: 1 critical (LinkedIn URL malformed), 3 warnings (external links to verify)
- Consistency: 2 criticals (template count mismatch README vs landing, line numbers outdated in reference.yaml)
- Breaking: 0 breaking changes detected vs v3.23.0 (clean release)
Convergence: 2 agents flagged template count issue (high confidence)
Time: 4 min 12 sec
Verdict: ✅ High value, found 3 criticals that would've shipped

Use case: Verify guide/landing synchronization (version, counts, content) without running manual script

Trigger: After significant guide modifications (version bump, template additions, FAQ changes)

Duration: 2-3 min

Team composition:

Team: landing-sync (2 agents)
├─ guide-scanner → Extract version, template count, eval count, guide lines, FAQ content
└─ landing-scanner → Compare with index.html, examples.html, check sync status

Copy-paste prompt:

> Validate guide/landing synchronization:
> - Guide scanner: Extract version from VERSION file, template count (find examples/ -type f | wc -l), eval count (find docs/resource-evaluations/ -name "*.md" | wc -l), guide lines (wc -l guide/ultimate-guide.md), FAQ entries from README.md
> - Landing scanner: Check /Users/florianbruniaux/Sites/perso/claude-code-ultimate-guide-landing/index.html and examples.html for version in footer+FAQ, template count in badges, eval count, guide lines approximation (~9800+)
>
> Report: Synced ✅ / Mismatches with line numbers

ROI:

  • Zero desync shipped to production
  • Avoids manual ./scripts/check-landing-sync.sh execution
  • Faster than script (2 min vs 5 min to run script + fix)

When to skip: Pure code changes (no docs/counts affected).

Success criteria:

✅ All synced: Version, template count, eval count match
⚠️ Mismatch: Specific line numbers in index.html to fix

2.3 Claude Code Guide - Multi-File Doc Update

Section titled “2.3 Claude Code Guide - Multi-File Doc Update”

Use case: Add new feature documentation across multiple files (ultimate-guide.md, reference.yaml, README.md) with cross-reference consistency

Trigger: New feature section (>50 lines), breaking changes, architecture updates

Duration: 5-8 min

Team composition:

Team: doc-update (3 agents)
├─ content-writer → Write main section in ultimate-guide.md with examples
├─ index-updater → Update TOC, reference.yaml entries, README navigation
└─ consistency-checker → Verify cross-references, line numbers, anchors work

Copy-paste prompt:

> Update documentation for new "[FEATURE NAME]" feature:
> - Content Scope: Write section [X.Y] in guide/ultimate-guide.md with:
> - Overview (what/why/when)
> - 2-3 concrete examples
> - Best practices + gotchas
> - Links to related sections
> Context: guide/ultimate-guide.md section [X.Y] only
> - Index Scope: Update:
> - guide/ultimate-guide.md TOC (add section [X.Y])
> - machine-readable/reference.yaml (add entry with line numbers)
> - README.md navigation (add link if major feature)
> Context: Index files only (TOC, reference.yaml, README.md)
> - Consistency Scope: Verify:
> - All cross-references resolve correctly
> - Line numbers in reference.yaml match actual content
> - Anchors in README point to correct sections
> - No broken internal links
> Context: All modified files for cross-reference validation

ROI:

  • Zero broken links (consistency-checker catches all)
  • 60% faster than sequential (write → index → verify)
  • Parallel work reduces waiting time

When to skip: Single-file edits (<50 lines), no cross-references needed.

Real example (Agent Teams section addition):

Task: Add "Agent Teams" as section 9.20 (300 lines)
Files touched: 3 (ultimate-guide.md, reference.yaml, README.md)
Sequential estimate: 12-15 min (write → index → verify)
Agent Teams: 7 min 30 sec (3 agents parallel)
Savings: 40% time + zero manual cross-ref checks

Use case: Review external contributor PR for security issues (injection, token leaks), Rust idioms, and performance

Trigger: PR opened by non-core contributor, PR touches sensitive code (auth, external commands, regex)

Duration: 5-8 min

Team composition:

Team: security-pr-review (3 agents)
├─ rust-expert → Check ownership patterns, error handling (anyhow/thiserror), idiomatic code
├─ security-auditor → Scan for injection risks, token leaks, input sanitization
└─ perf-analyzer → Review allocations, async patterns, compiled regex

Copy-paste prompt:

> Review PR #[NUMBER] with scope-focused analysis:
> - Rust Scope: Check:
> - Ownership patterns (prefer &str over String, minimize clones)
> - Error handling (anyhow::Result with .context(), no unwrap outside tests)
> - Idiomatic code (impl after type, #[cfg(test)] mod tests)
> - Clippy compliance (zero warnings)
> Context: All modified .rs files
> - Security Scope: Scan for:
> - Command injection (shell escapes, argument sanitization)
> - Token/credential leaks (hardcoded secrets, logs, error messages)
> - Input sanitization (path traversal, regex DoS)
> - File operations (path validation, permissions)
> Context: Input handling, auth, file I/O code
> - Performance Scope: Review:
> - Unnecessary allocations (String::from vs &str)
> - Async patterns (spawn_blocking for CPU-bound work)
> - Compiled regex (lazy_static! for hot paths)
> - Algorithm complexity (O(n) vs O(n²))
> Context: Hot paths, loops, async functions

ROI:

  • Blind spots detection: Security + Rust + Perf = areas solo reviewer would miss
  • Consistent review quality (not dependent on reviewer mood/focus)
  • Faster than sequential (3 agents parallel vs 3-pass review)

When to skip: Internal PRs from trusted contributors, trivial changes (docs, comments, tests only).

Success criteria:

✅ Convergence: 2+ agents flag same critical issue (high confidence)
✅ Unique insights: Each agent finds domain-specific issues (Rust/Security/Perf)
❌ False positives: <20% of findings are invalid

Real example (hypothetical external PR):

PR: Add new git filter command
Agents findings:
- Rust: 3 issues (unwrap in production code, missing .context(), non-idiomatic error handling)
- Security: 2 criticals (shell injection via user input, token leak in error message)
- Perf: 1 issue (regex compiled on every call, not lazy_static)
Convergence: Security + Rust both flagged missing input sanitization (high confidence)
Time: 6 min 40 sec
Verdict: ✅ Critical security issues caught, PR requires revision

SituationAgent Teams ?Raison
Pre-release review (Guide)✅ YESMulti-layer audit (accuracy + consistency + breaking) requires parallel perspectives
Simple typo fix❌ NOOverkill, 1 agent = 10 sec, 3 agents = cost bloat
External PR (RTK)✅ YESSecurity + Rust + Perf = blind spots detection, high-stakes review
Multi-file doc update (Guide)✅ YESContent + Index + Consistency = zero broken links, parallel work
Landing sync check⚠️ MAYBEUse agent teams if desync suspected, else ./scripts/check-landing-sync.sh faster
CHANGELOG update❌ NOSequential task (linear writing), no parallelization benefit
Single file edit (RTK)❌ NONo coordination needed, sequential OK
Small README tweak (<50 lines)❌ NONo cross-references, no complexity, single agent faster
Architecture design✅ YESMultiple perspectives (frontend, backend, infra, security) reveal blind spots
Bug investigation⚠️ MAYBESimple bugs → NO, complex multi-component failures → YES

Use Agent Teams when:

  • ✅ You’d naturally think “I should check X, Y, and Z”
  • ✅ High stakes (production release, external contributor, security-sensitive)
  • ✅ Multi-scope analysis needed (Rust scope + Security scope + Performance scope)
  • ✅ Cross-file consistency matters (links, counts, versions sync)
  • ✅ Parallel work possible (independent tasks, no sequential dependency)

Don’t use Agent Teams when:

  • ❌ Simple task (<5 files, <100 lines, 1 domain)
  • ❌ Sequential workflow (step B depends on step A result)
  • ❌ Budget tight (3x tokens, reserve for high-value tasks)
  • ❌ Write-heavy (many edits to same files = merge conflicts)

Terminal window
# 1. Setup (once per session)
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
# 2. Launch Claude
claude
# 3. Create team (prompt template)
> Create a team to [TASK]:
> - Agent 1 ([SCOPE/CONTEXT]): [SPECIFIC MISSION]
> - Agent 2 ([SCOPE/CONTEXT]): [SPECIFIC MISSION]
> - Agent 3 ([SCOPE/CONTEXT]): [SPECIFIC MISSION]
>
> [FILES/DIRECTORIES TO ANALYZE]
# 4. Observe (optional)
# Shift+Down to cycle through teammate outputs (in-process mode)
# 5. Synthesis
# Claude presents consolidated findings automatically
# 6. Act
# Fix critical findings, skip minor ones
> Create a pre-release audit team:
> - Accuracy: Check all claims, stats, version numbers, external links in CHANGELOG.md, README.md, and guide/ultimate-guide.md
> - Consistency: Verify template count (count examples/ files), eval count (count docs/resource-evaluations/ files), guide lines (wc -l guide/ultimate-guide.md) match across README.md, guide/cheatsheet.md, machine-readable/reference.yaml
> - Breaking: Identify breaking changes vs v3.23.0 by analyzing CHANGELOG.md [Unreleased] section
> Review PR #42 with scope-focused analysis:
> - Rust Scope: Check ownership patterns, error handling (anyhow/thiserror), idiomatic code in modified files (context: src/**/*.rs)
> - Security Scope: Scan for injection risks, token leaks, input sanitization (context: auth, input handling code)
> - Performance Scope: Review allocations, async patterns, compiled regex (context: hot paths, loops)
> Update documentation for new "Agent Teams Quick Start" feature:
> - Content Scope: Write guide/workflows/agent-teams-quick-start.md with overview, 4 patterns, decision matrix, metrics
> - Index Scope: Update guide/ultimate-guide.md (add reference section 9.20), machine-readable/reference.yaml (add entry), CHANGELOG.md (add "Added" entry)
> - Consistency Scope: Verify all cross-refs work, line numbers match, no broken links (context: all modified files)
> Validate guide/landing synchronization:
> - Guide scanner: Extract version from VERSION file, template count (find examples/ -type f | wc -l), eval count (find docs/resource-evaluations/ -name "*.md" | wc -l), guide lines (wc -l guide/ultimate-guide.md)
> - Landing scanner: Check /Users/florianbruniaux/Sites/perso/claude-code-ultimate-guide-landing/index.html and examples.html for version, counts, guide lines approximation

MetricTargetHow to Measure
Convergence rate>50%Count findings flagged by 2+ agents / total findings. High convergence = high confidence.
Unique insightsEach agent ≥1Each agent must find at least 1 unique issue in their domain. If agent finds 0 unique = wasted tokens.
False positive rate<20%Count invalid findings / total findings. Too many false positives = poor prompts.
Time saving60-70%Compare agent teams time vs sequential (estimate 3x single-agent time for 3 tasks).
Bug catch rate>80%Count critical bugs found by agents / total bugs found post-ship. High = effective prevention.

Real Example: Pre-Release Review Test (2026-02-08)

Section titled “Real Example: Pre-Release Review Test (2026-02-08)”
Task: Pre-release audit for v3.23.1
Agents: 3 (accuracy-auditor, consistency-checker, breaking-checker)
Duration: 4 min 12 sec
Findings (raw): 45 total
├─ Accuracy: 12 (1 critical: LinkedIn URL malformed, 11 warnings)
├─ Consistency: 18 (2 criticals: template count desync, line numbers outdated)
└─ Breaking: 15 (0 breaking changes, 15 informational notes)
Findings (deduplicated): ~30 unique issues
Convergence analysis:
├─ High confidence (2-3 agents): 4 issues
│ ├─ Template count mismatch (consistency + accuracy)
│ ├─ Line numbers outdated (consistency + accuracy)
│ ├─ External link verification needed (accuracy + breaking)
│ └─ Version sync across files (all 3 agents)
└─ Unique insights:
├─ Accuracy: LinkedIn URL corruption (only this agent caught it)
├─ Consistency: TOC structure deviation (only this agent)
└─ Breaking: Changelog format improvement suggestion (only this agent)
Metrics:
├─ Convergence rate: 4/30 = 13% (lower than target, but 4 criticals flagged by multiple agents = high confidence on what matters)
├─ Unique insights: 3/3 agents = 100% (each agent found unique issues in their domain)
├─ False positive rate: 2/30 = 6.6% (below 20% target ✅)
├─ Time saving: 4 min vs estimated 12 min sequential = 66% savings ✅
├─ Bug catch rate: 3 critical bugs caught that would've shipped = prevented production issues ✅
Verdict: ✅ High value for pre-release audits

After each agent teams task:

  1. Count findings: Note raw findings per agent, then deduplicate
  2. Mark convergence: Which issues were flagged by 2+ agents?
  3. Check unique insights: Did each agent find at least 1 domain-specific issue?
  4. Verify false positives: How many findings were invalid/noise?
  5. Time comparison: Agent teams duration vs estimated sequential time
  6. Post-ship validation: Did agent teams catch bugs that would’ve shipped?

Keep a log (Markdown table in project docs):

| Date | Task | Agents | Duration | Findings | Convergence | Unique | False+ | Time Saved | Bugs Caught |
|------|------|--------|----------|----------|-------------|--------|--------|------------|-------------|
| 2026-02-08 | Pre-release v3.23.1 | 3 | 4m12s | 30 | 13% (4 critical) | 3/3 | 6.6% | 66% | 3 |

Adjust prompts if metrics fail:

  • Low convergence (<30%) → Scopes too narrow, overlap context boundaries more
  • No unique insights → Scopes too similar, diversify analysis angles
  • High false positives (>20%) → Prompts too vague, add concrete criteria

LimitationWhat It MeansMitigation
3x tokensEach agent = separate model call = 3x costReserve for high-stakes tasks (pre-release, security PRs, not typo fixes)
Idle spamAgents show “Idle” messages during coordinationNormal behavior, not a bug, ignore the spam
ExperimentalResearch preview = stability not guaranteedExpect bugs, don’t rely on agent teams for production-critical workflows
Coordination overhead3-5 agents max, not 10 (coordination complexity grows)Stick to 2-4 agents, avoid “team of 10” prompts
Context isolationAgents don’t see each other’s discoveries (work independently)Claude synthesizes findings, but agents can’t build on each other’s work mid-task

Simple task (<5 files, <100 lines, 1 domain)

  • Example: Fix typo in README.md
  • Why avoid: 3x tokens for 10-second task = waste

Sequential workflow (step B depends on step A result)

  • Example: Implement feature → Write tests → Deploy
  • Why avoid: Agents work in parallel, can’t handle dependencies

Budget tight (3x tokens, optimize cost)

  • Example: Personal project with limited API credits
  • Why avoid: Agent teams = luxury for high-stakes tasks, not daily workflow

Write-heavy (many edits to same files)

  • Example: Refactor entire codebase structure
  • Why avoid: Merge conflicts, coordination overhead, agents stepping on each other

Low-stakes review (internal PR, trusted contributor, simple change)

  • Example: Team member fixes small bug
  • Why avoid: Overkill, single-agent review faster + cheaper

High-stakes (production release, security-sensitive, external contributor) ✅ Multi-domain (Rust + Security + Performance = blind spots) ✅ Parallel-friendly (independent tasks, no sequential dependencies) ✅ Consistency-critical (cross-file sync, counts, versions) ✅ Learning opportunity (understand blind spots, improve prompts)


Terminal window
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
claude
> Are agent teams enabled? # Verify

YES if:

  • High stakes + Multi-domain + Parallel-friendly

NO if:

  • Simple task + Sequential + Budget tight
  1. Pre-Release Review (Guide) → Accuracy + Consistency + Breaking
  2. Landing Sync (Guide) → Guide scanner + Landing scanner
  3. Multi-File Doc Update (Guide) → Content + Index + Consistency
  4. Security PR Review (RTK) → Rust + Security + Performance
  • Convergence: >50% (findings by 2+ agents)
  • Unique insights: Each agent ≥1
  • False positives: <20%
  • Time saving: 60-70%
  • Bug catch: >80%
  • ❌ Simple task (<5 files)
  • ❌ Sequential workflow
  • ❌ Budget tight
  • ❌ Write-heavy (merge conflicts)

  1. Try first test (5 min setup + simple 2-agent task)
  2. Pick 1 pattern from your project (Guide or RTK)
  3. Measure metrics (convergence, unique insights, time saved)
  4. Adjust prompts based on results
  5. Read full doc for advanced patterns: Agent Teams

Got questions? Check full documentation for:

  • Architecture deep-dive (how git coordination works)
  • Advanced use cases (15+ production scenarios)
  • Troubleshooting (common issues + solutions)
  • Best practices (team size, prompt design, conflict resolution)