Agent Teams Quick Start Guide

Practical guide for using agent teams in your projects Reading time: 8-10 min | Full documentation: Agent Teams (30 min overview)

What is This?

You know agent teams exist. You’ve read the theory. But when should you actually use them in your projects?

This guide gives you:

✅ 5-minute setup (environment → first test)
✅ 4 copy-paste patterns for real projects (Guide + RTK)
✅ Decision matrix (when YES, when NO)
✅ Metrics to measure ROI
✅ Red flags to avoid waste

Skip if: You want theory → Read Agent Teams full doc instead.

5-Minute Setup
Patterns for Your Projects
Decision Matrix: When to Use
Minimal Workflow Template
Success Metrics
Limitations & Red Flags

1. 5-Minute Setup

Step 1: Prerequisites Check

# Check Claude Code version (v2.1.32+ required)
claude --version

# Check model availability
claude
> /model opus
# Should show: "Model changed to opus (claude-opus-4-6-20250624)"

Minimum requirements:

Claude Code v2.1.32+
Opus 4.6 model
Git repository (agent teams use git for coordination)

Step 2: Enable Feature

# Set environment variable (add to ~/.bashrc or ~/.zshrc for persistence)
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1

# Launch Claude Code
claude

Step 3: Verification

> Are agent teams enabled?

Expected response:

Yes, agent teams are enabled in this session. I can create teams
of agents to work in parallel on complex tasks using:
- Multi-agent coordination
- Git-based task claiming
- Autonomous team coordination

If disabled: Check environment variable is set (echo $CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS).

Step 4: First Test (2 agents)

> Create a simple test team to analyze this README:
> - Agent 1: Check structure (sections, TOC, badges)
> - Agent 2: Check content quality (clarity, examples, completeness)

What happens:

Claude spawns 2 agents (you’ll see “Creating team…” message)
Agents work in parallel (you may see “Idle” messages - this is normal)
Claude presents consolidated findings (convergence + unique insights)

Navigation:

Shift+Down: Cycle through teammate outputs (in-process mode)
Main view: Consolidated synthesis

Duration: 1-2 min for simple 2-agent task.

2. Patterns for Your Projects

2.1 Claude Code Guide - Pre-Release Review

Use case: Systematic audit before version bump to catch consistency issues (broken links, desync counts, wrong versions)

Trigger: Before every /release command execution

Duration: 3-5 min

Team composition:

Team: pre-release-audit (3 agents)
├─ accuracy-auditor → Verify claims, stats, versions, external links
├─ consistency-checker → Check template count, eval count, guide lines match across files
└─ breaking-checker → Identify breaking changes vs last release

Copy-paste prompt:

> Create a pre-release audit team:
> - Accuracy: Check all claims, stats, version numbers, external links in CHANGELOG.md, README.md, and guide/ultimate-guide.md
> - Consistency: Verify template count (count examples/ files), eval count (count docs/resource-evaluations/ files), guide lines (wc -l guide/ultimate-guide.md) match across README.md, guide/cheatsheet.md, machine-readable/reference.yaml
> - Breaking: Identify breaking changes vs v3.23.0 by analyzing CHANGELOG.md [Unreleased] section

ROI: Catches bugs like:

LinkedIn URL corruption (caught in real audit today)
Desync between guide/landing (template counts)
Wrong version numbers in multiple files
Broken external links

When to skip: Simple typo fixes (no version/count/link changes).

Real example (test from 2026-02-08):

Findings:
- Accuracy: 1 critical (LinkedIn URL malformed), 3 warnings (external links to verify)
- Consistency: 2 criticals (template count mismatch README vs landing, line numbers outdated in reference.yaml)
- Breaking: 0 breaking changes detected vs v3.23.0 (clean release)

Convergence: 2 agents flagged template count issue (high confidence)
Time: 4 min 12 sec
Verdict: ✅ High value, found 3 criticals that would've shipped

2.2 Claude Code Guide - Landing Sync

Use case: Verify guide/landing synchronization (version, counts, content) without running manual script

Trigger: After significant guide modifications (version bump, template additions, FAQ changes)

Duration: 2-3 min

Team composition:

Team: landing-sync (2 agents)
├─ guide-scanner → Extract version, template count, eval count, guide lines, FAQ content
└─ landing-scanner → Compare with index.html, examples.html, check sync status

Copy-paste prompt:

> Validate guide/landing synchronization:
> - Guide scanner: Extract version from VERSION file, template count (find examples/ -type f | wc -l), eval count (find docs/resource-evaluations/ -name "*.md" | wc -l), guide lines (wc -l guide/ultimate-guide.md), FAQ entries from README.md
> - Landing scanner: Check /Users/florianbruniaux/Sites/perso/claude-code-ultimate-guide-landing/index.html and examples.html for version in footer+FAQ, template count in badges, eval count, guide lines approximation (~9800+)
>
> Report: Synced ✅ / Mismatches with line numbers

ROI:

Zero desync shipped to production
Avoids manual ./scripts/check-landing-sync.sh execution
Faster than script (2 min vs 5 min to run script + fix)

When to skip: Pure code changes (no docs/counts affected).

Success criteria:

✅ All synced: Version, template count, eval count match
⚠️ Mismatch: Specific line numbers in index.html to fix

2.3 Claude Code Guide - Multi-File Doc Update

Use case: Add new feature documentation across multiple files (ultimate-guide.md, reference.yaml, README.md) with cross-reference consistency

Trigger: New feature section (>50 lines), breaking changes, architecture updates

Duration: 5-8 min

Team composition:

Team: doc-update (3 agents)
├─ content-writer → Write main section in ultimate-guide.md with examples
├─ index-updater → Update TOC, reference.yaml entries, README navigation
└─ consistency-checker → Verify cross-references, line numbers, anchors work

Copy-paste prompt:

> Update documentation for new "[FEATURE NAME]" feature:
> - Content Scope: Write section [X.Y] in guide/ultimate-guide.md with:
>   - Overview (what/why/when)
>   - 2-3 concrete examples
>   - Best practices + gotchas
>   - Links to related sections
>   Context: guide/ultimate-guide.md section [X.Y] only
> - Index Scope: Update:
>   - guide/ultimate-guide.md TOC (add section [X.Y])
>   - machine-readable/reference.yaml (add entry with line numbers)
>   - README.md navigation (add link if major feature)
>   Context: Index files only (TOC, reference.yaml, README.md)
> - Consistency Scope: Verify:
>   - All cross-references resolve correctly
>   - Line numbers in reference.yaml match actual content
>   - Anchors in README point to correct sections
>   - No broken internal links
>   Context: All modified files for cross-reference validation

ROI:

Zero broken links (consistency-checker catches all)
60% faster than sequential (write → index → verify)
Parallel work reduces waiting time

When to skip: Single-file edits (<50 lines), no cross-references needed.

Real example (Agent Teams section addition):

Task: Add "Agent Teams" as section 9.20 (300 lines)
Files touched: 3 (ultimate-guide.md, reference.yaml, README.md)

Sequential estimate: 12-15 min (write → index → verify)
Agent Teams: 7 min 30 sec (3 agents parallel)
Savings: 40% time + zero manual cross-ref checks

2.4 RTK - Security PR Review

Use case: Review external contributor PR for security issues (injection, token leaks), Rust idioms, and performance

Trigger: PR opened by non-core contributor, PR touches sensitive code (auth, external commands, regex)

Duration: 5-8 min

Team composition:

Team: security-pr-review (3 agents)
├─ rust-expert → Check ownership patterns, error handling (anyhow/thiserror), idiomatic code
├─ security-auditor → Scan for injection risks, token leaks, input sanitization
└─ perf-analyzer → Review allocations, async patterns, compiled regex

Copy-paste prompt:

> Review PR #[NUMBER] with scope-focused analysis:
> - Rust Scope: Check:
>   - Ownership patterns (prefer &str over String, minimize clones)
>   - Error handling (anyhow::Result with .context(), no unwrap outside tests)
>   - Idiomatic code (impl after type, #[cfg(test)] mod tests)
>   - Clippy compliance (zero warnings)
>   Context: All modified .rs files
> - Security Scope: Scan for:
>   - Command injection (shell escapes, argument sanitization)
>   - Token/credential leaks (hardcoded secrets, logs, error messages)
>   - Input sanitization (path traversal, regex DoS)
>   - File operations (path validation, permissions)
>   Context: Input handling, auth, file I/O code
> - Performance Scope: Review:
>   - Unnecessary allocations (String::from vs &str)
>   - Async patterns (spawn_blocking for CPU-bound work)
>   - Compiled regex (lazy_static! for hot paths)
>   - Algorithm complexity (O(n) vs O(n²))
>   Context: Hot paths, loops, async functions

ROI:

Blind spots detection: Security + Rust + Perf = areas solo reviewer would miss
Consistent review quality (not dependent on reviewer mood/focus)
Faster than sequential (3 agents parallel vs 3-pass review)

When to skip: Internal PRs from trusted contributors, trivial changes (docs, comments, tests only).

Success criteria:

✅ Convergence: 2+ agents flag same critical issue (high confidence)
✅ Unique insights: Each agent finds domain-specific issues (Rust/Security/Perf)
❌ False positives: <20% of findings are invalid

Real example (hypothetical external PR):

PR: Add new git filter command
Agents findings:
- Rust: 3 issues (unwrap in production code, missing .context(), non-idiomatic error handling)
- Security: 2 criticals (shell injection via user input, token leak in error message)
- Perf: 1 issue (regex compiled on every call, not lazy_static)

Convergence: Security + Rust both flagged missing input sanitization (high confidence)
Time: 6 min 40 sec
Verdict: ✅ Critical security issues caught, PR requires revision

3. Decision Matrix: When to Use

Situation	Agent Teams ?	Raison
Pre-release review (Guide)	✅ YES	Multi-layer audit (accuracy + consistency + breaking) requires parallel perspectives
Simple typo fix	❌ NO	Overkill, 1 agent = 10 sec, 3 agents = cost bloat
External PR (RTK)	✅ YES	Security + Rust + Perf = blind spots detection, high-stakes review
Multi-file doc update (Guide)	✅ YES	Content + Index + Consistency = zero broken links, parallel work
Landing sync check	⚠️ MAYBE	Use agent teams if desync suspected, else `./scripts/check-landing-sync.sh` faster
CHANGELOG update	❌ NO	Sequential task (linear writing), no parallelization benefit
Single file edit (RTK)	❌ NO	No coordination needed, sequential OK
Small README tweak (<50 lines)	❌ NO	No cross-references, no complexity, single agent faster
Architecture design	✅ YES	Multiple perspectives (frontend, backend, infra, security) reveal blind spots
Bug investigation	⚠️ MAYBE	Simple bugs → NO, complex multi-component failures → YES

Rule of Thumb

Use Agent Teams when:

✅ You’d naturally think “I should check X, Y, and Z”
✅ High stakes (production release, external contributor, security-sensitive)
✅ Multi-scope analysis needed (Rust scope + Security scope + Performance scope)
✅ Cross-file consistency matters (links, counts, versions sync)
✅ Parallel work possible (independent tasks, no sequential dependency)

Don’t use Agent Teams when:

❌ Simple task (<5 files, <100 lines, 1 domain)
❌ Sequential workflow (step B depends on step A result)
❌ Budget tight (3x tokens, reserve for high-value tasks)
❌ Write-heavy (many edits to same files = merge conflicts)

4. Minimal Workflow Template

Bash Template (reusable)

# 1. Setup (once per session)
export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1

# 2. Launch Claude
claude

# 3. Create team (prompt template)
> Create a team to [TASK]:
> - Agent 1 ([SCOPE/CONTEXT]): [SPECIFIC MISSION]
> - Agent 2 ([SCOPE/CONTEXT]): [SPECIFIC MISSION]
> - Agent 3 ([SCOPE/CONTEXT]): [SPECIFIC MISSION]
>
> [FILES/DIRECTORIES TO ANALYZE]

# 4. Observe (optional)
# Shift+Down to cycle through teammate outputs (in-process mode)

# 5. Synthesis
# Claude presents consolidated findings automatically

# 6. Act
# Fix critical findings, skip minor ones

Example Prompts (copy-paste ready)

Pre-Release Guide Audit

> Create a pre-release audit team:
> - Accuracy: Check all claims, stats, version numbers, external links in CHANGELOG.md, README.md, and guide/ultimate-guide.md
> - Consistency: Verify template count (count examples/ files), eval count (count docs/resource-evaluations/ files), guide lines (wc -l guide/ultimate-guide.md) match across README.md, guide/cheatsheet.md, machine-readable/reference.yaml
> - Breaking: Identify breaking changes vs v3.23.0 by analyzing CHANGELOG.md [Unreleased] section

Security PR Review (RTK)

> Review PR #42 with scope-focused analysis:
> - Rust Scope: Check ownership patterns, error handling (anyhow/thiserror), idiomatic code in modified files (context: src/**/*.rs)
> - Security Scope: Scan for injection risks, token leaks, input sanitization (context: auth, input handling code)
> - Performance Scope: Review allocations, async patterns, compiled regex (context: hot paths, loops)

Multi-File Doc Update

> Update documentation for new "Agent Teams Quick Start" feature:
> - Content Scope: Write guide/workflows/agent-teams-quick-start.md with overview, 4 patterns, decision matrix, metrics
> - Index Scope: Update guide/ultimate-guide.md (add reference section 9.20), machine-readable/reference.yaml (add entry), CHANGELOG.md (add "Added" entry)
> - Consistency Scope: Verify all cross-refs work, line numbers match, no broken links (context: all modified files)

Landing Sync Validation

> Validate guide/landing synchronization:
> - Guide scanner: Extract version from VERSION file, template count (find examples/ -type f | wc -l), eval count (find docs/resource-evaluations/ -name "*.md" | wc -l), guide lines (wc -l guide/ultimate-guide.md)
> - Landing scanner: Check /Users/florianbruniaux/Sites/perso/claude-code-ultimate-guide-landing/index.html and examples.html for version, counts, guide lines approximation

5. Success Metrics

How to Measure Agent Teams ROI

Metric	Target	How to Measure
Convergence rate	>50%	Count findings flagged by 2+ agents / total findings. High convergence = high confidence.
Unique insights	Each agent ≥1	Each agent must find at least 1 unique issue in their domain. If agent finds 0 unique = wasted tokens.
False positive rate	<20%	Count invalid findings / total findings. Too many false positives = poor prompts.
Time saving	60-70%	Compare agent teams time vs sequential (estimate 3x single-agent time for 3 tasks).
Bug catch rate	>80%	Count critical bugs found by agents / total bugs found post-ship. High = effective prevention.

Real Example: Pre-Release Review Test (2026-02-08)

Task: Pre-release audit for v3.23.1
Agents: 3 (accuracy-auditor, consistency-checker, breaking-checker)
Duration: 4 min 12 sec

Findings (raw): 45 total
├─ Accuracy: 12 (1 critical: LinkedIn URL malformed, 11 warnings)
├─ Consistency: 18 (2 criticals: template count desync, line numbers outdated)
└─ Breaking: 15 (0 breaking changes, 15 informational notes)

Findings (deduplicated): ~30 unique issues

Convergence analysis:
├─ High confidence (2-3 agents): 4 issues
│   ├─ Template count mismatch (consistency + accuracy)
│   ├─ Line numbers outdated (consistency + accuracy)
│   ├─ External link verification needed (accuracy + breaking)
│   └─ Version sync across files (all 3 agents)
└─ Unique insights:
    ├─ Accuracy: LinkedIn URL corruption (only this agent caught it)
    ├─ Consistency: TOC structure deviation (only this agent)
    └─ Breaking: Changelog format improvement suggestion (only this agent)

Metrics:
├─ Convergence rate: 4/30 = 13% (lower than target, but 4 criticals flagged by multiple agents = high confidence on what matters)
├─ Unique insights: 3/3 agents = 100% (each agent found unique issues in their domain)
├─ False positive rate: 2/30 = 6.6% (below 20% target ✅)
├─ Time saving: 4 min vs estimated 12 min sequential = 66% savings ✅
├─ Bug catch rate: 3 critical bugs caught that would've shipped = prevented production issues ✅

Verdict: ✅ High value for pre-release audits

How to Track Your Metrics

After each agent teams task:

Count findings: Note raw findings per agent, then deduplicate
Mark convergence: Which issues were flagged by 2+ agents?
Check unique insights: Did each agent find at least 1 domain-specific issue?
Verify false positives: How many findings were invalid/noise?
Time comparison: Agent teams duration vs estimated sequential time
Post-ship validation: Did agent teams catch bugs that would’ve shipped?

Keep a log (Markdown table in project docs):

| Date | Task | Agents | Duration | Findings | Convergence | Unique | False+ | Time Saved | Bugs Caught |
|------|------|--------|----------|----------|-------------|--------|--------|------------|-------------|
| 2026-02-08 | Pre-release v3.23.1 | 3 | 4m12s | 30 | 13% (4 critical) | 3/3 | 6.6% | 66% | 3 |

Adjust prompts if metrics fail:

Low convergence (<30%) → Scopes too narrow, overlap context boundaries more
No unique insights → Scopes too similar, diversify analysis angles
High false positives (>20%) → Prompts too vague, add concrete criteria

6. Limitations & Red Flags

What Agent Teams DON’T Do

Limitation	What It Means	Mitigation
3x tokens	Each agent = separate model call = 3x cost	Reserve for high-stakes tasks (pre-release, security PRs, not typo fixes)
Idle spam	Agents show “Idle” messages during coordination	Normal behavior, not a bug, ignore the spam
Experimental	Research preview = stability not guaranteed	Expect bugs, don’t rely on agent teams for production-critical workflows
Coordination overhead	3-5 agents max, not 10 (coordination complexity grows)	Stick to 2-4 agents, avoid “team of 10” prompts
Context isolation	Agents don’t see each other’s discoveries (work independently)	Claude synthesizes findings, but agents can’t build on each other’s work mid-task

Red Flags: When NOT to Use

❌ Simple task (<5 files, <100 lines, 1 domain)

Example: Fix typo in README.md
Why avoid: 3x tokens for 10-second task = waste

❌ Sequential workflow (step B depends on step A result)

Example: Implement feature → Write tests → Deploy
Why avoid: Agents work in parallel, can’t handle dependencies

❌ Budget tight (3x tokens, optimize cost)

Example: Personal project with limited API credits
Why avoid: Agent teams = luxury for high-stakes tasks, not daily workflow

❌ Write-heavy (many edits to same files)

Example: Refactor entire codebase structure
Why avoid: Merge conflicts, coordination overhead, agents stepping on each other

❌ Low-stakes review (internal PR, trusted contributor, simple change)

Example: Team member fixes small bug
Why avoid: Overkill, single-agent review faster + cheaper

When You SHOULD Use (despite costs)

✅ High-stakes (production release, security-sensitive, external contributor) ✅ Multi-domain (Rust + Security + Performance = blind spots) ✅ Parallel-friendly (independent tasks, no sequential dependencies) ✅ Consistency-critical (cross-file sync, counts, versions) ✅ Learning opportunity (understand blind spots, improve prompts)

Summary: Quick Reference

Setup (5 min once)

export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
claude
> Are agent teams enabled?  # Verify

When to Use (decision rule)

YES if:

High stakes + Multi-domain + Parallel-friendly

NO if:

Simple task + Sequential + Budget tight

Patterns (4 ready-to-use)

Pre-Release Review (Guide) → Accuracy + Consistency + Breaking
Landing Sync (Guide) → Guide scanner + Landing scanner
Multi-File Doc Update (Guide) → Content + Index + Consistency
Security PR Review (RTK) → Rust + Security + Performance

Metrics (track ROI)

Convergence: >50% (findings by 2+ agents)
Unique insights: Each agent ≥1
False positives: <20%
Time saving: 60-70%
Bug catch: >80%

Red Flags (avoid waste)

❌ Simple task (<5 files)
❌ Sequential workflow
❌ Budget tight
❌ Write-heavy (merge conflicts)

Next Steps

Try first test (5 min setup + simple 2-agent task)
Pick 1 pattern from your project (Guide or RTK)
Measure metrics (convergence, unique insights, time saved)
Adjust prompts based on results
Read full doc for advanced patterns: Agent Teams

Got questions? Check full documentation for:

Architecture deep-dive (how git coordination works)
Advanced use cases (15+ production scenarios)
Troubleshooting (common issues + solutions)
Best practices (team size, prompt design, conflict resolution)

Agent Teams Quick Start Guide

Agent Teams Quick Start Guide

What is This?

Table of Contents

1. 5-Minute Setup

Step 1: Prerequisites Check

Step 2: Enable Feature

Step 3: Verification

Step 4: First Test (2 agents)

2. Patterns for Your Projects

2.1 Claude Code Guide - Pre-Release Review

2.2 Claude Code Guide - Landing Sync

2.3 Claude Code Guide - Multi-File Doc Update

2.4 RTK - Security PR Review

3. Decision Matrix: When to Use

Rule of Thumb

4. Minimal Workflow Template

Bash Template (reusable)

Example Prompts (copy-paste ready)

Pre-Release Guide Audit

Security PR Review (RTK)

Multi-File Doc Update

Landing Sync Validation

5. Success Metrics

How to Measure Agent Teams ROI

Real Example: Pre-Release Review Test (2026-02-08)

How to Track Your Metrics

6. Limitations & Red Flags

What Agent Teams DON’T Do

Red Flags: When NOT to Use

When You SHOULD Use (despite costs)

Summary: Quick Reference

Setup (5 min once)

When to Use (decision rule)

Patterns (4 ready-to-use)

Metrics (track ROI)

Red Flags (avoid waste)

Next Steps