Skip to content
Code Guide

4. Agents

What are Agents: Specialized AI personas for specific tasks (think “expert consultants”)

When to create one:

  • ✅ Task repeats often (security reviews, API design)
  • ✅ Requires specialized knowledge domain
  • ✅ Needs consistent behavior/tone
  • ❌ One-off tasks (just ask Claude directly)

Quick Start:

  1. Create .claude/agents/my-agent.md
  2. Add YAML frontmatter (name, description, tools, model)
  3. Write instructions
  4. Use: @my-agent "task description"

Popular agent types: Security auditor, Test generator, Code reviewer, API designer

Read this section if: You have repeating tasks or need domain expertise Skip if: All your tasks are one-off exploratory work


Reading time: 20 minutes Skill level: Week 1-2 Goal: Create specialized AI assistants

Agents are specialized sub-processes that Claude can delegate tasks to.

Without AgentsWith Agents
One Claude doing everythingSpecialized experts for each domain
Context gets clutteredEach agent has focused context
Generic responsesDomain-specific expertise
Manual tool selectionPre-configured tool access
Direct Prompt:
You: Review this code for security issues, focusing on OWASP Top 10,
checking for SQL injection, XSS, CSRF, and authentication vulnerabilities...
With Agent:
You: Use the security-reviewer agent to audit this code

The agent encapsulates all that expertise.

TypeSourceExample
Built-inClaude Code defaultExplore, Plan
CustomYour .claude/agents/Backend architect, Code reviewer

Agents are markdown files in .claude/agents/ with YAML frontmatter.

---
name: agent-name
description: Clear activation trigger (50-100 chars)
model: sonnet
tools: Read, Write, Edit, Bash, Grep, Glob
---
[Markdown instructions for the agent]

All official fields supported by Claude Code (source):

FieldRequiredDescription
nameKebab-case identifier
descriptionWhen to activate this agent (use “PROACTIVELY” for auto-invocation)
modelsonnet (default), opus, haiku, or inherit
toolsAllowed tools (comma-separated). Supports Task(agent_type) syntax to restrict spawnable subagents
disallowedToolsTools to deny, removed from inherited or specified list
permissionModedefault, acceptEdits, dontAsk, bypassPermissions, or plan
maxTurnsMaximum agentic turns before the subagent stops
skillsSkills to preload into agent context at startup (full content injected, not just available)
mcpServersMCP servers for this subagent — server name strings or inline configs
hooksLifecycle hooks scoped to this subagent (PreToolUse, PostToolUse, Stop)
memoryPersistent memory scope: user, project, or local
backgroundtrue to always run as a background task (default: false)
isolationworktree to run in a temporary git worktree (auto-cleaned if no changes)
colorCLI output color for visual distinction (e.g., green, magenta)

Memory scopes — choose based on how broadly the knowledge should apply:

ScopeStorageUse when
user~/.claude/agent-memory/<name>/Cross-project learning
project.claude/agent-memory/<name>/Project-specific, shareable via git
local.claude/agent-memory-local/<name>/Project-specific, not committed
ModelBest ForSpeedCost
haikuQuick tasks, simple changesFastLow
sonnetMost tasks (default)BalancedMedium
opusComplex reasoning, architectureSlowHigh

Copy this template to create your own agent:

---
name: your-agent-name
description: Use this agent when [specific trigger description]
model: sonnet
tools: Read, Write, Edit, Bash, Grep, Glob
skills: []
---
# Your Agent Name
## Role Definition
You are an expert in [domain]. Your responsibilities include:
- [Responsibility 1]
- [Responsibility 2]
- [Responsibility 3]
## Activation Triggers
Use this agent when:
- [Trigger 1]
- [Trigger 2]
- [Trigger 3]
## Methodology
When given a task, you should:
1. [Step 1]
2. [Step 2]
3. [Step 3]
4. [Step 4]
## Output Format
Your deliverables should include:
- [Output 1]
- [Output 2]
## Constraints
- [Constraint 1]
- [Constraint 2]
## Examples
### Example 1: [Scenario Name]
**User**: [Example prompt]
**Your approach**:
1. [What you do first]
2. [What you do next]
3. [Final output]
✅ Do❌ Don’t
Make agents specialistsCreate generalist agents
Define clear triggersUse vague descriptions
Include concrete examplesLeave activation ambiguous
Limit tool accessGive all tools to all agents
Compose via skillsDuplicate expertise

Good: An agent for each concern

backend-architect → API design, database, performance
security-reviewer → OWASP, auth, encryption
test-engineer → Test strategy, coverage, TDD

Bad: One agent for everything

full-stack-expert → Does everything (poorly)

Good description:

description: Use when designing APIs, reviewing database schemas, or optimizing backend performance

Bad description:

description: Backend stuff

Instead of duplicating knowledge:

# security-reviewer.md
skills:
- security-guardian # Inherits OWASP knowledge

Before deploying a custom agent, validate against these criteria:

Efficacy (Does it work?)

  • Tested on 3+ real use cases from your project
  • Output matches expected format consistently
  • Handles edge cases gracefully (empty input, errors, timeouts)
  • Integrates correctly with existing workflows

Efficiency (Is it cost-effective?)

  • <5000 tokens per typical execution
  • <30 seconds for standard tasks
  • Doesn’t duplicate work done by other agents/skills
  • Justifies its existence vs. native Claude capabilities

Security (Is it safe?)

  • Tools restricted to minimum necessary
  • No Bash access unless absolutely required
  • File access limited to relevant directories
  • No credentials or secrets in agent definition

Maintainability (Will it last?)

  • Clear, descriptive name and description
  • Explicit activation triggers documented
  • Examples show common usage patterns
  • Version compatibility noted if framework-dependent

💡 Rule of Three: If an agent doesn’t save significant time on at least 3 recurring tasks, it’s probably over-engineering. Start with skills, graduate to agents only when complexity demands it.

Automated audit: Run /audit-agents-skills for a comprehensive quality audit across all agents, skills, and commands. Scores each file on 16 criteria with weighted grading (32 points for agents/skills, 20 for commands). See examples/skills/audit-agents-skills/ for the full scoring methodology.

Subagents can run in the background without blocking the main session. This is useful for fire-and-forget tasks like running tests, linting, or notifications.

ModeBehaviorUse when
DefaultParent waits for agent outputNeed result before continuing
BackgroundAgent runs in parallel, parent continuesFire-and-forget (tests, linting, notifications)

Managing background agents:

Terminal window
# List running agents + kill overlay
ctrl+f # Opens agent manager overlay
# Cancel main thread only (background agents keep running)
ESC
ctrl+c
---
name: code-reviewer
description: Use for code quality reviews, security audits, and performance analysis
model: sonnet
tools: Read, Grep, Glob
skills:
- security-guardian
---
# Code Reviewer
## Scope Definition
Perform comprehensive code reviews with isolated context, focusing on:
- Code quality and maintainability
- Security best practices (OWASP Top 10)
- Performance optimization
- Test coverage analysis
Scope: Code review analysis only. Provide findings without implementing fixes.
## Activation Triggers
Use this agent when:
- Completing a feature before PR (need fresh eyes on code)
- Reviewing someone else's code (isolated review context)
- Auditing security-sensitive code (security-focused scope)
- Analyzing performance bottlenecks (performance-focused scope)
## Methodology
1. **Understand Context**: Read the code and understand its purpose
2. **Check Quality**: Evaluate readability, maintainability, DRY principles
3. **Security Scan**: Look for OWASP Top 10 vulnerabilities
4. **Performance Review**: Identify potential bottlenecks
5. **Provide Feedback**: Structured report with severity levels
## Output Format
### Code Review Report
**Summary**: [1-2 sentence overview]
**Critical Issues** (Must Fix):
- [Issue with file:line reference]
**Warnings** (Should Fix):
- [Issue with file:line reference]
**Suggestions** (Nice to Have):
- [Improvement opportunity]
**Positive Notes**:
- [What was done well]
---
name: debugger
description: Use when encountering errors, test failures, or unexpected behavior
model: sonnet
tools: Read, Bash, Grep, Glob
---
# Debugger
## Scope Definition
Perform systematic debugging with isolated context:
- Investigate root causes, not symptoms
- Use evidence-based debugging approach
- Verify rather than assume (always review output—LLMs can make mistakes)
Scope: Debugging analysis only. Focus on root cause identification without context pollution from previous debugging attempts.
## Methodology
1. **Reproduce**: Confirm the issue exists
2. **Isolate**: Narrow down to smallest reproducible case
3. **Analyze**: Read code, check logs, trace execution
4. **Hypothesize**: Form theories about the cause
5. **Test**: Verify hypothesis with minimal changes
6. **Fix**: Implement the solution
7. **Verify**: Confirm fix works and doesn't break other things
## Output Format
### Debug Report
**Issue**: [Description]
**Root Cause**: [What's actually wrong]
**Evidence**: [How you know]
**Fix**: [What to change]
**Verification**: [How to confirm it works]
---
name: backend-architect
description: Use for API design, database optimization, and system architecture decisions
model: opus
tools: Read, Write, Edit, Bash, Grep
skills:
- backend-patterns
---
# Backend Architect
## Scope Definition
Analyze backend architecture with isolated context, focusing on:
- API design (REST, GraphQL, tRPC)
- Database modeling and optimization
- System scalability
- Clean architecture patterns
Scope: Backend architecture analysis only. Focus on design decisions without frontend or DevOps considerations.
## Activation Triggers
Use this agent when:
- Designing new API endpoints (need architecture-focused analysis)
- Optimizing database queries (database scope isolation)
- Planning system architecture (system design scope)
- Refactoring backend code (backend-only scope)
## Methodology
1. **Requirements Analysis**: Understand the business need
2. **Architecture Review**: Check current system state
3. **Design Options**: Propose 2-3 approaches with trade-offs
4. **Recommendation**: Suggest best approach with rationale
5. **Implementation Plan**: Break down into actionable steps
## Constraints
- Follow existing project patterns
- Prioritize backward compatibility
- Consider performance implications
- Document architectural decisions

The description field determines when Claude auto-activates your agent. Optimize it like SEO:

# ❌ Bad description
description: Reviews code
# ✅ Good description (Tool SEO)
description: |
Security code reviewer - use PROACTIVELY when:
- Reviewing authentication/authorization code
- Analyzing API endpoints
- Checking input validation
- Auditing data handling
Triggers: security, auth, vulnerability, OWASP, injection

Tool SEO Techniques:

  1. “use PROACTIVELY”: Encourages automatic activation
  2. Explicit triggers: Keywords that trigger the agent
  3. Listed contexts: When the agent is relevant
  4. Short nicknames: sec-1, perf-a, doc-gen
CategoryTokensInit TimeOptimal Use
Lightweight<3K<1sFrequent tasks, workers
Medium10-15K2-3sAnalysis, reviews
Heavy25K+5-10sArchitecture, full audits

Golden Rule: A lightweight agent used 100x > A heavy agent used 10x

Launch 7 scope-focused sub-agents in parallel for complete features:

┌─────────────────────────────────────────────────────────────┐
│ PARALLEL FEATURE IMPLEMENTATION │
│ │
│ Task 1: Components → Create React components │
│ Task 2: Styles → Generate Tailwind styles │
│ Task 3: Tests → Write unit tests │
│ Task 4: Types → Define TypeScript types │
│ Task 5: Hooks → Create custom hooks │
│ Task 6: Integration → Connect with API/state │
│ Task 7: Config → Update configurations │
│ │
│ All in parallel → Final consolidation │
└─────────────────────────────────────────────────────────────┘

Example Prompt:

Implement the "User Profile" feature using 7 parallel sub-agents:
1. COMPONENTS: Create UserProfile.tsx, UserAvatar.tsx, UserStats.tsx
2. STYLES: Define Tailwind classes in a styles file
3. TESTS: Write tests for each component
4. TYPES: Create types in types/user-profile.ts
5. HOOKS: Create useUserProfile and useUserStats hooks
6. INTEGRATION: Connect with existing tRPC router
7. CONFIG: Update exports and routing
Launch all agents in parallel.

Concept: Multi-perspective analysis in parallel.

Process:

┌─────────────────────────────────────────────────────────────┐
│ SPLIT ROLE ANALYSIS │
│ │
│ Step 1: Setup │
│ └─ Activate Plan Mode (thinking enabled by default) │
│ │
│ Step 2: Role Suggestion │
│ └─ "What expert roles would analyze this code?" │
│ Claude suggests: Security, Performance, UX, etc. │
│ │
│ Step 3: Selection │
│ └─ "Use: Security Expert, Senior Dev, Code Reviewer" │
│ │
│ Step 4: Parallel Analysis │
│ ├─ Security Agent: [Vulnerability analysis] │
│ ├─ Senior Agent: [Architecture analysis] │
│ └─ Reviewer Agent: [Readability analysis] │
│ │
│ Step 5: Consolidation │
│ └─ Synthesize 3 reports into recommendations │
└─────────────────────────────────────────────────────────────┘

Code Review Prompt (scope-focused):

Analyze this PR with isolated scopes:
1. Architecture Scope: Design patterns, SOLID principles, modularity
2. Security Scope: Vulnerabilities, injection risks, auth/authz flaws
3. Performance Scope: Database queries, algorithmic complexity, caching
4. Maintainability Scope: Code clarity, documentation, naming conventions
5. Testing Scope: Test coverage, edge cases, testability
Context: src/**, tests/**, only files changed in PR

UX Review Prompt (scope-focused):

Evaluate this interface with isolated scopes:
1. Visual Design Scope: Consistency with design system, spacing, typography
2. Usability Scope: Discoverability, user flow, cognitive load
3. Efficiency Scope: Keyboard shortcuts, power user features, quick actions
4. Accessibility Scope: WCAG 2.1 AA compliance, screen reader, keyboard nav
5. Responsive Scope: Mobile breakpoints, touch targets, viewport handling
Context: src/components/**, styles/**, only UI-related files

Production Example: Multi-Agent Code Review (Pat Cullen, Jan 2026):

Scope-focused agents for comprehensive PR review:

  1. Consistency Scope: Duplicate logic, pattern violations, DRY compliance (context: full PR diff)
  2. SOLID Scope: SRP violations, nested conditionals (>3 levels), cyclomatic complexity >10 (context: changed classes/functions)
  3. Defensive Code Scope: Silent catches, swallowed exceptions, hidden fallbacks (context: error handling code)

Key patterns (beyond generic Split Role):

  • Pre-flight check: git log --oneline -10 | grep "Co-Authored-By: Claude" to detect follow-up passes and avoid repeating suggestions
  • Anti-hallucination: Use Grep/Glob to verify patterns before recommending them (occurrence rule: >10 = established, <3 = not established)
  • Reconciliation: Prioritize existing project patterns over ideal patterns, skip suggestions with documented reasoning
  • Severity classification: 🔴 Must Fix (blockers) / 🟡 Should Fix (improvements) / 🟢 Can Skip (nice-to-haves)
  • Convergence loop: Review → fix → re-review → repeat (max 3 iterations) until only optional improvements remain

Production safeguards:

  • Read full file context (not just diff lines)
  • Conditional context loading based on diff content (DB queries → check indexes, API routes → check auth middleware)
  • Protected files skip list (package.json, migrations, .env)
  • Quality gates: tsc && lint validation before each iteration

Source: Pat Cullen’s Final Review Implementation: See /review-pr advanced section, examples/agents/code-reviewer.md, guide/workflows/iterative-refinement.md (Review Auto-Correction Loop)

The guide lists “roleplaying expertise personas” as a bad reason to use agents (see §3.x, When NOT to use agents). Named Perspective Agents are a different pattern and should not be confused with it.

The distinction:

PatternWhat it isProblem
Persona roleplay (anti-pattern)“You are a senior backend developer with 10 years of experience”Generic role, adds nothing over a good prompt
Named Perspective”Review from DHH’s perspective”Encodes a specific, recognizable set of engineering opinions

A Named Perspective Agent uses a well-known engineering name as a compressed prompt. Naming an agent “DHH” bundles the following without spelling it out: fat models, thin controllers, REST conventions over configuration, skepticism of premature abstraction, Rails pragmatism. The name is a shortcut to a distinct opinionated style, not a costume.

When it works: Only for engineers whose views Claude has been trained on and whose opinions map to a stable, recognizable style. DHH (Rails), Kent Beck (TDD, simplicity), Martin Fowler (refactoring, patterns) are good candidates. Random names are not.

Example (from Every.to compound-engineering plugin):

---
name: dhh-reviewer
description: Review code from DHH's perspective. Prioritize Rails conventions, fat models, thin controllers, pragmatic REST, and skepticism of unnecessary abstraction.
allowed-tools: Read, Grep
---

The agent’s value is in surfacing a coherent perspective that might disagree with your default approach, not in simulating a person.

Caveat: Named Perspective Agents can drift as Claude’s training evolves. Treat the name as a convenient shorthand, not a guarantee that the agent will track a real person’s current opinions.

Source: Every.to compound-engineering plugin (2026)

┌─────────────────────────────────────────────────────────────┐
│ PARALLELIZABLE? │
│ │
│ Non-destructive Destructive │
│ (read-only) (write) │
│ │
│ Independent ✅ PARALLEL ⚠️ SEQUENTIAL │
│ Max efficiency Plan Mode first │
│ │
│ Dependent ⚠️ SEQUENTIAL ❌ CAREFUL │
│ Order matters Risk of conflicts │
│ │
└─────────────────────────────────────────────────────────────┘

✅ Perfectly parallelizable:

"Search 8 different GitHub repos for best practices on X"
"Analyze these 5 files for vulnerabilities (without modifying)"
"Compare 4 libraries and produce a comparative report"

⚠️ Sequential recommended:

"Refactor these 3 files (they depend on each other)"
"Migrate DB schema then update models then update routers"

❌ Needs extra care:

"Modify these 10 files in parallel"
→ Risk: conflicts if files share imports/exports
→ Solution: Plan Mode → Identify dependencies → Sequence if needed
┌─────────────────────────────────────────────────────────────┐
│ ORCHESTRATION PATTERN │
│ │
│ ┌──────────────┐ │
│ │ Sonnet 4.5 │ │
│ │ Orchestrator │ │
│ └──────┬───────┘ │
│ │ │
│ ┌────────────┼────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Haiku │ │ Haiku │ │ Haiku │ │
│ │ Worker1 │ │ Worker2 │ │ Worker3 │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ └────────────┼────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Sonnet 4.5 │ │
│ │ Validator │ │
│ └──────────────┘ │
│ │
│ Cost: 2-2.5x cheaper than Opus everywhere │
│ Quality: Equivalent for most common tasks │
└─────────────────────────────────────────────────────────────┘

See Section 2.5 Model Selection & Thinking Guide for the canonical decision table with effort levels and cost estimates.

Cost Optimization Example:

Scenario: Refactoring 100 files
❌ Naive approach:
- Opus for everything
- Cost: ~$50-100
- Time: 2-3h
✅ Optimized approach:
- Sonnet: Analysis and plan (1x)
- Haiku: Parallel workers (100x)
- Sonnet: Final validation (1x)
- Cost: ~$5-15
- Time: 1h (parallelized)
Estimated savings: significant (varies by project)

An agent that updates its own skills after each execution. Instead of manually maintaining documentation, the agent reads the current state of its domain and rewrites the knowledge injected into itself.

When to use: Long-lived agents whose domain evolves — presentation editors, API clients tracking schema changes, agents managing living documents.

Core mechanism (in agent system prompt):

### Step N: Self-Evolution (after every execution)
After completing your main task, update your preloaded skills to stay in sync:
1. Read the current state of [the domain you modified]
2. Update `.claude/skills/<your-skill>/SKILL.md` to reflect reality
3. Log what changed and why in a "## Learnings" section of this agent file
This prevents knowledge drift between what you know and what is.

Full example — a presentation curator agent that keeps its own layout/weight knowledge fresh:

---
name: presentation-curator
description: PROACTIVELY use when updating slides, structure, or weights
tools: Read, Write, Edit, Grep, Glob
model: sonnet
color: magenta
skills:
- presentation/slide-structure
- presentation/styling
---
## Step 5: Self-Evolution (after every execution)
Read presentation/index.html and update your skills:
- slide-structure skill: update section ranges, weight table, slide count
- styling skill: update CSS patterns if new ones were introduced
- Append new findings to the "## Learnings" section below
## Learnings
_Each run appends findings here. Future invocations start informed._
- Slide badges are JS-injected — never hardcode them in HTML.

Why it works: The skills: frontmatter injects skill content at agent startup. By writing back to those files after each run, the agent’s next invocation starts with current knowledge. No human maintenance required.

Key constraints:

  • Scope updates narrowly — only update what actually changed
  • Keep a ## Learnings log so the agent builds cumulative knowledge over sessions
  • Pair with memory: project for cross-session persistence of broader context

Quick jump: Two Kinds of Skills · Understanding Skills · Creating Skills · Skill Lifecycle · Skill Evals · Skill Template · Skill Examples


Note (January 2026): Skills and Commands are being unified. Both now use the same invocation mechanism (/skill-name or /command-name), share YAML frontmatter syntax, and can be triggered identically. The conceptual distinction (skills = knowledge modules, commands = workflow templates) remains useful for organization, but technically they’re converging. Create new ones based on purpose, not mechanism.


Reading time: 20 minutes Skill level: Week 2 Goal: Create, test, and manage reusable knowledge modules