Plan-Validate-Execute Pipeline
Plan-Validate-Execute Pipeline
Section titled “Plan-Validate-Execute Pipeline”Confidence: Tier 2 — Battle-tested by production teams shipping AI-first products at scale. Extends native
/planmode with structured agent orchestration and institutional memory.
A complete development workflow in 3 commands: plan with a dynamic research team, validate with independent specialist reviewers, execute with parallel agents. Each run improves the next through an ADR learning loop that progressively reduces human interruptions.
Reading time: ~25 min Prerequisites: Sub-agents, Task tool, worktrees, basic ADR concepts Related: Plan-Driven Development, Agent Teams, Spec-First
Table of Contents
Section titled “Table of Contents”- TL;DR
- Philosophy
- The Three Commands
- Dynamic Agent Pool
- ADR Learning Loop
- CLAUDE.md Discipline
- Context Management
- When to Use
- Cost Profile
- See Also
/plan-start → 5-phase planning: PRD analysis + dynamic research team + ADRs/plan-validate → 2-layer review: structural checks + trigger-based specialist agents/plan-execute → worktree + TDD + parallel execution + PR + merge + cleanupWhat makes this different from /plan mode:
- Research is done by specialized agents in parallel, not one agent sequentially
- Validation is independent from planning (no confirmation bias)
- Every significant decision generates an ADR that auto-resolves future decisions
- Execution spawns per-task agents in a git worktree, commits per task, handles everything through to merged PR
Run /clear between each command to reset context and avoid compacting overhead.
Philosophy
Section titled “Philosophy”Non-Prescriptive AI-First
Section titled “Non-Prescriptive AI-First”Tell Claude what to achieve, never how to implement it. The moment you prescribe implementation details, you’re using your knowledge as a ceiling instead of Claude’s as a floor.
A good opening prompt for a new project:
How should I use you most effectively to build this platform?Let Claude propose the architecture. Your job is to validate decisions, not dictate them.
No Bandaids, No Workarounds
Section titled “No Bandaids, No Workarounds”Hard rule for every agent in the pipeline:
We build state-of-the-art software. Always choose the best-in-class architecture, the most robust pattern, and the industry-standard approach.
Build time and effort are irrelevant to architectural decisions. Never factor implementation complexity into option assessments. The right solution is always the best solution.
Enforcement checklist (apply before implementing):
- Am I using a backward-compatibility flag, shim, or legacy mode?
- Would a new project following current official docs do it this way?
- Am I porting old patterns instead of learning the new ones?
- Am I patching one component when the fix belongs at the system level?
If any answer is yes: stop, fix at the correct level.
Why Independent Validation?
Section titled “Why Independent Validation?”Validators that didn’t write the plan are not anchored to its assumptions. Research shows multi-agent review with adversarial framing catches significantly more issues than self-review. The average plan produces ~18 issues when challenged by an independent team — ~95% auto-resolve from existing ADRs and first principles.
The Three Commands
Section titled “The Three Commands”/plan-start — 5-Phase Planning
Section titled “/plan-start — 5-Phase Planning”Phase 1: PRD & Design Analysis (interactive, no agents)
Read the PRD and surface issues in 3 buckets before any agent work:
- Missing requirements (unclear acceptance criteria, unspecified edge cases)
- Ambiguous requirements (multiple valid interpretations)
- Compliance concerns (security, data privacy, API contracts)
Present options with pros/cons, record decisions. Skip for non-PRD work (refactors, infra, bug fixes).
If UI changes are in scope, extend to design analysis: screen inventory, state catalog (empty/loading/populated/error), interaction specs, animation patterns, accessibility (ARIA), design token updates.
Phase 2: Technical Analysis (1-2 Explore agents + interactive)
Check existing ADRs and PATTERNS.md first. If 3+ confirmed ADRs match the decision → auto-resolve without asking. Otherwise:
- Spawn Explore agents for targeted codebase research
- Present architecture decisions with options and recommendations
- Create ADR documents for significant decisions (see ADR Learning Loop)
- Update PATTERNS.md
Phase 3: Scope Assessment (automatic + user approval)
Apply trigger rules against the agent pool (see Dynamic Agent Pool). Present the proposed team with justification for each agent. User can add or remove agents before research starts.
| Tier | Agent Count | Label |
|---|---|---|
| 0 | 0 | Solo (inline research) |
| 1 | 1-3 | Focused |
| 2 | 4-6 | Standard |
| 3 | 7-9 | Comprehensive |
| 4 | 10+ | Full Spectrum |
Phase 4: Research & Plan Creation (dynamic team)
- Tier 0: inline research, no agents
- Tier 1+: spawn approved agents in parallel (background), lead monitors via TaskOutput loop
planning-coordinator(Opus) synthesizes all reports into final plan- Commit plan file, ADRs, and creation artifacts
Output: docs/plans/plan-{name}.md + docs/adr/ADR-XXXX.md + docs/plans/metrics/{name}.json
Auto-transition: no unresolved ambiguity → auto-start /plan-validate
/plan-validate — 2-Layer Validation
Section titled “/plan-validate — 2-Layer Validation”Layer 1: Structural (inline, instant)
Mechanical checks that don’t need agents:
- Plan format and completeness (all required sections present)
- Task ordering and dependency chain (no circular deps)
- File existence checks (files listed for modification actually exist)
- ADR consistency (plan aligns with its ADRs)
- CLAUDE.md rule compliance
Layer 2: Specialist Review (trigger-based, 0-8 agents)
| Agent | Trigger | Model |
|---|---|---|
security-reviewer | Auth, payments, PII, RBAC, new APIs | Opus |
db-migration-reviewer | New tables, columns, indexes, migrations | Opus |
performance-reviewer | New resolvers, queries, routes, new deps | Sonnet |
design-system-reviewer | New UI components, visual styling | Sonnet |
ux-reviewer | New pages, forms, interactions | Sonnet |
cross-platform-reviewer | Web + mobile, or shared packages | Sonnet |
native-app-reviewer | Mobile screens, native UI packages | Sonnet |
integration-reviewer | New services, libraries, OTEL config | Opus |
No agents selected automatically for trivial plans. A payments feature might trigger 4+.
Auto-Fix Phase
Every issue must be resolved — no skipping. Triage:
- Issues matching existing ADR decisions → auto-resolve
- Issues matching confirmed PATTERNS.md entries → auto-resolve
- Issues resolvable from first principles → auto-resolve
- Residual → human decision → new rule → auto-resolved next time
Auto-transition: all issues resolved → auto-start /plan-execute
/plan-execute — Execution to Merged PR
Section titled “/plan-execute — Execution to Merged PR”Single command handles everything:
- Worktree creation — isolated branch from current branch
- TDD scaffolding — write failing tests first for TDD-marked tasks
- Level-based parallel execution — detect independent tasks, spawn per-task agents, commit per task
- Drift detection — flag if implementation diverges from plan
- Quality gate — parallel tests + integration smoke test (GraphQL probe, container log scan, plan-defined smoke commands)
- Pre-PR docs update — PRD reconciliation + plan archival (in worktree)
- PR creation and merge — squash merge, clean commit message
- Post-merge metrics — execution data committed to metrics file
- Worktree cleanup — remove branch and worktree
If quality gate fails: up to 3 auto-fix attempts by dedicated debug agents. Still failing → notify human.
Dynamic Agent Pool
Section titled “Dynamic Agent Pool”Agents are not hardcoded in CLAUDE.md. They are defined at invocation time — description, trigger criteria, and model selection embedded in the plan phase where they’re spawned. This keeps CLAUDE.md lightweight while giving each agent full context for its role.
Research Pool (/plan-start)
Section titled “Research Pool (/plan-start)”| Agent | Trigger | Model |
|---|---|---|
code-explorer | Always | Sonnet |
arch-researcher | Multi-layer changes (2+ layers) | Sonnet |
database-analyst | Any DB schema changes | Sonnet |
security-analyst | Auth, payments, PII, RBAC | Opus |
test-analyzer | Non-trivial feature | Sonnet |
cross-platform-specialist | Mobile parity needed | Sonnet |
native-app-specialist | Tasks touch mobile/UI | Sonnet |
design-system-researcher | UI changes in scope | Sonnet |
dependency-researcher | New packages being added | Sonnet |
devops-specialist | Docker, env vars, CI/CD | Sonnet |
integration-researcher | New services, libraries, OTEL | Opus |
planning-coordinator | Always (when 2+ agents) | Opus |
Key design choices:
- Opus only for high-stakes roles (security, integration, coordination)
- Sonnet for standard research (good quality, lower cost)
planning-coordinatoronly spawned when 2+ agents are selected — it synthesizes, it doesn’t research
Validation Pool (/plan-validate)
Section titled “Validation Pool (/plan-validate)”See Layer 2 table above. These are different agents from the research pool — validators are not biased by the creation process.
ADR Learning Loop
Section titled “ADR Learning Loop”Every significant architectural decision generates an ADR. Over time, ADRs compound into institutional memory that reduces human interruptions.
What Triggers an ADR
Section titled “What Triggers an ADR”Always create an ADR for:
- Choice between multiple valid interaction patterns (overlay vs page, drawer vs modal)
- New animation or transition patterns not in existing conventions
- Platform divergence decisions (web vs mobile behavior)
- Loading state strategy that introduces a new pattern
- Auth strategy, DB schema approach, service boundary decisions
- New dependency selections with architectural implications
Do NOT create an ADR for:
- Decisions dictated by an approved source or existing convention
- Minor layout choices within established patterns
- Obvious state catalog entries (standard empty/loading/error states)
Maturity Levels
Section titled “Maturity Levels”1 ADR → Watching — tracked, not yet prescriptive2 ADRs → Emerging — presented as recommended default with precedent context3+ ADRs → Confirmed — auto-resolved during planning (no human input needed) → Candidate for promotion to CLAUDE.md as hard ruleThe Loop
Section titled “The Loop”/plan-start Phase 2 → ADR created ↓ PATTERNS.md updated ↓/adr-review (periodic) → Detect patterns across ADRs ↓ Propose CLAUDE.md promotions ↓Future /plan-start ← Confirmed rules auto-resolve decisionsRun /adr-review every 10-15 plans to batch-analyze patterns and propose CLAUDE.md additions.
The compounding effect: a project with 20 plans auto-resolves ~80% of architecture decisions. Human input focuses on genuinely novel decisions only.
CLAUDE.md Discipline
Section titled “CLAUDE.md Discipline”Hard Limit: 120 Lines
Section titled “Hard Limit: 120 Lines”Every line in CLAUDE.md costs context on every request. A 300-line CLAUDE.md is overhead that runs before every single prompt. Set and enforce a hard limit.
What earns a line in CLAUDE.md:
- First principles (hard rules that override agent preferences)
- Confirmed ADR patterns (3+ occurrences)
- Project-specific conventions that agents cannot infer from the codebase
- Pointers to sub-files
What should NOT be in CLAUDE.md:
- Full content of design systems, env configs, architecture docs
- Rules that apply to <10% of tasks
- Explanations and rationale (write those in ADRs)
Pointer Strategy
Section titled “Pointer Strategy”Instead of loading all context into CLAUDE.md, use pointers:
## Context Files (load only when relevant)- @docs/DESIGN_SYSTEM.md — when UI changes are in scope- @docs/ARCHITECTURE.md — when service boundaries are touched- @docs/ENV_CONFIG.md — when Docker or env vars are modified- @docs/ADR_PATTERNS.md — during planning phasesAgents load only what their task requires. A backend task never loads the design system. Context stays clean.
Regular Trimming
Section titled “Regular Trimming”Review CLAUDE.md every 10-15 plans alongside /adr-review. Promote confirmed patterns, remove rules that have become obvious through codebase conventions, trim anything that hasn’t been referenced.
Context Management
Section titled “Context Management”/clear Between Steps
Section titled “/clear Between Steps”Run /clear between /plan-start, /plan-validate, and /plan-execute. Each command is self-contained — the plan file on disk is the handoff artifact, not in-memory context.
Without /clear: context accumulates across all phases, compacting triggers earlier, agents inherit irrelevant context from previous phases, and token costs increase significantly.
Why This Works
Section titled “Why This Works”Each command reads its inputs from disk (plan files, ADRs, codebase). There’s no state that needs to live in the context window between steps. The discipline of clearing between steps is what makes the pipeline scale to large projects without hitting context limits mid-execution.
When to Use
Section titled “When to Use”✅ Use This Pipeline When
Section titled “✅ Use This Pipeline When”- Feature requires multiple files and layers (API + DB + UI)
- Security-sensitive changes (auth, payments, PII)
- Complex DB migrations
- New external service integrations
- Anything where a planning mistake would be expensive to undo
- Team projects where decision history matters
❌ Don’t Use When
Section titled “❌ Don’t Use When”- Typo fix, trivial refactor (use standard
/planmode) - Exploratory prototyping where requirements are unknown
- Hotfix under time pressure (use dual-instance planning instead)
- Changes touching ≤2 files with no architectural decisions
⚡ Tier 0 Shortcut
Section titled “⚡ Tier 0 Shortcut”For small but non-trivial changes, still run the pipeline but the system will detect Tier 0 scope and skip agent spawning — research happens inline, validation is Layer 1 only, execution is single-agent. Same commands, lower overhead.
Cost Profile
Section titled “Cost Profile”| Phase | Cost Driver | Approximate Range |
|---|---|---|
/plan-start Tier 0 | Inline research only | $0.10-0.30 |
/plan-start Tier 1-2 | 2-6 Sonnet agents | $0.50-2.00 |
/plan-start Tier 3 | 7+ agents + Opus coordinator | $2.00-8.00 |
/plan-validate | 0-8 agents | $0.20-3.00 |
/plan-execute | Per-task agents + quality gate | $0.50-5.00 |
| Typical feature (Tier 2) | Full pipeline | $2-10 |
Cost compounds as ADR coverage grows: fewer agents needed, fewer validation issues, faster execution.
Use /plan-metrics periodically to review historical cost trends and calibrate estimates.
See Also
Section titled “See Also”- Plan-Driven Development — native
/planmode, lighter alternative - Dual-Instance Planning — simpler 2-instance pattern
- Agent Teams — native parallel coordination (experimental)
- Task Management — Tasks API for cross-session coordination
- Spec-First Development — CLAUDE.md as specification contract
- ADR Writer Agent — standalone ADR generation
- Plan Challenger Agent — adversarial plan review