Skip to content
Code Guide

Development Methodologies Reference

Confidence: Tier 2 — Validated by multiple production reports and official documentation.

Last updated: February 2026

This is a quick reference for 15 structured development methodologies that have emerged for AI-assisted development in 2025-2026. For hands-on practical workflows, see workflows/.


  1. Decision Tree
  2. The 15 Methodologies
  3. SDD Tools Reference
  4. Writing Effective Specs
  5. Combination Patterns
  6. Sources

┌─ "I want quality code" ────────────→ workflows/tdd-with-claude.md
├─ "I want to spec before code" ─────→ workflows/spec-first.md
├─ "I need to plan architecture" ────→ workflows/plan-driven.md
├─ "I'm iterating on something" ─────→ workflows/iterative-refinement.md
└─ "I need methodology theory" ──────→ Continue reading below

Where each methodology sits on two axes: Spec-First vs Code-First (Y) and Lean/Solo vs Enterprise/Governed (X).

SPEC / PLANNING FIRST
── lean · spec ── │ ── governed · spec ──
[Doc-Driven] [SDD] │ [BDD] [ATDD] [Req-Driven]
[GSD] [Plan-First] │ [CDD] [ADR-Driven] [DDD] [BMAD]
LEAN ─────────────────────────┼────────────────────────────────► ENTERPRISE
── lean · code ── │ ── governed · code ──
[Context Eng.] [TDD] │ [Multi-Agent]
[Prompt Eng.] [Iterative] │ [Eval-Driven] [FDD]
[Ralph Loop] │ [JiTTesting]
CODE / EMERGENT

How to read it:

  • Top-left — Spec-first lean: SDD, Doc-Driven, Plan-First. Natural entry point for solo devs and small teams moving away from “code first”.
  • Top-right — Spec-first governed: BMAD, Req-Driven, ATDD, DDD. Real governance, but costly to set up. ROI is driven by project complexity and requirement stability, not headcount alone.
  • Bottom-left — Code-first lean: the natural Claude Code terrain. TDD + Ralph Loop + Iterative = core solo workflow.
  • Bottom-right — Code-first at scale: Multi-Agent, Eval-Driven, JiTTesting (Meta, 100M+ LoC). Emerging patterns for high-volume teams.
  • On the axisPlan-First, CDD, ADR-Driven, GSD: hybrid approaches that adapt to any context.

Organized in a 6-tier pyramid from strategic orchestration down to optimization techniques.

NameWhatBest ForClaude Fit
BMADMulti-agent governance with constitution as guardrailHigh-complexity projects with stable requirements, compliance or governance needs⭐⭐ Niche but powerful
GSDMeta-prompting 6-phase workflow with fresh contexts per taskSolo devs, Claude Code CLI⭐⭐ Similar to patterns in guide

BMAD (Breakthrough Method for Agile AI-Driven Development) inverts the traditional paradigm: documentation becomes the source of truth, not code. Uses specialized agents (Analyst, PM, Architect, Developer, QA) orchestrated with strict governance. Note: BMAD’s role-based agent naming reflects their methodology; see §9.17 Agent Anti-Patterns for scope-focused alternatives.

  • Key concept: Constitution.md as strategic guardrail
  • When to use: Complex enterprise projects needing governance
  • When to avoid: MVPs, rapid prototyping, evolving requirements — BMAD is brittle when specs change mid-project

GSD (Get Shit Done) addresses context rot through systematic 6-phase workflow (Initialize → Discuss → Plan → Execute → Verify → Complete) with fresh 200k-token contexts per task. Core concepts (multi-agent orchestration, fresh context management) overlap significantly with existing patterns like Ralph Loop, Gas Town, and BMAD. See resource evaluation for detailed comparison.

Emerging: Ralph Inferno implements autonomous multi-persona workflows (Analyst→PM→UX→Architect→Business) with VM-based execution and self-correcting E2E loops. Experimental but interesting for “vibe coding at scale”.


Foundational Discipline: Plan-First Workflow

Section titled “Foundational Discipline: Plan-First Workflow”

“Once the plan is good, the code is good.” — Boris Cherny, creator of Claude Code

Not just a feature (/plan command) — a systematic discipline.

Context Engineering: Thoughtworks designates this broader approach “Context Engineering” in their Technology Radar (Nov 2025)1 — the systematic design of information provided to LLMs during inference. Three core techniques: context setup (minimal system prompts, few-shot examples), context management for long-horizon tasks (summarization, external memories, sub-agent architectures), and dynamic information retrieval (JIT context loading). Related patterns in Claude Code: AGENTS.md, MCP Context7, Plan Mode.

The Mental Model:

Planning isn’t optional for complex tasks. It’s the difference between:

  • ❌ 8 iterations of “try → fix → retry → fix again”
  • ✅ 1 iteration of “plan → validate → execute cleanly”

When to plan first:

Task ComplexityPlan First?Why
>3 files modified✅ YesCross-file dependencies need architecture
>50 lines changed✅ YesEnough complexity for mistakes
Architectural changes✅ YesImpact analysis required
Unfamiliar codebase✅ YesNeed exploration before action
Typo/obvious fix❌ NoPlanning overhead > task time
Single-line change❌ NoJust do it

How plan-first works:

  1. Exploration phase (Plan Mode via Shift+Tab):

    • Claude reads files, explores architecture
    • No edits allowed → forces thinking before action
    • Proposes approach with trade-offs
  2. Validation phase (you review):

    • Plan exposes assumptions and gaps
    • Easier to correct direction now vs after 100 lines written
    • Plan becomes contract for execution
  3. Execution phase (toggle back to Normal Mode with Shift+Tab):

    • Plan → code becomes mechanical translation
    • Fewer surprises, cleaner implementation
    • Faster overall despite “slower” start

Boris Cherny workflow:

“I run many sessions, start in plan mode, then switch into execution once the plan looks right. The signature upgrade is verification—giving Claude a way to test and confirm its own output.”

Benefits over “just start coding”:

  • Fewer correction iterations: Plan catches issues before they become code
  • Better architecture: Forced to think about structure first
  • Clearer communication: Plan is shared understanding with team/Claude
  • Reduced cost: One clean iteration < multiple messy iterations (even if plan phase costs tokens)

Integration with CLAUDE.md:

Document your team’s plan-first triggers:

## Planning Policy
- ALWAYS plan first: API changes, database migrations, new features
- OPTIONAL planning: Bug fixes <10 lines, test additions
- NEVER skip: Changes affecting >2 modules

See also: Plan Mode documentation for /plan command usage.

Advanced pattern: For an iterative annotation-based approach to plan-driven development, see Custom Markdown Plans (Boris Tane Pattern).


NameWhatBest ForClaude Fit
SDDSpecs before codeAPIs, contracts⭐⭐⭐ Core pattern
Doc-DrivenDocs = source of truthCross-team alignment⭐⭐⭐ CLAUDE.md native
Req-DrivenRich artifact context (20+ artifacts)Complex requirements⭐⭐ Heavy setup
DDDDomain language firstBusiness logic⭐⭐ Design-time

SDD (Spec-Driven Development) — Specifications BEFORE code. One well-structured iteration equals 8 unstructured ones. CLAUDE.md IS your spec file.

Doc-Driven Development — Living documentation versioned in git becomes the single source of truth. Changes to specs trigger implementation.

Requirements-Driven Development — Uses CLAUDE.md as comprehensive implementation guide with 20+ structured artifacts.

DDD (Domain-Driven Design) — Aligns software with business language through:

  • Ubiquitous Language: Shared vocabulary in code
  • Bounded Contexts: Isolated domain boundaries
  • Domain Distillation: Core vs Support vs Generic domains

NameWhatBest ForClaude Fit
BDDGiven-When-Then scenariosStakeholder collaboration⭐⭐⭐ Tests & specs
ATDDAcceptance criteria firstCompliance, regulated⭐⭐ Process-heavy
CDDAPI contracts as interfaceMicroservices⭐⭐⭐ OpenAPI native

BDD (Behavior-Driven Development) — Beyond testing: a collaboration process.

  1. Discovery: Involve devs and business experts
  2. Formulation: Write Given-When-Then examples
  3. Automation: Convert to executable tests (Gherkin/Cucumber)
Feature: Order Management
Scenario: Cannot buy without stock
Given product with 0 stock
When customer attempts purchase
Then system refuses with error message

ATDD (Acceptance Test-Driven Development) — Acceptance criteria defined BEFORE coding, collaboratively (“Three Amigos”: Business, Dev, Test).

In agentic development, ATDD is particularly effective because agents need unambiguous success conditions. The flow maps cleanly to agent tasks:

  1. Define acceptance criteria in Gherkin (human-readable, machine-executable)
  2. Agent writes failing tests based on scenarios (not implementation)
  3. Agent implements until tests pass
Feature: Password Reset
Scenario: User resets via email
Given a registered user with email "user@example.com"
When they request a password reset
Then they receive a reset email within 60 seconds
And the reset link expires after 24 hours

This Gherkin scenario is the contract between intent and implementation. The agent cannot misinterpret scope because done is defined before a line of code is written.

Applied to agents: Pass the Gherkin file to Claude Code before implementing. “Write failing tests for this feature file, then implement until they pass.” The scenario writer role (human or agent) forces explicit scope before execution starts.

CDD (Contract-Driven Development) — API contracts (OpenAPI specs) as executable interface between teams. Patterns: Contract as Test, Contract as Stub.

JiTTesting (Just-in-Time Testing) — Tests generated on-the-fly at PR submission, designed to fail, then discarded after merge. No maintenance cost, no test suite growth.

TDD/BDD/ATDD all assume the developer controls the pace of code authoring. Agentic development breaks that assumption: an agent can generate 200 lines per hour, faster than any human test-writing workflow can keep up with. JiTTests are the industrial response to that mismatch.

The mechanism: at PR time, an LLM infers the intent of the diff, generates code mutants (deliberately broken variants), writes tests that catch those mutants, runs ensemble rule-based and LLM assessors to filter false positives, and surfaces only real regressions to the engineer. The tests never land in the codebase.

Meta deployed this at scale (100M+ LoC): 4x improvement in catching regressions over traditional hardening tests, 70% reduction in human review load, 4 serious production failures prevented from 41 candidates reviewed.

No open-source implementation exists yet. You can approximate this today: before merging any agent-generated PR, prompt Claude with “generate tests that would catch regressions introduced by this diff specifically — I’ll run them locally and discard them after the PR closes.” The ephemeral framing focuses test generation on what actually changed rather than general coverage.

Reference: Just-in-Time Catching Test Generation at Meta — Harman, 2026.


NameWhatBest ForClaude Fit
FDDFeature-by-feature deliveryFeature teams with parallel delivery⭐⭐ Structure
Context Eng.Context as first-class designLong sessions⭐⭐⭐ Fundamental

FDD (Feature-Driven Development) — Five processes:

  1. Develop Overall Model
  2. Build Features List
  3. Plan by Feature
  4. Design by Feature
  5. Build by Feature

Strict iteration: 2 weeks max per feature.

Context Engineering — Treat context as design element:

  • Progressive Disclosure: Let agent discover incrementally
  • Memory Management: Conversation vs persistent memory
  • Dynamic Refresh: Rewrite TODO list before response

NameWhatBest ForClaude Fit
TDDRed-Green-RefactorQuality code⭐⭐⭐ Core workflow
Eval-DrivenEvals for LLM outputsAI products⭐⭐⭐ Agents
Multi-AgentOrchestrate sub-agentsComplex tasks⭐⭐⭐ Task tool

TDD (Test-Driven Development) — The classic cycle:

  1. Red: Write failing test
  2. Green: Minimal code to pass
  3. Refactor: Clean up, tests stay green

With Claude: Be explicit. “Write FAILING tests that don’t exist yet.”

Verification Loops — A formalized pattern for autonomous iteration (broader than TDD):

Core principle: Give Claude a mechanism to verify its own output.

Code generated → Verification tool → Feedback loop → Improvement

Why it works (Boris Cherny): “An agent that can ‘see’ what it has done produces better results.”

Verification mechanisms by domain:

DomainVerification ToolWhat Claude “Sees”
FrontendBrowser preview (live reload)Visual rendering, layout, interactions
BackendTests (unit/integration)Pass/fail status, error messages
TypesTypeScript compilerType errors, incompatibilities
StyleLinters (ESLint, Prettier)Style violations, formatting issues
PerformanceProfilers, benchmarksExecution time, memory usage
Accessibilityaxe-core, screen readersWCAG violations, navigation issues
SecurityStatic analyzers (Semgrep)Vulnerability patterns
UXUser testing, recordingsUsability problems, confusion points

TDD as canonical example:

  1. Claude writes tests for the feature
  2. Claude iterates code until tests pass
  3. Continue until explicit completion criteria met

Official guidance: “Tell Claude to keep going until all tests pass. It will usually take a few iterations.”Anthropic Best Practices

Implementation patterns:

  • Hooks: PostToolUse hook runs verification after each edit
  • Browser extension: Claude in Chrome sees rendered output
  • Test watchers: Jest/Vitest watch mode provides instant feedback
  • CI/CD gates: GitHub Actions runs full validation suite
  • Multi-Claude verification: One Claude codes, another reviews

Anti-pattern: Blind iteration without feedback. Without verification mechanism, Claude can’t converge toward correct solution—it guesses.

Eval-Driven Development — TDD for LLMs. Test agent behaviors via evals:

  • Code-based: output == golden_answer
  • LLM-based: Another Claude evaluates
  • Human grading: Reference, slow

Eval Harness — The infrastructure that runs evaluations end-to-end: providing instructions and tools, running tasks concurrently, recording steps, grading outputs, and aggregating results.

See Anthropic’s comprehensive guide: Demystifying Evals for AI Agents

Multi-Agent Orchestration — From single assistant to orchestrated team:

Meta-Agent (Orchestrator)
├── Analyst (requirements)
├── Architect (design)
├── Developer (code)
└── Reviewer (validation)

Pattern: Write plain English ADRs → Feed to implement-adr skill → Execute natively

Architecture Decision Records (ADRs) combined with Claude Code skills create a workflow where architectural decisions drive implementation directly.

Workflow Steps:

  1. Document decision in ADR format (context, decision, consequences)
  2. Create implementation skill (generic or implement-adr specialized)
  3. Feed ADR as prompt to skill with clear acceptance criteria
  4. Claude executes based on architectural guidance in ADR

Example ADR Template:

# ADR-001: Database Migration Strategy
## Context
Legacy MySQL schema needs migration to PostgreSQL for better JSON support.
## Decision
Use incremental dual-write pattern with feature flags.
## Consequences
- Positive: Zero-downtime migration
- Negative: Temporary code complexity during transition

Implementation Workflow:

Terminal window
# 1. Write ADR (plain English)
vim docs/adr/001-database-migration.md
# 2. Feed to implementation skill
/implement-adr docs/adr/001-database-migration.md
# 3. Claude executes based on ADR guidance
# → Creates migration scripts
# → Updates ORM configuration
# → Adds feature flags
# → Implements dual-write logic

Benefits:

  • Documentation-driven: Architecture and code stay synchronized
  • Native execution: No external frameworks needed
  • Traceable decisions: Clear audit trail from decision to implementation
  • Team alignment: ADRs communicate intent to both humans and AI

Source: Gur Sannikov embedded engineering workflow


NameWhatBest ForClaude Fit
Iterative LoopsAutonomous refinementOptimization⭐⭐⭐ Core
Fresh ContextReset per task, state in filesLong autonomous sessions⭐⭐⭐ Power users
Prompt EngineeringTechnique foundationEverything⭐⭐⭐ Prerequisite

Iterative Refinement Loops — Autonomous convergence:

  1. Execute prompt
  2. Observe result
  3. If result ≠ “DONE” → refine and repeat

Prompt Engineering — Foundations for ALL Claude usage:

  • Zero-Shot Chain of Thought: “Think step by step”
  • Few-Shot Learning: 2-3 examples of expected pattern
  • Structured Prompts: XML tags for organization
  • Position Matters: For long docs, place question at end

Fresh Context Pattern (Ralph Loop) — Solves context rot by spawning fresh agent instances per task. State persists in git + progress files, not chat history. Ideal for long autonomous sessions (migrations, overnight runs). See Ultimate Guide - Fresh Context Pattern for implementation.


Three tools have emerged to formalize Spec-Driven Development:

ToolUse CaseOfficial DocsClaude Integration
Spec KitGreenfield, governancegithub.blog/spec-kit/speckit.constitution, /speckit.specify, /speckit.plan
OpenSpecBrownfield, changesgithub.com/Fission-AI/OpenSpec/openspec:proposal, /openspec:apply, /openspec:archive
SpecmaticAPI contract testingspecmatic.ioMCP agent available
Spec-to-Code FactoryGreenfield, enforcement outillégithub.com/SylvainChabaud/spec-to-code-factoryImplémentation référence multi-agents (BREAK→MODEL→ACT→DEBRIEF)

5-phase workflow:

  1. Constitution: /speckit.constitution → guardrails
  2. Specify: /speckit.specify → requirements
  3. Plan: /speckit.plan → architecture
  4. Tasks: /speckit.tasks → decomposition
  5. Implement: /speckit.implement → code

Two-folder architecture:

openspec/
├── specs/ ← Current truth (stable)
└── changes/ ← Proposals (temporary)

Workflow: Proposal → Review → Apply → Archive

  • Contract as Test: Auto-generates 1000s of tests from OpenAPI spec
  • Contract as Stub: Mock server for parallel development
  • Backward Compatibility: Detects breaking changes

Based on analysis of 2,500+ agent configuration files. Source: Addy Osmani

ComponentWhat to IncludeExample
CommandsExecutable with flagsnpm test -- --coverage
TestingFramework, coverage, locationsvitest, 80%, tests/
Project structureExplicit directoriessrc/, lib/, tests/
Code styleOne example > paragraphsShow a real function
Git workflowBranch, commit, PR formatfeat/name, conventional commits
BoundariesPermission tiersSee below
TierSymbolUse For
Always doSafe actions, no approval (lint, format)
Ask first⚠️High-impact changes (delete, publish)
Never do🚫Hard stops (commit secrets, force push main)

⚠️ Research shows more instructions = worse adherence to each one.

Solution: Feed only relevant spec sections per task, not the entire document.

Project SizeApproach
Small (<10 files)Single spec file
Medium (10-50 files)Sectioned spec, feed per task
Large (50+ files)Sub-agent routing by domain

Recommended stacks by situation:

SituationRecommended StackNotes
Solo MVPSDD + TDDMinimal overhead, quality focus
Team 5-10, greenfieldSpec Kit + TDD + BDDGovernance + quality + collaboration
MicroservicesCDD + SpecmaticContract-first, parallel dev
Existing SaaS (100+ features)OpenSpec + BDDChange tracking, no spec drift
High-complexity / complianceBMAD + Spec Kit + SpecmaticFull governance + contracts
LLM-native productEval-Driven + Multi-AgentSelf-improving systems

MethodologyLevelPrimary FocusBest ContextLearning Curve
BMADOrchestrationGovernanceHigh complexity, stable requirementsHigh
SDDSpecificationContractsAnyMedium
Doc-DrivenSpecificationAlignmentAnyLow
Req-DrivenSpecificationContextComplex requirements, many artifactsMedium
DDDSpecificationDomainComplex business domainVery High
BDDBehaviorCollaborationMulti-role stakeholder involvementMedium
ATDDBehaviorComplianceRegulated, explicit acceptance criteriaMedium
CDDBehaviorAPIsService boundaries, parallel teamsMedium
FDDDeliveryFeaturesFeature teams, parallel deliveryMedium
Context Eng.DeliveryAI sessionsAnyLow
TDDImplementationQualityAnyLow
Eval-DrivenImplementationAI outputsAnyMedium
Multi-AgentImplementationComplexityAnyMedium
IterativeOptimizationRefinementAnyLow
Prompt Eng.OptimizationFoundationAnyVery Low

SDD & Spec-First

BMAD

TDD with AI

BDD & DDD

Context Engineering

Eval-Driven & Multi-Agent


  1. Thoughtworks Technology Radar Vol 33, Nov 2025. PDF. See also: Macro trends blog post.