Spec-First Development with Claude
Spec-First Development with Claude
Section titled “Spec-First Development with Claude”Confidence: Tier 2 — Validated by multiple production teams and aligns with official SDD guidance.
Define what you want in CLAUDE.md BEFORE asking Claude to build. One well-structured iteration equals 8 unstructured ones.
Table of Contents
Section titled “Table of Contents”- TL;DR
- The Pattern
- Task Granularity: Sizing Work for Agents
- CLAUDE.md Spec Templates
- Step-by-Step Workflow
- Integration with Tools
- When to Use
- Anti-Patterns
- See Also
1. Write spec in CLAUDE.md2. Claude reads spec automatically3. Implementation follows spec exactly4. Verify against specCLAUDE.md IS your spec file. Treat it as a contract.
The Pattern
Section titled “The Pattern”Spec-First Development inverts the typical AI coding flow:
Traditional: Spec-First:─────────── ──────────Prompt → Code Spec → Prompt → Code → Verify │ │ │ │ └─ Hope it's └── Contract └── Follows spec what you want defined └── Check against specThe spec becomes the source of truth that:
- Constrains what Claude builds
- Documents decisions for the team
- Enables verification of completeness
Task Granularity: Sizing Work for Agents
Section titled “Task Granularity: Sizing Work for Agents”Before writing the spec, verify the task is the right size. Agents work best with vertical slices — thin, end-to-end units that cut through all layers but implement exactly one complete user behavior (e.g. “password reset via email”, not “authentication system”).
Rule of thumb: One agent session = one vertical slice. If the task description requires “and” between two user behaviors, split it.
PRD Quality Checklist
Section titled “PRD Quality Checklist”Run this before handing any task to an agent. Six dimensions to verify:
| Dimension | Question to ask | Red flag |
|---|---|---|
| Problem Clarity | Is the problem statement unambiguous? | ”Improve performance” |
| Testable Criteria | Can completion be verified automatically? | ”Works well” |
| Scope Boundaries | What is explicitly OUT of scope? | Nothing listed as excluded |
| Observable Done | What does “done” look like to a user? | Internal-only description |
| Requirements Clarity | No implementation details in the spec? | ”Use Redis for caching” |
| Terminology | Same terms used throughout? | “user” and “account” mixed |
A task that fails 2+ dimensions needs rework before an agent touches it. The spec review catches ambiguity that will otherwise surface as incorrect implementation mid-session.
❌ Too big, ambiguous:"Add user authentication to the app"
✅ One vertical slice:"Users can log in with email + password.- POST /auth/login returns JWT on success, 401 on failure- Invalid credentials show 'Email or password incorrect' (not which is wrong)- Session expires after 24h- Out of scope: OAuth, password reset, remember me"CLAUDE.md Spec Templates
Section titled “CLAUDE.md Spec Templates”Feature Spec (Most Common)
Section titled “Feature Spec (Most Common)”## Feature: [Name]
### Description[2-3 sentences explaining the feature purpose]
### Capabilities- MUST: [Required functionality]- MUST: [Another requirement]- SHOULD: [Nice to have]- MUST NOT: [Explicit exclusions]
### Tech Stack- Required: [lib1, lib2, lib3]- Forbidden: [lib4, lib5]
### Acceptance Criteria- [ ] Criterion 1: [Specific, testable condition]- [ ] Criterion 2: [Another condition]- [ ] Criterion 3: [Edge case handling]
### API Contract (if applicable)- Endpoint: POST /api/[resource]- Request: { field1: string, field2: number }- Response: { id: string, created: timestamp }- Errors: 400 (validation), 404 (not found), 500 (server)Architecture Spec
Section titled “Architecture Spec”## Architecture: [Component Name]
### Purpose[Why this component exists]
### Boundaries- Owns: [What this component is responsible for]- Delegates to: [What other components handle]- Does NOT: [Explicit non-responsibilities]
### Dependencies- Upstream: [Components that call this]- Downstream: [Components this calls]
### Data FlowInput → Validation → Processing → Output │ │ └─ Errors ─────┘
### Constraints- Performance: [Response time, throughput]- Security: [Auth requirements, data handling]- Scalability: [Expected load, limits]API Spec
Section titled “API Spec”## API: [Endpoint Name]
### Endpoint`POST /api/v1/[resource]`
### AuthenticationBearer token required. Scopes: `read:resource`, `write:resource`
### Request```json{ "field1": "string (required, max 255 chars)", "field2": "number (optional, default: 0)", "nested": { "subfield": "boolean" }}Response
Section titled “Response”{ "id": "uuid", "created_at": "ISO 8601 timestamp", "data": { ... }}Error Codes
Section titled “Error Codes”| Code | Meaning | Response Body |
|---|---|---|
| 400 | Validation failed | { "errors": [...] } |
| 401 | Not authenticated | { "message": "..." } |
| 403 | Not authorized | { "message": "..." } |
| 404 | Resource not found | { "message": "..." } |
---
## Step-by-Step Workflow
### Step 1: Write the Spec
Before any implementation request, add spec to CLAUDE.md:
```markdown## Feature: User Authentication
### Capabilities- MUST: Email/password login- MUST: JWT token generation- MUST: Password hashing with bcrypt- SHOULD: Remember me functionality- MUST NOT: Store plain text passwords
### Tech Stack- Required: bcrypt, jsonwebtoken- Forbidden: passport.js (too heavy for this use case)
### Acceptance Criteria- [ ] User can login with valid credentials- [ ] Invalid credentials return 401- [ ] Token expires after 24h (or 7d with remember me)- [ ] Passwords hashed with cost factor 12Step 2: Reference Spec in Prompt
Section titled “Step 2: Reference Spec in Prompt”Implement the User Authentication feature as specified in CLAUDE.md.Follow the acceptance criteria exactly.Claude automatically reads CLAUDE.md and follows the spec.
Step 3: Verify Against Spec
Section titled “Step 3: Verify Against Spec”After implementation, verify:
Review the implementation against the User Authentication spec.Check off each acceptance criterion that's satisfied.List any gaps.Step 4: Update Spec if Needed
Section titled “Step 4: Update Spec if Needed”If requirements change during implementation:
Update the User Authentication spec to include:- MUST: Rate limiting (5 attempts per minute)Then implement the rate limiting.Integration with Tools
Section titled “Integration with Tools”With Spec Kit (Greenfield)
Section titled “With Spec Kit (Greenfield)”# Install Spec Kitnpx @anthropic/spec-kit init
# Use slash commands/speckit.constitution # Define project guardrails/speckit.specify # Write feature specs/speckit.plan # Create implementation plan/speckit.implement # Build from specWith OpenSpec (Brownfield)
Section titled “With OpenSpec (Brownfield)”# Install OpenSpecnpm install -g @fission-ai/openspec@latestopenspec init
# Use slash commands/openspec:proposal "Add dark mode" # Create change proposal/openspec:apply add-dark-mode # Implement changes/openspec:archive add-dark-mode # Merge to specsWith Plan Mode
Section titled “With Plan Mode”[Press Shift+Tab to enter Plan Mode]
I need to implement the Payment Processing feature.Review the spec in CLAUDE.md and create an implementation plan.When to Use
Section titled “When to Use”Use Spec-First
Section titled “Use Spec-First”| Scenario | Why |
|---|---|
| New features | Define before building |
| API design | Contract must be explicit |
| Architecture decisions | Document constraints |
| Team collaboration | Shared understanding |
| Complex requirements | Reduce ambiguity |
Skip Spec-First
Section titled “Skip Spec-First”| Scenario | Why |
|---|---|
| Quick fixes | Overhead not worth it |
| Exploration | Don’t know what you want yet |
| Prototyping | Requirements will change |
| Single-line changes | Obvious intent |
Anti-Patterns
Section titled “Anti-Patterns”Vague Specs
Section titled “Vague Specs”# Wrong## Feature: User Management- Handle users
# Right## Feature: User Management### Capabilities- MUST: Create user with email, password, name- MUST: Update user profile (name, avatar)- MUST: Soft delete (mark as inactive, don't remove data)- MUST NOT: Allow duplicate emailsSpec After Code
Section titled “Spec After Code”# Wrong workflow1. Ask Claude to implement feature2. Write spec documenting what was built
# Right workflow1. Write spec defining what should be built2. Ask Claude to implement from specIgnoring Forbidden
Section titled “Ignoring Forbidden”# Don't forget exclusions### Tech Stack- Required: React, TypeScript- Forbidden: jQuery, vanilla JS, class components ↑ These constraints prevent driftModular Spec Design
Section titled “Modular Spec Design”Pattern: Break large specifications into multiple focused files instead of cramming everything into a single CLAUDE.md.
The Problem: Monolithic CLAUDE.md
Section titled “The Problem: Monolithic CLAUDE.md”When specs exceed ~200 lines, several issues emerge:
- Context pollution: Claude struggles to extract relevant information from bloated context
- Cognitive overload: Developers can’t quickly scan for what they need
- Maintenance burden: Updating one area requires navigating unrelated sections
- Performance degradation: Large CLAUDE.md files slow down context loading and processing
When to Split
Section titled “When to Split”| Threshold | Action |
|---|---|
| <100 lines | Single CLAUDE.md is fine |
| 100-200 lines | Consider splitting if distinct domains exist |
| >200 lines | Split immediately — you’re past the cognitive load threshold |
| Multi-team projects | Split by domain/ownership regardless of size |
Split Strategies
Section titled “Split Strategies”1. Feature-Based Split
CLAUDE.md # Core project contextCLAUDE-auth.md # Authentication specCLAUDE-api.md # API endpoints specCLAUDE-billing.md # Payment processing spec2. Role-Based Split
CLAUDE.md # Shared conventionsCLAUDE-frontend.md # UI/UX specificationsCLAUDE-backend.md # API/database specsCLAUDE-infra.md # DevOps/deployment specs3. Workflow-Based Split
CLAUDE.md # Daily development rulesCLAUDE-testing.md # Test specificationsCLAUDE-release.md # Release process specCLAUDE-security.md # Security requirementsImplementation Pattern
Section titled “Implementation Pattern”Main CLAUDE.md (stays concise):
# Project: [NAME]
## Tech Stack[Core technologies]
## Commands[Daily commands]
## Rules[Universal rules]
## Detailed Specs- Authentication: See @CLAUDE-auth.md- API Design: See @CLAUDE-api.md- Testing: See @CLAUDE-testing.mdCLAUDE-auth.md (focused spec):
# Authentication Specification
## Capabilities- MUST: JWT-based authentication- MUST: Refresh token rotation- MUST NOT: Store tokens in localStorage
## API Contract[Detailed auth endpoints...]
## Security Requirements[Specific auth security rules...]Benefits:
- Claude can reference specific files with
@CLAUDE-auth.md - Faster context loading (only relevant specs)
- Easier maintenance (edit one domain without affecting others)
- Better team collaboration (ownership per spec file)
Source: Addy Osmani, “How to write a good spec for AI agents” (Jan 2026)
Operational Boundaries
Section titled “Operational Boundaries”Pattern: Define explicit boundaries for what AI agents should do automatically, ask about, or never touch.
The Three-Tier System
Section titled “The Three-Tier System”Traditional specs use binary constraints (MUST/MUST NOT), but operational work requires three levels:
| Tier | Meaning | Claude Code Mapping |
|---|---|---|
| Always | Execute automatically without asking | Auto-accept mode |
| Ask First | Get user confirmation before proceeding | Default mode |
| Never | Block or require Plan Mode | Plan mode / Hook blocking |
Operational Boundaries Template
Section titled “Operational Boundaries Template”## Boundaries
### Always (Auto-accept)- Run tests after code changes- Format code with Prettier- Update imports when moving files- Fix linting errors- Add type annotations for untyped code
### Ask First (Confirm)- Modify database schemas- Add new dependencies- Change API contracts- Refactor >50 lines of code- Update configuration files
### Never (Block)- Push to production branch- Commit secrets or API keys- Delete data without backup- Modify CI/CD workflows without review- Bypass security checksMapping to Claude Code Permissions
Section titled “Mapping to Claude Code Permissions”Always → Permission Allowlist:
// In .claude/settings.json{ "permissions": { "allow": [ "Bash(npm test*)", "Bash(npx prettier*)", "Bash(npx tsc*)" ] }}Ask First → Default Mode:
- Standard behavior, prompts for every action
- Use for actions with moderate risk/impact
Never → Plan Mode + Hooks:
# Hook configured via settings.json (PreToolUse event)#!/bin/bashif [[ "$TOOL_NAME" == "Bash" ]] && [[ "$INPUT" =~ "git push origin main" ]]; then echo "BLOCKED: Direct push to main blocked. Use feature branches." exit 2 # Send feedback to Claude (non-zero exit blocks the action)fiDecision Framework
Section titled “Decision Framework”Ask yourself for each action:
- Can it cause data loss? → Ask First or Never
- Is it reversible with git? → Maybe Always
- Does it affect other developers? → Ask First
- Is it a security risk? → Never
- Is it part of the standard workflow? → Always
Example: API Development
Section titled “Example: API Development”### Always- Run unit tests (npm test)- Validate request schemas- Generate API documentation- Check response formats
### Ask First- Add new API endpoints- Change existing endpoint signatures- Modify authentication requirements- Update rate limiting rules
### Never- Expose internal endpoints publicly- Log sensitive user data- Disable authentication checks- Remove rate limitingMaintenance
Section titled “Maintenance”Review boundaries quarterly:
- Promote: Actions that never caused issues (Ask First → Always)
- Demote: Actions that caused problems (Always → Ask First)
- Block: Repeated mistakes (Ask First → Never)
Source: Addy Osmani, “How to write a good spec for AI agents” (Jan 2026)
Command Spec Template
Section titled “Command Spec Template”Pattern: Document executable commands with expected outputs and error handling.
Why Command Specs Matter
Section titled “Why Command Specs Matter”Most specs focus on features (“build authentication”), but commands (“how to test authentication”) are equally critical for AI agents.
Template Structure
Section titled “Template Structure”## Commands
### [Command Category]
**Purpose**: [What this command accomplishes]
#### Command: `[actual command]`**When**: [Trigger condition]**Expected Output**: [What success looks like]**Error Handling**: [What to do on failure]**Flags**: [Important options]
---Example: Testing Commands
Section titled “Example: Testing Commands”## Commands
### Testing
#### Command: `pnpm test`**When**: Before every commit, after code changes**Expected Output**:- All tests pass (exit code 0)- Coverage ≥80% (lines, branches, functions)- No console warnings**Error Handling**:- If tests fail → Fix tests, don't skip- If coverage drops → Add tests for uncovered code- If warnings appear → Investigate before committing**Flags**:- `--coverage`: Generate coverage report- `--watch`: Run in watch mode for development- `--silent`: Suppress console output
#### Command: `pnpm test:e2e`**When**: Before merging to main, in CI pipeline**Expected Output**:- All E2E scenarios pass- Screenshots captured for failures- Test duration <5 minutes**Error Handling**:- If flaky → Investigate race conditions, don't retry blindly- If timeout → Check network mocks, async handling- If screenshots differ → Review UI changes deliberately**Flags**:- `--headed`: Run with visible browser (debugging)- `--project chromium`: Test specific browserExample: Build & Deployment
Section titled “Example: Build & Deployment”## Commands
### Build
#### Command: `pnpm build`**When**: Before deployment, in CI pipeline**Expected Output**:- Build succeeds (exit code 0)- Output in `dist/` directory- No TypeScript errors- Bundle size <500KB (main chunk)**Error Handling**:- If TypeScript errors → Fix types, don't use `@ts-ignore`- If bundle too large → Analyze with `pnpm analyze`, code-split- If missing assets → Check public/ directory, update paths**Flags**:- `--mode production`: Production optimizations- `--analyze`: Generate bundle size report
### Deployment
#### Command: `pnpm deploy:staging`**When**: After PR approval, before production**Expected Output**:- Deployment succeeds- Health check returns 200 OK- Staging URL: https://staging.example.com**Error Handling**:- If health check fails → Rollback automatically- If database migration fails → Don't proceed, investigate- If environment vars missing → Check .env.staging, update secrets**Never**: Run `pnpm deploy:production` manually — use CI/CD onlyExample: Database Commands
Section titled “Example: Database Commands”## Commands
### Database
#### Command: `pnpm db:migrate`**When**: After pulling schema changes, before development**Expected Output**:- Migrations applied successfully- Database schema matches models- Seed data loaded (development only)**Error Handling**:- If migration fails → Check database connection, review SQL- If conflicts detected → Resolve migrations, don't force**Never**: Run migrations in production manually — CI/CD only
#### Command: `pnpm db:reset`**When**: Development only, never in staging/production**Expected Output**:- Database dropped and recreated- All migrations applied- Seed data loaded**Error Handling**:- If production check fails → Abort immediately, verify environment**Safety**: Requires `NODE_ENV=development` checkIntegration with CLAUDE.md
Section titled “Integration with CLAUDE.md”Reference command specs in your main CLAUDE.md:
## Commands- Build: `pnpm build` (see spec for error handling)- Test: `pnpm test` (must pass before commit)- Deploy: See CLAUDE-deployment.md for full proceduresSource: Addy Osmani, “How to write a good spec for AI agents” (Jan 2026)
Anti-Pattern: Monolithic CLAUDE.md
Section titled “Anti-Pattern: Monolithic CLAUDE.md”The Problem
Section titled “The Problem”Symptom: Your CLAUDE.md has grown to 300+ lines, mixing feature specs, API contracts, testing requirements, deployment procedures, and team conventions.
Impact:
- Context inefficiency: Claude loads entire 300 lines for every request, even for simple tasks
- Slow response time: Large context = slower processing
- Reduced accuracy: Important details get lost in noise
- Maintenance overhead: Updating one section requires navigating unrelated content
- Team friction: Multiple developers editing same file = merge conflicts
Real-World Example
Section titled “Real-World Example”Before (monolithic):
# CLAUDE.md (387 lines)
## Tech Stack[20 lines]
## Authentication[45 lines of auth spec]
## API Endpoints[67 lines of API contracts]
## Database Schema[52 lines of schema rules]
## Testing[38 lines of test requirements]
## Deployment[41 lines of deployment procedures]
## Security Rules[55 lines of security requirements]
## Team Conventions[33 lines of coding standards]
## Git Workflow[28 lines of branching rules]
## Troubleshooting[8 lines of common issues]Problem: Claude loads all 387 lines even when user asks: “Add a new API endpoint for user profile”
After (modular):
CLAUDE.md (82 lines) # Core context: tech stack, commands, rulesCLAUDE-auth.md (45 lines) # Authentication spec onlyCLAUDE-api.md (67 lines) # API contracts onlyCLAUDE-database.md (52 lines) # Database schema onlyCLAUDE-testing.md (38 lines) # Test requirements onlyCLAUDE-deploy.md (41 lines) # Deployment procedures onlyCLAUDE-security.md (55 lines) # Security requirements onlyBenefit: Claude loads CLAUDE.md (82 lines) + CLAUDE-api.md (67 lines) = 149 lines (61% reduction)
Split Strategy
Section titled “Split Strategy”Step 1: Identify Domains
Look for natural boundaries in your spec:
- Do these sections serve different purposes?
- Would different team members own different sections?
- Are some sections referenced more frequently than others?
Step 2: Extract to Focused Files
Move domain-specific content to dedicated files:
# Keep in CLAUDE.md (always loaded)- Tech stack (unchanging baseline)- Daily commands (frequent reference)- Universal rules (apply to all work)
# Extract to domain files (load on demand)- Feature specs → CLAUDE-[feature].md- API contracts → CLAUDE-api.md- Testing → CLAUDE-testing.md- Deployment → CLAUDE-deploy.mdStep 3: Create Index in Main CLAUDE.md
# Project: [NAME]
## Tech Stack[Core technologies]
## Commands[Daily commands]
## Rules[Universal rules]
## Detailed SpecificationsReference these files for domain-specific requirements:- @CLAUDE-auth.md — Authentication & authorization- @CLAUDE-api.md — API endpoint contracts- @CLAUDE-database.md — Schema and migrations- @CLAUDE-testing.md — Test requirements- @CLAUDE-deploy.md — Deployment procedures- @CLAUDE-security.md — Security requirementsStep 4: Reference When Needed
Claude can reference specific files:
User: "Add a new API endpoint for user settings"Claude: Reads CLAUDE.md + @CLAUDE-api.md (relevant context only)Maintenance Rules
Section titled “Maintenance Rules”- Keep CLAUDE.md <100 lines (core context only)
- Domain files <150 lines each (if bigger, split further)
- Review quarterly: Merge rarely-used files, split frequently-updated sections
- Use @file references: Explicitly load what you need
Migration Checklist
Section titled “Migration Checklist”- Identify domains in current CLAUDE.md (>200 lines?)
- Create domain-specific files (CLAUDE-[domain].md)
- Move content to focused files
- Update main CLAUDE.md with index/references
- Test: Ask Claude to perform domain-specific task
- Verify: Check context usage with
/status - Document: Update team on new structure
Source: Addy Osmani, “How to write a good spec for AI agents” (Jan 2026)
See Also
Section titled “See Also”- ../methodologies.md — SDD and other methodologies
- Spec Kit Documentation
- OpenSpec Documentation
- tdd-with-claude.md — Combine with TDD
- Spec-to-Code Factory — Implémentation référence complète avec enforcement outillé (6 gates via Node.js, invariants “No Spec No Code” + “No Task No Commit”, ~900K tokens/projet)