Skip to main content
Code Guide
DORA 2025 SPACE Framework 5-25 people

Pilot Your AI-Augmented Team

The metrics that actually matter when AI writes 70% of your code.

When Velocity Lies

When AI accelerates delivery, old benchmarks break. PRs per day goes up, which looks like progress — and masks quality and skill problems underneath. You need different sensors.

70–90%
AI-assisted code
Anthropic research, Aug 2025 ↗
DORA tiers
abandoned
DORA 2025 report ↗

4 Metric Categories

Each one covers a blind spot the others miss. Together they give you a complete picture.

Delivery Health

DORA

The baseline. Automate these first.

  • Deployment Frequency How often you ship to production
  • Lead Time for Changes Commit to production elapsed time
  • Change Failure Rate % of deploys causing incidents
  • MTTR Mean time to restore after failure

Quality Signal

Where AI hides its mistakes.

  • Bug Escape Rate Bugs found in prod per sprint
  • PR Review Comprehension % of PRs reviewed with genuine understanding
  • CI Speed P50/P90 Pipeline latency at median and 90th percentile

Product Impact

The layer most teams skip.

  • Time-to-Value Days from feature start to first user value
  • Feature Adoption (14-day) % of target users activating a feature within 2 weeks of release
  • CSAT on Key Features User satisfaction score on high-investment areas

Human Health

SPACE

What DORA doesn't see.

  • Developer Satisfaction Quarterly CSAT survey (5 questions, anonymous)
  • PR Review Time Avg hours from PR open to first review
  • Burnout Signals After-hours commits, PTO utilization, qualitative check-ins

Start Small, Scale Right

The right metrics depend on your team size. Don't track what you can't act on.

5-person team

  • Deployment Frequency
  • Cycle Time
  • Time-to-Value
  • Bugs in prod/month
  • Quarterly satisfaction
Recommended tooling GitHub Insights + spreadsheet
Full breakdown in guide →

25-person team

  • All 4 DORA (automated)
  • Cycle Time per squad
  • Bug Escape Rate
  • AI contribution %
  • Quarterly satisfaction
Recommended tooling LinearB or Faros.ai
Full breakdown in guide →

The 4-Question Test

For any metric you're considering tracking, run it through this checklist. Fewer than 3 "yes" answers? Drop it — it's noise, not signal.

1
Can you act on it in <2 weeks?
2
Does it explain WHY, not just WHAT?
3
Is it correlated to a business outcome?
4
Can it be measured automatically?
Rule: fewer than 3 yes answers means the metric is not worth tracking. Tracking too much costs attention — the one resource AI can't augment.

Read the full framework in the guide

Covers implementation playbook, tooling comparison, anti-patterns, and worked examples for squads already using Claude Code daily.