Skip to main content
Code Guide
DORA 2025 SPACE Framework 5-25 people

Pilot Your AI-Augmented Team

The metrics that actually matter when AI writes 70% of your code.

When Velocity Lies

When AI accelerates delivery, old benchmarks break. PRs per day goes up, which looks like progress, while masking quality and skill problems underneath. You need different sensors.

70–90%
AI-assisted code
Anthropic research, Aug 2025 ↗
DORA tiers
abandoned
DORA 2025 report ↗

4 Metric Categories

Each one covers a blind spot the others miss. Together they give you a complete picture.

Delivery Health

DORA

The baseline. Automate these first.

  • Deployment Frequency How often you ship to production
  • Lead Time for Changes Commit to production elapsed time
  • Change Failure Rate % of deploys causing incidents
  • MTTR Mean time to restore after failure

Quality Signal

Where AI hides its mistakes.

  • Bug Escape Rate Bugs found in prod per sprint
  • PR Review Comprehension % of PRs reviewed with genuine understanding
  • CI Speed P50/P90 Pipeline latency at median and 90th percentile

Product Impact

The layer most teams skip.

  • Time-to-Value Days from feature start to first user value
  • Feature Adoption (14-day) % of target users activating a feature within 2 weeks of release
  • CSAT on Key Features User satisfaction score on high-investment areas

Human Health

SPACE

What DORA doesn't see.

  • Developer Satisfaction Quarterly CSAT survey (5 questions, anonymous)
  • PR Review Time Avg hours from PR open to first review
  • Burnout Signals After-hours commits, PTO utilization, qualitative check-ins

Start Small, Scale Right

The right metrics depend on your team size. Don't track what you can't act on.

5-person team

  • Deployment Frequency
  • Cycle Time
  • Time-to-Value
  • Bugs in prod/month
  • Quarterly satisfaction
Recommended tooling GitHub Insights + spreadsheet
Full breakdown in guide →

25-person team

  • All 4 DORA (automated)
  • Cycle Time per squad
  • Bug Escape Rate
  • AI contribution %
  • Quarterly satisfaction
Recommended tooling LinearB or Faros.ai
Full breakdown in guide →

The 4-Question Test

For any metric you're considering tracking, run it through this checklist. Fewer than 3 "yes" answers? Drop it. It's noise, not signal.

1
Can you act on it in <2 weeks?
2
Does it explain WHY, not just WHAT?
3
Is it correlated to a business outcome?
4
Can it be measured automatically?
Rule: fewer than 3 yes answers means the metric is not worth tracking. Tracking too much costs attention, the one resource AI can't augment.

Read the full framework in the guide

Covers implementation playbook, tooling comparison, anti-patterns, and worked examples for squads already using Claude Code daily.