Skip to main content
Code Guide
T21 Intermediate Technical

Fast Mode & API Breaking Changes

Fast mode and the major API changes you need to know

PDF
← All cards

Fast Mode

Fast Mode produces responses 2.5x faster in exchange for a 6x higher cost. The underlying model remains Opus 4.6, not a different model: this is priority resource allocation, not simplified reasoning.

ParameterStandardFast Mode
ModelOpus 4.6Opus 4.6
SpeedReference2.5x faster
Input price$5/MTok$30/MTok
Output price$25/MTok$150/MTok

In CLI: /fast activates the mode for the current session.

When to Use Fast Mode

Fast Mode is worthwhile when response time has direct business value: live demo, intensive pair programming, mechanical code generation in a tight loop. It is not a replacement for Sonnet on simple tasks — Sonnet remains 10x cheaper.

Relevant:

  • Repetitive boilerplate generation in an interactive session
  • Reformatting or large-scale code conversion
  • Demo context where visible latency impacts the experience

Not relevant:

  • Background or asynchronous tasks (speed does not matter)
  • Simple tasks covered by Sonnet or Haiku
  • Tight API budgets

Fast Mode via API (Opus 4.6)

response = client.messages.create(
model="claude-opus-4-6",
speed="fast",
extra_headers={
"anthropic-beta": "fast-mode-2026-02-01"
},
messages=[{"role": "user", "content": "..."}]
)

The beta header is mandatory. Without it, the speed parameter is silently ignored.

Breaking Change: assistant-prefill Removed

Opus 4.6 removed support for assistant prefill: the technique that allowed pre-filling Claude’s response to guide its output format.

# Before (no longer works on Opus 4.6)
messages=[
{"role": "user", "content": "Respond in JSON"},
{"role": "assistant", "content": "{"} # prefill
]
# After — use the system prompt
system="Always respond with valid JSON only."

Impact: any API pipeline using assistant-prefill on Opus 4.6 must migrate to explicit instructions in the system prompt or few-shot examples.

The effort Parameter (API)

The effort parameter replaces budget_tokens on Opus 4.6 for controlling reasoning depth.

output_config={"effort": "medium"} # low|medium|high|max

budget_tokens remains functional on Opus 4.5 but is deprecated on 4.6. Migrate to effort for new pipelines.

Summary of Opus 4.6 Changes

FeatureStatus
assistant-prefillRemoved
budget_tokensDeprecated (replaced by effort)
Fast ModeNew (speed: "fast")
Adaptive ThinkingNew (replaces opt-in thinking)

Enter your email to read the full card and get the complete PDF bundle.

All content is free and open-source. We just ask for your email.

PDF: