Context Engineering: Tools & Ecosystem
The context window is not storage — it's a budget of attention. This page maps the tools that help you spend it well.
From Prompt Engineering to Context Engineering
Prompt engineering optimizes one request. Context engineering optimizes the entire information architecture — what the model knows before any request begins.
When Claude generates generic output, ignores a convention, or hallucinates, the model is almost never broken. The context it received was incomplete, stale, or carrying too much noise. That reframe shifts troubleshooting from "the AI is bad at this" to "what is missing from the context?"
The tools on this page attack that problem from different angles: compressing what enters the context, filtering what shouldn't, routing intelligently, and measuring the results.
| Dimension | Prompt Engineering | Context Engineering |
|---|---|---|
| Question | How to phrase this? | What should the model know? |
| Scope | Single request | Full system |
| Data source | Static user input | RAG, memory, tool outputs |
| Goal | One good response | Reliable system at scale |
| Approach | Heuristic, artisanal | Algorithmic, systematic |
Three Concepts That Change How You Think
Before picking tools, these mental models determine whether you're solving the right problem.
Minimum Viable Context
MVCProvide exactly what the model needs — nothing more. Over-context degrades adherence as fast as under-context produces hallucinations.
Context Rot
As context length grows, models ignore information in the middle. Instructions at line 400 of a CLAUDE.md are followed ~60% as often as instructions at line 10.
Semantic Priming
Ultra-compressed context works not by verbatim recall, but by activating the model's pre-trained knowledge. 10 precise keywords can outperform 100 tokens of prose.
Six Tool Categories
Each category intercepts at a different point in the pipeline — from CLI output to API gateway to observability layer.
Output Compression
Filter CLI and tool output before it reaches the model
- RTK CLI proxy, 89% avg savings ★ 446
- Headroom Lossless, 70–95% on structured data
Prompt Compression
Reduce input tokens before sending to the LLM
- LLMLingua 20x compression, ~1.5% perf loss
- LLMLingua-2 Distillation-based, faster
AI Gateways
Routing, guardrails, and compression at the API layer
- Edgee Edge compression up to 50%
- Portkey 250+ LLMs, semantic caching
RAG Optimization
Improve retrieval quality and reduce retrieval noise
- Contextual Retrieval Anthropic — 67% fewer failures
- RAG Triad Evaluation: context / answer / groundedness
Memory Systems
Persist context across sessions without flooding the window
- ICM Infinite Context Memory, dual-layer
- /compact Native Claude Code — use at 70%
LLMOps
Trace, measure, and improve your AI pipelines
- Langfuse Open-source tracing, self-hostable
- Arize Phoenix RAG Triad evaluation specialist
Benchmarks Worth Knowing
These numbers come from published research and production data — not marketing copy.
Tool Selection by Use Case
Which tools matter depends on where you sit in the stack.
Claude Code developer
| Commands flooding context | RTK |
| Context growing too long | /compact at 70% |
| Rules ignored in large CLAUDE.md | Path-scoping |
| Session memory lost | ICM |
AI application builder
| Tool output JSON too verbose | Headroom |
| Prompts too long | LLMLingua |
| Multi-provider routing | Portkey |
| RAG chunks losing context | Contextual Retrieval |
| Guardrails + compression at edge | Edgee |
| RAG quality measurement | Arize Phoenix |
Self-hosted LLM deployer
| GPU memory fragmentation | vLLM (PagedAttention) |
| Shared-prefix cache reuse | SGLang (RadixAttention) |
| Repeated query caching | Redis semantic cache |
"Context engineering is the art of filling the context window with the right information at the right time."Andrej Karpathy
Three words carry the weight: right information (not all information), right time (not always-on for everything). The tools on this page are the mechanics behind making that "right" a system property rather than a manual judgment call.
Full Reference in the Guide
The guide covers every tool with install instructions, measured benchmarks, and when-to-choose-what guidance — including LLMOps tools (Langfuse, LangSmith, Arize Phoenix), KV cache infrastructure (vLLM, SGLang), and the full research landscape (SlimInfer, TopV, AttnComp).