Prompt caching is one of the highest-leverage optimizations available — when it hits, it cuts the cost of the cached portion by ~10× and latency by a meaningful chunk. The catch: hit rate is everything, and most teams design their prompts in ways that destroy hit rate.
Hit-rate killers
- User-specific data near the top of the prompt — pushes the static suffix out of the cache. Put user data at the END, after the cacheable parts.
- Timestamps in the system prompt — every minute is a cache miss. Generate the timestamp at call time and put it in a fresh user message, not the system prompt.
- A/B variants in the prompt — split traffic across cache lines. Either ship the winner or accept the cost.
- Tool definitions that change between calls — keep the tool catalog stable and reflect the situation through a "context" message instead.
Designing for cache from day one
Order your prompt: stable system prompt → stable tool definitions → cacheable context (knowledge base passages) → variable user input. Once you do this, hit rates of 80%+ are normal. Cost falls accordingly.
§ Further reading
- 01
Knowledge check
0/1 answered1. Which of these destroys prompt cache hit rate fastest?
Discussion
0 commentsBe the first to start the conversation.