Right-sized models, prompt caching, agent routing. Driving run-cost down without sacrificing quality.
When to send to the small model, when to escalate, when to refuse.
Cache hit ratios in real traffic, and what kills the hit rate.
A run-cost model for a real system in your org, with two cost-reduction levers proven via eval.