- 68%
- reduction in LLM spend
- 2.1x
- faster median response
- +9 pts
- eval accuracy improvement
The challenge
Every user request hit the most expensive frontier model with a bloated prompt assembled by string concatenation. There were no evals — so nobody could safely change anything, and costs scaled linearly with success. The team was one pricing change away from negative unit economics.
How we approached it
- 1
Built an evaluation suite from real production traffic first — no optimization without a quality baseline.
- 2
Introduced model routing: a fast, cheap model for the 70% of requests it handles well, escalating to frontier models only when needed.
- 3
Restructured prompts for cache efficiency and cut token volume with retrieval that fetched only relevant context.
- 4
Added cost observability per feature and per customer, making unit economics visible in the metrics stack.
The outcome
Spend dropped 68% while eval scores improved — the routing forced clearer prompts and better retrieval, which helped the big model too. The feature's unit economics went from a liability to a margin story the company now tells investors.
→ Start a project
Tell us what's broken. We'll tell you how we'd fix it.
Start with a conversation — thirty minutes, engineers on both ends of the call. If we're not the right team, we'll say so and point you somewhere better.