Cutting LLM spend 68% while improving answer quality

68%
reduction in LLM spend: 2.1x
faster median response: +9 pts
eval accuracy improvement

The challenge

Every user request hit the most expensive frontier model with a bloated prompt assembled by string concatenation. There were no evals — so nobody could safely change anything, and costs scaled linearly with success. The team was one pricing change away from negative unit economics.

How we approached it

1
Built an evaluation suite from real production traffic first — no optimization without a quality baseline.
2
Introduced model routing: a fast, cheap model for the 70% of requests it handles well, escalating to frontier models only when needed.
3
Restructured prompts for cache efficiency and cut token volume with retrieval that fetched only relevant context.
4
Added cost observability per feature and per customer, making unit economics visible in the metrics stack.

The outcome

Spend dropped 68% while eval scores improved — the routing forced clearer prompts and better retrieval, which helped the big model too. The feature's unit economics went from a liability to a margin story the company now tells investors.

→ Start a project

Tell us what's broken. We'll tell you how we'd fix it.

Start with a conversation — thirty minutes, engineers on both ends of the call. If we're not the right team, we'll say so and point you somewhere better.

Start a projectemail us