Work/Software

B2B SaaS company

Name withheld

Cutting LLM spend 68% while improving answer quality

68%
reduction in LLM spend
2.1x
faster median response
+9 pts
eval accuracy improvement

The challenge

Every user request hit the most expensive frontier model with a bloated prompt assembled by string concatenation. There were no evals — so nobody could safely change anything, and costs scaled linearly with success. The team was one pricing change away from negative unit economics.

How we approached it

  1. 1

    Built an evaluation suite from real production traffic first — no optimization without a quality baseline.

  2. 2

    Introduced model routing: a fast, cheap model for the 70% of requests it handles well, escalating to frontier models only when needed.

  3. 3

    Restructured prompts for cache efficiency and cut token volume with retrieval that fetched only relevant context.

  4. 4

    Added cost observability per feature and per customer, making unit economics visible in the metrics stack.

The outcome

Spend dropped 68% while eval scores improved — the routing forced clearer prompts and better retrieval, which helped the big model too. The feature's unit economics went from a liability to a margin story the company now tells investors.

Start a project

Tell us what's broken. We'll tell you how we'd fix it.

Start with a conversation — thirty minutes, engineers on both ends of the call. If we're not the right team, we'll say so and point you somewhere better.