Why Your Agent Costs More Than You Think (And How to Fix It)

19th December 2025 | Playbooks Why Your Agent Costs More Than You Think (And How to Fix It)

A 500 employee SaaS company built an agent to handle refund requests and in the demo, it looked great checking the policy, drafting a reply and even knowing when to offer a discount.

Then the CFO saw the bill. Each request was costing about £1.50 in compute to recover roughly £5 in margin. Worse, around one in ten replies misinterpreted the policy and that was enough to shut down the project.

Most firms think about agents like software subscriptions, but they behave more like digital workers. Every step they take has a cost and their reasoning shows up on the bill.

Treat them like software and you won’t see the cost until it’s too late. If you treat them like headcount, you can ask a harder question. Is this role actually paying for itself?

The real aim is simpler. Build agents where machine thinking costs comfortably less than the human time spent on the same problem. That gap is where the value appears.

Three Ways Agents Drain Your Margins

  1. Call it the negative value agent. A task costs about £0.50 when a human handles it. The agent doing the same job racks up £0.75 once you add API calls, lookups, retries, and fixes. You’ve automated something, but you’ve made it more expensive.
  2. Then there’s the assumption that AI is basically free. Teams carry that belief from demos into production. It doesn’t hold. That £0.05 per task you saw in testing can easily become £0.50 once the edge cases, traffic, and guardrails show up.
  3. The worst cases are the ones no one notices at first. No one is watching closely, so costs leak out through token bloat. Expensive models get used for trivial steps because nobody has gone back to audit the workflow. Margins erode a few pennies at a time, and by the time it’s visible, the damage is already done.

The Trap: The CapEx Mindset

Treating agents as a one-off build cost is a common mistake.

Agents sit firmly in operational expenditure. Their costs rise with usage. As volume grows, spend grows with it. If the business doubles, agent costs double too, unless someone is actively paying attention. That’s how margins start to slip.

Without a clear P&L owner, costs drift. Tokens accumulate. Models get upgraded “just in case”. Features creep in without anyone checking what they do to the bill.

More dashboards don’t fix that. Ownership does. Every agent needs a named P&L owner. Not the engineer who built it. Not the data scientist tuning it. A business owner who notices margin movement and steps in early.

That’s the problem the next framework is built to address.

The Framework: The Agent Value Matrix

Before anything gets built, it forces two questions.

  1. How much cheaper is the agent than a human? (thinking cost)
  2. What happens if it fails? (risk of failure)

Every proposed agent should be placed on this matrix. The position determines whether you build, whether you require human oversight, or whether the work should be refused altogether.

Step 1: Calculate the Thinking Cost

Start with the human baseline.

Human costAgent cost
(Annual salary + benefits + overhead) ÷ annual task volume
Example: £45,000 ÷ 5,000 tasks = £9 per taskThen calculate the agent cost.
API calls + lookups + retries + error correction, plus a buffer
A 20% buffer is a sensible minimum
Example: £0.50 per taskFrom this, set a hurdle rate.

An agent needs to be at least 10x cheaper than a human to justify full automation. That level of spread creates enough room for edge cases, growth, and mistakes.

If the agent is less than 2x cheaper, the economics are fragile. Small changes in volume or complexity will wipe out the benefit.

Step 2: Plot the Task on the Matrix

Next, layer in risk. Ask one blunt question: if the agent fails, what breaks?

  • Low risk: the output can be fixed quickly with minimal consequence
  • High risk: failure leads to client loss, data exposure, or material harm
Low Risk of FailureHigh Risk of Failure 
High Arbitrage (>10x savings)Zone 1: The “No-Brainer” Build immediately. Let it run on auto-pilot.Zone 2: The “Tethered Agent” Build, but mandate human-in-the-loop.
Low Arbitrage (<2x savings)Zone 3: The “Distraction” Refuse. Maintenance outweighs value.Zone 4: The “Money Pit” Burn the project. High risk, low margin.

How to Use This Matrix

Zone 1 is where you scale. The savings are real and the downside is limited.

Zone 2 is where agents support humans, not replace them. You gain efficiency, but retain control.

Zone 3 looks tempting but rarely pays off. Ongoing maintenance consumes the margin.

Zone 4 should be rejected early. High risk and thin economics make failure likely.

In The Wild Example: The Logistics Refund Agent

Initial placement

The firm attempted to build a fully autonomous refund agent. It was treated internally as a ‘God Agent’ – checking policy, making decisions, and issuing refunds without review.

On the matrix, this sat squarely in Zone 4.

  • Arbitrage was low once compute, retries, and error handling were included
  • Risk was high, as mistakes directly affected customers and revenue
  • The agent hallucinated policy edge cases and relied on expensive models 

The outcome was predictable. The agent created real financial exposure, and the CFO shut the project down.

Revised placement

The fix wasn’t better prompts or a larger model. It was a change in design.

The agent was re-scoped into Zone 2.

  • The agent now retrieves data and drafts responses only
  • A human reviews and sends the final message
  • Expensive reasoning steps were removed

Result

  • Agent cost dropped from £0.50 to £0.05 per task
  • Human handling time fell from three minutes to thirty seconds
  • Risk was contained. Efficiency stayed intact.

The same agent became viable once it was placed in the right zone.

The Executive Takeaway

The first question isn’t whether an agent can do the task. It’s whether the P&L can live with the cost of failure. If not, you shouldn’t build it. The strength of an agent programme shows up in what you refuse, not what you deploy.

Latest actionable AI insights

How AI can power your growth

The Brand Constitution: How to Automate Brand Governance

The Brand Constitution: How to Automate Brand Governance

The Hidden Editing Tax A mid-market brand used an agent to generate 500 product descriptions, finishing the...

Read more
The Churn Hunter – Spotting Customers Who Are Leaving Before They Ghost

The Churn Hunter – Spotting Customers Who Are Leaving Before They Ghost

A Christmas Surprise During a December retention review, a mid-market IT provider found a problem hiding in...

Read more
Why Your Agent Costs More Than You Think (And How to Fix It)

Why Your Agent Costs More Than You Think (And How to Fix It)

A 500 employee SaaS company built an agent to handle refund requests and in the demo, it looked great checking...

Read more