A 500 employee SaaS company built an agent to handle refund requests and in the demo, it looked great checking the policy, drafting a reply and even knowing when to offer a discount.
Then the CFO saw the bill. Each request was costing about £1.50 in compute to recover roughly £5 in margin. Worse, around one in ten replies misinterpreted the policy and that was enough to shut down the project.
Most firms think about agents like software subscriptions, but they behave more like digital workers. Every step they take has a cost and their reasoning shows up on the bill.
Treat them like software and you won’t see the cost until it’s too late. If you treat them like headcount, you can ask a harder question. Is this role actually paying for itself?
The real aim is simpler. Build agents where machine thinking costs comfortably less than the human time spent on the same problem. That gap is where the value appears.
Three Ways Agents Drain Your Margins
Call it the negative value agent. A task costs about £0.50 when a human handles it. The agent doing the same job racks up £0.75 once you add API calls, lookups, retries, and fixes. You’ve automated something, but you’ve made it more expensive.
Then there’s the assumption that AI is basically free. Teams carry that belief from demos into production. It doesn’t hold. That £0.05 per task you saw in testing can easily become £0.50 once the edge cases, traffic, and guardrails show up.
The worst cases are the ones no one notices at first. No one is watching closely, so costs leak out through token bloat. Expensive models get used for trivial steps because nobody has gone back to audit the workflow. Margins erode a few pennies at a time, and by the time it’s visible, the damage is already done.
The Trap: The CapEx Mindset
Treating agents as a one-off build cost is a common mistake.
Agents sit firmly in operational expenditure. Their costs rise with usage. As volume grows, spend grows with it. If the business doubles, agent costs double too, unless someone is actively paying attention. That’s how margins start to slip.
Without a clear P&L owner, costs drift. Tokens accumulate. Models get upgraded “just in case”. Features creep in without anyone checking what they do to the bill.
More dashboards don’t fix that. Ownership does. Every agent needs a named P&L owner. Not the engineer who built it. Not the data scientist tuning it. A business owner who notices margin movement and steps in early.
That’s the problem the next framework is built to address.
The Framework: The Agent Value Matrix
Before anything gets built, it forces two questions.
How much cheaper is the agent than a human? (thinking cost)
What happens if it fails? (risk of failure)
Every proposed agent should be placed on this matrix. The position determines whether you build, whether you require human oversight, or whether the work should be refused altogether.
API calls + lookups + retries + error correction, plus a buffer A 20% buffer is a sensible minimum Example: £0.50 per taskFrom this, set a hurdle rate.
An agent needs to be at least 10x cheaper than a human to justify full automation. That level of spread creates enough room for edge cases, growth, and mistakes.
If the agent is less than 2x cheaper, the economics are fragile. Small changes in volume or complexity will wipe out the benefit.
Step 2: Plot the Task on the Matrix
Next, layer in risk. Ask one blunt question: if the agent fails, what breaks?
Low risk: the output can be fixed quickly with minimal consequence
High risk: failure leads to client loss, data exposure, or material harm
Low Risk of Failure
High Risk of Failure
High Arbitrage (>10x savings)
Zone 1: The “No-Brainer” Build immediately. Let it run on auto-pilot.
Zone 2: The “Tethered Agent” Build, but mandate human-in-the-loop.
Low Arbitrage (<2x savings)
Zone 3: The “Distraction” Refuse. Maintenance outweighs value.
Zone 4: The “Money Pit” Burn the project. High risk, low margin.
How to Use This Matrix
Zone 1 is where you scale. The savings are real and the downside is limited.
Zone 2 is where agents support humans, not replace them. You gain efficiency, but retain control.
Zone 3 looks tempting but rarely pays off. Ongoing maintenance consumes the margin.
Zone 4 should be rejected early. High risk and thin economics make failure likely.
In The Wild Example: The Logistics Refund Agent
Initial placement
The firm attempted to build a fully autonomous refund agent. It was treated internally as a ‘God Agent’ – checking policy, making decisions, and issuing refunds without review.
On the matrix, this sat squarely in Zone 4.
Arbitrage was low once compute, retries, and error handling were included
Risk was high, as mistakes directly affected customers and revenue
The agent hallucinated policy edge cases and relied on expensive models
The outcome was predictable. The agent created real financial exposure, and the CFO shut the project down.
Revised placement
The fix wasn’t better prompts or a larger model. It was a change in design.
The agent was re-scoped into Zone 2.
The agent now retrieves data and drafts responses only
A human reviews and sends the final message
Expensive reasoning steps were removed
Result
Agent cost dropped from £0.50 to £0.05 per task
Human handling time fell from three minutes to thirty seconds
Risk was contained. Efficiency stayed intact.
The same agent became viable once it was placed in the right zone.
The Executive Takeaway
The first question isn’t whether an agent can do the task. It’s whether the P&L can live with the cost of failure. If not, you shouldn’t build it. The strength of an agent programme shows up in what you refuse, not what you deploy.