AI in Practice: What broke when the checking stopped

10th April 2026 | Insights & Case Studies

This week’s article is based on an interview with the founder of a mid-market logistics firm that had rolled out an AI agent to handle intake for new freight forwarding clients.

The agent’s job was straightforward: collect intake information from new clients, check it against internal templates, flag gaps, and generate a first draft of the setup documents. In the pilot it seemed reliable, with people still close enough to catch what it missed.

When they expanded the rollout, the checkpoint that had kept the pilot honest disappeared, and nobody noticed.

From the start, the escalation rule was wrong. When the agent hit a gap it couldn’t resolve, it was supposed to flag for human review. Instead, it marked items as reviewed and moved them forward. Nobody caught it because nobody was looking closely enough.

The issue surfaced six months after the rollout expanded, when an account manager noticed a setup document marked as reviewed at the exact moment it had been generated. The defect had been running for most of that period. The team had stopped checking every output before it moved to the next stage, and the escalation rule had been wrong since the pilot.

What the review had been catching

The discovery raised one question: whose job had it been to catch this?

Three people, asked separately, named someone else. The senior account manager thought the team lead had taken over the review, or assumed she had. The team lead thought it had been distributed across the account managers. The account managers thought the senior account manager was still doing it. Nobody had written down what review meant in production, who owned it, or what happened when something surfaced.

Which meant nobody could say what had actually happened. The ops lead pulled the source system data alongside the agent’s logs. The records looked clean until she compared them against the source system, where the timestamps told a different story: documents were being recorded as completed when they were generated, not when intake was actually finished. The trail was there. It just described a process that hadn’t happened.

In the cases aibl covers, the more common failure isn’t a pilot that never worked. It’s one that worked well enough that the team stopped checking, and a rollout that inherited the agent but not the conditions that made the pilot reliable.

What they rebuilt

Fixing the escalation rule came first. Without it, they had no way to tell which cases the agent had completed and which it had waved through going forward. Reconstructing the previous six months required that fix plus a comparison against the source system records.

That made the ownership question answerable. The checkpoint had disappeared when the rollout expanded, not because anyone removed it but because everyone assumed it was still there. They reinstated it: one person, clear scope, accountable when something surfaced. Not a new role, the same one the pilot had run on, made explicit. The timestamp fix came last and took longest, needing the systems team, but it only moved forward once someone was accountable for looking at what the trail actually showed.

The ops lead could reconstruct any client’s onboarding sequence from the record itself, rather than pulling people back through Slack threads and logs.

Stopping the review hadn’t felt like a decision. The pilot had gone well, the agent had earned trust, and expanding its autonomy felt like the next step. But the pilot had gone well partly because people were still close enough to catch the misses, often without naming them as misses. When the rollout expanded, that cover went with it. Nobody noticed the last safety layer was gone until the consequences had been running for months.

AI in Practice: What broke when the checking stopped

What the review had been catching

What they rebuilt

Our latest operator insights

The Building Blocks: Scaling AI in Regulated Markets with Colin Carmichael, BAIA

Most firms can build a demo, but few can own what comes next

AI in Practice: What broke when the checking stopped

Don’t miss all the latest practical AI insights