Field notes

Your AI reporting tool is making up your client's numbers

25 April 2026

ai-reportingagency-opsdata-qualityworkflow-design

On this page

Last month I looked at a monthly performance report an agency had generated with an AI reporting tool. The report cited a benchmark: "Industry average CTR for search campaigns in this sector is 4.2%." The campaign had a 1.9% CTR. The benchmark was in there to contextualise why results were below expectations.

No source. No date. No campaign type specified. I checked. The number does not exist anywhere I can verify. The tool invented it, the report included it, and the account manager sent it.

The term "hallucination" makes it sound exotic. It is not. It is an AI tool filling a gap in its input with a plausible-sounding number, citation, or comparison because it was asked to generate a complete report and completeness is what it optimises for.

In practice, I have seen three failure modes appear repeatedly once agencies move AI into reporting workflows.

Invented benchmarks. The tool is asked to contextualise performance. It does not have verified benchmark data for your client's niche. It produces one anyway: a tidy figure, often with one decimal place for credibility. "Email open rates in the property sector average 31.4%." Said who? Based on what period? The tool does not know. It generated something that fits.

Phantom competitor comparisons. "Competitor A is running approximately 12 campaigns on Meta." How does the tool know this? It does not. It inferred, guessed, or recalled something from training data that may be two years old. This lands in the report as fact.

Fabricated citations. The tool adds a source to sound authoritative. The source either does not exist or does not say what the tool claims. I have seen a report cite a HubSpot study with a specific year and statistic that, when I looked for the actual study, was nowhere in HubSpot's published research.

None of these are edge cases anymore. They are the normal failure output of AI tools used without a structured verification step. This is the same upstream problem I covered in why most AI pilots fail before they ship. Workflows that generate output without a defined gate produce work nobody can trust.

Most agencies running AI-generated reports have some version of a review. An account manager reads it before it sends. A strategist glances at the numbers. Someone checks the formatting.

That is not verification. That is proofreading with extra confidence.

Checking means reading for coherence: does this make sense? Verifying means tracing a claim to a source: where did this number actually come from?

The distinction matters because AI-generated content is coherent almost by design. It reads well. It sounds authoritative. It uses the right vocabulary for the industry. The things that will catch you out are the invisible ones: the benchmark with no source, the competitor claim based on inference, the percentage that sounds plausible but is not tied to any real data.

A human reading for sense will miss all of those. A human checking against a source will catch them.

"Someone will catch it" is how you describe a hope, not a process. A process has a named owner, a specific check, and evidence that it happened.

The risk here is not just a bad report. It is a client who does their own research, finds the benchmark is invented, and asks why your agency is putting unverified data in a document they are paying for. That conversation is hard to recover from.

The verification gate — where it sits in your reporting workflow.

A verification gate is not a full audit of every sentence. That would take longer than building the report manually. It is a specific set of checks targeted at the categories of claims that AI tools routinely get wrong.

For AI-generated client reports, I use a gate with four checks:

Check	What it covers	How to verify
Benchmarks	Any industry average, sector comparison, or "typical" figure	Source must be named and linkable. No source, no benchmark
Competitor claims	Any specific activity attributed to a named brand	Remove or clearly label as estimate unless sourced from a live tool
Statistics with citations	Any percentage or number attributed to a named study	Pull the original source and confirm the figure is there
Platform data match	Campaign metrics cited in the report	Cross-check against the raw platform export, not just the AI summary

That last check catches a different problem: AI tools that pull from platform APIs can misread or misaggregate data. Clicks for one campaign period appearing under another. ROAS calculated on spend that does not match the dashboard. These are not hallucinations in the traditional sense. They are data handling errors. They belong in the same gate.

The gate should be a named task on the report workflow, assigned to a specific person, with a sign-off field. Not a note that says "please review." A checkbox with a name next to it and a timestamp.

The instinct is to put the verification gate at the final step before sending. That is the wrong place.

By the time a report is formatted and assembled, there is real pressure to get it out. The client is expecting it. The account manager has a deadline. A last-minute gate feels like friction at exactly the wrong moment, so it gets compressed. The check becomes a skim, the skim becomes a send.

The gate needs to sit at the draft stage, before the report moves into layout or presentation. That means:

AI generates the draft
Verification gate runs: benchmarks sourced or removed, competitor claims flagged, platform data cross-checked
Amended draft moves to formatting
Final review is for formatting and tone, not data accuracy

This structure separates the jobs. Data verification is not a quick task you do while reading for flow. It requires switching to a different mental mode: source-checking, not sense-checking. Mixing the two means neither happens properly.

Building the gate into the draft stage also means the account manager who runs the check has time to go back to the AI tool or the data source and fix the report rather than simply removing the offending claim. The output is better, not just safer.

Agencies that have been running AI in their reporting workflows for twelve to eighteen months are hitting the same problem from different angles: the output is fast, the output is coherent, and the output is sometimes wrong in ways that are not immediately visible.

The solution is not to stop using AI tools. The economics of manual reporting do not make sense at scale. The solution is to treat the AI as a drafter, not a publisher, and to build the human check into the workflow at the structural level, not as a cultural expectation that relies on individuals being thorough under deadline pressure.

That means a named owner for verification. A documented gate with specific checks. A sign-off trail that exists somewhere other than the account manager's memory.

Agencies that build this now are the ones that will still have client trust in twelve months. The ones that treat speed as the only metric will eventually send something a client can disprove with a thirty-second search. That is the conversation that ends retainers.

If your agency is at that crossroads, our AI Operations Support service is designed for exactly this: keeping AI workflows reliable after they go live, with named ownership and documented verification gates. The related problem upstream (vague briefs that produce vague output, faster) is covered in the brief is the bottleneck.

Fast and unverified is not a workflow. It is a liability that has not been invoiced yet.

Related field notes

25 Apr 2026

Your AI reporting tool is making up your client's numbers

Related field notes

The brief is the bottleneck. AI can't fix bad intake

AML checks are now a workflow problem, not a paperwork problem

The AI WhatsApp reply that almost cost an estate agent a complaint