Field notes
Your AI reporting tool is making up your client's numbers
25 April 2026
On this page
Last month I looked at a monthly performance report an agency had generated with an AI reporting tool. The report cited a benchmark: "Industry average CTR for search campaigns in this sector is 4.2%." The campaign had a 1.9% CTR. The benchmark was in there to contextualise why results were below expectations.
No source. No date. No campaign type specified. I checked. The number does not exist anywhere I can verify. The tool invented it, the report included it, and the account manager sent it.
The term "hallucination" makes it sound exotic. It is not. It is an AI tool filling a gap in its input with a plausible-sounding number, citation, or comparison because it was asked to generate a complete report and completeness is what it optimises for.
In practice, I have seen three failure modes appear repeatedly once agencies move AI into reporting workflows.
Invented benchmarks. The tool is asked to contextualise performance. It does not have verified benchmark data for your client's niche. It produces one anyway: a tidy figure, often with one decimal place for credibility. "Email open rates in the property sector average 31.4%." Said who? Based on what period? The tool does not know. It generated something that fits.
Phantom competitor comparisons. "Competitor A is running approximately 12 campaigns on Meta." How does the tool know this? It does not. It inferred, guessed, or recalled something from training data that may be two years old. This lands in the report as fact.
Fabricated citations. The tool adds a source to sound authoritative. The source either does not exist or does not say what the tool claims. I have seen a report cite a HubSpot study with a specific year and statistic that, when I looked for the actual study, was nowhere in HubSpot's published research.
None of these are edge cases anymore. They are the normal failure output of AI tools used without a structured verification step. This is the same upstream problem I covered in why most AI pilots fail before they ship. Workflows that generate output without a defined gate produce work nobody can trust.
Most agencies running AI-generated reports have some version of a review. An account manager reads it before it sends. A strategist glances at the numbers. Someone checks the formatting.
That is not verification. That is proofreading with extra confidence.
Checking means reading for coherence: does this make sense? Verifying means tracing a claim to a source: where did this number actually come from?
The distinction matters because AI-generated content is coherent almost by design. It reads well. It sounds authoritative. It uses the right vocabulary for the industry. The things that will catch you out are the invisible ones: the benchmark with no source, the competitor claim based on inference, the percentage that sounds plausible but is not tied to any real data.
A human reading for sense will miss all of those. A human checking against a source will catch them.
"Someone will catch it" is how you describe a hope, not a process. A process has a named owner, a specific check, and evidence that it happened.
The risk here is not just a bad report. It is a client who does their own research, finds the benchmark is invented, and asks why your agency is putting unverified data in a document they are paying for. That conversation is hard to recover from.
A verification gate is not a full audit of every sentence. That would take longer than building the report manually. It is a specific set of checks targeted at the categories of claims that AI tools routinely get wrong.
For AI-generated client reports, I use a gate with four checks:
| Check | What it covers | How to verify |
|---|---|---|
| Benchmarks | Any industry average, sector comparison, or "typical" figure | Source must be named and linkable. No source, no benchmark |
| Competitor claims | Any specific activity attributed to a named brand | Remove or clearly label as estimate unless sourced from a live tool |
| Statistics with citations | Any percentage or number attributed to a named study | Pull the original source and confirm the figure is there |
| Platform data match | Campaign metrics cited in the report | Cross-check against the raw platform export, not just the AI summary |
That last check catches a different problem: AI tools that pull from platform APIs can misread or misaggregate data. Clicks for one campaign period appearing under another. ROAS calculated on spend that does not match the dashboard. These are not hallucinations in the traditional sense. They are data handling errors. They belong in the same gate.
The gate should be a named task on the report workflow, assigned to a specific person, with a sign-off field. Not a note that says "please review." A checkbox with a name next to it and a timestamp.
The instinct is to put the verification gate at the final step before sending. That is the wrong place.
By the time a report is formatted and assembled, there is real pressure to get it out. The client is expecting it. The account manager has a deadline. A last-minute gate feels like friction at exactly the wrong moment, so it gets compressed. The check becomes a skim, the skim becomes a send.
The gate needs to sit at the draft stage, before the report moves into layout or presentation. That means:
- AI generates the draft
- Verification gate runs: benchmarks sourced or removed, competitor claims flagged, platform data cross-checked
- Amended draft moves to formatting
- Final review is for formatting and tone, not data accuracy
This structure separates the jobs. Data verification is not a quick task you do while reading for flow. It requires switching to a different mental mode: source-checking, not sense-checking. Mixing the two means neither happens properly.
Building the gate into the draft stage also means the account manager who runs the check has time to go back to the AI tool or the data source and fix the report rather than simply removing the offending claim. The output is better, not just safer.
Agencies that have been running AI in their reporting workflows for twelve to eighteen months are hitting the same problem from different angles: the output is fast, the output is coherent, and the output is sometimes wrong in ways that are not immediately visible.
The solution is not to stop using AI tools. The economics of manual reporting do not make sense at scale. The solution is to treat the AI as a drafter, not a publisher, and to build the human check into the workflow at the structural level, not as a cultural expectation that relies on individuals being thorough under deadline pressure.
That means a named owner for verification. A documented gate with specific checks. A sign-off trail that exists somewhere other than the account manager's memory.
Agencies that build this now are the ones that will still have client trust in twelve months. The ones that treat speed as the only metric will eventually send something a client can disprove with a thirty-second search. That is the conversation that ends retainers.
If your agency is at that crossroads, our AI Operations Support service is designed for exactly this: keeping AI workflows reliable after they go live, with named ownership and documented verification gates. The related problem upstream (vague briefs that produce vague output, faster) is covered in the brief is the bottleneck.
Fast and unverified is not a workflow. It is a liability that has not been invoiced yet.