Web Project Studios

Field notes

The AI WhatsApp reply that almost cost an estate agent a complaint

16 May 2026

estate-agencyai-workflowverification

A negotiator at a busy sales branch sent a WhatsApp reply to a buyer asking about a property on their books. The message confirmed the garden was south-facing. The buyer booked a viewing on the strength of it. The garden faced north. The buyer arrived, looked at the fence, and went straight to the branch manager.

The negotiator hadn't written the message. An AI assistant had drafted it from the CRM thread and a vague property summary. The negotiator had skimmed it, assumed it was accurate, and hit send.

No complaint was filed. The manager smoothed it over. But the near-miss sat in the back of my head for weeks, because the structural problem behind it is one I see constantly, just wearing different clothes each time.

This post is about that structure. Not the model, not the prompt. The gap between AI-generated outbound text and the verified listing record it should be drawing from.

I'm not a solicitor. Nothing here is legal advice. If a misrepresentation complaint lands on your desk, speak to your professional indemnity insurer and your compliance lead.

The agency was running an AI drafting tool layered on top of their CRM. This kind of setup is common now across Reapit, Alto, and Jupix installations. A thread comes in, the tool reads the recent messages and any property notes attached to the contact record, and it generates a reply.

The problem is what it reads from. CRM contact records and message threads are unstructured. They contain things like "buyer loves south-facing gardens" from a note a negotiator added six months ago during a different search. The AI read that note, matched it to the current enquiry, and inserted it into the reply as a property fact.

The actual listing record, with the correct orientation data, was in a separate part of the system. The AI had no instruction to check it. So it didn't.

This is the same pattern I wrote about in AI-generated reporting and the hallucination problem: the model isn't lying, it's filling gaps with whatever is plausible given the context it was handed. The output sounds confident because that's what these models do. The error isn't in the generation step. It's in the input design.

A verification gate is a checkpoint. Before AI-generated text leaves a system, something confirms that the claims in that text match a trusted data source.

In this case, the gate would have been simple: before the reply is drafted, pull the canonical listing record for the property referenced in the thread. Feed that record into the prompt as the authoritative source. Instruct the model to use only that data for property-specific claims. Flag any claim in the draft that cannot be traced back to the record.

That's it. Not a better prompt asking the model to "be accurate." A structural constraint that makes accuracy checkable.

The negotiator's skim-and-send behaviour is a separate issue, but it's downstream of the same problem. When people trust AI output at volume, their review quality degrades. That's not laziness. It's how attention works under load. The system has to be designed so that a fast review is still a safe review.

The AML verification workflow post covers a similar point in a compliance context: the policy document looks fine, the individual check log does not. Here the listing data is fine. The outbound message is not. The gap is always between the authoritative record and the thing that actually reached the client.

One misquoted orientation is embarrassing. At volume, this becomes a liability pattern.

Agencies using AI to handle buyer enquiries across a portfolio of 200 or 300 active listings are generating dozens of outbound messages a day. If the drafting tool is reading from contact notes and message history rather than the listing record, every reply is a small gamble. Most of them will be fine. The CRM notes will happen to match the listing. But the ones that don't will be the ones that matter.

Property misdescription sits under the Consumer Protection from Unfair Trading Regulations 2008. A buyer who makes a decision based on a materially false statement about a property has grounds for a complaint, potentially a claim. "The AI wrote it" is not a defence. The agency sent it.

Trading Standards and The Property Ombudsman both treat written representations, including messages, as carrying weight. A WhatsApp message confirming a feature that doesn't exist is a written representation. The channel doesn't change the exposure.

This is also where professional indemnity starts to get interesting. Insurers are beginning to ask whether AI-generated client communications sit within the scope of PI cover as written. Worth checking your policy wording if you haven't.

The fix is a checkpoint before send, not a better prompt. Here is the structure I would implement:

ai_outbound_reply_workflow:
  trigger: inbound_buyer_enquiry
  steps:
    - step: resolve_property_reference
      action: extract_property_id_from_thread_or_contact_record
      fallback: flag_for_human_if_no_property_id_found
 
    - step: fetch_listing_record
      action: pull_canonical_listing_data_from_crm_or_portal_feed
      fields_required:
        - address
        - orientation
        - bedrooms
        - tenure
        - parking
        - key_features
 
    - step: draft_reply
      action: generate_ai_draft
      context_sources:
        - listing_record (authoritative)
        - buyer_thread (tone and question context only)
      instruction: use listing_record for all property facts. do not infer features from thread history.
 
    - step: verification_check
      action: compare_claims_in_draft_against_listing_record
      on_mismatch: flag_claim_for_human_review_before_send
 
    - step: human_review
      action: negotiator_approves_or_edits
      sla: before_send
 
    - step: send
      action: dispatch_via_whatsapp_or_email
      log: record_draft_source_and_review_timestamp

The logging step at the end matters more than most people think. If a complaint arrives, you want to be able to show that the message was reviewed by a named person at a specific time before it was sent. That audit trail is the difference between a process failure and a documented, defensible workflow.

If you are using any AI drafting tool connected to Reapit, Alto, Jupix, or a similar CRM, run this check this week:

  1. Open the tool's configuration or prompt template. Find where it pulls context from.
  2. Ask: is the canonical listing record included as a named input, or is the model working from thread history and contact notes?
  3. If the listing record is not a named input, that is the gap. Fix the input before you touch the prompt.
  4. Add a pre-send flag for any outbound message containing property-specific claims (orientation, tenure, parking, square footage). These are the facts most likely to generate a complaint if wrong.
  5. Log who reviewed each AI-generated message and when. One column in a spreadsheet is enough to start.

The goal is not to stop using AI for buyer communications. The volume argument is real and the time savings are real. The goal is to make the workflow safe enough that a fast human review is a genuine check, not a rubber stamp on whatever the model decided to say.

If you want a structured look at where your current AI workflows have gaps like this one, the AI Workflow Audit is where we start. We map the inputs, the gates, and the points where unverified output reaches a client. Most of the problems we find are not in the model. They are in the design around it.