Web Project Studios

Field notes

Your CRM is a data lake. Your AI tool is drinking from the wrong tap.

1 May 2026

estate-agencycrmworkflow

I sat with a lettings manager last autumn who had just run an AI summary across her pipeline in Alto. The tool told her that a particular landlord had "expressed interest in portfolio management services." She had no memory of that conversation. She went back through the contact notes. The tag had been applied eighteen months earlier by a negotiator who had since left. The landlord had called to complain about a maintenance delay. Someone had mis-tagged the record. The AI had no way to know that. It just read what was there.

That is the problem in one story. The AI did not hallucinate. It summarised accurately. The data was wrong, and the summary inherited that wrongness with complete confidence.

This is where most estate agency AI projects are sitting right now. The tool is fine. The tap it is drinking from is not.

Reapit, Alto, Jupix, AgentOS. All of them accumulate the same categories of rot over time, because no one has an incentive to clean them in the moment.

Duplicate contacts are the most obvious problem. A buyer registers on Rightmove. They call in directly two weeks later. A second record is created. Both records have notes. One has a viewing history. One has a mortgage status tag. Neither is complete. When you run AI summarisation across your active buyer list, the model picks up whichever record it encounters first, or tries to merge them, or surfaces both. The output looks coherent. It is not.

Soft-deleted records are quieter. Most CRM platforms do not hard-delete enquiries, viewings, or contact associations when you click remove. They flag the record as inactive and exclude it from the standard UI. But the underlying data is still there, and many AI integrations query at the database or API layer, not through the filtered front-end view. So a contact who withdrew their offer in frustration eighteen months ago can appear in an AI-generated summary of "warm leads" because their record was soft-deleted rather than properly closed and tagged.

Outdated tags are the slowest drift. Someone sets up a tag called "investor buyer" in 2022. Negotiators apply it loosely for a year. The definition shifts. Nobody audits it. By 2025, the tag means three different things depending on who applied it. The AI treats it as a consistent signal. It is not.

None of this is the AI's fault. A model summarising contact notes is doing exactly what it was designed to do. The issue is that the notes were never designed to be summarised at scale by a machine. They were written by humans, for humans, in the moment, with all the inconsistency that implies.

There is a version of this problem that existed before AI, and it was manageable. A negotiator reading a contact record could apply judgement. They would notice that a note from three years ago was probably stale. They would recognise a colleague's shorthand. They would know that "investor query" on a record from 2021 might mean something different to what it means now.

AI does not do that. It reads the field, processes the value, and produces an output. The confidence of the output does not correlate with the quality of the input. This is what makes AI-generated reporting feel authoritative even when it is wrong. The prose is fluent. The summary is structured. The underlying data is four years of accumulated drift.

The practical effect is that agents start making decisions on AI-surfaced insights without the instinct to interrogate them. A manager sees "twelve warm leads have not been contacted in thirty days" and acts on it. Some of those twelve are duplicates of contacts who were called last week on a different record. Some are soft-deleted. Some have a status that was never updated after the deal fell through. The AI found twelve records matching a filter. It cannot tell you whether those twelve records reflect twelve real situations.

This is not a model problem. Better prompting will not fix it. A more expensive integration will not fix it. The data needs to be right before the AI touches it.

Before you connect an AI layer to your CRM, four categories of data need attention.

Duplicate contacts. Run a deduplication report. Most CRMs have one built in, or you can export to a spreadsheet and match on email and phone. Where duplicates exist, decide which record is canonical, merge the notes manually, and archive the secondary record properly. Do not soft-delete. Mark it as merged and link it to the primary.

Soft-deleted records. Talk to whoever manages your CRM configuration. Ask what "delete" actually does in your system. If records are flagged inactive rather than removed, find out whether your AI integration queries the full dataset or the filtered view. If it queries the full dataset, you need a hard exclusion rule, or you need to go back and properly close every soft-deleted record with a status that the AI can read as terminal.

Stale and inconsistent tags. Pull a full list of every tag in your system. Count how many records carry each one. Look at the oldest applications of each tag. If the meaning has drifted, either retire the tag and retag records manually, or document the current definition and accept that historical applications are unreliable. The AI should not be drawing inferences from tags without that context.

Contact notes written for humans. This one is harder. Notes like "called, no answer, try again" or "interested but waiting on mortgage" are useful to a negotiator in context. They are noise to an AI without structure. You cannot rewrite four years of notes. But you can add structured fields going forward: status, last action type, next action date. Give the AI something machine-readable to anchor to.

This is not glamorous work. It is the work that makes everything else worth doing.

week_1:
  focus: "Duplicate contact audit"
  tasks:
    - Export full contact list with email, phone, and created date
    - Run deduplication match on email and mobile
    - Flag all records with more than one match
    - Assign one negotiator per branch to review and merge flagged records
    - Set canonical record rule: most recent activity wins
  output: "Clean, single-record contacts with merged note history"
 
week_2:
  focus: "Soft-delete and status audit"
  tasks:
    - Request database-level export or API query of all records including inactive
    - Compare against front-end active list to identify soft-deleted volume
    - Categorise each soft-deleted record: withdrawn, duplicate, spam, or historic
    - Apply hard status tags readable by AI integration
    - Confirm with CRM provider whether AI queries filtered or unfiltered data
  output: "Closed records properly marked, AI exclusion rules confirmed"
 
week_3:
  focus: "Tag rationalisation"
  tasks:
    - Pull full tag list with record counts and date ranges
    - Identify tags with fewer than 10 uses or last applied more than 18 months ago
    - Retire or consolidate low-use tags
    - Document current definitions for retained tags in a shared reference doc
    - Remove retired tags from all records in bulk
  output: "Tag library with documented definitions, no orphaned tags"
 
week_4:
  focus: "Structured field implementation"
  tasks:
    - Add or enforce: contact status, last action type, next action date
    - Brief all negotiators on what each field means and how to complete it
    - Set a minimum data standard for new records going forward
    - Run a test AI query across cleaned data and review output for anomalies
    - Document what the AI can and cannot reliably surface
  output: "Structured data layer the AI can read with confidence"

Four weeks is achievable for a single branch. Multi-branch or multi-system operations will take longer, and that is fine. The point is to do it before you scale the AI layer, not after.

If you have already plugged an AI tool into your CRM and you are reading this with a familiar feeling, the answer is not to disconnect it. Be honest about what the output is worth right now.

Treat every AI-generated insight as a hypothesis, not a conclusion. When the tool surfaces a warm lead or flags a landlord as a portfolio prospect, have someone verify it against the raw record before acting. That is extra work in the short term. It is also how you learn which categories of output are reliable and which are not, which tells you exactly where to focus the data cleaning.

The AI Workflow Audit at Web Project Studios is built around this kind of diagnosis. We look at what data your AI is actually reading, where the quality breaks down, and what needs to happen in your CRM before the tool can be trusted. It is the same principle I wrote about in the context of AI pilots that stop without anyone formally cancelling them. The tool did not fail. The foundation was never solid enough to build on.

Fix the data first. The AI will still be there when you are done.