Web Project Studios

Field notes

The Make scenario that runs every night and has never once been checked

29 June 2026

ai-workflow-opsmakeobservabilityautomation

A recruitment agency's candidate intake workflow ran every night for eleven weeks. The scenario completed without errors. The execution history showed green ticks. Nobody looked at it. Then someone noticed the CRM had stopped receiving new candidates from one source around the time the job board updated its export format. The field previously called candidate_name was now full_name. Make did not throw an error. It mapped an empty value, wrote a blank record, and moved on. Eleven weeks of candidates, silently dropped.

This post is about that gap. Not the technical failure specifically, but the organisational habit that made it possible: the assumption that a running scenario is a working scenario. Make's built-in execution history is not observability. It is a receipt. What you need instead is a deliberate monitoring layer, and most teams never build one.

Make marks a scenario execution as successful when it completes without throwing a hard error. That is a narrow definition of success. It says nothing about whether the data was valid, whether all expected records were processed, or whether the output matched what the downstream system needed.

The field-rename pattern above is one of the most common silent failures in Make workflows. Upstream systems change their schemas. APIs version. Spreadsheet column headers get tidied up by someone who did not know the automation existed. Make does not validate that the fields it is reading still contain what they contained when you built the scenario. It reads what is there, maps it, and continues. If what is there is now empty or structurally different, the execution still completes. The green tick appears. Nothing alerts.

The same failure mode appears with record counts. A scenario that processes 200 records on Monday and 3 records on Tuesday has not necessarily had a quiet day. It may have hit a pagination change, a permissions expiry on the connected account, or a filter condition that now excludes almost everything. Without a count check at the end of each run, you will not know which it is.

Running is not working. The distinction matters because most teams design for the former and assume the latter.

No error route to anywhere visible. Make allows you to add error handlers to modules. Most production scenarios do not have them, or have them only on the modules that felt risky at build time. When something unexpected happens elsewhere, the scenario either halts or, worse, continues with degraded data. Either way, nobody is notified. The error sits in the execution log, which nobody is reading.

No owner after go-live. The person who built the scenario is often not the person responsible for the process it supports. After handover, there is no one with a standing obligation to check it. It becomes infrastructure: assumed to be working until someone notices a downstream consequence. By the time the consequence surfaces, the failure may be weeks old.

No review cadence. Even teams with good intentions rarely formalise a log review step. It is the kind of task that gets done when there is time, which means it does not get done. The execution history accumulates. Nobody reads it. A pattern of degraded runs can persist for months without triggering any action because the signal is buried in a list of green ticks.

These three failures compound. A scenario with no error routing, no owner, and no review cadence is not being operated. It is being hosted.

The goal is to make failure visible without requiring anyone to go looking for it. That means pushing signals out rather than waiting for someone to pull them.

Error routes to a dedicated Slack channel. Every scenario that touches production data should have an error handler on each module that routes failures to a single Slack channel, not a general #dev or #ops channel where the message will be lost. The message should include the scenario name, the module that failed, the timestamp, and the record identifier that was being processed. This is a ten-minute addition to any existing scenario. It means the first signal of a failure arrives in a place someone actually looks, rather than in an execution log nobody opens.

Record count assertions at the end of each run. Before the final module in a scenario, add a step that counts the records processed and compares it to an expected range. If the count falls outside that range, route to the Slack channel. This does not require complex logic. It requires knowing roughly how many records a healthy run should produce and encoding that expectation explicitly. A scenario that usually processes between 150 and 300 records and suddenly processes 4 should tell someone immediately.

A weekly log review in the ops checklist. This is the organisational habit, not the technical one. Someone should have a standing task, once a week, to open the execution history for each production scenario and scan for patterns: runs that took significantly longer than usual, modules that are retrying more than they should, execution counts that have drifted. This does not need to be a long review. Fifteen minutes across five scenarios is enough to catch the slow degradation that error routing alone will not surface.

Here is a baseline observability spec for a Make production scenario:

scenario_observability_baseline:
  scenario_name: "candidate-intake-crm-sync"
  owner: "ops_lead@agency.com"
  review_cadence: "weekly"
 
  error_routing:
    enabled: true
    destination: "Slack #make-alerts"
    message_fields:
      - scenario_name
      - module_name
      - error_message
      - record_id
      - timestamp
 
  record_count_assertion:
    enabled: true
    expected_range:
      min: 50
      max: 400
    on_breach: "route to Slack #make-alerts"
 
  field_validation:
    spot_check_fields:
      - candidate_name
      - email
      - source
    on_empty: "route to Slack #make-alerts"
 
  log_review:
    frequency: "weekly"
    checklist_item: true
    reviewer: "ops_lead@agency.com"
    items_to_check:
      - execution_duration_drift
      - retry_rate_increase
      - execution_count_drop

The owner field matters as much as the technical config. If no one is named, the review cadence will not happen.

If you have Make scenarios in production right now and you are not sure which ones have error routing, start there. Open each scenario. Check whether any module has an error handler attached. If not, add a basic route-to-Slack handler on the data-mapping modules first, because that is where field-rename failures will surface.

Then do the count check. Pick your two highest-volume scenarios and look at the last thirty days of execution history. Note the record counts per run. Calculate a rough expected range. Add an assertion module before the final step.

Then name an owner. Write it in the scenario description field in Make. Put a weekly review task in whatever ops tool your team uses. It does not need to be elaborate. It needs to exist.

The scenarios most likely to be silently broken are the ones that have been running the longest without anyone checking them. They were built when the upstream systems looked a certain way. The upstream systems have changed since then. The scenario does not know that.

This is the same organisational failure pattern that makes most AI pilots quietly stop: not a dramatic breakdown, but a gradual drift from working to not-working with no moment where anyone decides to look. The technical fix for Make scenarios is straightforward. The harder fix is building the habit of treating automation as something that requires ongoing operation, not just initial deployment.

If you want to audit the Make scenarios your team has in production and map where the observability gaps are, the AI Workflow Audit is designed for exactly that. We look at what is running, what is owned, and what would happen if it silently stopped working tonight.