Web Project Studios

Field notes

How to know whether your n8n workflow is actually running in production

10 June 2026

ai-workflow-opsn8nobservability

A client showed me their n8n instance three weeks after go-live. Eleven workflows. All green in the UI. No errors in the execution log. I asked when the last successful run had actually pushed data to their CRM. They opened a tab, checked manually, and went quiet. The answer was nine days ago.

The workflows had not crashed. They had just stopped being triggered. A webhook endpoint had changed upstream, the trigger had received no new events, and n8n had logged nothing because nothing had technically failed. From the dashboard's perspective, everything was fine.

That is the specific failure mode I want to talk about.

Most small-team automation setups treat absence of errors as proof of health. It is not. It is proof of absence of errors. Those two things are different, and conflating them is how you end up with a referral workflow that has not fired in a fortnight while the team assumes leads are just slow.

n8n's execution log is useful, but it has a hard retention limit. On self-hosted instances, the default keeps the last 100 executions per workflow. On n8n Cloud, it depends on your plan. Either way, if a workflow runs hourly and you check it after three days, you are looking at a truncated window. If a workflow is supposed to run daily and has not run in five days, the log just looks sparse. There is no alarm. There is no badge. The UI does not know your workflow is supposed to have run.

This is not a criticism of n8n. It is a description of what observability means and why it has to be designed in, not assumed from the platform.

The question to ask is: how would I know, within two hours, if this workflow stopped working? If you cannot answer that without logging into n8n and checking manually, you do not have monitoring. You have hope.

The most reliable pattern I use for high-stakes workflows is a heartbeat ping, sometimes called a dead-man's switch. The idea is simple: at the end of every successful execution, the workflow sends a signal to an external service. If that signal stops arriving on schedule, the service alerts you.

The workflow does not need to report failure. It just needs to report success. Silence becomes the alarm.

Here is a minimal implementation using n8n's HTTP Request node and a service like Better Uptime's heartbeat monitors:

# Heartbeat pattern: n8n workflow health check
workflow_name: "Daily tenancy renewal check"
trigger: "Schedule (09:00 daily)"
 
steps:
  - id: fetch_renewals
    type: Airtable
    action: List records
    filter: "renewal_date within 60 days"
 
  - id: send_notifications
    type: HTTP Request
    method: POST
    url: "{{ $env.NOTIFICATION_WEBHOOK }}"
    body:
      records: "{{ $json.records }}"
 
  - id: heartbeat_ping
    type: HTTP Request
    method: GET
    url: "https://betteruptime.com/api/v1/heartbeat/{{ $env.HEARTBEAT_TOKEN }}"
    description: >
      Runs ONLY if previous steps succeed.
      Better Uptime expects this ping every 24 hours.
      If it misses, an alert fires to on-call.
 
alerting:
  provider: Better Uptime
  heartbeat_interval: 24h
  grace_period: 1h
  alert_channels:
    - email
    - sms

The grace period matters. If your workflow runs at 09:00 and occasionally takes twelve minutes due to a slow API, a zero-tolerance window will generate false alarms. One hour of grace on a daily workflow is usually enough.

You can use Better Uptime, Cronitor, or Healthchecks.io. The tool is secondary. The pattern is what matters.

If you are thinking about how this fits alongside compliance-adjacent workflows, the verification gate approach I described for AML checks uses a similar logic: the absence of a completed step is itself a signal worth capturing.

Heartbeat pings tell you whether a workflow ran. They do not tell you what it did. For that, you need a log your team can read without opening n8n.

My preferred approach for small teams is an Airtable audit table. Every workflow that touches anything consequential writes a row on completion: timestamp, workflow name, records processed, outcome, and any relevant IDs. This gives you two things the n8n execution log does not: a retention window you control (Airtable keeps records indefinitely on most plans), and a view your non-technical team members can open without credentials.

The write step adds maybe 200ms to execution time. It is worth it.

For higher-volume workflows where Airtable would accumulate thousands of rows quickly, I use a dedicated logging workflow instead: a separate n8n workflow that receives a webhook from the main workflow, formats the log entry, and writes it to a Postgres table or Google Sheet depending on what the client already has running.

The point is that logging should be a first-class node in your workflow design, not an afterthought. If your workflow diagram does not have a log step, your workflow is not finished.

n8n has a built-in error workflow setting. In the workflow settings panel, you can designate a separate workflow to trigger whenever the main one errors. Most people leave this blank.

A minimal error workflow captures the workflow name and error message from $workflow and $execution, formats a readable alert, and sends it somewhere a human will see within the hour. Slack, email, or a PagerDuty webhook all work. The specifics depend on your on-call setup.

What it should not do is send a generic "workflow failed" message with no context. I have seen error alerts that say exactly that and nothing else. By the time someone investigates, the execution log has rotated and the context is gone.

Log the error payload. Log the timestamp. Log which node failed. Log enough that someone can diagnose the problem from the alert alone without needing to reproduce it.

This connects to a broader point I made in the post on AI reporting and hallucinated metrics: when the system produces no output, the temptation is to assume the input was bad. Sometimes the workflow just stopped and nobody noticed.

If you have n8n workflows running in production right now and no alerting layer, here is the order I would tackle it:

  1. Audit your triggers. List every workflow and its expected run frequency. If you cannot state the frequency, you cannot monitor it.
  2. Add a heartbeat to your three most important workflows first. Set up a free Healthchecks.io account (generous free tier), create a heartbeat monitor per workflow, and add the ping node as the final step.
  3. Create one error workflow. Wire it to all three of those workflows. Have it post to a Slack channel you actually check. Do not make it clever. Make it fast to build.
  4. Add an Airtable log node to any workflow that writes to a CRM, sends a communication, or touches financial data. Name the table clearly. Share it with whoever owns that process.
  5. Set a calendar reminder for a monthly execution review. Open the Airtable log. Check that the row counts match expectations. Look for gaps.

That is a working observability layer. It is not sophisticated. It does not require a dedicated ops tool. It requires designing the monitoring in at the same time as the workflow, not six weeks later when something has already gone wrong.

The workflows that drift are always the ones where someone said "we'll add proper monitoring later." Later is when the referral pipeline is dry and nobody knows why.


If you are building automation for a client or running your own agency's internal workflows, the AI Workflow Audit is where I start every engagement: mapping what is running, what has no alerting, and what is silently failing. It is a short engagement with a clear output. If the audit above sounds like a list of things you have been meaning to do, that is probably the right place to start.