Stack12 min read·22 Oct 2025

Production AI agents with n8n + Claude

The agent stack I use for client automations — including the prompts and guardrails that keep things from breaking at scale.

I run AI agents in production for three clients. They handle tasks like:

Summarizing 200 daily news articles into a Slack digest
Triaging customer support tickets and drafting responses
Generating weekly social posts from a content calendar

All three run on the same stack: n8n for orchestration, Claude for reasoning, Supabase for state. None of them have caused a Slack-at-3am incident in the last six months. Here's how I build them so they don't.

Why n8n + Claude (and not LangGraph, CrewAI, etc.)

I tried the popular "agent frameworks" early last year. They're impressive demos. They're also:

A new abstraction to learn that mostly hides the fact that you're calling an LLM in a loop
A vendored runtime that someone has to debug at 3am when it breaks
A thing your client cannot read or modify

n8n is a visual workflow tool. Every step is a node you can click on, see the input, see the output, run independently. When something breaks, the client can usually fix it themselves. When something needs a tweak, they don't have to wait for me.

The "agent" is just: an n8n trigger → fetch some data → call Claude with a careful prompt → parse the response → take the action. That's it. No framework. No runtime. No magic.

The architecture I keep landing on

┌─────────────────┐
│  Trigger        │  cron, webhook, or new row in DB
└────────┬────────┘
         │
┌────────▼────────┐
│  Fetch context  │  pull the data Claude needs to decide
└────────┬────────┘
         │
┌────────▼────────┐
│  Call Claude    │  with a strict JSON-output prompt
└────────┬────────┘
         │
┌────────▼────────┐
│  Validate       │  parse JSON, check schema, fail loudly
└────────┬────────┘
         │
┌────────▼────────┐
│  Side effects   │  send email, update DB, post to Slack
└─────────────────┘

Five steps. Easy to inspect. Easy to retry from the failure point. Easy to add a sixth step.

The prompt template that survives in production

Here's the format I use for almost every Claude call inside an agent. The structure matters:

You are an expert {role}. You will receive {input description}.

Your job is to output exactly one JSON object matching this shape:

{
  "decision": "send" | "skip" | "escalate",
  "reasoning": "string, one paragraph",
  "draft_response": "string or null"
}

Rules:
- Output ONLY the JSON object. No prose before or after.
- If you are uncertain, set "decision" to "escalate".
- "reasoning" must explain the decision in plain language.
- Never invent facts not in the input.

Input:
<input>
{actual_data}
</input>

Three things that matter here:

The schema is in the prompt, including a discriminated union for the decision. Claude is excellent at following structured-output instructions when the structure is explicit.
<input> tags wrap the user data. This makes prompt injection from user content much harder — Claude treats it as data, not instructions.
There's an "escalate" path. When the model is uncertain, it tells you. You then route those to a human queue. Without this, agents quietly hallucinate confidence and you don't notice until something breaks.

The validate step nobody talks about

Even with structured prompts, Claude occasionally outputs malformed JSON. ~1 in 500 calls in my logs. If your agent crashes on bad JSON, your agent will go down at the worst possible moment.

My validate step (a Function node in n8n):

function validate(raw) {
  let parsed;
  try {
    parsed = JSON.parse(raw);
  } catch {
    // Try extracting JSON from a code block, etc.
    const match = raw.match(/\{[\s\S]*\}/);
    if (!match) throw new Error("no json found");
    parsed = JSON.parse(match[0]);
  }
  if (!["send", "skip", "escalate"].includes(parsed.decision)) {
    throw new Error("invalid decision: " + parsed.decision);
  }
  return parsed;
}

If validation fails, I retry once with a recovery prompt: "Your previous response was not valid JSON. Output only the JSON object, exactly matching the schema." That recovers ~80% of the failures. The rest go to the escalate queue.

Rate limiting and cost guards

Two non-obvious things will bite you in production:

1. The bursty trigger. A webhook fires for every new row. Marketing imports a 2,000-row CSV. You just queued 2,000 Claude calls. At a few cents each, that's a $40 surprise that arrives in 15 minutes.

The fix: every workflow has a n8n queue at the front with a max-concurrency of 5 and a per-workflow daily token budget. When the budget hits 80%, I get a Slack ping. When it hits 100%, the workflow pauses until I review.

2. The infinite loop. Agent calls another agent calls the first agent. Or: agent emails customer, customer replies, your support-ticket agent fires, it emails the customer, ad infinitum.

The fix: every agent stamps a header in its outputs (X-Agent-Name) and refuses to process inputs with its own name. Cheap. Lifesaving.

Memory that actually works

The "give your agent memory" thing is mostly a product narrative. In practice, you have two options:

Short-term: stuff the last N messages into the prompt. Works for chat. Doesn't scale beyond that.

Long-term: Supabase + pgvector. Embed every message you want to remember. At query time, embed the user's question, vector-search for the top 5 most relevant memories, stuff those into the prompt as context.

This works. It's also overkill for 90% of "agents." Most automations don't need memory — they need access to the right database query at the right moment. Read the row. Pass it to Claude. Make a decision. Don't dress it up as memory.

Costs in practice

For my three production agents:

News digest: ~$0.40/day in Claude calls (200 articles × ~$0.002 each).
Support triage: ~$2/day, scales with ticket volume.
Content calendar: ~$0.10/day (a few calls per week).

All three replace work that would cost the client $400-1500/month in part-time human attention. The math is comically lopsided in favor of the agent. The constraint isn't cost. The constraint is trust — and the only way to earn that is the validate step, the escalate queue, and shipping unsexy boring guardrails.

Build agents like you'd build a database migration. Slow, deliberate, paranoid. They'll outlast your enthusiasm for them.

✦ Keep reading

✦ Keep reading

Got an idea you want to build?

Hire me →

Production AI agents with n8n + Claude

Why n8n + Claude (and not LangGraph, CrewAI, etc.)

The architecture I keep landing on

The prompt template that survives in production

The validate step nobody talks about

Rate limiting and cost guards

Memory that actually works

Costs in practice

✦ Keep reading

Replit + Claude is the quietly best stack of 2026

The solo developer playbook: how I ship apps in 7 days

How much does it cost to build an MVP in 2026?