Production AI agents with n8n + Claude
The agent stack I use for client automations — including the prompts and guardrails that keep things from breaking at scale.
I run AI agents in production for three clients. They handle tasks like:
- Summarizing 200 daily news articles into a Slack digest
- Triaging customer support tickets and drafting responses
- Generating weekly social posts from a content calendar
All three run on the same stack: n8n for orchestration, Claude for reasoning, Supabase for state. None of them have caused a Slack-at-3am incident in the last six months. Here's how I build them so they don't.
Why n8n + Claude (and not LangGraph, CrewAI, etc.)
I tried the popular "agent frameworks" early last year. They're impressive demos. They're also:
- A new abstraction to learn that mostly hides the fact that you're calling an LLM in a loop
- A vendored runtime that someone has to debug at 3am when it breaks
- A thing your client cannot read or modify
n8n is a visual workflow tool. Every step is a node you can click on, see the input, see the output, run independently. When something breaks, the client can usually fix it themselves. When something needs a tweak, they don't have to wait for me.
The "agent" is just: an n8n trigger → fetch some data → call Claude with a careful prompt → parse the response → take the action. That's it. No framework. No runtime. No magic.
The architecture I keep landing on
┌─────────────────┐
│ Trigger │ cron, webhook, or new row in DB
└────────┬────────┘
│
┌────────▼────────┐
│ Fetch context │ pull the data Claude needs to decide
└────────┬────────┘
│
┌────────▼────────┐
│ Call Claude │ with a strict JSON-output prompt
└────────┬────────┘
│
┌────────▼────────┐
│ Validate │ parse JSON, check schema, fail loudly
└────────┬────────┘
│
┌────────▼────────┐
│ Side effects │ send email, update DB, post to Slack
└─────────────────┘
Five steps. Easy to inspect. Easy to retry from the failure point. Easy to add a sixth step.
The prompt template that survives in production
Here's the format I use for almost every Claude call inside an agent. The structure matters:
You are an expert {role}. You will receive {input description}.
Your job is to output exactly one JSON object matching this shape:
{
"decision": "send" | "skip" | "escalate",
"reasoning": "string, one paragraph",
"draft_response": "string or null"
}
Rules:
- Output ONLY the JSON object. No prose before or after.
- If you are uncertain, set "decision" to "escalate".
- "reasoning" must explain the decision in plain language.
- Never invent facts not in the input.
Input:
<input>
{actual_data}
</input>
Three things that matter here:
- The schema is in the prompt, including a discriminated union for the decision. Claude is excellent at following structured-output instructions when the structure is explicit.
<input>tags wrap the user data. This makes prompt injection from user content much harder — Claude treats it as data, not instructions.- There's an "escalate" path. When the model is uncertain, it tells you. You then route those to a human queue. Without this, agents quietly hallucinate confidence and you don't notice until something breaks.
The validate step nobody talks about
Even with structured prompts, Claude occasionally outputs malformed JSON. ~1 in 500 calls in my logs. If your agent crashes on bad JSON, your agent will go down at the worst possible moment.
My validate step (a Function node in n8n):
function validate(raw) {
let parsed;
try {
parsed = JSON.parse(raw);
} catch {
// Try extracting JSON from a code block, etc.
const match = raw.match(/\{[\s\S]*\}/);
if (!match) throw new Error("no json found");
parsed = JSON.parse(match[0]);
}
if (!["send", "skip", "escalate"].includes(parsed.decision)) {
throw new Error("invalid decision: " + parsed.decision);
}
return parsed;
}
If validation fails, I retry once with a recovery prompt: "Your previous response was not valid JSON. Output only the JSON object, exactly matching the schema." That recovers ~80% of the failures. The rest go to the escalate queue.
Rate limiting and cost guards
Two non-obvious things will bite you in production:
1. The bursty trigger. A webhook fires for every new row. Marketing imports a 2,000-row CSV. You just queued 2,000 Claude calls. At a few cents each, that's a $40 surprise that arrives in 15 minutes.
The fix: every workflow has a n8n queue at the front with a max-concurrency of 5 and a per-workflow daily token budget. When the budget hits 80%, I get a Slack ping. When it hits 100%, the workflow pauses until I review.
2. The infinite loop. Agent calls another agent calls the first agent. Or: agent emails customer, customer replies, your support-ticket agent fires, it emails the customer, ad infinitum.
The fix: every agent stamps a header in its outputs (X-Agent-Name) and refuses to process inputs with its own name. Cheap. Lifesaving.
Memory that actually works
The "give your agent memory" thing is mostly a product narrative. In practice, you have two options:
Short-term: stuff the last N messages into the prompt. Works for chat. Doesn't scale beyond that.
Long-term: Supabase + pgvector. Embed every message you want to remember. At query time, embed the user's question, vector-search for the top 5 most relevant memories, stuff those into the prompt as context.
This works. It's also overkill for 90% of "agents." Most automations don't need memory — they need access to the right database query at the right moment. Read the row. Pass it to Claude. Make a decision. Don't dress it up as memory.
Costs in practice
For my three production agents:
- News digest: ~$0.40/day in Claude calls (200 articles × ~$0.002 each).
- Support triage: ~$2/day, scales with ticket volume.
- Content calendar: ~$0.10/day (a few calls per week).
All three replace work that would cost the client $400-1500/month in part-time human attention. The math is comically lopsided in favor of the agent. The constraint isn't cost. The constraint is trust — and the only way to earn that is the validate step, the escalate queue, and shipping unsexy boring guardrails.
Build agents like you'd build a database migration. Slow, deliberate, paranoid. They'll outlast your enthusiasm for them.
✦ Keep reading
More in this category →Got an idea you want to build?
Hire me →