✦ Stack

Next.js + Claude API + Supabase pgvector

Production-grade retrieval-augmented generation. Cited answers. No hallucination.

What RAG actually is, briefly

Retrieval-Augmented Generation: instead of asking the LLM "what's in your training data about X?", you fetch relevant context from your own corpus and pass it to the LLM as input. The LLM's job is to synthesize an answer from your data, with citations, refusing to answer if retrieval finds nothing.

The pipeline I build

Ingestion — your docs/website/PDFs chunked into ~500-token segments, embedded with Voyage or OpenAI embeddings, stored in Supabase pgvector.
Retrieval — user query embedded with the same model, vector search returns top-K matching chunks.
Generation — Claude gets the chunks + user query in a structured prompt, returns answer + cited sources.
Logging — every conversation stored. Bad answers get flagged, retrieval gets re-tuned.

The thing that distinguishes production RAG from demo RAG is "I don't know" behavior. Demos hallucinate confidently. Production says "I don't have information on that" when retrieval doesn't find a strong match. That requires explicit prompt engineering and a similarity-score threshold.

✦ Good for

✓Internal knowledge bots
✓Customer support automation
✓Document Q&A
✓Research tools

✦ Skip if

—Pure conversational chatbots (use Claude direct, no RAG needed)
—Real-time data needs (RAG is for static-ish corpora)

✦ Built with this stack

Deep-SKAI™

AI demo for hospital supply chains with interactive ROI calculator, persona-based briefing flows, and Replit handoff package.

Read the case study →

✦ Keep reading

Build on Next.js + Claude API + Supabase pgvector?

See pricing →