Kensink Labs
14 RAG patternsDirect LLM · no frameworkProduction grade
RAG · ARCHITECTURES · 14 PATTERNS

Fourteen RAG patterns. Five you will pick from. Nine you should know.

Five patterns drive most production RAG conversations: Advanced, Agentic, GraphRAG, Self-RAG, and Corrective. Nine more are specialised — each earns the build in narrower shapes of corpus or query. All fourteen sketched, ranked honestly, by a lab that ships them.

pgvectorCohereOpenAIAnthropicEval pipelines
Coverage
14 named industry patterns
Default
Advanced RAG (hybrid + rerank)
Cycle
Sprint or program · sized to corpus
Discipline
Citations + eval gates
[THE FIVE THAT MATTER]

Five primary patterns.

In order of how often we ship them. Advanced is the 2026 production default. GraphRAG and Agentic earn the build for multi-hop and heterogeneous-corpus work. Self-RAG and Corrective are the quality-check loops for high-stakes accuracy.

01PrimaryOUR DEFAULT

Advanced RAG

Query rewriting + hybrid (vector + BM25) + reciprocal-rank fusion + cross-encoder rerank + citation discipline. The 2026 production consensus.

When it earns the build

Most production RAG. Reranking alone lifts Recall@5 by ~17 points; hybrid catches the exact-term matches embeddings miss. Worth the extra ~30ms p95 on almost every build.

When it doesn't

Throughput-extreme workloads where you cannot pay the rerank latency tax, or corpora so well-keyed that hybrid + rerank doesn't move the needle.

Query rewrite
Vector search
BM25 search
Reciprocal-rank fusion
Cross-encoder rerank
LLM + citations
02PrimaryTOOL-USING

Agentic RAG

LLM decides what to retrieve, from which source, and whether the result is good enough. Multiple retrieval rounds, source-specific agents, validation step.

When it earns the build

Heterogeneous corpora, multi-source research (legal + financial + internal), complex queries that need decomposition. Where one shot of retrieval was always wrong.

When it doesn't

Cost-sensitive volume traffic. Each query costs multiple LLM calls. Latency is unpredictable.

Query
Planner LLM
Source A agent
Source B agent
Source C agent
Validator LLM
(retry if weak)
Final generate
03PrimaryMULTI-HOP

Graph RAG

Build a knowledge graph over entities + relationships. Retrieval becomes graph traversal. Reasoning becomes path-following.

When it earns the build

Multi-hop reasoning across linked facts: clinical decision support, regulatory analysis, complex case files, investigative journalism. Published numbers report 81%+ accuracy in specialised domains, +6.8 pts over flat RAG.

When it doesn't

Flat document collections, fast-changing data, small corpora where the graph build cost dwarfs the retrieval gain.

Source · Microsoft GraphRAG
Query
Entity extract
Knowledge graph
Multi-hop traversal
Path scoring
Generate with paths
04PrimarySELF-CRITIQUE

Self-RAG

Model evaluates its own retrieval and answer. Decides if it needs to retrieve again, with what query, before responding.

When it earns the build

Vague or under-specified queries, domains where a confidently wrong answer is more expensive than a slow one. Hides bad retrieval from the user.

When it doesn't

Latency-critical interactive use. Self-critique adds round-trips. Can also refuse to answer too often when uncertainty is high.

Source · Asai et al., Self-RAG (2023)
Query
Retrieve
Draft answer
Self-critique LLM
(retry or refine)
Final answer
05PrimaryPOST-CHECK

Corrective RAG (CRAG)

After retrieval, evaluate document quality. If weak, fall back to web search or query rewrite before generation.

When it earns the build

High-stakes accuracy contexts — legal research, academic writing, policy analysis — where catching a bad retrieval before generation is worth the extra latency.

When it doesn't

Volume traffic. Quality checks add cost on every query, including the ones that would have been fine.

Source · Yan et al., Corrective RAG (2024)
Query
Retrieve
Doc quality eval
(strong: use)
(weak: fallback)
Web search or rewrite
Generate
[SPECIALISED PATTERNS]

Nine specialised patterns.

Each earns the build in a specific shape of corpus or query. Not the place to start a project — the place to look when the primary five don't fit.

01BASELINE

Naive RAG

Embed the query, fetch top-K, stuff them in the prompt. No rewriting, no fusion, no reranking.

When it earns the build

Single-domain FAQ, internal docs search, narrow-scope chatbots where queries are short and the corpus is well-keyed. The fastest thing that can possibly work.

02CONVERSATIONAL

Simple RAG with memory

Naive RAG plus a conversation buffer. The model can resolve pronouns and follow-up references because it sees prior turns.

When it earns the build

Customer support chat, tutoring bots, personal assistants where the second question depends on the first. Cheaper than rebuilding context every turn.

03ARCHITECTURE STYLE

Modular RAG

Compose the pipeline from swappable parts (retriever, reranker, generator). Not a pattern itself; an engineering posture.

When it earns the build

Every production build. Module boundaries let you swap a reranker or change the embedding model without rewriting the whole thing. The 2026 default engineering posture.

04PARALLEL EXPLORATION

Branched RAG

Run multiple interpretations of the query in parallel, score each, pick or merge the best answer.

When it earns the build

Open-ended research queries, comparative analysis (this product vs that), domains where the question has multiple legitimate framings.

05ROUTING

Adaptive RAG

Classify the query (simple / complex / broad / narrow) and route to a matched retrieval strategy. Simple queries get fast pipelines; complex ones get the agentic shape.

When it earns the build

Mixed-shape traffic — public-facing assistants, support bots, internal tools that see everything from "what's our return policy?" to "compare these three contracts".

06PRE-FETCH

Speculative RAG

Predict the next likely query while answering the current one. Pre-fetch retrieval for the predicted follow-up.

When it earns the build

Latency-critical interactive use where the conversation has predictable shape. Autocomplete-style search, support flows with well-known follow-ups.

07QUERY EXPANSION

HyDE (Hypothetical Document Embedding)

LLM writes a hypothetical answer to the query, embeds the answer, retrieves documents semantically similar to it. Semantic matching, not term matching.

When it earns the build

Technical or specialist domains where the query and the document use different vocabulary. Medical, legal, academic. When BM25 misses and embeddings need the right anchor.

Source · Gao et al., HyDE (2022)
08BEYOND TEXT

Multimodal RAG

Text + images + tables + audio in one retrieval surface. Vision LLMs and multi-modal embedding models (BGE-M3, ColPali) make documents understandable, not just searchable.

When it earns the build

PDFs with tables and figures (legal, financial, technical), visual catalogs, scanned archives, medical imaging notes. See our /llm/rag/multimodal/ playbook for the full build.

09REASONING LOOPS

Iterative / multi-step RAG

Generate, retrieve based on the partial answer, generate again. Used inside agentic and chain-of-thought workflows when one retrieval pass isn't enough.

When it earns the build

Long-form synthesis, structured report writing, queries that decompose into sub-questions. Often a composition pattern inside agentic systems rather than a standalone build.

[WHAT YOU GET]

What we leave on every RAG build.

14
Patterns considered, one named
Hybrid
Vector + BM25 by default
Reranked
Cross-encoder on top-K
Cited
Every claim, every answer
[COMMON QUESTIONS]

What buyers ask before they sign.

If you had to ship one pattern tomorrow, which one?
Advanced RAG: hybrid retrieval (pgvector + BM25 fused with RRF) + cross-encoder rerank (Cohere Rerank v3) + citation discipline. It's the 2026 production consensus, and the +17 pts of Recall@5 from reranking alone almost always justifies the latency. Promote to Agentic or GraphRAG only when the corpus or query shape demands.
When is GraphRAG worth the build cost?
When the question quality depends on multi-hop reasoning across linked entities — clinical decision support, regulatory analysis, complex case files, multi-document Q&A in regulated domains. Microsoft and follow-on research report 6-8 point accuracy gains over flat RAG, hitting 81%+ in specialised domains. The cost is building and maintaining the graph, which is non-trivial.
Aren't Self-RAG and Corrective RAG basically the same?
Close but not identical. Self-RAG critiques its own answer and decides whether to retrieve again. Corrective RAG evaluates retrieval quality before generation, and falls back to web search or query rewrite if the retrieved docs are weak. In practice we often combine them with eval gates at both stages.
Is HyDE still useful with modern embedding models?
Yes, in specialist domains where the query vocabulary diverges from the document vocabulary — medical, legal, code. Always pair with rerank: HyDE expands the candidate set; rerank cleans it up. Without rerank, hallucinated hypotheses can pull retrieval off course.
Do you ever ship Naive RAG to production?
Only for very narrow, well-keyed corpora — a single product's FAQ, internal docs with consistent vocabulary, narrow-domain chatbots. Even then, we add reranking and citations on the second iteration. The cost of "upgrade later" is almost always lower than the cost of shipping a system that gets quietly wrong.
DIRECT RAG · APPLIED K

Pick the pattern. Bring the corpus.

We will sketch the pipeline against your real data, name the trade, and ship a measured build. Sized to the work — sprint, program, or ongoing partnership.