Initial round
Standard retrieval against the original query. Generates a first partial answer with what is known so far.
Some questions cannot be answered in one round of retrieval; the system retrieves, partially answers, identifies what is still missing, and retrieves again. Each round narrows the gap. The pattern for research questions that decompose into sequential sub-questions.
Best for: Research questions that decompose into sequential sub-questions. Comparative analysis where one finding informs the next search. Investigation-style workloads where the question is iteratively refined.
Iterative RAG runs multiple rounds. Each round retrieves against the current state of knowledge, generates a partial answer, identifies what is still missing, and refines the query for the next round. A controller decides when to stop.
Standard retrieval against the original query. Generates a first partial answer with what is known so far.
LLM evaluates the partial answer against the original question. Names the missing pieces explicitly: 'we know X but still need Y'.
Refined query targets the identified gap. Retrieves additional context. Updates the partial answer.
Controller decides when to stop. Common conditions: gap closes, round budget exhausts, or LLM declares the answer complete.
Synthesizer combines findings from all rounds into a coherent answer with citation discipline.
Stack is base RAG plus a multi-round controller plus state management across rounds. The cost scales with round count.
Each round is a full Advanced RAG retrieval. Quality compounds; weak per-round retrieval poisons subsequent rounds.
Mid-tier LLM names what is still missing. Structured output: list of sub-questions or required facts.
Decides when to continue and when to stop. Per-query round budget capped (typically 3-5 rounds).
Accumulates partial answers and retrieved evidence across rounds. Per-tenant scoped; cleaned up at session end.
High-tier LLM for final synthesis. Combines round-by-round findings into a coherent answer.
Iterative RAG deploys as a multi-step pattern that requires careful round budget control. We size it as a 10-week first build because the controller calibration is non-trivial.
Most queries are not iterative. On the eval set, identify which questions actually benefit from multiple rounds. Usually research-style or comparative.
Each round runs full Advanced RAG. The per-round quality must be strong; iterative cannot rescue weak base retrieval.
LLM identifies what is missing after each round. Structured output: explicit sub-questions or facts still needed.
Stop on (a) gap closes, (b) round budget exhausts, or (c) LLM declares complete. Hard cap (typically 5 rounds).
Combines findings. Citation discipline maintained across rounds.
Eval set must include questions that genuinely benefit from multiple rounds. Otherwise the iterative benefit is invisible.
Iterative RAG lifts completeness on questions that decompose into sub-questions. Hard to benchmark cleanly because public datasets rarely have multi-step ground truth.
Iterative RAG eval grades the per-round retrieval, the gap identification, the stop condition, and the final synthesis. Each component must be measurable independently so regressions are locatable.
Iterative RAG is emerging as a named pattern in 2025-2026. The shape has existed in agentic systems for longer; the consolidation under one name is recent.
The practitioner consensus is that iterative RAG is a research-grade pattern that earns the build for research-grade workloads. The cost is meaningful (3-5x base) and the latency is sequential. The win is real but only on questions that genuinely require multiple rounds of investigation. Routing matters: most production queries are not iterative.
We reach for Iterative RAG on research and analytical workloads where questions genuinely decompose into sub-questions. We do not reach for it on interactive or cost-sensitive workloads.
Iterative RAG earns the build on a narrow set of workloads. The shape that fits: asynchronous research, comparative analysis, deep investigation. The shape that does not: interactive chat, cost-sensitive support, single-shot Q&A. The eval set decides.
Agentic RAG when the work is multi-source rather than multi-round. Branched RAG when the work is multi-interpretation rather than multi-round. Plain Advanced RAG when one round is enough.
Closest family; agentic is plan-then-execute, iterative is execute-then-refine.
Read playbookRelated patternMulti-step alternative; branched is parallel, iterative is sequential.
Read playbookRelated patternIterative often runs with a CRAG-style evaluator on each round.
Read playbook