Kensink Labs
Iterative RAGSpecialised patternEval-gated
ITERATIVE RAG · MULTI-ROUND REFINEMENT

Iterative RAG. Each round refines the next retrieval.

Some questions cannot be answered in one round of retrieval; the system retrieves, partially answers, identifies what is still missing, and retrieves again. Each round narrows the gap. The pattern for research questions that decompose into sequential sub-questions.

ClaudeOpenAIEval pipelines
Best for
Research questions · sequential decomposition
Stack
Multi-round retrieval + state
Cost
Per-round LLM + retrieval
Latency
Multiple sequential rounds
[AT A GLANCE]

Best for: Research questions that decompose into sequential sub-questions. Comparative analysis where one finding informs the next search. Investigation-style workloads where the question is iteratively refined.

Origin
Various, 2023-2026; consolidated as a named pattern
Year
2023-2026
Complexity
Complex
Production stage
Emerging
[THE PIPELINE]

Retrieve, partial answer, identify gap, retrieve more.

Iterative RAG runs multiple rounds. Each round retrieves against the current state of knowledge, generates a partial answer, identifies what is still missing, and refines the query for the next round. A controller decides when to stop.

Initial query
Round 1: retrieve
Partial answer
Identify gap (LLM)
Round N: retrieve
Refined answer
Stop condition (LLM)
Final synthesis
01

Initial round

Standard retrieval against the original query. Generates a first partial answer with what is known so far.

02

Gap identification

LLM evaluates the partial answer against the original question. Names the missing pieces explicitly: 'we know X but still need Y'.

03

Next round

Refined query targets the identified gap. Retrieves additional context. Updates the partial answer.

04

Stop condition

Controller decides when to stop. Common conditions: gap closes, round budget exhausts, or LLM declares the answer complete.

05

Final synthesis

Synthesizer combines findings from all rounds into a coherent answer with citation discipline.

[TECHNICAL STACK]

What we'd actually deploy.

Stack is base RAG plus a multi-round controller plus state management across rounds. The cost scales with round count.

BASE RETRIEVAL

Advanced RAG per round

Each round is a full Advanced RAG retrieval. Quality compounds; weak per-round retrieval poisons subsequent rounds.

GAP IDENTIFIER

Claude Sonnet or GPT-5.5

Mid-tier LLM names what is still missing. Structured output: list of sub-questions or required facts.

ROUND CONTROLLER

State machine with stop condition

Decides when to continue and when to stop. Per-query round budget capped (typically 3-5 rounds).

STATE STORE

Per-query session state

Accumulates partial answers and retrieved evidence across rounds. Per-tenant scoped; cleaned up at session end.

FINAL SYNTHESIZER

Claude Opus or GPT-5.5 (high effort)

High-tier LLM for final synthesis. Combines round-by-round findings into a coherent answer.

[HOW WE DEPLOY]

Day one to live traffic.

Iterative RAG deploys as a multi-step pattern that requires careful round budget control. We size it as a 10-week first build because the controller calibration is non-trivial.

  1. 01

    Identify iterative workloads

    Most queries are not iterative. On the eval set, identify which questions actually benefit from multiple rounds. Usually research-style or comparative.

  2. 02

    Base retrieval per round

    Each round runs full Advanced RAG. The per-round quality must be strong; iterative cannot rescue weak base retrieval.

  3. 03

    Gap identifier prompt

    LLM identifies what is missing after each round. Structured output: explicit sub-questions or facts still needed.

  4. 04

    Stop condition + round budget

    Stop on (a) gap closes, (b) round budget exhausts, or (c) LLM declares complete. Hard cap (typically 5 rounds).

  5. 05

    Final synthesizer

    Combines findings. Citation discipline maintained across rounds.

  6. 06

    Eval set with iterative cases

    Eval set must include questions that genuinely benefit from multiple rounds. Otherwise the iterative benefit is invisible.

[ACCURACY + BENCHMARKS]

What the numbers say.

Iterative RAG lifts completeness on questions that decompose into sub-questions. Hard to benchmark cleanly because public datasets rarely have multi-step ground truth.

+15-30%
Completeness on iterative questions
3-5x
Cost vs single-round
Sequential
Latency multiplied by round count
Routing
Most queries should not be iterative
Our eval methodology

Iterative RAG eval grades the per-round retrieval, the gap identification, the stop condition, and the final synthesis. Each component must be measurable independently so regressions are locatable.

[COMMUNITY FEEDBACK]

What practitioners report.

Iterative RAG is emerging as a named pattern in 2025-2026. The shape has existed in agentic systems for longer; the consolidation under one name is recent.

The practitioner consensus is that iterative RAG is a research-grade pattern that earns the build for research-grade workloads. The cost is meaningful (3-5x base) and the latency is sequential. The win is real but only on questions that genuinely require multiple rounds of investigation. Routing matters: most production queries are not iterative.

[COMMON PITFALLS]
  • Running iterative on every query. Cost blows up; quality gain absent on queries that did not need it.
  • No stop condition. Loops run until the round budget exhausts on every query.
  • Weak base retrieval. Iterative cannot rescue weak retrieval; it just compounds the problem.
  • Sequential latency. If the workload is interactive, iterative is the wrong pattern.
[KENSINK LABS EVALUATION]

Our honest take.

We reach for Iterative RAG on research and analytical workloads where questions genuinely decompose into sub-questions. We do not reach for it on interactive or cost-sensitive workloads.

Iterative RAG earns the build on a narrow set of workloads. The shape that fits: asynchronous research, comparative analysis, deep investigation. The shape that does not: interactive chat, cost-sensitive support, single-shot Q&A. The eval set decides.

[WHEN WE REACH FOR IT]
  • Research and deep-investigation workloads where multi-round retrieval is required.
  • Comparative analysis where one finding informs the next search.
  • Asynchronous report generation where latency budget is generous.
What we'd substitute

Agentic RAG when the work is multi-source rather than multi-round. Branched RAG when the work is multi-interpretation rather than multi-round. Plain Advanced RAG when one round is enough.

[COMMON QUESTIONS]

What buyers ask before they sign.

How many rounds?
3-5 is typical. Beyond that, the marginal information gain per round is small and the cost keeps adding up. Round budget capped at deploy time, refined on the eval set.
Iterative RAG vs Agentic RAG?
Different shape. Agentic plans across sources; iterative refines within a source over multiple rounds. They compose: an agentic loop can include iterative retrieval within each source.
Can iterative RAG work for interactive chat?
Rarely. The sequential latency is hard to hide. If a chat workload genuinely needs iterative, we usually pair it with speculative pre-fetch or accept the latency budget.
DIRECT RAG · APPLIED K

Bring the corpus. We'll bring the build.

Senior engineers, eval suite at handoff, full source ownership. We integrate against the model and the index the same way we integrate against Postgres. Sized to the work in front of you.