Kensink Labs
Branched RAGSpecialised patternEval-gated
BRANCHED RAG · PARALLEL HYPOTHESES

Branched RAG. Explore multiple interpretations in parallel.

When a query has multiple plausible interpretations, run retrieval and partial generation on each branch in parallel, compare the answers, and pick the strongest. Trades cost for completeness on open-ended or ambiguous questions.

ClaudeOpenAIEval pipelines
Best for
Open-ended · ambiguous queries
Cost
N-way LLM cost
Latency
Best-case parallel
Pattern type
Quality + coverage
[AT A GLANCE]

Best for: Market research that needs structured findings from multiple angles. Open-ended technical questions where 'what about X' and 'what about Y' are both worth answering. Comprehensive briefing tasks.

Origin
Industry practice, 2024-2026
Year
2024-2026
Complexity
Complex
Production stage
Emerging
[THE PIPELINE]

Branch, retrieve in parallel, compare, synthesize.

A query decomposer generates 2-5 interpretations or sub-questions. Each branch runs an independent retrieval and partial generation. A comparator picks the best, or the synthesizer merges across them.

Query
Branch generator (LLM)
Branch 1 retrieve + draft
Branch 2 retrieve + draft
Branch 3 retrieve + draft
Compare drafts
Synthesize final
01

Branch generation

LLM generates 2-5 distinct interpretations of the query. Different angles, different sub-questions, different rephrasings.

02

Per-branch retrieval + draft

Each branch runs through Advanced RAG independently, in parallel. Each produces a partial draft.

03

Compare and synthesize

Comparator LLM evaluates the drafts on completeness and faithfulness. Synthesizer either picks the strongest or merges across branches.

[TECHNICAL STACK]

What we'd actually deploy.

Stack is parallel Advanced RAG plus a comparator and synthesizer. Cost scales linearly with branch count.

BRANCH GENERATOR

Claude Sonnet or GPT-5.5

Mid-tier LLM generates branches. Structured output: N labeled interpretations.

PER-BRANCH RETRIEVAL

Advanced RAG per branch

Each branch is a full Advanced RAG call. Parallel where the API supports it.

COMPARATOR + SYNTHESIZER

Claude Opus or GPT-5.5 (high effort)

Final stage gets the high-tier model because it must merge across draft candidates faithfully.

COST BUDGET

Per-query hard cap

N branches at parallel cost. Hard cap on N (typically 3-5) and on total per-query tokens.

[HOW WE DEPLOY]

Day one to live traffic.

Branched RAG deploys as an add-on to Advanced RAG. The hard part is calibrating which queries benefit from branching vs which just pay extra cost.

  1. 01

    Eval set for branching benefit

    Identify which queries on the eval set actually benefit from multiple interpretations. Most do not; the ones that do are typically open-ended or ambiguous.

  2. 02

    Branch generator

    Calibrated to produce distinct (not paraphrased) interpretations. Structured output validates branch distinctness.

  3. 03

    Parallel retrieval orchestration

    Per-branch retrieval runs in parallel where the LLM API supports concurrent calls. Sequential where it does not; latency reflects accordingly.

  4. 04

    Comparator + synthesizer

    Final stage merges drafts. The harder calibration: when to pick the best draft vs when to merge.

  5. 05

    Routing: which queries get branched

    A classifier decides per query whether branching is worth the cost. Most queries skip the branched path.

[ACCURACY + BENCHMARKS]

What the numbers say.

Branched RAG lifts completeness on open-ended questions; trades cost for coverage. Hard to benchmark cleanly because most public benchmarks have a single ground truth.

+15-25%
Completeness on open-ended
Nx cost
N = branch count
Parallel
Latency if API supports it
Routing
Most queries should skip branching
Our eval methodology

Branched RAG eval needs a 'comprehensiveness' grading axis in addition to standard faithfulness. We grade whether the synthesized answer covers the angles that branching surfaced, not just whether each individual fact is faithful.

[COMMUNITY FEEDBACK]

What practitioners report.

Branched RAG is an emerging-to-mature production pattern. Less standardised than Advanced or Agentic; teams ship their own variants tuned to their workload.

The practitioner consensus is that branching pays off on a small subset of queries and overpays on the rest. The win is the routing: a classifier that decides per query whether to branch. Builds that branch every query waste cost; builds that branch when the eval set says to are competitive.

[COMMON PITFALLS]
  • Branching every query. Cost explodes; quality gain is small on queries that did not need it.
  • Paraphrased branches. If the branches are not distinct, the parallel work is wasted.
  • Sequential branch retrieval. If your LLM API supports parallel, use it; sequential branching is slow.
  • Merging instead of picking. Sometimes the strongest branch is the answer; merging across creates a mush.
[KENSINK LABS EVALUATION]

Our honest take.

We reach for Branched RAG on comprehensive briefing and open-ended research tasks where the cost is acceptable. We do not reach for it on cost-sensitive interactive workloads.

Branched RAG is one of those patterns that earns the build in narrow shapes. Most production traffic does not benefit. The shape that does benefit (research, briefing, comprehensive technical answers) usually has higher cost tolerance, which makes the trade workable.

[WHEN WE REACH FOR IT]
  • Market research and comprehensive briefing tasks.
  • Open-ended technical questions where multiple angles are worth answering.
  • Asynchronous workloads where the latency adder is acceptable.
What we'd substitute

Agentic RAG when the branching is across sources rather than across interpretations. Plain Advanced RAG with query rewriting when the queries are not actually multi-faceted.

[COMMON QUESTIONS]

What buyers ask before they sign.

How many branches?
Usually 3. Lower than that does not buy enough coverage; higher than that overpays on cost. Eval set should confirm.
How do we decide when to branch?
A classifier on the query (open-ended? ambiguous? research-style?) decides per query whether branching is worth the cost. Most production traffic skips the branched path.
Branched RAG vs Agentic RAG?
Different shape. Branched explores interpretations in parallel; Agentic plans across sources sequentially. Pick by whether the query has multiple interpretations or whether it needs multiple sources.
DIRECT RAG · APPLIED K

Bring the corpus. We'll bring the build.

Senior engineers, eval suite at handoff, full source ownership. We integrate against the model and the index the same way we integrate against Postgres. Sized to the work in front of you.