Branch generation
LLM generates 2-5 distinct interpretations of the query. Different angles, different sub-questions, different rephrasings.
When a query has multiple plausible interpretations, run retrieval and partial generation on each branch in parallel, compare the answers, and pick the strongest. Trades cost for completeness on open-ended or ambiguous questions.
Best for: Market research that needs structured findings from multiple angles. Open-ended technical questions where 'what about X' and 'what about Y' are both worth answering. Comprehensive briefing tasks.
A query decomposer generates 2-5 interpretations or sub-questions. Each branch runs an independent retrieval and partial generation. A comparator picks the best, or the synthesizer merges across them.
LLM generates 2-5 distinct interpretations of the query. Different angles, different sub-questions, different rephrasings.
Each branch runs through Advanced RAG independently, in parallel. Each produces a partial draft.
Comparator LLM evaluates the drafts on completeness and faithfulness. Synthesizer either picks the strongest or merges across branches.
Stack is parallel Advanced RAG plus a comparator and synthesizer. Cost scales linearly with branch count.
Mid-tier LLM generates branches. Structured output: N labeled interpretations.
Each branch is a full Advanced RAG call. Parallel where the API supports it.
Final stage gets the high-tier model because it must merge across draft candidates faithfully.
N branches at parallel cost. Hard cap on N (typically 3-5) and on total per-query tokens.
Branched RAG deploys as an add-on to Advanced RAG. The hard part is calibrating which queries benefit from branching vs which just pay extra cost.
Identify which queries on the eval set actually benefit from multiple interpretations. Most do not; the ones that do are typically open-ended or ambiguous.
Calibrated to produce distinct (not paraphrased) interpretations. Structured output validates branch distinctness.
Per-branch retrieval runs in parallel where the LLM API supports concurrent calls. Sequential where it does not; latency reflects accordingly.
Final stage merges drafts. The harder calibration: when to pick the best draft vs when to merge.
A classifier decides per query whether branching is worth the cost. Most queries skip the branched path.
Branched RAG lifts completeness on open-ended questions; trades cost for coverage. Hard to benchmark cleanly because most public benchmarks have a single ground truth.
Branched RAG eval needs a 'comprehensiveness' grading axis in addition to standard faithfulness. We grade whether the synthesized answer covers the angles that branching surfaced, not just whether each individual fact is faithful.
Branched RAG is an emerging-to-mature production pattern. Less standardised than Advanced or Agentic; teams ship their own variants tuned to their workload.
The practitioner consensus is that branching pays off on a small subset of queries and overpays on the rest. The win is the routing: a classifier that decides per query whether to branch. Builds that branch every query waste cost; builds that branch when the eval set says to are competitive.
We reach for Branched RAG on comprehensive briefing and open-ended research tasks where the cost is acceptable. We do not reach for it on cost-sensitive interactive workloads.
Branched RAG is one of those patterns that earns the build in narrow shapes. Most production traffic does not benefit. The shape that does benefit (research, briefing, comprehensive technical answers) usually has higher cost tolerance, which makes the trade workable.
Agentic RAG when the branching is across sources rather than across interpretations. Plain Advanced RAG with query rewriting when the queries are not actually multi-faceted.
Same family of multi-step patterns; agentic is plan-then-execute, branched is parallel-explore.
Read playbookRelated patternSequential multi-step where branched is parallel multi-step.
Read playbookRelated patternRouting rather than branching; relevant when branching is overkill.
Read playbook