Query classifier
Lightweight classifier (LLM or fine-tuned model) reads the query and picks a mode. Calibrated against the eval set.
A classifier reads the incoming query and routes it: simple to a no-retrieval LLM call, mid to Advanced RAG, complex to Agentic. Cuts cost on the easy queries; spends the budget where the query genuinely needs it. The 2026 cost-optimisation pattern.
Best for: Production workloads with a wide query distribution. Customer support over a mature product where 60% of queries are simple FAQ and 40% are complex troubleshooting. Saves budget on the easy half.
Incoming query goes through a classifier that picks a route: no-retrieval (the LLM already knows), Advanced RAG (standard retrieval), or Agentic RAG (complex / multi-source). The pipeline that runs depends on the route.
Lightweight classifier (LLM or fine-tuned model) reads the query and picks a mode. Calibrated against the eval set.
Each mode is its own pipeline: no-retrieval, Advanced RAG, or Agentic RAG. The classifier picks; the pipeline runs.
Each mode generates an answer with its own discipline (citations required from Advanced and Agentic; cautious refusal at no-retrieval if the query is non-trivial).
Stack is multiple RAG modes plus a routing classifier. The work is in calibrating the classifier; the modes are off-the-shelf shapes.
Lightweight call rates query complexity. Structured output: route choice plus confidence.
Used for queries the model can answer from its training (well-known facts, simple definitions). Calibrated conservatively.
The middle route. Most production traffic ends up here.
Highest-cost route. Reserved for queries that genuinely need it.
Each route's quality is gated independently. Routing misclassifications are caught in the eval set.
Adaptive RAG deploys after the underlying modes are in place. The work is in classifier calibration and per-route quality gating.
No-retrieval, Advanced RAG, Agentic RAG (or whichever subset applies). Eval each independently.
Hand-label representative queries with their correct route. This is the source of truth for classifier calibration.
Lightweight LLM call or fine-tuned classifier. We default to a prompt-based classifier with a held-out eval set.
Each route gated separately. Routing misclassifications are caught when the wrong route is taken and quality drops.
Distribution of routes watched in production. Drift on routing percentages triggers re-evaluation of the classifier.
Adaptive RAG does not change the ceiling of accuracy; it changes the cost per query. The win is operational: same average quality, lower average cost.
Adaptive RAG eval grades the routing decision (precision per route) separately from the per-route answer quality. Both matter; the routing decision is the new thing the eval must measure.
Adaptive RAG became a standard cost-optimisation pattern in 2025-2026 as agentic RAG costs got large enough that teams started looking for ways to skip it on easy queries.
The practitioner consensus is that routing is the right cost lever once a build has heterogeneous query traffic. Teams report cost savings of 40-60% on average without quality regression. The risk is classifier miscalibration; the mitigation is conservative routing (default to a more expensive route when in doubt) plus per-route eval gating.
We reach for Adaptive RAG on production workloads where the query distribution is wide enough that running every query through the same pipeline is wasteful.
We have shipped Adaptive RAG on customer-support workloads where 60% of queries were known-FAQ and only 40% needed real retrieval. Routing the easy 60% to a no-retrieval Claude Haiku call cut average cost by half. The classifier was the work; the modes were straightforward.
Plain Advanced RAG when the query distribution is homogeneous (every query needs retrieval). Speculative RAG when the latency optimisation matters more than the cost optimisation.
Cousin pattern: speculative is latency-optimised, adaptive is cost-optimised.
Read playbookRelated patternOne of the routes adaptive picks from for complex queries.
Read playbookRelated patternThe standard middle route in most adaptive deployments.
Read playbook