Kensink Labs
Adaptive RAGSpecialised patternEval-gated
ADAPTIVE RAG · ROUTE BY QUERY TYPE

Adaptive RAG. Route each query to the cheapest pattern that handles it.

A classifier reads the incoming query and routes it: simple to a no-retrieval LLM call, mid to Advanced RAG, complex to Agentic. Cuts cost on the easy queries; spends the budget where the query genuinely needs it. The 2026 cost-optimisation pattern.

Claude HaikuClassifierEval pipelines
Best for
Heterogeneous query distributions
Stack
Classifier + multiple RAG modes
Cost
Average per-query goes down
Risk
Classifier miscalibration
[AT A GLANCE]

Best for: Production workloads with a wide query distribution. Customer support over a mature product where 60% of queries are simple FAQ and 40% are complex troubleshooting. Saves budget on the easy half.

Origin
Jeong et al., Adaptive-RAG (2024)
Year
2024-2026
Complexity
Complex
Production stage
Mature
[THE PIPELINE]

Classify, route, run the chosen mode.

Incoming query goes through a classifier that picks a route: no-retrieval (the LLM already knows), Advanced RAG (standard retrieval), or Agentic RAG (complex / multi-source). The pipeline that runs depends on the route.

Query
Classifier
Route A: no retrieval
Route B: Advanced RAG
Route C: Agentic RAG
Answer
01

Query classifier

Lightweight classifier (LLM or fine-tuned model) reads the query and picks a mode. Calibrated against the eval set.

02

Route to the matching mode

Each mode is its own pipeline: no-retrieval, Advanced RAG, or Agentic RAG. The classifier picks; the pipeline runs.

03

Generate the answer

Each mode generates an answer with its own discipline (citations required from Advanced and Agentic; cautious refusal at no-retrieval if the query is non-trivial).

[TECHNICAL STACK]

What we'd actually deploy.

Stack is multiple RAG modes plus a routing classifier. The work is in calibrating the classifier; the modes are off-the-shelf shapes.

CLASSIFIER

Claude Haiku or fine-tuned classifier

Lightweight call rates query complexity. Structured output: route choice plus confidence.

ROUTE: NO RETRIEVAL

Direct LLM call

Used for queries the model can answer from its training (well-known facts, simple definitions). Calibrated conservatively.

ROUTE: ADVANCED RAG

Standard hybrid + rerank pipeline

The middle route. Most production traffic ends up here.

ROUTE: AGENTIC RAG

Multi-step / multi-source pipeline

Highest-cost route. Reserved for queries that genuinely need it.

ROUTING METRICS

Per-route eval gating

Each route's quality is gated independently. Routing misclassifications are caught in the eval set.

[HOW WE DEPLOY]

Day one to live traffic.

Adaptive RAG deploys after the underlying modes are in place. The work is in classifier calibration and per-route quality gating.

  1. 01

    Ship the underlying modes

    No-retrieval, Advanced RAG, Agentic RAG (or whichever subset applies). Eval each independently.

  2. 02

    Label a routing training set

    Hand-label representative queries with their correct route. This is the source of truth for classifier calibration.

  3. 03

    Train or prompt the classifier

    Lightweight LLM call or fine-tuned classifier. We default to a prompt-based classifier with a held-out eval set.

  4. 04

    Per-route eval gating

    Each route gated separately. Routing misclassifications are caught when the wrong route is taken and quality drops.

  5. 05

    Production routing metrics

    Distribution of routes watched in production. Drift on routing percentages triggers re-evaluation of the classifier.

[ACCURACY + BENCHMARKS]

What the numbers say.

Adaptive RAG does not change the ceiling of accuracy; it changes the cost per query. The win is operational: same average quality, lower average cost.

-40-60%
Average cost per query
Equal
Accuracy if routing is right
Variable
Latency by route
Critical
Classifier calibration
Our eval methodology

Adaptive RAG eval grades the routing decision (precision per route) separately from the per-route answer quality. Both matter; the routing decision is the new thing the eval must measure.

[COMMUNITY FEEDBACK]

What practitioners report.

Adaptive RAG became a standard cost-optimisation pattern in 2025-2026 as agentic RAG costs got large enough that teams started looking for ways to skip it on easy queries.

The practitioner consensus is that routing is the right cost lever once a build has heterogeneous query traffic. Teams report cost savings of 40-60% on average without quality regression. The risk is classifier miscalibration; the mitigation is conservative routing (default to a more expensive route when in doubt) plus per-route eval gating.

[COMMON PITFALLS]
  • Classifier miscalibration. Sends easy queries to expensive routes (waste) or hard queries to cheap routes (quality drop).
  • No per-route eval. Aggregate quality looks fine while one route is silently bad.
  • Routing on superficial query features. A long query is not necessarily a hard query.
  • No conservative defaults. When the classifier is unsure, route to the more expensive option, not the cheaper one.
[KENSINK LABS EVALUATION]

Our honest take.

We reach for Adaptive RAG on production workloads where the query distribution is wide enough that running every query through the same pipeline is wasteful.

We have shipped Adaptive RAG on customer-support workloads where 60% of queries were known-FAQ and only 40% needed real retrieval. Routing the easy 60% to a no-retrieval Claude Haiku call cut average cost by half. The classifier was the work; the modes were straightforward.

[WHEN WE REACH FOR IT]
  • Customer support workloads with a heavy FAQ tail.
  • Internal assistants serving diverse query types.
  • Cost-sensitive production builds with strong eval discipline.
What we'd substitute

Plain Advanced RAG when the query distribution is homogeneous (every query needs retrieval). Speculative RAG when the latency optimisation matters more than the cost optimisation.

[COMMON QUESTIONS]

What buyers ask before they sign.

How does the classifier work?
Lightweight LLM call (Claude Haiku or similar) reads the query and outputs a structured route choice. Calibrated against a hand-labeled training set; per-route quality gated in CI.
What if the classifier picks wrong?
Per-route eval gating catches it. Conservative defaults (when in doubt, the more expensive route) limit the downside. We track routing precision per route in production and re-train when it drifts.
Adaptive RAG vs Agentic RAG?
Adaptive picks a route per query; Agentic plans across sources within a route. They compose: adaptive can route to an agentic mode for complex queries while skipping it for easy ones.
DIRECT RAG · APPLIED K

Bring the corpus. We'll bring the build.

Senior engineers, eval suite at handoff, full source ownership. We integrate against the model and the index the same way we integrate against Postgres. Sized to the work in front of you.