How does the classifier work?

Lightweight LLM call (Claude Haiku or similar) reads the query and outputs a structured route choice. Calibrated against a hand-labeled training set; per-route quality gated in CI.

What if the classifier picks wrong?

Per-route eval gating catches it. Conservative defaults (when in doubt, the more expensive route) limit the downside. We track routing precision per route in production and re-train when it drifts.

Adaptive RAG vs Agentic RAG?

Adaptive picks a route per query; Agentic plans across sources within a route. They compose: adaptive can route to an agentic mode for complex queries while skipping it for easy ones.

★ Adaptive RAGSpecialised patternEval-gated

ADAPTIVE RAG · ROUTE BY QUERY TYPE

Adaptive RAG. Route each query to the cheapest pattern that handles it.

A classifier reads the incoming query and routes it: simple to a no-retrieval LLM call, mid to Advanced RAG, complex to Agentic. Cuts cost on the easy queries; spends the budget where the query genuinely needs it. The 2026 cost-optimisation pattern.

Claude HaikuClassifierEval pipelines

Start a conversation →All architectures →

Best for

Heterogeneous query distributions

Stack

Classifier + multiple RAG modes

Cost

Average per-query goes down

Risk

Classifier miscalibration

[AT A GLANCE]

Best for: Production workloads with a wide query distribution. Customer support over a mature product where 60% of queries are simple FAQ and 40% are complex troubleshooting. Saves budget on the easy half.

Origin

Jeong et al., Adaptive-RAG (2024)

Year

2024-2026

Complexity

Complex

Production stage

Mature

[THE PIPELINE]

Classify, route, run the chosen mode.

Incoming query goes through a classifier that picks a route: no-retrieval (the LLM already knows), Advanced RAG (standard retrieval), or Agentic RAG (complex / multi-source). The pipeline that runs depends on the route.

Query classifier

Lightweight classifier (LLM or fine-tuned model) reads the query and picks a mode. Calibrated against the eval set.

Route to the matching mode

Each mode is its own pipeline: no-retrieval, Advanced RAG, or Agentic RAG. The classifier picks; the pipeline runs.

Generate the answer

Each mode generates an answer with its own discipline (citations required from Advanced and Agentic; cautious refusal at no-retrieval if the query is non-trivial).

[TECHNICAL STACK]

What we'd actually deploy.

Stack is multiple RAG modes plus a routing classifier. The work is in calibrating the classifier; the modes are off-the-shelf shapes.

CLASSIFIER

Claude Haiku or fine-tuned classifier

Lightweight call rates query complexity. Structured output: route choice plus confidence.

ROUTE: NO RETRIEVAL

Direct LLM call

Used for queries the model can answer from its training (well-known facts, simple definitions). Calibrated conservatively.

ROUTE: ADVANCED RAG

Standard hybrid + rerank pipeline

The middle route. Most production traffic ends up here.

ROUTE: AGENTIC RAG

Multi-step / multi-source pipeline

Highest-cost route. Reserved for queries that genuinely need it.

ROUTING METRICS

Per-route eval gating

Each route's quality is gated independently. Routing misclassifications are caught in the eval set.

[HOW WE DEPLOY]

Day one to live traffic.

Adaptive RAG deploys after the underlying modes are in place. The work is in classifier calibration and per-route quality gating.

01
Ship the underlying modes
No-retrieval, Advanced RAG, Agentic RAG (or whichever subset applies). Eval each independently.
02
Label a routing training set
Hand-label representative queries with their correct route. This is the source of truth for classifier calibration.
03
Train or prompt the classifier
Lightweight LLM call or fine-tuned classifier. We default to a prompt-based classifier with a held-out eval set.
04
Per-route eval gating
Each route gated separately. Routing misclassifications are caught when the wrong route is taken and quality drops.
05
Production routing metrics
Distribution of routes watched in production. Drift on routing percentages triggers re-evaluation of the classifier.

[ACCURACY + BENCHMARKS]

What the numbers say.

Adaptive RAG does not change the ceiling of accuracy; it changes the cost per query. The win is operational: same average quality, lower average cost.

-40-60%

Average cost per query

Equal

Accuracy if routing is right

Variable

Latency by route

Critical

Classifier calibration

Our eval methodology

Adaptive RAG eval grades the routing decision (precision per route) separately from the per-route answer quality. Both matter; the routing decision is the new thing the eval must measure.

[COMMUNITY FEEDBACK]

What practitioners report.

Adaptive RAG became a standard cost-optimisation pattern in 2025-2026 as agentic RAG costs got large enough that teams started looking for ways to skip it on easy queries.

The practitioner consensus is that routing is the right cost lever once a build has heterogeneous query traffic. Teams report cost savings of 40-60% on average without quality regression. The risk is classifier miscalibration; the mitigation is conservative routing (default to a more expensive route when in doubt) plus per-route eval gating.

[COMMON PITFALLS]

Classifier miscalibration. Sends easy queries to expensive routes (waste) or hard queries to cheap routes (quality drop).
No per-route eval. Aggregate quality looks fine while one route is silently bad.
Routing on superficial query features. A long query is not necessarily a hard query.
No conservative defaults. When the classifier is unsure, route to the more expensive option, not the cheaper one.

[KENSINK LABS EVALUATION]

Our honest take.

We reach for Adaptive RAG on production workloads where the query distribution is wide enough that running every query through the same pipeline is wasteful.

We have shipped Adaptive RAG on customer-support workloads where 60% of queries were known-FAQ and only 40% needed real retrieval. Routing the easy 60% to a no-retrieval Claude Haiku call cut average cost by half. The classifier was the work; the modes were straightforward.

[WHEN WE REACH FOR IT]

Customer support workloads with a heavy FAQ tail.
Internal assistants serving diverse query types.
Cost-sensitive production builds with strong eval discipline.

What we'd substitute

Plain Advanced RAG when the query distribution is homogeneous (every query needs retrieval). Speculative RAG when the latency optimisation matters more than the cost optimisation.

[RELATED PATTERNS]

Worth a look next.

Related pattern

[COMMON QUESTIONS]

What buyers ask before they sign.

How does the classifier work?: Lightweight LLM call (Claude Haiku or similar) reads the query and outputs a structured route choice. Calibrated against a hand-labeled training set; per-route quality gated in CI.
What if the classifier picks wrong?: Per-route eval gating catches it. Conservative defaults (when in doubt, the more expensive route) limit the downside. We track routing precision per route in production and re-train when it drifts.
Adaptive RAG vs Agentic RAG?: Adaptive picks a route per query; Agentic plans across sources within a route. They compose: adaptive can route to an agentic mode for complex queries while skipping it for easy ones.

DIRECT RAG · APPLIED K

Bring the corpus. We'll bring the build.

Senior engineers, eval suite at handoff, full source ownership. We integrate against the model and the index the same way we integrate against Postgres. Sized to the work in front of you.

Start a conversation →All RAG topics

Adaptive RAG. Route each query to the cheapest pattern that handles it.

Classify, route, run the chosen mode.

Query classifier

Route to the matching mode

Generate the answer

What we'd actually deploy.

Claude Haiku or fine-tuned classifier

Direct LLM call

Standard hybrid + rerank pipeline

Multi-step / multi-source pipeline

Per-route eval gating

Day one to live traffic.

Ship the underlying modes

Label a routing training set

Train or prompt the classifier

Per-route eval gating

Production routing metrics

What the numbers say.

What practitioners report.

Our honest take.

Worth a look next.

Speculative RAG

Agentic RAG

Advanced RAG

What buyers ask before they sign.

Bring the corpus. We'll bring the build.