Base retrieval
Run Advanced RAG over the primary corpus. Returns top-K passages plus their fusion + rerank scores.
Yan et al. (2024) introduced an explicit retrieval-evaluation step: after the initial retrieval, a lightweight evaluator rates the retrieved documents. Strong results go to generation; weak results trigger a fallback (web search or query rewrite). The corrective loop catches bad retrievals before they bleed through to bad answers.
Best for: Knowledge bases with a long tail of queries the corpus does not cover well. Customer support over an evolving product, news, legal research, academic literature search. Anywhere a confident 'I do not know' is worse than reaching for web evidence.
CRAG runs base retrieval, then a lightweight evaluator scores the result. Strong results pass through to generation. Weak results trigger a corrective branch: query rewrite, web search, or escalation to a richer retrieval mode. The corrective branch's evidence is decomposed, refined, and merged with the original (if any).
Run Advanced RAG over the primary corpus. Returns top-K passages plus their fusion + rerank scores.
Lightweight model rates the result on confidence (strong / mediocre / weak). T5-base fine-tuned is the original; prompt-only mid-tier LLMs work in production.
Pass through to generation with the original retrieved passages. No fallback work, no extra cost.
Trigger Tavily, Brave Search, or similar API. Retrieved web snippets go through decomposition (extract claims) and refinement (filter to query-relevant) before merging with any kept original passages.
Final answer prompt sees the union of strong original passages and refined web evidence. Citation discipline names each source.
CRAG stack is Advanced RAG plus a retrieval evaluator plus a web search API. The evaluator is the operational lever; the web search adds external evidence but also external cost and review complexity.
CRAG sits on top of normal retrieval. The corrective branch only fires when base retrieval is genuinely weak.
Cheap classifier-style call rates retrieval strength. Structured output: strong / mediocre / weak with rationale.
Managed web search APIs designed for LLM consumption. We pick per buyer constraint (data residency, content licensing, cost).
Web snippets get decomposed (extract claim-level statements) and refined (filter to query-relevant). Keeps the noise of raw web search from poisoning the generation prompt.
Citation-required prompt that handles both internal and external evidence. Internal evidence cited to chunk; external to URL.
Common weak queries get cached web results to amortise the cost of the corrective branch. Cache TTL aligned to corpus freshness needs.
CRAG deploys in two phases: get the retrieval evaluator calibrated against the eval set, then plug in the corrective branch. The first phase is harder than it sounds; the second is mostly integration work.
Ship the Advanced RAG pipeline first. The eval set tells you which queries actually need a corrective branch vs. which are just naturally weak.
Lightweight evaluator (LLM call or fine-tuned classifier) rates retrieval strength. Calibrate on the eval set: target precision so we do not trigger fallback on retrievals that were actually fine.
Tavily / Brave / Bing API. Rate limits, cost controls, content-licensing review handled at integration time.
LLM-based filter on web snippets. Extracts claim-level evidence, drops off-topic noise. Eval-gated against a held-out web-fallback set.
Common weak queries cached. Per-query cost budget. Circuit breaker on the corrective branch if the daily web search budget exhausts.
Eval set explicitly includes queries that should trigger fallback. We grade both the evaluator's decision and the final answer quality with the corrective branch enabled.
CRAG is most useful where the eval set has a long-tail of queries the corpus does not cover well. The headline metric is fallback precision (did the evaluator pick the right cases?) plus final answer quality on those cases.
CRAG eval grades the evaluator's decision (precision and recall on 'should fall back'), the corrective branch quality (web evidence refinement), and the final answer (faithfulness, completeness). We split metrics so a regression in any one component is locatable without rebuilding the whole pipeline.
CRAG is becoming a standard add-on to Advanced RAG in 2026 production. The shape is small enough to add to an existing build without restructuring; the win is concrete; the cost on the strong-retrieval path is near zero.
Practitioners report the same trade we have seen: most queries do not fire the corrective branch, so the average cost addition is small. The queries that do fire get a meaningful answer-quality lift. The hard work is calibrating the evaluator so we do not over-trigger (paying for web search on queries the corpus actually answered fine) or under-trigger (missing weak retrievals that should have fallen back).
We reach for CRAG when the corpus has a known long-tail (time-sensitive content, evolving product docs, regulatory updates) and the buyer accepts the web-search dependency. The pattern is cheap on most queries and lifts answer quality on the long tail.
We have shipped CRAG on customer-support builds where the product docs lag the product. The corrective branch catches the newly-shipped feature questions before the docs are updated, with web evidence (often the company's own blog or release notes) cited in the answer. That bridges the docs-lag gap operationally without forcing the docs team to publish faster.
Self-RAG when the failure mode is bad generation more than bad retrieval. Plain Advanced RAG when the long tail is not material to the workload.
Sibling quality-check loop; CRAG focuses on retrieval, Self-RAG on the answer.
Read playbookRelated patternCRAG sits on top of Advanced; the base retrieval still needs to be strong.
Read playbookRelated patternAdaptive routes per query; CRAG falls back per query. Related shapes.
Read playbook