Kensink Labs
GraphRAGPrimary pattern · production defaultEval-gated
GRAPHRAG · KNOWLEDGE GRAPH RETRIEVAL

GraphRAG. Retrieval becomes traversal. Reasoning becomes paths.

Microsoft's GraphRAG and its descendants build a knowledge graph over entities and relationships in the corpus. Retrieval is graph traversal; reasoning is path-following. For domains where the answer hops across linked facts, hits 81%+ accuracy in published benchmarks, a 6.8% lift over Advanced RAG on the same tasks.

Neo4jpgvectorClaudeOpenAIEntity linking
Best for
Multi-hop · regulated reasoning
Stack
Graph DB + vector + LLM
Accuracy
81%+ in regulated domains
Build cost
Graph construction is real work
[AT A GLANCE]

Best for: Clinical research, regulatory analysis, legal case files, complex multi-document compliance, fraud investigation, intelligence work. Anything where the answer is a path across linked entities, not a quote from one passage.

Origin
Microsoft Research, GraphRAG (Larson et al., 2024)
Year
2024-2026
Complexity
Very complex
Production stage
Mature
[THE PIPELINE]

Build the graph offline, traverse it online.

GraphRAG splits cleanly into two halves. Offline: extract entities and relationships from the corpus, cluster into communities, summarise each community. Online: route the query to the right community, traverse the graph for relevant context, generate with the path as evidence.

Corpus
Entity extraction (LLM)
Relationship extraction (LLM)
Knowledge graph build
Community detection
Community summaries (LLM)
Query
Community routing
Graph traversal
Synthesize with path
Cited multi-hop answer
01

Entity + relationship extraction

LLM passes over the corpus extracting named entities (people, organisations, drugs, regulations, transactions) and the relationships between them. Output is a typed graph schema.

02

Graph construction + community detection

Entities and edges loaded into a graph store. Hierarchical Leiden or similar clusters the graph into communities at multiple resolutions.

03

Community summarisation

LLM writes a summary per community at each resolution. These summaries become the global retrieval surface; the underlying graph is the local one.

04

Query routing

Incoming query routed to the relevant community summaries via vector search over the summaries themselves. Picks the resolution that matches query specificity.

05

Graph traversal + synthesis

Selected communities expanded to the underlying entities and relationships. Synthesizer LLM writes the answer with explicit path evidence (entity A linked to entity B linked to entity C).

[TECHNICAL STACK]

What we'd actually deploy.

GraphRAG is the most stack-heavy pattern in the list. You operate a graph store, a vector store, and substantial LLM compute for the offline build. The deploy is worth the build cost only in domains where the multi-hop accuracy gain is the product.

GRAPH STORE

Neo4j or pgrouting (Postgres extension)

Neo4j for graph-first workloads; pgrouting for Postgres-first teams who want to keep the graph in the same DB. Both support the community detection and traversal we need.

VECTOR STORE

pgvector for community summaries

Community summaries are the global retrieval surface. They live in pgvector for cheap routing. Underlying entity graph lives in the graph store.

EXTRACTION LLM

Claude Sonnet or GPT-5.5

Mid-tier model for entity and relationship extraction. Run in batch on the corpus, output validated against a typed schema. Cheap per token, large bill across a real corpus.

SYNTHESIS LLM

Claude Opus or GPT-5.5 (high effort)

Final-answer model is the higher tier. Synthesises across the traversal path, maintains citation discipline.

COMMUNITY DETECTION

Hierarchical Leiden

Standard 2026 default. Produces communities at multiple resolutions so query routing can pick specificity per question.

GRAPH SCHEMA

Typed, domain-specific

Entity and relationship types tuned to the domain (Drug, Trial, Adverse Event, Approval for clinical; Party, Contract, Clause, Obligation for legal). Generic schemas rarely earn the build.

[HOW WE DEPLOY]

Day one to live traffic.

GraphRAG deploys as a 12-16 week first build because graph construction over a real corpus is non-trivial. We size it as two phases: extract and validate the graph (slow), then build the retrieval layer (relatively fast once the graph is right).

  1. 01

    Domain schema

    Define the entity and relationship types with domain experts. The schema is the most expensive thing to change later; we get it right before any extraction runs.

  2. 02

    Extraction pipeline

    LLM batch extraction over the corpus, structured output validated against the schema. Failed extractions logged and re-run with refined prompts; the bottom 5% of extractions usually need human review.

  3. 03

    Graph build + validation

    Load into the graph store. Validate against domain expert sample queries before going further. Adjustments to schema or extraction happen here, not later.

  4. 04

    Community detection at multiple resolutions

    Hierarchical Leiden produces clusters from very local to global. Each level becomes a retrieval surface for different query specificity.

  5. 05

    Community summary generation

    LLM summarises each community. Summaries embedded into pgvector. Manual review of a sample to catch bad summaries before production.

  6. 06

    Query router

    Vector search over summaries with resolution-aware ranking. Routes to the level of community detail that matches the question.

  7. 07

    Graph traversal + synthesis

    Expand selected communities, retrieve entities + relationships, synthesize with path-aware prompts. Output cites both passages and traversal paths.

  8. 08

    Eval gating + corpus drift

    Domain-expert golden set, accuracy gated in CI. Corpus updates trigger graph rebuilds (full or incremental) on a schedule the eval set defines.

[ACCURACY + BENCHMARKS]

What the numbers say.

Microsoft's published GraphRAG paper and its open-source implementation report consistent accuracy lifts on multi-hop benchmarks. Independent reproductions confirm the win in domains where multi-hop reasoning is the question.

81%+
Accuracy in regulated domains
Multiple 2025-2026 reports
+6.8%
vs Advanced RAG on multi-hop tasks
GraphRAG benchmark, 2024
Path-evidence
Every claim traceable to a graph path
By construction
Rebuilds
Graph drift on corpus updates
Operational reality
Our eval methodology

We eval GraphRAG with a domain-expert curated golden set of multi-hop questions, where the expected answer cites a traversal path. Path correctness graded separately from final-answer faithfulness; both gate ship. Single-hop questions are kept in the set as a control to make sure we have not regressed on the easy cases.

[COMMUNITY FEEDBACK]

What practitioners report.

GraphRAG went from research to production fast. Microsoft's open-source release set the reference shape; LangChain, LlamaIndex, and Neo4j ship native variants. Gartner lists it in the 2026 top data and analytics trends.

Practitioners report the same trade. The accuracy gain is real on multi-hop and regulated domains, sometimes dramatic. The build cost is also real: graph construction takes time, the LLM bill for extraction over a meaningful corpus is meaningful, and corpus updates require either incremental updates or periodic full rebuilds. Teams that ship GraphRAG successfully treat it as an offline-heavy / online-light split, where the expensive work happens during build and refresh, not at query time.

[COMMON PITFALLS]
  • Generic entity types. A graph of (Entity, RelatedTo, Entity) is not very useful; the schema needs to be domain-specific.
  • Ignoring extraction error rates. Bottom 5% of LLM extractions are usually wrong in interesting ways; they need a review path.
  • Treating the graph as immutable. Corpora change; the graph needs scheduled refresh aligned to the eval set's stability.
  • Using GraphRAG when the eval set does not actually require multi-hop. If single-shot retrieval is winning, GraphRAG will not magically help.
[KENSINK LABS EVALUATION]

Our honest take.

We reach for GraphRAG when the eval set demonstrably requires multi-hop reasoning, the corpus has strong entity structure, and the buyer cares about path-evidence in answers. All three need to be true.

The compounding effect we keep seeing: a domain that has clear entity structure (regulations, clinical, legal) also tends to be one where path-evidence matters in the answer (a compliance auditor wants the chain of regulations, not a paraphrase). That alignment is what makes the build cost pay off. In domains where entities exist but path-evidence does not matter, Advanced RAG with good metadata filters usually wins on cost.

[WHEN WE REACH FOR IT]
  • Clinical research and drug-safety analysis where adverse events link across trials and reports.
  • Regulatory compliance where the answer chain goes regulation to clause to obligation.
  • Multi-document legal case files where the question crosses statutes, prior cases, and exhibits.
  • Intelligence and fraud work where entity relationships are the signal.
What we'd substitute

Advanced RAG with strong metadata filtering when the corpus has entity structure but the queries do not actually require multi-hop. Agentic RAG when the corpus is heterogeneous and the question is which source rather than which path.

[COMMON QUESTIONS]

What buyers ask before they sign.

How big does the corpus need to be for GraphRAG to pay off?
Less about size than structure. We have shipped GraphRAG over corpora of a few thousand documents where entity density was high. We have walked away from GraphRAG on corpora of millions of documents where entity density was low. The eval set tells you.
Can we incrementally update the graph as the corpus changes?
Yes for new documents added to existing communities. No for structural changes (new entity types, renamed relationships, schema migrations). Plan for periodic full rebuilds aligned with major corpus shifts.
Neo4j vs Postgres for the graph?
Neo4j when the team is comfortable operating it and the workload is graph-first. Postgres with pgrouting when the team already runs Postgres and the graph workload is one of many. Both ship GraphRAG-quality results.
How do we explain GraphRAG to a non-technical buyer?
We say: the model gets to follow the chain of facts, not just look at one passage at a time. That maps directly to how a clinician, a regulator, or a lawyer actually reasons about their work.
DIRECT RAG · APPLIED K

Bring the corpus. We'll bring the build.

Senior engineers, eval suite at handoff, full source ownership. We integrate against the model and the index the same way we integrate against Postgres. Sized to the work in front of you.