★ Vector databasesDirect LLM · no frameworkProduction grade

RAG · VECTOR DATABASES · 2026 COMPARISON

The seven vector databases. Honest 2026 comparison and our default.

pgvector, Qdrant, Milvus, Weaviate, Vespa, LanceDB, Pinecone. The selection matrix you actually need, written by a lab that ships RAG on top of these systems. We default to pgvector and name the four signals that tell us to leave it.

pgvectorPostgreSQLQdrantMilvusWeaviateVespa

Start a conversation →All RAG topics →

DBs compared

pgvector, Qdrant, Milvus, Weaviate, Vespa, LanceDB, Pinecone

Our default

pgvector (HNSW)

Scale ceiling

Per-DB, named below

Discipline

Eval-driven, not benchmark-driven

Seven vector databases, five dimensions.

The dimensions that change the build decision: where the DB sits in scale, whether hybrid is native, what the ops model looks like, published p99 at 10M vectors, and our take. Our default is highlighted.

Database	Scale sweet spot	Hybrid native vector + lexical, fused	Ops model	p95 @ 10M 2026 published	Our take
pgvector Postgres extension · open source	Up to ~10M vectors comfortably; ~50M with care	Yes (BM25 via ts_vector + RRF in app)	One Postgres. Your existing backups, your existing IDP.	Comparable to dedicated DBs at <1M with HNSW	Default. Schemas, embeddings, metadata, and access control in one DB. We move off only when scale or workload genuinely demands.
Qdrant Rust · open source · managed available	10M to 1B vectors, fast filtering on payloads	Yes (native, v1.9+)	Self-host or Qdrant Cloud. Kubernetes operator.	~12ms p99 at 10M, fastest open source in published benchmarks	First choice when leaving pgvector. Strong payload filtering, low memory floor, hybrid is first-class.
Milvus C++/Go · open source · GPU-accelerated	100M to 10B+. The billion-scale option.	Yes (native, 2.5+)	Heavier ops surface; complex topology (coords, datanodes, querynodes).	~18ms p99 at 10M; built for the billion-scale tier	Pick at billion-scale. Otherwise the operational tax doesn't pay back vs Qdrant.
Weaviate Go · open source · GraphQL API	Up to ~50M comfortably; memory-hungry past that	Yes (native)	Easy single-binary or Kubernetes. Good docs, graph-aware schema.	~16ms p99 at 10M	Strong default when the data model is genuinely graph-shaped and BM25 + dense fusion is required out of the box.
Vespa Java · open source · Yahoo-scale	Billions of vectors with native hybrid + structured filtering	Yes (built around it; ColBERT, late interaction)	Steeper learning curve; YQL configuration model.	Tuned for sub-100ms across hybrid + rerank at large scale	Real answer for hyperscale search + ranking + vector in one engine. Pick when search relevance matters as much as retrieval.
LanceDB Rust · embedded · columnar (Lance format)	Embedded use cases; up to tens of millions	Yes (native, 2026)	Embedded or serverless. No daemon. Tight Python integration.	Fast on local / SSD-attached workloads	Underrated for AI agents and edge-deployed RAG. When the index travels with the app, this is the answer.
Pinecone Closed source · managed only	Up to billions on the higher tiers	Yes (native)	Fully managed. No infra to run. Vendor lock-in.	Serverless tier trades latency for convenience	Pick when the customer wants zero ops AND has signed off on the lock-in. Otherwise Qdrant Cloud or pgvector is the same outcome without the licence cost.

Latency numbers from published 2026 benchmarks; your distance from the metal and your filter complexity will move them. Always validate on your workload before committing.

[THE DECISION]

The four-step selection.

Read the row that matches your corpus. The default is the starting point; the others earn the build when a real signal tells us they do.

Default: < ~5M vectors

Postgres + pgvector + HNSW. Hybrid via ts_vector. Reciprocal-rank fusion in the app. One DB.

Scale up: 5M to ~500M

Qdrant (self-host or Cloud). Hybrid native, fast payload filters, lower memory floor than Weaviate. Postgres still owns the source of truth + metadata.

Billion-scale: 500M to multi-B

Milvus for pure vector workloads at GPU-accelerated scale. Vespa when hybrid search + reranking + structured filtering all need to run in one engine.

Edge / embedded

LanceDB when the index has to travel with the app (offline agents, on-device search, mobile RAG).

[WHAT YOU GET]

What you get from a Kensink RAG VDB build.

Right-sized

DB picked to your real corpus

HNSW

Tuned to your recall/latency target

Hybrid

Vector + BM25 with RRF

Portable

Migration path priced before commit

[COMMON QUESTIONS]

What buyers ask before they sign.

Why is pgvector your default when Pinecone is more popular?: Three reasons. First, pgvector with HNSW matches or beats dedicated vector databases up to ~1M vectors on equivalent compute (Supabase 2026 benchmarks). Second, your Postgres already has your text, your metadata, your access control, and your backups; one database is fewer failure modes than two. Third, Pinecone is closed source and vendor-locked. The day you outgrow it (or its pricing), the migration is non-trivial. pgvector is portable to any of the dedicated DBs below when you actually need them.
When do you leave pgvector?: Three signals: (1) corpus crosses ~10M vectors with HNSW and recall starts to slide, (2) query throughput exceeds what one Postgres can serve without sharding, or (3) the workload needs first-class hybrid search at sub-100ms p95. Qdrant is almost always the next step. Milvus, Vespa, and Weaviate enter the conversation when scale or workload-shape demands.
Qdrant vs Weaviate?: Qdrant is faster (~12ms p99 vs ~16ms at 10M in 2026 benchmarks), uses less memory, and has stronger payload filtering. Weaviate has a better story for graph-shaped data and a richer GraphQL API. We default to Qdrant for general RAG; pick Weaviate when the data is genuinely graph-native and the team wants the GraphQL ergonomics.
Should we use Vespa instead of building hybrid + rerank ourselves?: Yes, if you have hyperscale search workloads where vector retrieval, BM25, and learned reranking all need to run in one engine at sub-100ms across billions of documents. The Yahoo / Spotify / Bing-class shape. For most production RAG (< 100M vectors), assembling pgvector + Cohere Rerank in your app is simpler.
What about MongoDB Atlas Vector Search, Elasticsearch dense_vector, Redis vector?: Each is a good answer if you're already running that stack. MongoDB Atlas Vector is reasonable when Mongo is the source of truth. Elasticsearch dense_vector is reasonable when ES already serves your search. Redis vector is fine for small caches. We don't reach for them as new deployments because each has scale / operational ceilings the dedicated VDBs don't.

[RELATED RAG TOPICS]

Worth a look next.

01 · RAG

Bring the corpus. We will pick the DB.

We will run your real query distribution against the candidate DBs, name the trade, and ship the build on the one that wins for your data. Not the one that wins the leaderboard.

Start a conversation →All RAG topics