Kensink Labs
Vector databasesDirect LLM · no frameworkProduction grade
RAG · VECTOR DATABASES · 2026 COMPARISON

The seven vector databases. Honest 2026 comparison and our default.

pgvector, Qdrant, Milvus, Weaviate, Vespa, LanceDB, Pinecone. The selection matrix you actually need, written by a lab that ships RAG on top of these systems. We default to pgvector and name the four signals that tell us to leave it.

pgvectorPostgreSQLQdrantMilvusWeaviateVespa
DBs compared
pgvector, Qdrant, Milvus, Weaviate, Vespa, LanceDB, Pinecone
Our default
pgvector (HNSW)
Scale ceiling
Per-DB, named below
Discipline
Eval-driven, not benchmark-driven

Seven vector databases, five dimensions.

The dimensions that change the build decision: where the DB sits in scale, whether hybrid is native, what the ops model looks like, published p99 at 10M vectors, and our take. Our default is highlighted.

Database
Scale sweet spot
Hybrid native
vector + lexical, fused
Ops model
p95 @ 10M
2026 published
Our take
pgvector
Postgres extension · open source
Up to ~10M vectors comfortably; ~50M with careYes (BM25 via ts_vector + RRF in app)One Postgres. Your existing backups, your existing IDP.Comparable to dedicated DBs at <1M with HNSWDefault. Schemas, embeddings, metadata, and access control in one DB. We move off only when scale or workload genuinely demands.
Qdrant
Rust · open source · managed available
10M – 1B vectors, fast filtering on payloadsYes (native, v1.9+)Self-host or Qdrant Cloud. Kubernetes operator.~12ms p99 at 10M — fastest open source in published benchmarksFirst choice when leaving pgvector. Strong payload filtering, low memory floor, hybrid is first-class.
Milvus
C++/Go · open source · GPU-accelerated
100M – 10B+. The billion-scale option.Yes (native, 2.5+)Heavier ops surface; complex topology (coords, datanodes, querynodes).~18ms p99 at 10M; built for the billion-scale tierPick at billion-scale. Otherwise the operational tax doesn't pay back vs Qdrant.
Weaviate
Go · open source · GraphQL API
Up to ~50M comfortably; memory-hungry past thatYes (native)Easy single-binary or Kubernetes. Good docs, graph-aware schema.~16ms p99 at 10MStrong default when the data model is genuinely graph-shaped and BM25 + dense fusion is required out of the box.
Vespa
Java · open source · Yahoo-scale
Billions of vectors with native hybrid + structured filteringYes (built around it; ColBERT, late interaction)Steeper learning curve; YQL configuration model.Tuned for sub-100ms across hybrid + rerank at large scaleReal answer for hyperscale search + ranking + vector in one engine. Pick when search relevance matters as much as retrieval.
LanceDB
Rust · embedded · columnar (Lance format)
Embedded use cases; up to tens of millionsYes (native, 2026)Embedded or serverless. No daemon. Tight Python integration.Fast on local / SSD-attached workloadsUnderrated for AI agents and edge-deployed RAG. When the index travels with the app, this is the answer.
Pinecone
Closed source · managed only
Up to billions on the higher tiersYes (native)Fully managed. No infra to run. Vendor lock-in.Serverless tier trades latency for conveniencePick when the customer wants zero ops AND has signed off on the lock-in. Otherwise Qdrant Cloud or pgvector is the same outcome without the licence cost.

Latency numbers from published 2026 benchmarks; your distance from the metal and your filter complexity will move them. Always validate on your workload before committing.

[THE DECISION]

The four-step selection.

Read the row that matches your corpus. The default is the starting point; the others earn the build when a real signal tells us they do.

01

Default — < ~5M vectors

Postgres + pgvector + HNSW. Hybrid via ts_vector. Reciprocal-rank fusion in the app. One DB.

02

Scale up — 5M to ~500M

Qdrant (self-host or Cloud). Hybrid native, fast payload filters, lower memory floor than Weaviate. Postgres still owns the source of truth + metadata.

03

Billion-scale — 500M to multi-B

Milvus for pure vector workloads at GPU-accelerated scale. Vespa when hybrid search + reranking + structured filtering all need to run in one engine.

04

Edge / embedded

LanceDB when the index has to travel with the app (offline agents, on-device search, mobile RAG).

[WHAT YOU GET]

What you get from a Kensink RAG VDB build.

Right-sized
DB picked to your real corpus
HNSW
Tuned to your recall/latency target
Hybrid
Vector + BM25 with RRF
Portable
Migration path priced before commit
[COMMON QUESTIONS]

What buyers ask before they sign.

Why is pgvector your default when Pinecone is more popular?
Three reasons. First, pgvector with HNSW matches or beats dedicated vector databases up to ~1M vectors on equivalent compute (Supabase 2026 benchmarks). Second, your Postgres already has your text, your metadata, your access control, and your backups; one database is fewer failure modes than two. Third, Pinecone is closed source and vendor-locked. The day you outgrow it (or its pricing), the migration is non-trivial. pgvector is portable to any of the dedicated DBs below when you actually need them.
When do you leave pgvector?
Three signals: (1) corpus crosses ~10M vectors with HNSW and recall starts to slide, (2) query throughput exceeds what one Postgres can serve without sharding, or (3) the workload needs first-class hybrid search at sub-100ms p95. Qdrant is almost always the next step. Milvus, Vespa, and Weaviate enter the conversation when scale or workload-shape demands.
Qdrant vs Weaviate?
Qdrant is faster (~12ms p99 vs ~16ms at 10M in 2026 benchmarks), uses less memory, and has stronger payload filtering. Weaviate has a better story for graph-shaped data and a richer GraphQL API. We default to Qdrant for general RAG; pick Weaviate when the data is genuinely graph-native and the team wants the GraphQL ergonomics.
Should we use Vespa instead of building hybrid + rerank ourselves?
Yes, if you have hyperscale search workloads where vector retrieval, BM25, and learned reranking all need to run in one engine at sub-100ms across billions of documents. The Yahoo / Spotify / Bing-class shape. For most production RAG (< 100M vectors), assembling pgvector + Cohere Rerank in your app is simpler.
What about MongoDB Atlas Vector Search, Elasticsearch dense_vector, Redis vector?
Each is a good answer if you're already running that stack. MongoDB Atlas Vector is reasonable when Mongo is the source of truth. Elasticsearch dense_vector is reasonable when ES already serves your search. Redis vector is fine for small caches. We don't reach for them as new deployments because each has scale / operational ceilings the dedicated VDBs don't.
DIRECT RAG · APPLIED K

Bring the corpus. We will pick the DB.

We will run your real query distribution against the candidate DBs, name the trade, and ship the build on the one that wins for your data — not for the leaderboard.