Default — < ~5M vectors
Postgres + pgvector + HNSW. Hybrid via ts_vector. Reciprocal-rank fusion in the app. One DB.
pgvector, Qdrant, Milvus, Weaviate, Vespa, LanceDB, Pinecone. The selection matrix you actually need, written by a lab that ships RAG on top of these systems. We default to pgvector and name the four signals that tell us to leave it.
The dimensions that change the build decision: where the DB sits in scale, whether hybrid is native, what the ops model looks like, published p99 at 10M vectors, and our take. Our default is highlighted.
| Database | Scale sweet spot | Hybrid native vector + lexical, fused | Ops model | p95 @ 10M 2026 published | Our take |
|---|---|---|---|---|---|
pgvector Postgres extension · open source | Up to ~10M vectors comfortably; ~50M with care | Yes (BM25 via ts_vector + RRF in app) | One Postgres. Your existing backups, your existing IDP. | Comparable to dedicated DBs at <1M with HNSW | Default. Schemas, embeddings, metadata, and access control in one DB. We move off only when scale or workload genuinely demands. |
Qdrant Rust · open source · managed available | 10M – 1B vectors, fast filtering on payloads | Yes (native, v1.9+) | Self-host or Qdrant Cloud. Kubernetes operator. | ~12ms p99 at 10M — fastest open source in published benchmarks | First choice when leaving pgvector. Strong payload filtering, low memory floor, hybrid is first-class. |
Milvus C++/Go · open source · GPU-accelerated | 100M – 10B+. The billion-scale option. | Yes (native, 2.5+) | Heavier ops surface; complex topology (coords, datanodes, querynodes). | ~18ms p99 at 10M; built for the billion-scale tier | Pick at billion-scale. Otherwise the operational tax doesn't pay back vs Qdrant. |
Weaviate Go · open source · GraphQL API | Up to ~50M comfortably; memory-hungry past that | Yes (native) | Easy single-binary or Kubernetes. Good docs, graph-aware schema. | ~16ms p99 at 10M | Strong default when the data model is genuinely graph-shaped and BM25 + dense fusion is required out of the box. |
Vespa Java · open source · Yahoo-scale | Billions of vectors with native hybrid + structured filtering | Yes (built around it; ColBERT, late interaction) | Steeper learning curve; YQL configuration model. | Tuned for sub-100ms across hybrid + rerank at large scale | Real answer for hyperscale search + ranking + vector in one engine. Pick when search relevance matters as much as retrieval. |
LanceDB Rust · embedded · columnar (Lance format) | Embedded use cases; up to tens of millions | Yes (native, 2026) | Embedded or serverless. No daemon. Tight Python integration. | Fast on local / SSD-attached workloads | Underrated for AI agents and edge-deployed RAG. When the index travels with the app, this is the answer. |
Pinecone Closed source · managed only | Up to billions on the higher tiers | Yes (native) | Fully managed. No infra to run. Vendor lock-in. | Serverless tier trades latency for convenience | Pick when the customer wants zero ops AND has signed off on the lock-in. Otherwise Qdrant Cloud or pgvector is the same outcome without the licence cost. |
Latency numbers from published 2026 benchmarks; your distance from the metal and your filter complexity will move them. Always validate on your workload before committing.
Read the row that matches your corpus. The default is the starting point; the others earn the build when a real signal tells us they do.
Postgres + pgvector + HNSW. Hybrid via ts_vector. Reciprocal-rank fusion in the app. One DB.
Qdrant (self-host or Cloud). Hybrid native, fast payload filters, lower memory floor than Weaviate. Postgres still owns the source of truth + metadata.
Milvus for pure vector workloads at GPU-accelerated scale. Vespa when hybrid search + reranking + structured filtering all need to run in one engine.
LanceDB when the index has to travel with the app (offline agents, on-device search, mobile RAG).
Naive, Advanced, Modular, Agentic, GraphRAG, CRAG, Self-RAG. Five named patterns with the decision tree for picking one.
Read moreEmbeddings, chunking, hybrid search, reranking. The four layers retrieval quality lives or dies in.
Read moreProven designs from under 100k chunks to over 1B. The architecture changes with the scale.
Read morePDFs with tables and figures. Vision LLM extraction, ColPali, BGE-M3, court-ready citations.
Read more