Corpus + chunking audit
We characterise your corpus before we pick a chunker. PDFs with tables, code with comments, transcripts with timestamps — each gets a different strategy.
Postgres + pgvector for vector search, BM25 for lexical, fused with reciprocal-rank fusion. Citation-first answers — every claim links to its chunk. No separate vector DB, no five-system synchronisation problem.
A naive cosine-similarity search over a 1M-chunk corpus retrieves plausible-but-wrong context, the LLM hallucinates confidently on top, and the user discovers the failure mode by getting fired for citing it. Hybrid retrieval + citations make the failure mode visible.
We characterise your corpus before we pick a chunker. PDFs with tables, code with comments, transcripts with timestamps — each gets a different strategy.
One database for your text, your embeddings, your metadata, your access controls. Skip the vector-DB-shaped tax.
BM25 finds the exact-term matches the embeddings miss; vector search finds the semantic matches BM25 misses. Reciprocal-rank fusion merges them.
The prompt requires the model to cite each claim. The UI renders inline citations. Users (and auditors) can verify every assertion.