Back

Vector Databases, Explained for Builders

What a vector database actually does, how approximate nearest-neighbour search works, and when pgvector beats a dedicated engine.

Vector Databases, Explained for Builders
Written by
BSH Technologies
Published on2026-05-04

A vector database finds meaning by distance

A vector database stores embeddings — lists of numbers that capture the meaning of text, images, or audio — and finds the ones closest to a query vector. "Closest" is the whole trick: items with similar meaning end up near each other in this high-dimensional space, so nearest-neighbour search quietly becomes semantic search. That is what powers retrieval for RAG, recommendations, deduplication, and any similarity lookup where exact-match keywords fall short.

The naive approach is to compare your query against every stored vector and keep the top matches. That is exact, and it is perfectly fine until you have a few hundred thousand vectors. Past that, scanning everything on every query is too slow to sit behind a user-facing feature, which is where approximate nearest-neighbour algorithms come in and change the economics entirely.

Approximate search is a deliberate trade

Approximate nearest-neighbour, or ANN, search trades a sliver of accuracy for an enormous speed-up. Instead of checking every vector, the index organises them so a query only visits a small, promising subset. Two families dominate in practice, and knowing the difference helps you reason about cost.

  • HNSW builds a layered graph you traverse from coarse to fine. It is fast and accurate, with the cost paid in memory consumption.
  • IVF partitions vectors into clusters and searches only the nearest few. It is lighter on memory and tunable, at some cost to recall.
  • Both expose knobs that trade recall against latency. The right setting is an empirical question you answer with your own data, not a default you accept blindly from a tutorial, and the only honest way to find it is to measure recall and latency on a representative sample of your real queries.

Distance metrics are not interchangeable

The metric decides what "close" means, and it must match how your embedding model was trained. Cosine similarity compares direction and is the common default for text. Dot product factors in magnitude as well as direction. Euclidean distance measures straight-line separation. Pick the wrong one and your results degrade in ways that are maddening to diagnose, because nothing actually errors — the answers are simply a bit worse than they should be, and you have no obvious thread to pull. The fix is to read the embedding model's documentation, use the metric it recommends, and confirm with a quick test against known-good pairs before you trust the index in production.

Metadata filtering is where many projects stumble. Searching only documents from one team, or after a certain date, has to combine with the vector search efficiently, not bolt on as an afterthought that scans everything.

pgvector or a dedicated engine?

You do not always need a specialised database. If you are already running PostgreSQL, the pgvector extension adds vector columns and ANN indexes to the database you operate, back up, and secure today. For workloads up to a few million vectors with real metadata to filter on, that consolidation is a genuine advantage — one fewer system to run, monitor, and reason about at three in the morning.

Dedicated vector engines earn their place at larger scale, with very high query throughput, or when you want managed sharding and replication built around vectors specifically. The honest answer for most teams starting out is to begin with pgvector and migrate only when you hit a wall you can actually measure, rather than one you imagine you might hit someday.

Operational realities

Embeddings are tied to the model that produced them. Change the embedding model and every stored vector must be regenerated, because vectors from different models live in incompatible spaces and comparing them is meaningless. Budget for that re-indexing cost before it surprises you, version your embeddings so you know which model produced what, and always keep the original source text so you can rebuild the whole index from scratch when you need to.

Two more details catch teams off guard. Embedding dimensions are not free — higher-dimensional vectors improve quality up to a point but cost more memory and slow every query, so the largest model is not automatically the right one. And index build time grows with your corpus, which means a naive "rebuild everything on deploy" step that took seconds in development can take many minutes in production. Plan to build indexes incrementally or in the background, and measure both query latency and build time as first-class numbers, not afterthoughts you discover under load.

How BSH can help

BSH Technologies helps teams choose and run the right vector infrastructure — frequently pgvector on a managed Postgres instance, sometimes a dedicated engine when the numbers demand it. We tune indexes for your recall and latency targets, get metadata filtering right, and plan re-embedding so model upgrades do not become outages. If semantic search is on your roadmap, our Thrissur engineers can help you build it on solid foundations.

From the blog

View all posts
Designing Multi-Tenant SaaS That Scales
Software Dev

Designing Multi-Tenant SaaS That Scales

Choosing an isolation model, keeping tenant data separate, and dodging the noisy-neighbour and migration traps that bite SaaS later.

BSH Technologies
BSH Technologies · 2026-06-14
Hitting Green Core Web Vitals in Next.js
Software Dev

Hitting Green Core Web Vitals in Next.js

A practical guide to LCP, INP and CLS in Next.js — image handling, font loading, the App Router boundary, and costly third-party scripts.

BSH Technologies
BSH Technologies · 2026-06-10