How to Build RAG With pgvector and Postgres
Build a complete RAG pipeline inside PostgreSQL using pgvector — schema, indexing, hybrid search, and metadata filtering in one database.

pgvector lets you build RAG inside the database you already run
Building RAG with pgvector and Postgres means your embeddings, your metadata, and your similarity search all live in one database you already operate, back up, and secure. The pgvector extension adds a vector column type and approximate-nearest-neighbour indexes to PostgreSQL, so a retrieval-augmented pipeline becomes ordinary SQL plus an embedding step. For teams already on Postgres, this consolidation is a real advantage — one fewer system to run, monitor, and reason about when something breaks at three in the morning.
This guide covers the parts that actually decide whether a pgvector RAG system is fast and accurate: schema design, the right index, hybrid search, and metadata filtering that does not fall apart under load. None of it is exotic, but each has a sharp edge that quietly degrades quality if you get it wrong, so it pays to be deliberate.
Step 1: Schema and the embedding column
Design the table so each chunk carries its vector, its source text, and the metadata you will filter and cite on. A good schema makes everything downstream easier; a careless one forces awkward workarounds later.
- Add a vector column sized to your embedding model's dimension, since the column type fixes the length.
- Keep the original chunk text in a column so answers can quote and verify it directly.
- Store source path, page, document id, and any access tags as ordinary columns you can index and filter.
Step 2: Choose and build the index
pgvector supports HNSW and IVFFlat indexes, and the choice affects both speed and memory. HNSW gives excellent recall and speed at the cost of memory and longer build time; IVFFlat is lighter and faster to build but needs its lists tuned and the data present before you build it, because it learns the partitions from existing rows.
Build the index after loading a representative amount of data, and match the index's operator class to the distance metric your embedding model expects, or recall quietly suffers.
Step 3: Get the distance metric right
pgvector offers cosine distance, inner product, and Euclidean (L2). The metric must match how your embedding model was trained — cosine is the common default for text embeddings, but some models expect inner product. Use the matching operator in your query and the matching operator class on your index, or recall degrades with nothing erroring to warn you that anything is wrong.
- Confirm the recommended metric in your embedding model's documentation before you commit to an index.
- Use the same metric for the index and the query so they agree on what "close" means.
- Validate with a few known-good pairs before trusting the index in anything real.
Step 4: Add hybrid search and filtering
Pure vector search is great at meaning and surprisingly weak at exact tokens — error codes, names, identifiers. Postgres already has full-text search built in, so you can run keyword search and vector search together and fuse the rankings, all in the same query against the same data. This is a quiet superpower of the pgvector approach: hybrid retrieval without standing up a second system to maintain.
- Combine pgvector similarity with Postgres full-text search for the best of meaning and exact match.
- Apply metadata filters — team, date, document type — in the same SQL so you never scan the whole table needlessly.
- Add a reranking pass over the top candidates for the single biggest quality jump in a struggling system.
Step 5: Operate it like a database
Because it is just Postgres, your existing operational muscle applies directly. Back it up, monitor it, and scale it the way you already do, which is a real saving in both effort and risk. The one RAG-specific discipline to add is re-embedding: when you change embedding models, every stored vector must be regenerated, because vectors from different models live in incompatible spaces. Version your embeddings so you know which model produced what, keep the source text so you can rebuild from scratch, and treat a model upgrade as a planned re-index rather than a surprise outage.
Prefer it built and managed for you?
BSH Technologies builds production RAG on pgvector and Postgres — schema and indexes tuned to your recall and latency targets, hybrid search, correct metrics, and a re-embedding plan that keeps model upgrades from becoming outages. If you are already on Postgres, talk to BSH Technologies or explore our AI & automation services.
Frequently asked questions
Can I build a full RAG system using only pgvector and Postgres?
Yes. pgvector adds vector columns and approximate-nearest-neighbour indexes to PostgreSQL, so embeddings, metadata, and similarity search all live in one database. Combined with Postgres full-text search for hybrid retrieval, it covers the whole pipeline without a separate vector service for most small-to-mid-scale workloads.
Should I use HNSW or IVFFlat in pgvector?
HNSW gives higher recall and faster queries at the cost of more memory and slower index builds, making it the default choice for quality. IVFFlat uses less memory and builds faster but needs its list count tuned and the data loaded first. Pick HNSW unless memory is tight or builds are too slow.
Which distance metric should I use in pgvector?
Match the metric to your embedding model. Cosine distance is the common default for text embeddings, while some models expect inner product or Euclidean. Use the matching operator in queries and the matching operator class on the index. The wrong metric degrades recall silently with no error to warn you.
Can pgvector do hybrid keyword and vector search?
Yes, and that is a key advantage. PostgreSQL has built-in full-text search, so you can run keyword search alongside pgvector similarity in the same query and fuse the rankings. Vector search handles meaning, keyword search handles exact tokens like error codes and names, with no second system needed.
When should I move off pgvector to a dedicated vector engine?
When you reach very high query throughput, tens of millions of vectors, or need managed sharding built specifically around vectors. Below that, pgvector keeps everything in one database you already operate. Migrate only when you can measure a wall you have actually hit, not one you imagine you might.
From the blog
View all posts
How to Build an AI Agent for Free in 2026
You can build a working AI agent for free in 2026 using n8n, open-source frameworks, and a free LLM tier. Here is the exact stack and the steps.

Best Free AI Agent Frameworks in 2026
The best free AI agent frameworks in 2026 are LangChain, CrewAI, Microsoft AutoGen, LangGraph, and n8n. Here is how to choose between them.