Back

Better Search With Embeddings

Keyword search misses what users mean. Embedding-based semantic search finds it — without overcomplicating your stack to get there.

Better Search With Embeddings
Written by
BSH Technologies
Published on2025-12-27

Search should match meaning, not just words

Embeddings let you build search that understands intent instead of matching strings. A user looking for ways to lower my monthly bill should find an article titled reducing recurring costs, even though the two share almost no words in common. Traditional keyword search fails that query completely; semantic search built on embeddings handles it naturally, because it compares meaning rather than spelling. For most content-heavy products, that is the difference between a search box people rely on and one they abandon after two tries.

An embedding is simply a list of numbers that captures the semantic content of a piece of text. Texts that mean similar things end up close together in that numeric space, so finding relevant results reduces to finding the nearest vectors to the query. That one idea — meaning as geometry — is the whole foundation, and everything practical follows from it.

The pipeline is smaller than you expect

A working semantic search system has only a handful of moving parts, and you can stand up a useful version in days rather than months.

  1. Split your content into passages sized to answer a question on their own.
  2. Run each passage through an embedding model and store the resulting vector alongside the original text.
  3. At query time, embed the user's query with the same model and retrieve the stored vectors closest to it.

The retrieved passages are your results. The entire system lives or dies on two decisions: how you split your content into chunks, and which embedding model you choose. Get those two right and the rest is ordinary plumbing you already know how to build.

Chunk for self-contained meaning

Chunking is where most implementations quietly go wrong, and it rarely shows up until results feel vaguely off. Chunks that are too large dilute the signal, because a single vector has to represent too many distinct ideas at once. Chunks that are too small lose the surrounding context that gave them meaning. Split on natural boundaries — sections and paragraphs — rather than fixed character counts that slice sentences and tables in half.

  • Aim for passages that could each stand alone as a sensible answer to some plausible question.
  • Add a little overlap between adjacent chunks so meaning is not lost at the seams where you cut.
  • Keep the title or section heading attached to each chunk, so a passage retrieved in isolation still carries the context of where it came from.
  • Store a little structured metadata with each chunk — source, date, category — so you can filter results by recency or section before ranking, which often matters as much as the semantic match itself.
  • Revisit your chunking when results disappoint — it is almost always the first thing to tune, and almost always more impactful than swapping models or reaching for a bigger embedding dimension.

Combine semantic and keyword search

Pure embedding search has one notable blind spot: exact identifiers. Product codes, model numbers, proper names, and precise figures are exactly where keyword matching shines and embeddings can drift, because a code like X-4471 carries no semantic meaning to lean on. The strongest systems run both approaches and merge the results — semantic search for meaning, keyword search for precision — so each covers the other's weakness.

  • Use keyword matching to guarantee that exact terms like SKUs, error codes, and proper nouns are never missed.
  • Use embeddings to catch paraphrases, synonyms, and conceptually related content that share no literal words.
  • Blend the two ranked lists with a sensible weighting so the final ordering reflects both kinds of relevance.

This hybrid approach is now the default for good reason: it is robustly better than either method alone across the full range of queries real users throw at a search box.

Choose your vector storage to fit your scale

You do not need exotic infrastructure to begin, and reaching for it early is a common and costly mistake. For tens of thousands of documents, a vector extension on the database you already operate is more than enough, and it keeps your data and your search in one place you already know how to back up, secure, and monitor. Reach for a dedicated vector database only when scale, query latency, or advanced filtering needs genuinely demand it — and let measured requirements, not hype, make that call. Match the tool to the problem and you will ship sooner, spend less, and have far less to maintain.

How BSH can help

BSH Technologies builds semantic search and retrieval systems sized to your actual content and traffic — sensible chunking, hybrid ranking that blends meaning and exact terms, and storage that fits your existing stack rather than inflating it. If your current search frustrates users who know exactly what they want but cannot find it, we can help you fix that without rebuilding your infrastructure from scratch.

From the blog

View all posts
Designing Multi-Tenant SaaS That Scales
Software Dev

Designing Multi-Tenant SaaS That Scales

Choosing an isolation model, keeping tenant data separate, and dodging the noisy-neighbour and migration traps that bite SaaS later.

BSH Technologies
BSH Technologies · 2026-06-14
Hitting Green Core Web Vitals in Next.js
Software Dev

Hitting Green Core Web Vitals in Next.js

A practical guide to LCP, INP and CLS in Next.js — image handling, font loading, the App Router boundary, and costly third-party scripts.

BSH Technologies
BSH Technologies · 2026-06-10