Hybrid Search: Combining Keyword and Vector Search for Better Retrieval
Introduction
Search is deceptively hard. When a user types a query, they might be searching by meaning, "how do I fix a login problem?", or by exact identifier, "error code SKU-12345". These two cases need fundamentally different approaches, and most search systems only handle one of them well.
Vector search, also called semantic search, excels at understanding intent and meaning. It can match "password reset" with "account recovery" because the two phrases convey similar meaning, even though they share no words. But vector search is poor at exact matching: it may fail to find the one document that contains "SKU-12345" if the embedding model represents that identifier as a generic token rather than a precise signal.
Keyword search, most famously the BM25 algorithm, is the opposite. It excels at exact matching and is fast and predictable, but it has no understanding of synonyms, context, or semantic relationships.
Hybrid search runs both in parallel and combines their results. This is the approach used by virtually every serious production RAG system today. This article explains how both methods work, why combining them beats either alone, and how to implement the fusion correctly using Reciprocal Rank Fusion.
Problem Statement
The retrieval problem in RAG systems is subtle. A retrieval system must perform well across a wide variety of query types simultaneously: natural-language questions, exact product codes, technical abbreviations, proper nouns, and short keyword queries. No single retrieval method handles all of these well.
Pure vector search fails on exact matches and rare identifiers. It is built on the principle that similar meaning implies similar representation, but a product code like "RFC-8446" is not similar in meaning to anything else, it is unique. When the embedding model compresses that token into a generic vector, the document containing it may not surface at all.
Pure BM25 fails on natural-language queries. A user who asks "what should I do when I cannot log in?" will not find a document titled "Account Access Recovery Guide" through keyword search, because the two phrasings share no words. BM25 has no mechanism to recognise that the query and the document describe the same situation.
The result is that production RAG systems built on a single retrieval method always have a class of queries they handle poorly. Hybrid search is the systematic fix for this inherent limitation.
Core Concepts and Terminology
| Term | Definition | Role in Hybrid Search |
|---|---|---|
| Vector Search (Dense Retrieval) | Retrieval using embedding vectors to find documents semantically similar to a query | Handles natural language, conceptual queries, and synonyms |
| Keyword Search (Sparse Retrieval) | Retrieval based on term matching between query and document | Handles exact identifiers, rare terms, and short keyword queries |
| BM25 (Best Match 25) | A specific keyword search algorithm that scores documents using term frequency, inverse document frequency, and document length normalisation | The standard sparse retrieval baseline; fast, effective, and requires no model training |
| Embedding | A dense numerical vector representation of a piece of text, produced by a neural language model | The representation used by vector search to measure semantic similarity |
| Term Frequency (TF) | How many times a query term appears in a document | One component of BM25 scoring, more occurrences signals higher relevance, up to a saturation point |
| Inverse Document Frequency (IDF) | A measure of how rare a term is across the entire document corpus | Rare terms are more informative, BM25 rewards documents that contain rare query terms more strongly |
| Reciprocal Rank Fusion (RRF) | A rank-based fusion method that combines multiple ranked lists by scoring each document based on its position in each list | The standard and most robust method for merging BM25 and vector search results |
| Ranked List | A list of retrieved documents ordered by their relevance score from a single retrieval method | Each retrieval method produces one ranked list; fusion combines them into a single final ranking |
| Cosine Similarity | A measure of the angle between two vectors; used to score document relevance in vector search | Produces scores on a different scale than BM25, which is why rank-based fusion is preferable to score-based fusion |
How It Works
Hybrid search combines two independently operating retrieval systems and merges their outputs. Think of it like two expert researchers, one a librarian who finds documents by their exact index entries, and one a subject-matter expert who finds documents by topic and concept. You ask both the same question, get two lists of recommended documents, and then reconcile them into one ranked answer. Documents that both experts recommend move to the top; documents only one recommends stay lower.
- Build two separate indexes over the same document corpus. The first is a BM25 inverted index, a classic term-frequency data structure that maps each word to the documents containing it. The second is a vector index, a data structure (such as HNSW or IVF) that stores document embeddings and supports fast nearest-neighbour lookup.
- At query time, query both indexes in parallel. The user's query text is sent to the BM25 index unchanged. It is also converted into an embedding vector and sent to the vector index. Both searches run simultaneously to minimise latency.
- Each index returns a ranked list of results. BM25 returns documents ranked by term-frequency score. The vector index returns documents ranked by embedding similarity. These are two independent ranked lists that may overlap partially or significantly depending on the query.
- Apply Reciprocal Rank Fusion to merge the two ranked lists. For each document, RRF computes a score based on that document's rank in each list. A document ranked first by both methods receives a high combined score. A document ranked 50th by one method and absent from the other receives a low score. RRF uses only the rank position, not the raw scores, which makes it immune to the scale incompatibility between BM25 scores and cosine similarity values.
- Return the final merged ranking. Documents are sorted by their combined RRF score. The top results are passed to the LLM as context for the RAG system.
The RRF scoring formula for a document assigns a contribution from each ranked list equal to one divided by the sum of a constant and the document's rank position. The constant, typically set to 60, prevents the very top results in one list from receiving disproportionately large scores relative to documents ranked second or third. A document appearing highly in both lists accumulates contributions from each, naturally surfacing to the top of the merged result.
Practical Example
Consider a technical documentation system for a cloud infrastructure product. Users ask two types of questions: conceptual questions like "how does the load balancer handle failover?" and specific queries like "what is the default timeout for connection type GRPC-TLS-v2?"
For the conceptual question, vector search performs well. The query embedding lands close to documents about load balancing and high availability, even if they use different phrasing. BM25 may return fewer relevant results because it only matches documents containing "failover" literally.
For the specific query, BM25 performs well. The rare technical string "GRPC-TLS-v2" is a distinctive term that BM25 can match exactly in the one or two documents that mention it. Vector search, however, may fail, the embedding model has no special treatment for this specific protocol identifier and may produce an embedding that is semantically close to generic networking documents rather than the one specific document that contains the exact term.
With hybrid search and RRF fusion, both documents surface correctly. The conceptual document is boosted by its high vector search rank; the specific configuration document is boosted by its high BM25 rank. Neither method's failure mode dominates the final result.
When evaluated on a held-out set of 500 mixed queries from this documentation system, hybrid search achieves a Precision@5 of 0.84, compared to 0.71 for pure vector search and 0.68 for pure BM25. The improvement is consistent across query types, not just at the extremes.
Advantages
- Covers each method's failure modes. By running both retrieval systems in parallel, hybrid search ensures that exact matches are found when they matter and semantic understanding is applied when keywords alone are insufficient. Neither failure mode goes unaddressed.
- RRF requires no training and no score normalisation. Weighted score fusion requires normalising BM25 and cosine similarity scores to the same scale, a non-trivial engineering problem. RRF sidesteps this entirely by working only with rank positions, which are always comparable regardless of the underlying scoring algorithm.
- Consistent performance across diverse query types. Empirical benchmarks across multiple information retrieval datasets consistently show that hybrid search outperforms either method alone, particularly on datasets with mixed query types. This robustness is its primary value in production.
- Degrades gracefully. When one retrieval method returns poor results for a specific query, the other method still contributes to the final ranking. The result may not be perfect, but it is rarely catastrophically wrong, which is a common failure mode of single-method systems.
- Native support in modern vector databases reduces implementation burden. Weaviate, Qdrant, Elasticsearch, and Pinecone all support hybrid search with built-in RRF or similar fusion, meaning you do not need to implement the fusion layer from scratch in most cases.
Limitations and Trade-offs
- Two indexes mean doubled storage and maintenance costs. Running hybrid search requires maintaining a BM25 inverted index and a vector index over the same corpus, keeping both updated in sync, and paying for the storage of both. For very large corpora, this cost is significant.
- Latency increases compared to a single-method approach. Even when run in parallel, two queries take more compute resources than one. In latency-sensitive applications, the parallel overhead of running both retrievals simultaneously must be budgeted for explicitly.
- Index synchronisation is an operational burden. When documents are added, updated, or deleted, both indexes must be updated consistently. Inconsistent indexes produce confusing results, documents that appear in one index but not the other can cause mismatches in the fusion step.
- Chunking strategy affects both methods differently. BM25 is sensitive to chunk size because large chunks dilute term frequency. Vector search is sensitive to chunk semantic coherence. A chunking strategy that is good for one method may not be optimal for the other, requiring careful calibration.
- RRF assumes both methods contribute meaningfully. If 90% of your queries are exact keyword lookups and BM25 dominates every result, you are paying the cost of running vector search without gaining much from it. Audit your query distribution before committing to hybrid search as a universal approach.
Common Mistakes
- Using weighted score fusion without normalising scores first. Adding a BM25 score of 12.4 to a cosine similarity of 0.83 is meaningless, the scales are incompatible. If you choose weighted score fusion over RRF, you must normalise both score distributions to the same range before combining them. Many teams skip this step and then wonder why the fusion does not work correctly.
- Skipping evaluation after adding hybrid search. Teams often add hybrid search on the assumption that it will improve retrieval quality, but never measure whether it actually does. Always compare Precision@K, Recall@K, and MRR against each single-method baseline on a labelled evaluation set before deploying.
- Not running the two retrievals in parallel. Some implementations query BM25, wait for results, then query the vector index. This doubles retrieval latency unnecessarily. Both retrievals are independent and should run simultaneously.
- Letting one method dominate without investigation. If the BM25 top result is always ranked first in the hybrid output, it may mean BM25 is overwhelming the fusion rather than complementing it. Check whether both methods are contributing to the final ranking. If one is consistently irrelevant, your query distribution may not need hybrid search at all.
- Poor chunking undermining the benefit of hybrid search. If document chunks are too large, BM25 term frequencies are diluted and exact matches lose their signal. If chunks are too small, vector embeddings lack sufficient context and semantic similarity degrades. Invest in chunking strategy before optimising the fusion layer.
Best Practices
- Always run BM25 and vector search in parallel to minimise latency overhead from running two retrievals.
- Use RRF as your default fusion method unless you have strong evidence from offline evaluation that a more complex approach is justified. RRF is robust, parameter-free, and hard to misconfigure.
- Keep the RRF constant at 60 unless you have a specific reason to change it. The original RRF paper validated this value across a wide range of datasets.
- Maintain both indexes with the same update cadence. When documents are added or deleted, update both the BM25 index and the vector index in the same pipeline step to prevent synchronisation drift.
- Evaluate retrieval quality on a labelled held-out set using Precision@K, Recall@K, and MRR before and after adding hybrid search. Do not assume it improves quality, measure it.
- Analyse your query distribution before investing in hybrid search. If the vast majority of queries are natural-language questions with no exact identifiers, pure vector search may already be sufficient.
- For new projects, use a vector database that supports hybrid search natively, Weaviate, Qdrant, or Elasticsearch, rather than building the fusion layer manually.
- Treat chunking strategy as a prerequisite to fusion tuning. The quality of both retrieval methods is bounded by the quality of your chunks.
Comparison: Search Methods
| Aspect | Pure Vector Search | Pure BM25 | Hybrid Search (RRF) |
|---|---|---|---|
| Semantic Understanding | Excellent | None | Excellent |
| Exact Keyword Matching | Weak | Excellent | Excellent |
| Rare Terms and Identifiers | Weak | Excellent | Excellent |
| Short Keyword Queries | Moderate | Good | Good |
| Synonym and Paraphrase Handling | Excellent | None | Excellent |
| Score Normalisation Required | No | No | No (with RRF) |
| Implementation Complexity | Low | Low | Medium |
| Storage Cost | Medium | Low | Medium to High |
| Robustness on Mixed Query Types | Good | Good | Excellent |
Frequently Asked Questions
Why use RRF instead of simply averaging the scores from both methods?
BM25 scores and cosine similarity scores live on completely different numerical scales. A BM25 score of 10 means something entirely different from a cosine similarity of 0.9. Averaging them without normalisation produces results that are dominated by whichever method produces larger absolute numbers, usually BM25. RRF sidesteps this problem entirely by working only with rank positions, which are always on the same scale regardless of the underlying scoring method.
What value of K should I use for the RRF constant?
The constant of 60 is the original value from the RRF paper and has been validated across many datasets. In practice, values between 40 and 80 produce nearly identical results. Unless you have a specific reason, backed by offline evaluation, to change it, keep it at 60 and save your tuning effort for more impactful variables like chunk size and top-K retrieval count.
Does hybrid search always outperform single-method retrieval?
It outperforms on average across diverse query sets, but not on every individual query. For a corpus where 95% of queries are pure natural-language questions with no exact identifiers, pure vector search may perform comparably to hybrid search and will be simpler and cheaper to operate. Measure your specific query distribution and evaluate before committing.
Which vector databases support hybrid search natively?
Weaviate supports hybrid search with configurable BM25 and vector weights. Qdrant offers hybrid search using score fusion. Elasticsearch has supported combined keyword and dense vector retrieval for several years. Pinecone supports sparse-dense hybrid search. All of these provide built-in fusion, so you do not need to implement RRF yourself.
How should I evaluate whether hybrid search is actually improving my system?
Build a labelled evaluation set of at least several hundred queries with known relevant documents. Measure Precision@5, Recall@5, and MRR for pure vector search, pure BM25, and hybrid search separately. Hybrid search should beat both baselines on average, if it does not, either the fusion is misconfigured or your query distribution does not benefit from it.
References
- Robertson, S., and Zaragoza, H. (2009). The Probabilistic Relevance Framework: BM25 and Beyond. Foundations and Trends in Information Retrieval, 3(4), 333–389.
- Cormack, G. V., Clarke, C. L. A., and Buettcher, S. (2009). Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods. SIGIR 2009.
- Thakur, N., et al. (2021). BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. NeurIPS 2021 Datasets Track.
- Weaviate. Hybrid Search Explained
- Qdrant. Hybrid Search Documentation
Key Takeaways
- Pure vector search fails on exact identifiers and rare terms; BM25 fails on semantic and conceptual queries. Hybrid search covers both failure modes simultaneously.
- Reciprocal Rank Fusion is the right default fusion method, it works with rank positions rather than raw scores, making it immune to the scale incompatibility between BM25 and cosine similarity.
- Always evaluate retrieval quality against labelled data using Precision@K, Recall@K, and MRR before deploying any change to the retrieval layer.
- Run both retrievals in parallel to minimise latency. Keep both indexes synchronised on the same update cadence to prevent results inconsistency.
- Most major vector databases now support hybrid search natively. Prefer a built-in implementation over a custom fusion layer to reduce operational complexity.
- Chunking strategy is a prerequisite to fusion quality. Both retrieval methods are bounded by how well your documents are chunked, optimise chunking before tuning fusion parameters.
Related Articles