In my last two blogs [https://dev.to/shannonlal/unlocking-the-power-of-hybrid-search-5bej, https://dev.to/shannonlal/building-blocks-for-hybrid-search-combining-keyword-and-semantic-search-236k] I focused on giving an overview of MongoDB's Vector Search with the goal of demonstrating hybrid search in Mongo. In this blog I am going present a solution on how I got hybrid search to work with Mongo; however, it took several attempts and I will talk about my different strategies.
Attempting MongoDB Aggregation for Hybrid Search
My initial strategy was to do a MongoDB aggregation to perform a dual search - one on the text and another on the vectors. The idea was to leverage the power of MongoDB's $search stage to execute a text search followed by a vector search within the same pipeline. Here is the aggregation query that I put together
[ // Stage 1: Text-based search on 'description' field { $search: { index: 'text_index', text: { query: 'searchTerm', path: 'description', score: { boost: { value: 2 } } } } }, // Stage 2: Incorporate the vector search based on the embedding { $search: { index: 'vector_index', compound: { should: [ { vector: { path: 'embedding', query: [/* your vector embedding here */], score: { boost: { value: 1 } } } } ] } } }, { $sort: { 'score': { $meta: 'textScore' } } }, { $project: { _id: 0, // excluding the id field name: 1, description: 1, textScore: { $meta: 'textScore' }, vectorScore: { $meta: 'searchScore' } } } ];
However, MongoDB only allows one $search stage and it must be at the beginning of the pipeline. As a result it looks like the aggregation pipeline won't work.
Crafting a Combined Search Index
The second strategy I tried involved creating a unified search index that could potentially handle both text and vector searches. Below is the index that I tried to create.
{ "mappings": { "dynamic": false, "fields": { "description": { "type": "string", "analyzer": "lucene.standard" }, "embedding": { "type": "vector", "similarity": "cosine", "numDimensions": 512 } } } }
Unfortunately, this approach hit a roadblock as MongoDB does not recognize 'vector' as a valid type within its index mappings.
Mongo Union with Aggregation
The final approach was to use a unionWith technique with Mongo Aggregation to perform the Vector Search first and then using the unionWith operator perform a Text Search.
The following code is based on my previous blog on Hybrid Search. Here is the aggregation pipeline code for hybrid search
const pipeline = [ { $vectorSearch: { index: 'vector_index', path: 'embedding', queryVector: embedding, numCandidates: 10, limit: 10, }, }, { $addFields: { vs_score: { $meta: 'vectorSearchScore' } } }, { $project: { vs_score: 1, _id: 1, description: 1, name: 1, }, }, { $unionWith: { coll: 'vector_test', pipeline: [ { $search: { index: 'default', text: { query: searchTerm, path: 'description' }, }, }, { $limit: 10 }, { $addFields: { fts_score: { $meta: 'searchScore' } } }, { $project: { fts_score: 1, _id: 1, description: 1, name: 1, }, }, ], }, }, { $group: { _id: '$_id', vs_score: { $max: '$vs_score' }, fts_score: { $max: '$fts_score' }, description: { $first: '$description' }, name: { $first: '$name' }, }, }, { $project: { description: 1, name: 1, vs_score: { $ifNull: ['$vs_score', 0] }, fts_score: { $ifNull: ['$fts_score', 0] }, }, }, { $project: { description: 1, name: 1, score: { $add: ['$fts_score', '$vs_score'] }, _id: 1, vs_score: 1, fts_score: 1, }, }, { $sort: { score: -1 } }, { $limit: 10 }, ];
The aggregation is a little bit more complex than I would like it to be but it seems to do the job. I think the one thing that I would recommend is paying attention to how the combined score is determined. In this approach we are just adding the two scores (vs_score and fts_score) together; however, this may not be the best solution for your use case. I have included the score results based on my test search that I did below
Query Results:
Search Term | Combined Score | Text Score | Vector Score |
---|---|---|---|
Car for hire | 1.373 | 0.653 | 0.720 |
Limo Hires | 0.775 | 0.037 | 0.737 |
Electric Scooter | 0.733 | 0.044 | 0.689 |
Bike Share | 0.731 | 0.042 | 0.689 |
Car Dealership | .651 | 0.036 | 0.615 |
The Road Ahead
Over the next couple of weeks I am going to be load testing this out to see how the query handles when search for large number of documents. I would definitely welcome any feedback or comments on how I can improve the query or better strategies to get hybrid search working.
Thanks
Top comments (0)