- Notifications
You must be signed in to change notification settings - Fork 2k
Add MongoDB as a vector store #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MongoDB as a vector store #21
Conversation
remove unused dependencies make test a little less janky check for if collection exists before attempting to create Add basic test for modification Get basic test 1 for vector store working Add basic use of mongo db as a vector store
| Thanks so much, will review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Kirbstomper,
Thank you for your contribution. You can find my comments and requests inline.
But my primary concern is that MongoDB does not provide native Vector Search capabilities, or at least this implementation doesn't lever them?
| this.metadata = metadata; | ||
| } | ||
| | ||
| public Document() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What necessitates an empty constructor?
Mind that currently the Document class doesn't have setters for the id, text and metadata!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At one point I was exploring mapping the document object straight away from mongo and needed one, but eventually went with creating new Document by mapping from the BasicDBObject avoid messing with the immutability of the class.
So removed!
| this.similarity = similarity; | ||
| } | ||
| | ||
| public double getSimilarity() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure those methods in the InMemoryVectorStore are relevant to the MongoDB store?
...ng-ai-mongodb-store/src/main/java/org/springframework/ai/vectorstore/MongoDBVectorStore.java Outdated Show resolved Hide resolved
...ng-ai-mongodb-store/src/main/java/org/springframework/ai/vectorstore/MongoDBVectorStore.java Outdated Show resolved Hide resolved
...ng-ai-mongodb-store/src/main/java/org/springframework/ai/vectorstore/MongoDBVectorStore.java Outdated Show resolved Hide resolved
...ng-ai-mongodb-store/src/main/java/org/springframework/ai/vectorstore/MongoDBVectorStore.java Outdated Show resolved Hide resolved
...ng-ai-mongodb-store/src/main/java/org/springframework/ai/vectorstore/MongoDBVectorStore.java Outdated Show resolved Hide resolved
...ng-ai-mongodb-store/src/main/java/org/springframework/ai/vectorstore/MongoDBVectorStore.java Outdated Show resolved Hide resolved
| .stream() | ||
| .map(this::mapBasicDbObject) | ||
| .map(entry -> new InMemoryVectorStore.Similarity(entry.getId(), | ||
| InMemoryVectorStore.EmbeddingMath.cosineSimilarity(queryEmbedding, entry.getEmbedding()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. I've assumed, wrongly perhaps, that MongoDB provides a build-in Vector Search support?
If it doesn't provide a native vector search capability, then I'm not convinced this Mongo DB sore would perform or scale any better than the in-memory one.
@markpollack what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to review @tzolov!
The managed version of MongoDB (atlas), allows configuration to provide semantic search capability by automatically attaching embeddings to documents using fields configured for each collection.
https://www.mongodb.com/products/platform/atlas-vector-search
Given that, I more of less went for making a PersistentVectorStore backed by MongoDB, partially out of "could I do it" and to provide another persistent (though not optimal) out of the box.
| This is a strange case. All other vector stores that I'm aware of would implement the search inside the database, where as in this case is implementing a search as a linear in-memory scan. I think we would violate the "Principle of least astonishment" with this implementation. The implementation should be changed/updated to use AtlastVectorSearch from MongoDB. I'll leave the PR open for a while to discuss. |
| Maybe contributing a MongoDB implementation of |
| @Kirbstomper, |
| @markpollack @tzolov As for better supporting the capabilities provided by Atlas I can start to explore that direction. For now I think we can close this PR until that gets to a more realized state. |
| Have a look at Spring AI Issue #48. MongoDB offers Atlas Vector Search, which might be a suitable way to go. |
So I was looking into this a bit last week. Using its built in similarity search seemed to be giving results I didn't expect, so need to take a deeper look on it |
The API provided by Atlas is currently still under development so might change.
| I'll close this PR for now and there can be a new one that includes support for mongodb's native capabilities. |
Adding MongoDB as a vector store.
Implemented using basically the same logic and math as
InMemoryVectorStoreso it's prob worth breaking some of those shared classes out.Big thanks to @tzolov for the work on
pgvector-store, I was able to use most of his tests as guidance