Skip to content

Conversation

@Kirbstomper
Copy link
Contributor

@Kirbstomper Kirbstomper commented Sep 19, 2023

Adding MongoDB as a vector store.

Implemented using basically the same logic and math as InMemoryVectorStore so it's prob worth breaking some of those shared classes out.

Big thanks to @tzolov for the work on pgvector-store, I was able to use most of his tests as guidance

remove unused dependencies make test a little less janky check for if collection exists before attempting to create Add basic test for modification Get basic test 1 for vector store working Add basic use of mongo db as a vector store
@markpollack
Copy link
Member

Thanks so much, will review!

Copy link
Contributor

@tzolov tzolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Kirbstomper,

Thank you for your contribution. You can find my comments and requests inline.

But my primary concern is that MongoDB does not provide native Vector Search capabilities, or at least this implementation doesn't lever them?

this.metadata = metadata;
}

public Document() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What necessitates an empty constructor?
Mind that currently the Document class doesn't have setters for the id, text and metadata!

Copy link
Contributor Author

@Kirbstomper Kirbstomper Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At one point I was exploring mapping the document object straight away from mongo and needed one, but eventually went with creating new Document by mapping from the BasicDBObject avoid messing with the immutability of the class.

So removed!

this.similarity = similarity;
}

public double getSimilarity() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure those methods in the InMemoryVectorStore are relevant to the MongoDB store?

.stream()
.map(this::mapBasicDbObject)
.map(entry -> new InMemoryVectorStore.Similarity(entry.getId(),
InMemoryVectorStore.EmbeddingMath.cosineSimilarity(queryEmbedding, entry.getEmbedding())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. I've assumed, wrongly perhaps, that MongoDB provides a build-in Vector Search support?

If it doesn't provide a native vector search capability, then I'm not convinced this Mongo DB sore would perform or scale any better than the in-memory one.
@markpollack what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to review @tzolov!

The managed version of MongoDB (atlas), allows configuration to provide semantic search capability by automatically attaching embeddings to documents using fields configured for each collection.
https://www.mongodb.com/products/platform/atlas-vector-search

Given that, I more of less went for making a PersistentVectorStore backed by MongoDB, partially out of "could I do it" and to provide another persistent (though not optimal) out of the box.

@markpollack
Copy link
Member

This is a strange case. All other vector stores that I'm aware of would implement the search inside the database, where as in this case is implementing a search as a linear in-memory scan. I think we would violate the "Principle of least astonishment" with this implementation. The implementation should be changed/updated to use AtlastVectorSearch from MongoDB.

I'll leave the PR open for a while to discuss.

@markpollack
Copy link
Member

Maybe contributing a MongoDB implementation of org.springframework.ai.loader.Loader would be a good reuse of this PR for another feature. Thoughts?

@tzolov
Copy link
Contributor

tzolov commented Sep 28, 2023

@Kirbstomper,
For me would bring more value if the mongodb integration supports the Atlast VectorSearch.
Don't know how challenging this task might be but will appreciate if you can explore this direction.
The Loader is a good idea as well.

@Kirbstomper
Copy link
Contributor Author

@markpollack @tzolov
Getting a MongoDB data loader together doesn't look like it will be too much trouble. I can explore that in a separate feature/PR.

As for better supporting the capabilities provided by Atlas I can start to explore that direction. For now I think we can close this PR until that gets to a more realized state.

@jxblum
Copy link
Contributor

jxblum commented Oct 9, 2023

Have a look at Spring AI Issue #48. MongoDB offers Atlas Vector Search, which might be a suitable way to go.

@Kirbstomper
Copy link
Contributor Author

Have a look at Spring AI Issue #48. MongoDB offers Atlas Vector Search, which might be a suitable way to go.

So I was looking into this a bit last week. Using its built in similarity search seemed to be giving results I didn't expect, so need to take a deeper look on it

The API provided by Atlas is currently still under development so might change.
@markpollack
Copy link
Member

I'll close this PR for now and there can be a new one that includes support for mongodb's native capabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

4 participants