Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Request for Embedding in Vector Store #719

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ricken07
Copy link
Contributor

Currently, vector store automatically calls the embedding client to generate the document embedding without checking whether the document already had an embedding.

In this PR, I first check if the document doesn't already have an embedding before calling the client to generate an embedding. This prevents too many calls to generate an embedding.

  • Tests are green for impacted vector stores

@markpollack markpollack added this to the 1.0.0-M2 milestone May 24, 2024
@tzolov
Copy link
Contributor

tzolov commented Aug 21, 2024

If not mistaken this is the same or related to #413 ?

But this change comes with some risks. For example, it is not clear when one would have to invalidate the pre-computed embedding (e.g. the index). Likely when
Also I'm not sure how useful this feature would be. What is the use case where you will use repeatedly the same Documents (with pre-computed embeddings) for searching? Or what are the reasons you might what to re-add a document that has precomputed embedding?

Maybe I'm missing some interesting use cases?

Right now we do not allow the Vector Store to use other embeddings but those computed by the embedding-model registered with the VectorStore. Using the embedding field would allow one to pre-compute the embeddings externally using different embedding-model and then the VectorStore will store the document with the externally computed embedding.
But I'm not sure if this is a real or needed use case, nor if this is the right approach to support it.

@tzolov
Copy link
Contributor

tzolov commented Aug 21, 2024

If the pre-computed embeddings are not applicable/useful for real use cases, IMO, we should remove the embedding field from the Document class.

@markpollack markpollack removed this from the 1.0.0-M2 milestone Aug 22, 2024
@markpollack markpollack added this to the 1.0.0-M5 milestone Nov 25, 2024
@markpollack
Copy link
Member

See #1781

I think this was a design mistake to begin with, we shouldn't be caching/storing the embedding in the document in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants