What
We currently use OAI Embeddings (ada) to embed vectors into a vector store (say pgvector). It's required that all the metadata are tagged at index time. There's no mechanism to update any chunks with new metadata (not in a straightforward manner). We cannot also use arbitrary objects in the metadata filters.
We propose using instructor-embedding that can embed query along with provided filters to embed query to get relevant chunks. langchain has Instruct Embeddings implementation which we can use to embed anything.
Why
instructor-embedding allows a pair (prompt, text) to embed jointly. This allows to use any custom prompt to embed any text.
For example, if we want to embed a query applying certain filters, we can embed through the pair as:
("Represent the query with filters cateogories=['x', 'y']", "<Some long text>")
What
We propose using instructor-embedding that can embed query along with provided filters to embed query to get relevant chunks. langchain has Instruct Embeddings implementation which we can use to embed anything.
Why
instructor-embedding allows a pair (prompt, text) to embed jointly. This allows to use any custom prompt to embed any text.
For example, if we want to embed a query applying certain filters, we can embed through the pair as:
("Represent the query with filters cateogories=['x', 'y']", "<Some long text>")