Implement instructor embedding based retrieval

# What

> We currently use OAI Embeddings (ada) to embed vectors into a vector store (say pgvector). It's required that all the metadata are tagged at index time. There's no mechanism to update any chunks with new metadata (not in a straightforward manner). We cannot also use arbitrary objects in the metadata filters.

We propose using [instructor-embedding](https://github.com/xlang-ai/instructor-embedding) that can embed query along with provided filters to embed query to get relevant chunks. langchain has [Instruct Embeddings](https://python.langchain.com/docs/integrations/text_embedding/instruct_embeddings) implementation which we can use to embed anything.

# Why

instructor-embedding allows a pair (prompt, text) to embed jointly. This allows to use any custom prompt to embed any text.
For example, if we want to embed a query applying certain filters, we can embed through the pair as:
`("Represent the query with filters cateogories=['x', 'y']", "<Some long text>")`


---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement instructor embedding based retrieval #16

What

Why

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement instructor embedding based retrieval #16

Description

What

Why

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions