-
Notifications
You must be signed in to change notification settings - Fork 1.6k
fix: add initialize_kwargs for embedder API authentication #457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,13 @@ | ||
| { | ||
| "embedder": { | ||
| "client_class": "OpenAIClient", | ||
| "initialize_kwargs": { | ||
| "api_key": "${OPENAI_API_KEY}", | ||
| "base_url": "${OPENAI_BASE_URL}" | ||
| }, | ||
| "batch_size": 500, | ||
| "model_kwargs": { | ||
| "model": "text-embedding-3-small", | ||
| "dimensions": 256, | ||
| "encoding_format": "float" | ||
| } | ||
|
Comment on lines
9
to
12
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔴 Removal of Removing the Click to expandBackgroundThe How the bug is triggered
Code flow
if self.repo_paths and os.path.exists(self.repo_paths["save_db_file"]):
logger.info("Loading existing database...")
self.db = LocalDB.load_state(self.repo_paths["save_db_file"])
documents = self.db.get_transformed_data(key="split_and_embed")
if documents:
# ... logs dimensions but doesn't validate against current config
return documents # Returns old embeddings
self.retriever = FAISSRetriever(
**configs["retriever"],
embedder=retrieve_embedder, # Uses new 1536-dim embedder
documents=self.transformed_docs, # Contains old 256-dim embeddings
document_map_func=lambda doc: doc.vector,
)Impact
Recommendation: Either: (1) Keep the Was this helpful? React with 👍 or 👎 to provide feedback. |
||
| }, | ||
|
|
@@ -16,6 +19,9 @@ | |
| }, | ||
| "embedder_google": { | ||
| "client_class": "GoogleEmbedderClient", | ||
| "initialize_kwargs": { | ||
| "api_key": "${GOOGLE_API_KEY}" | ||
| }, | ||
|
Comment on lines
+22
to
+24
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar to the OpenAI configuration, this change can cause issues if the This bypasses the client's logic to raise a The I recommend removing this |
||
| "batch_size": 100, | ||
| "model_kwargs": { | ||
| "model": "text-embedding-004", | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change introduces a potential issue when environment variables are not set. The
replace_env_placeholdersfunction will pass the literal placeholder string (e.g.,"${OPENAI_BASE_URL}") to theOpenAIClientconstructor if the corresponding environment variable is missing.This causes two problems:
base_url, it bypasses the client's logic to use the default OpenAI URL (https://api.openai.com/v1) whenOPENAI_BASE_URLis not set. The client will instead try to connect to the invalid URL"${OPENAI_BASE_URL}".api_key, it bypasses the client's validation that raises aValueErrorif the API key is missing. Instead, it will attempt to authenticate with the invalid key"${OPENAI_API_KEY}".The
OpenAIClientis already designed to readapi_keyandbase_urlfrom environment variables if they are not passed to the constructor. Relying on that existing mechanism is more robust.While the removal of the
dimensionsfield is a good improvement, I recommend removing theinitialize_kwargssection entirely. The client will correctly handle credentials on its own. If the original issue persists, it might be due to the environment variables not being correctly propagated to the application, which should be investigated.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OpenAIClient is already designed to read api_key and base_url from environment variables if they are not passed to the constructor. Relying on that existing mechanism is more robust.
I am curious what issues are you seeing if it is not defined directly in the kwargs?