Skip to content

Conversation

@asimurka
Copy link
Contributor

@asimurka asimurka commented Jan 13, 2026

Description

This PR brings the new functionality of Azure Entra ID authentication.
Part of it is also the management of the automatic enrichment of the llama stack config with the BYOK section from the LCS config, which shares the principle with the dynamic injection of access tokens.
Since Llama Stack needs a valid Azure API key already at the start of the service, it was necessary to add mechanisms to generate the initial access token even before the start of the Llama Stack service. This works for both library and service mode.
Documentation is also part of it, which discusses in detail the workflow of access tokens for both deployment modes and provides examples of configs for local deployment.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (Cursor
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Azure Entra ID authentication with automatic token refresh support
    • Enhanced client routing for library and service mode configurations
  • Bug Fixes

    • Updated Azure LLM compatibility matrix with corrected model support status
    • Fixed environment variable propagation for Azure credentials across deployments
  • Documentation

    • Added comprehensive Azure Entra ID authentication guide covering configuration and setup
  • Tests

    • Added Azure token manager unit tests validating credential refresh and expiration handling
  • Chores

    • Added dependencies: azure-core, azure-identity, einops

✏️ Tip: You can customize this high-level summary in your review settings.

@asimurka asimurka marked this pull request as draft January 13, 2026 16:15
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 13, 2026

Walkthrough

This PR introduces Azure Entra ID authentication for Azure OpenAI inference, replacing static API key handling with dynamic token management. It adds a singleton token manager, configuration support, token refresh logic across query endpoints, updated deployment manifests, and comprehensive test coverage.

Changes

Cohort / File(s) Summary
Azure Entra ID Token Manager
src/authorization/azure_token_manager.py
New singleton implementing token caching, refresh via ClientSecretCredential, expiration tracking with leeway, and environment variable integration via AZURE_API_KEY
Configuration Models & Structures
src/models/config.py, src/configuration.py
Added AzureEntraIdConfiguration class with tenant_id, client_id, client_secret, scope fields; exposed via AppConfig.azure_entra_id property
Configuration Generation & Enrichment
src/llama_stack_configuration.py
Added setup_azure_entra_id_token() for token initialization, enrich_byok_rag() for BYOK RAG enrichment, generate_configuration() orchestration, updated construct_vector_dbs_section() and construct_vector_io_providers_section() for dict-based BYOK handling
Application Initialization
src/app/main.py
Added AzureEntraIDManager initialization on startup; calls set_config and refresh_token during lifespan startup with error logging
Application Endpoints
src/app/endpoints/query.py, src/app/endpoints/streaming_query.py
Added Azure token expiration check and refresh logic before response retrieval; reloads library client or updates provider data with refreshed credentials
Query Endpoints - Cleanup
src/app/endpoints/query_v2.py
Removed informational OpenAI response logging statement
Llama Stack Client Management
src/client.py
Added library-mode vs service-mode routing; new _load_library_client(), _load_service_client(), _enrich_library_config(); added reload_library_client(), update_provider_data() methods; added is_library_client property
Lightspeed Stack Startup
src/lightspeed_stack.py
Removed Llama Stack configuration generation CLI flags (-g, -i, -o) and related code path
Docker Compose - Service Mode
docker-compose.yaml
Added lightspeed-stack.yaml volume mounts; replaced AZURE_API_KEY with TENANT_ID, CLIENT_ID, CLIENT_SECRET environment variables in llama-stack and lightspeed-stack services
Docker Compose - Library Mode
docker-compose-library.yaml
Replaced AZURE_API_KEY with TENANT_ID, CLIENT_ID, CLIENT_SECRET environment variables; added comment indicating dynamic token handling
Build & Runtime Configuration
Makefile
Added CONFIG and LLAMA_STACK_CONFIG variables; updated run target to use -c $(CONFIG); added run-llama-stack target
Container Configuration
test.containerfile
Added azure-identity to pip install; copied llama_stack_configuration.py and llama-stack-entrypoint.sh; added ENTRYPOINT directive
Container Entrypoint
scripts/llama-stack-entrypoint.sh
New script for config enrichment and environment variable setup; executes enrichment if lightspeed config exists
E2E Workflow - General
.github/workflows/e2e_tests.yaml
Changed config path from fixed run-ci.yaml to dynamic run-${{ matrix.environment }}.yaml
E2E Workflow - Providers
.github/workflows/e2e_tests_providers.yaml
Removed "Get Azure API key" step; added "Add Azure Entra ID config block" step; updated Run services steps to use TENANT_ID, CLIENT_ID, CLIENT_SECRET instead of AZURE_API_KEY
Configuration Examples - Entra ID Library
examples/lightspeed-stack-azure-entraid-lib.yaml
New example config for library-mode Lightspeed Core Service with azure_entra_id credentials sourced from environment variables
Configuration Examples - Entra ID Service
examples/lightspeed-stack-azure-entraid-service.yaml
New example config for service-mode Lightspeed Core Service with azure_entra_id credentials sourced from environment variables
Configuration Examples - Cleanup
examples/azure-run.yaml, tests/e2e/configs/run-azure.yaml
Minor trailing newline formatting fixes
Test Configs - Whitespace Cleanup
tests/e2e/configuration/library-mode/*, tests/e2e/configuration/server-mode/*
Removed trailing blank lines from test configuration files
E2E Test Steps - Shield Validation
tests/e2e/features/steps/info.py
Removed dynamic provider_resource_id validation; hardcoded expectation to "openai/gpt-4o-mini"
Unit Tests - Azure Token Manager
tests/unit/authorization/test_azure_token_manager.py
New comprehensive test module covering singleton behavior, token expiration, refresh success/failure, configuration management
Unit Tests - Client
tests/unit/test_client.py
Added reset_singleton fixture; added test_update_provider_data_service_client() and test_reload_library_client() tests
Unit Tests - Configuration
tests/unit/test_configuration.py
Added test_load_configuration_with_azure_entra_id() and test_load_configuration_with_incomplete_azure_entra_id_raises() tests
Unit Tests - Llama Stack Configuration
tests/unit/test_llama_stack_configuration.py
Refactored tests to use dict-based BYOK RAG configurations; updated assertions for dictionary structures
Unit Tests - Configuration Dump
tests/unit/models/config/test_dump_configuration.py
Added assertions for azure_entra_id presence in dumped configuration
Unit Tests - Formatting
tests/unit/authentication/test_api_key_token.py
Added blank line after module docstring
Documentation - Provider Guide
docs/providers.md
Added comprehensive "Azure Provider - Entra ID Authentication Guide" section covering configuration, token lifecycle, startup sequencing, security, and deployment examples
Documentation - Compatibility Table
README.md
Updated Azure row in LLM compatibility table; split into two rows with different model sets and support status
OpenAPI Schema
docs/openapi.json
Added AzureEntraIdConfiguration schema; updated Configuration to include azure_entra_id; expanded ModelContextProtocolServer with authorization_headers and timeout; updated probes wording
Dependencies - Python
pyproject.toml
Added azure-core and azure-identity dependencies
Dependencies - Build
requirements-build.txt
Added commented azure-identity entries under wheel and unsafe dependencies blocks
Dependencies - Hashes
requirements.hashes.source.txt
Added hashes for azure-core, azure-identity, msal, msal-extensions, regex
Dependencies - Wheels
requirements.hashes.wheel.txt
Added einops==0.8.1; removed regex==2025.11.3 entries
Build Pipelines
.tekton/lightspeed-stack-pull-request.yaml, .tekton/lightspeed-stack-push.yaml
Added einops to prefetch pip-binary packages list

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client Request
    participant Endpoint as Query Endpoint
    participant TokenMgr as AzureEntraIDManager
    participant AzureAD as Azure AD
    participant AzureOAI as Azure OpenAI

    Client->>Endpoint: Query Request
    Endpoint->>TokenMgr: Check is_token_expired
    alt Token Expired
        TokenMgr->>AzureAD: refresh_token()
        AzureAD-->>TokenMgr: AccessToken (with expiry)
        TokenMgr->>TokenMgr: Cache token & update AZURE_API_KEY
        alt Library Client Mode
            Endpoint->>Endpoint: reload_library_client()
        else Service Client Mode
            Endpoint->>Endpoint: update_provider_data(api_key, api_base)
        end
    end
    Endpoint->>AzureOAI: Request with Token
    AzureOAI-->>Endpoint: Response
    Endpoint-->>Client: Query Result
Loading
sequenceDiagram
    participant Startup as Application Startup
    participant Config as AppConfig
    participant TokenMgr as AzureEntraIDManager
    participant AzureAD as Azure AD
    participant Llama as Llama Stack

    Startup->>Config: Load Configuration
    Config-->>Startup: Configuration Object
    Startup->>TokenMgr: Initialize (singleton)
    Startup->>Config: Get azure_entra_id
    Config-->>Startup: AzureEntraIdConfiguration
    Startup->>TokenMgr: set_config()
    Startup->>TokenMgr: refresh_token()
    TokenMgr->>AzureAD: ClientSecretCredential.get_token()
    AzureAD-->>TokenMgr: AccessToken
    TokenMgr->>TokenMgr: Cache & store in AZURE_API_KEY
    Startup->>Llama: Initialize Llama Stack
    Llama-->>Startup: Ready
    Startup-->>Startup: Application Ready
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

The PR introduces a substantial authentication system spanning configuration, token management, multiple endpoints, and deployment. The changes are heterogeneous: new singleton token manager with credential flow, configuration model extensions, token refresh logic integrated across query and streaming endpoints, Docker/Makefile updates, and comprehensive test refactoring. Token expiration logic, provider data updates, and deployment manifest correctness require careful analysis.

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • tisnik
  • radofuchs
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'LCORE-861: Azure Entra ID token managment' references the issue key and main feature, but contains a typo ('managment' instead of 'management') and is partially related to the changeset. The PR primarily implements Azure Entra ID authentication, token management, and Llama Stack configuration enrichment.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@asimurka asimurka force-pushed the azure-entra-id-final branch from 481a427 to def9221 Compare January 14, 2026 08:27
echo "=== Recent commits ==="
git log --oneline -5
- name: Add Azure Entra ID config block to all test configs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same set of LCS config files is shared across all the test jobs. Thus the Entra ID config block has to be added only to the relevant test jobs.

def _update_access_token(self, token: str, expires_on: int) -> None:
"""Update the token in env var and track expiration time."""
self._expires_on = expires_on - TOKEN_EXPIRATION_LEEWAY
os.environ["AZURE_API_KEY"] = token
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Azure token is stored only as an environment variable (single source of truth principle). Every uvicorn process should have its own scope for environment variables. Environment variable is used for the injection into LLS config.

def update_provider_data(self, updates: dict[str, str]) -> AsyncLlamaStackClient:
"""Update provider data headers for service client.
For use with service mode only.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different principle for service client then for library client. The re-injection of a new access token would need to restart the whole service, which is not feasible. Therefore, it replaces tokens in providers' header which is preferably used.

# configuration for Llama Stack
if args.generate_llama_stack_configuration:
try:
generate_configuration(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to src/llama_stack_configuration.py

@asimurka asimurka marked this pull request as ready for review January 14, 2026 09:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
src/lightspeed_stack.py (2)

26-40: Docstring references removed CLI flags.

The docstring still documents the removed -g/--generate-llama-stack-configuration, -i/--input-config-file, and -o/--output-config-file flags (lines 33-36) that are no longer part of the argument parser.

📝 Suggested fix
 def create_argument_parser() -> ArgumentParser:
     """Create and configure argument parser object.

     The parser includes these options:
     - -v / --verbose: enable verbose output
     - -d / --dump-configuration: dump the loaded configuration to JSON and exit
     - -c / --config: path to the configuration file (default "lightspeed-stack.yaml")
-    - -g / --generate-llama-stack-configuration: generate a Llama Stack
-                                                 configuration from the service configuration
-    - -i / --input-config-file: Llama Stack input configuration filename (default "run.yaml")
-    - -o / --output-config-file: Llama Stack output configuration filename (default "run_.yaml")

     Returns:
         Configured ArgumentParser for parsing the service CLI options.
     """

69-86: Docstring references removed functionality.

The main() docstring still references --generate-llama-stack-configuration behavior (lines 77-79) and mentions "Llama Stack generation fails" as a SystemExit condition (lines 84-85), but this functionality has been moved to src/llama_stack_configuration.py.

📝 Suggested fix
 def main() -> None:
     """Entry point to the web service.

     Start the Lightspeed Core Stack service process based on CLI flags and configuration.

     Parses command-line arguments, loads the configured settings, and then:
     - If --dump-configuration is provided, writes the active configuration to
       configuration.json and exits (exits with status 1 on failure).
-    - If --generate-llama-stack-configuration is provided, generates and stores
-      the Llama Stack configuration to the specified output file and exits
-      (exits with status 1 on failure).
     - Otherwise, sets LIGHTSPEED_STACK_CONFIG_PATH for worker processes, starts
       the quota scheduler, and starts the Uvicorn web service.

     Raises:
-        SystemExit: when configuration dumping or Llama Stack generation fails
-                    (exits with status 1).
+        SystemExit: when configuration dumping fails (exits with status 1).
     """
tests/unit/test_client.py (2)

53-66: Missing @pytest.mark.asyncio decorator.

This async test function is missing the @pytest.mark.asyncio decorator, which could cause it to not execute properly as an async test.

Proposed fix
+@pytest.mark.asyncio
 async def test_get_async_llama_stack_remote_client() -> None:

69-83: Missing @pytest.mark.asyncio decorator.

Same issue as above - this async test function needs the @pytest.mark.asyncio decorator.

Proposed fix
+@pytest.mark.asyncio
 async def test_get_async_llama_stack_wrong_configuration() -> None:
🤖 Fix all issues with AI agents
In `@docs/providers.md`:
- Around line 87-89: The YAML example uses AZURE_TENANT_ID / AZURE_CLIENT_ID /
AZURE_CLIENT_SECRET but the bash export example uses TENANT_ID / CLIENT_ID /
CLIENT_SECRET; make them consistent by renaming the bash exports to
AZURE_TENANT_ID, AZURE_CLIENT_ID, and AZURE_CLIENT_SECRET (or alternately rename
the YAML keys to the non-prefixed names) and update all occurrences (including
the other instance noted) or add a brief note explaining the intentional
difference so readers aren’t confused.

In `@examples/azure-run.yaml`:
- Around line 71-77: The YAML uses an outdated Azure OpenAI api_version value;
update the api_version field under the inference -> config block (the entry with
provider_id: azure and provider_type: remote::azure) to a current release—e.g.,
set api_version to "2025-04-01-preview" for the data-plane or "2025-06-01" if
you need control-plane GA features—so the inference provider uses a supported
API version.

In `@scripts/llama-stack-entrypoint.sh`:
- Around line 7-10: The issue is that INPUT_CONFIG and ENRICHED_CONFIG can point
to the same file so the existence check for ENRICHED_CONFIG can be true even
when enrichment failed; update the script so ENRICHED_CONFIG uses a distinct
default path (e.g., a different filename) or capture the enrichment script's
exit status instead of suppressing errors with "|| true" and use that exit code
to decide whether to treat the output as enriched; refer to the variables
INPUT_CONFIG and ENRICHED_CONFIG and the enrichment invocation (the command
currently followed by "|| true") and the conditional "if [ -f
\"$ENRICHED_CONFIG\" ]" when making the change.

In `@src/app/endpoints/query.py`:
- Around line 319-339: The current code uses next(...) to find azure_config and
will raise StopIteration if no provider with provider_type == "remote::azure"
exists and may convert a missing api_base into the string "None"; change the
lookup to a safe search (e.g., use next((p.config for p in await
client.providers.list() if p.provider_type == "remote::azure"), None) or
explicitly iterate and return None if not found), then check that azure_config
is not None before calling client_holder.update_provider_data (handle the
missing provider case appropriately), and when building azure_api_base use a
safe default instead of str(None) (e.g., use azure_config.get("api_base") or ""
only if not None) so you never store the literal "None" string.

In `@src/app/endpoints/streaming_query.py`:
- Around line 896-916: The next(...) call that looks for a provider with
provider_type == "remote::azure" can raise StopIteration; change the lookup to
use a safe lookup (e.g., azure_config = next((p.config for p in await
client.providers.list() if p.provider_type == "remote::azure"), None)) and then
check azure_config is not None before using it in
client_holder.update_provider_data; if azure_config is None, handle gracefully
(log/raise a clear error or return an appropriate response) so
AzureEntraIDManager.refresh_token() path won't crash when the Azure provider is
missing.

In `@src/llama_stack_configuration.py`:
- Around line 106-108: The except block that catches ClientAuthenticationError
and CredentialUnavailableError (the Azure Entra ID token generation block that
currently calls logger.error("Azure Entra ID: Failed to generate token: %s", e))
must not silently swallow failures; after logging the error re-raise the caught
exception (raise) so callers can handle or fail fast. Locate the except handling
those two exceptions in the token generation function and append a re-raise (or
alternatively change the function to return an explicit failure status and
propagate that to callers) so authentication failures are observable to callers.
- Around line 146-148: The code builds provider_id using
brag.get("vector_db_id") which can be None and yield a malformed "byok_" value;
update the logic around the vector_db_id/provider_id assignment (where
"vector_db_id", "provider_id", "embedding_model" are set) to validate or provide
a safe default for vector_db_id (e.g., use empty-string-safe fallback or a
sentinel like "unknown") and then construct provider_id from that validated
value (or skip/omit provider_id when vector_db_id is absent) so provider_id is
never "byok_" with an empty suffix.
🧹 Nitpick comments (22)
pyproject.toml (1)

63-64: Add version constraints to Azure dependencies for consistency with other packages.

Both azure-core and azure-identity are missing version constraints, while every other dependency in the file specifies minimum or exact versions (e.g., fastapi>=0.115.12, kubernetes>=30.1.0). Consider using azure-core>=1.37.0 and azure-identity>=1.25.1 to maintain consistency and prevent unexpected behavior from major version updates.

src/app/main.py (1)

44-55: Redundant environment variable assignment.

Lines 49-51 set AZURE_API_KEY from access_token, but AzureEntraIDManager.refresh_token() already sets os.environ["AZURE_API_KEY"] internally via _update_access_token(). The explicit assignment is unnecessary.

♻️ Suggested simplification
     azure_config = configuration.configuration.azure_entra_id
     if azure_config is not None:
         AzureEntraIDManager().set_config(azure_config)
         try:
             await AzureEntraIDManager().refresh_token()
-            os.environ["AZURE_API_KEY"] = (
-                AzureEntraIDManager().access_token.get_secret_value()
-            )
             logger.info("Azure Entra ID token set in environment")
         except ValueError as e:
             logger.error("Failed to refresh Azure token: %s", e)

Additionally, consider whether a failed token refresh should prevent startup. Currently, a ValueError is logged but the service continues, which could lead to authentication failures at runtime.

src/app/endpoints/query.py (1)

319-327: Consider storing the singleton reference for cleaner code.

AzureEntraIDManager() is instantiated multiple times (lines 321, 322, 324, and 336). While it's a singleton, storing a single reference improves readability.

♻️ Suggested refactor
         if provider_id == "azure":
+            azure_manager = AzureEntraIDManager()
             if (
-                AzureEntraIDManager().is_entra_id_configured
-                and AzureEntraIDManager().is_token_expired
+                azure_manager.is_entra_id_configured
+                and azure_manager.is_token_expired
             ):
-                await AzureEntraIDManager().refresh_token()
+                await azure_manager.refresh_token()

                 if client_holder.is_library_client:
                     client = await client_holder.reload_library_client()
                 else:
                     # ... provider lookup ...
                     client = client_holder.update_provider_data(
                         {
-                            "azure_api_key": AzureEntraIDManager().access_token.get_secret_value(),
+                            "azure_api_key": azure_manager.access_token.get_secret_value(),
                             "azure_api_base": str(azure_config.get("api_base")),
                         }
                     )
scripts/llama-stack-entrypoint.sh (1)

15-19: Consider capturing enrichment failures for debugging.

The || true suppresses all errors from the enrichment script without logging them. If enrichment fails, operators won't have visibility into why.

♻️ Suggested improvement
     echo "Enriching llama-stack config..."
-    python3 /opt/app-root/llama_stack_configuration.py \
+    if ! python3 /opt/app-root/llama_stack_configuration.py \
         -c "$LIGHTSPEED_CONFIG" \
         -i "$INPUT_CONFIG" \
         -o "$ENRICHED_CONFIG" \
-        -e "$ENV_FILE" 2>&1 || true
+        -e "$ENV_FILE" 2>&1; then
+        echo "Warning: Config enrichment failed, will use original config"
+    fi
src/models/config.py (1)

1482-1494: Consider adding Field metadata for consistency.

The tenant_id, client_id, and client_secret fields lack the title and description metadata that other configuration fields in this module include. This affects API documentation and schema generation consistency.

♻️ Suggested addition of field metadata
 class AzureEntraIdConfiguration(ConfigurationBase):
     """Microsoft Entra ID authentication attributes for Azure."""

-    tenant_id: SecretStr
-    client_id: SecretStr
-    client_secret: SecretStr
+    tenant_id: SecretStr = Field(
+        ...,
+        title="Tenant ID",
+        description="Azure AD tenant ID for authentication.",
+    )
+    client_id: SecretStr = Field(
+        ...,
+        title="Client ID",
+        description="Azure AD application (client) ID.",
+    )
+    client_secret: SecretStr = Field(
+        ...,
+        title="Client Secret",
+        description="Azure AD client secret for authentication.",
+    )
     scope: str = Field(
         "https://cognitiveservices.azure.com/.default",
         title="Token scope",
         description="Azure Cognitive Services scope for token requests. "
         "Override only if using a different Azure service.",
     )
examples/lightspeed-stack-azure-entraid-service.yaml (2)

1-8: Consider adding a security note for auth_enabled: false.

This example has auth_enabled: false on line 5. While appropriate for development examples, consider adding a comment clarifying this should be enabled in production deployments.


9-17: Clarify the placeholder api_key value.

The api_key: xyzzy on line 17 is a placeholder. Since Azure Entra ID manages authentication dynamically, consider adding a comment explaining that this value is overridden at runtime by the token management system, or use a more descriptive placeholder like "<replaced-at-runtime>".

Makefile (1)

20-23: Handle missing .env file gracefully and clarify in-place config modification.

Two observations:

  1. Missing .env handling: If .env doesn't exist or AZURE_API_KEY isn't defined, grep will fail silently and AZURE_API_KEY will be empty, potentially causing runtime issues without clear error messages.

  2. In-place modification: Using the same file for -i and -o ($(LLAMA_STACK_CONFIG)) overwrites the original. This may be intentional but could surprise users.

Suggested improvement for `.env` handling
 run-llama-stack: ## Start Llama Stack with enriched config (for local service mode)
+	`@test` -f .env || (echo "Error: .env file not found" && exit 1)
 	uv run src/llama_stack_configuration.py -c $(CONFIG) -i $(LLAMA_STACK_CONFIG) -o $(LLAMA_STACK_CONFIG) && \
-	AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env | cut -d'=' -f2-) \
+	AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env | cut -d'=' -f2- || echo "") \
 	uv run llama stack run $(LLAMA_STACK_CONFIG)
src/app/endpoints/streaming_query.py (1)

897-901: Consider caching the AzureEntraIDManager() instance.

While AzureEntraIDManager is a singleton, calling it three times in this block is slightly verbose. Consider storing it in a local variable for readability.

Suggested improvement
         if provider_id == "azure":
+            azure_manager = AzureEntraIDManager()
             if (
-                AzureEntraIDManager().is_entra_id_configured
-                and AzureEntraIDManager().is_token_expired
+                azure_manager.is_entra_id_configured
+                and azure_manager.is_token_expired
             ):
-                await AzureEntraIDManager().refresh_token()
+                await azure_manager.refresh_token()
tests/unit/authorization/test_azure_token_manager.py (1)

122-143: Type annotation improvement and assertion placement.

  1. The caplog parameter should use pytest.LogCaptureFixture instead of Any for proper type checking.
  2. The assertion at line 143 is inside the caplog.at_level context manager - while this works, placing it outside is cleaner.
Suggested improvement
     `@pytest.mark.asyncio`
     async def test_refresh_token_failure_logs_error(
         self,
         token_manager: AzureEntraIDManager,
         dummy_config: AzureEntraIdConfiguration,
         mocker: MockerFixture,
-        caplog: Any,
+        caplog: pytest.LogCaptureFixture,
     ) -> None:
         """Log error when token retrieval fails due to ClientAuthenticationError."""
         token_manager.set_config(dummy_config)
         mock_credential_instance = mocker.Mock()
         mock_credential_instance.get_token.side_effect = ClientAuthenticationError(
             "fail"
         )
         mocker.patch(
             "authorization.azure_token_manager.ClientSecretCredential",
             return_value=mock_credential_instance,
         )

         with caplog.at_level("ERROR"):
             await token_manager.refresh_token()
-            assert "Failed to retrieve Azure access token" in caplog.text
+        assert "Failed to retrieve Azure access token" in caplog.text
tests/unit/test_llama_stack_configuration.py (1)

157-177: Multiple type: ignore comments indicate potential model validation issues.

The extensive use of # type: ignore[call-arg] suggests the Pydantic models have required fields not being provided. While this works at runtime, consider:

  1. Creating a test fixture with all required fields
  2. Using model_construct() to bypass validation for test-only scenarios

This would improve type safety and make the test more maintainable.

docker-compose.yaml (1)

11-14: Inconsistent SELinux volume mount labels.

Line 12 uses :Z (exclusive private label) for run.yaml, while line 14 uses :z (shared label) for lightspeed-stack.yaml. This inconsistency could cause issues:

  • :Z relabels the content to be private to the container
  • :z allows sharing between containers

If both services need to read lightspeed-stack.yaml, :z is correct for that file. Consider making the labels consistent based on actual sharing requirements.

src/authorization/azure_token_manager.py (2)

59-74: Consider making token retrieval truly async or documenting blocking behavior.

The refresh_token method is async but calls the synchronous _retrieve_access_token which blocks on the Azure SDK's get_token(). This blocks the event loop during token retrieval.

Options:

  1. Run the blocking call in an executor: await asyncio.to_thread(self._retrieve_access_token)
  2. Document that brief blocking is expected (token refresh is infrequent)

For a token refresh that happens every ~hour, the brief blocking may be acceptable, but worth documenting.

Option 1: Use asyncio.to_thread
+import asyncio
+
     async def refresh_token(self) -> None:
         """Refresh the cached Azure access token.

         This is async to enforce proper ordering in the event loop -
         callers must await this before using the refreshed token.

         Raises:
             ValueError: If Entra ID configuration has not been set.
         """
         if self._entra_id_config is None:
             raise ValueError("Azure Entra ID configuration not set")

         logger.info("Refreshing Azure access token")
-        token_obj = self._retrieve_access_token()
+        token_obj = await asyncio.to_thread(self._retrieve_access_token)
         if token_obj:
             self._update_access_token(token_obj.token, token_obj.expires_on)

90-100: Error logging lacks exception details for debugging.

When token retrieval fails, only a generic message is logged without the exception details. This makes troubleshooting authentication issues harder.

Include exception in log
         except (ClientAuthenticationError, CredentialUnavailableError):
-            logger.error("Failed to retrieve Azure access token")
+            logger.error("Failed to retrieve Azure access token", exc_info=True)
             return None
src/llama_stack_configuration.py (8)

82-100: Consider security implications of writing token to disk.

The Azure access token is written in plaintext to the .env file. While this enables the Llama Stack service to read it at startup, be aware that:

  1. Tokens persisted to disk may be accessed by other processes
  2. The file should have restrictive permissions (e.g., 0600)

Consider adding file permission restrictions after writing.

Proposed enhancement: Set restrictive file permissions
         with open(env_file, "w", encoding="utf-8") as f:
             f.writelines(lines)
+        
+        # Restrict file permissions to owner-only
+        os.chmod(env_file, 0o600)

         logger.info(

126-127: Docstring type doesn't match signature.

The docstring references list[ByokRag] but the actual parameter type is list[dict[str, Any]]. Update the docstring to match the implementation.

Proposed fix
     Parameters:
         ls_config (dict[str, Any]): Existing Llama Stack configuration mapping
         used as the base; existing `vector_dbs` entries are preserved if
         present.
-        byok_rag (list[ByokRag]): List of BYOK RAG definitions to be added to
+        byok_rag (list[dict[str, Any]]): List of BYOK RAG definitions to be added to
         the `vector_dbs` section.

139-140: Potential mutation of input data via reference assignment.

Assigning output = ls_config["vector_dbs"] creates a reference to the original list. Subsequent output.append() calls will modify the original ls_config["vector_dbs"] in place. Per coding guidelines, avoid in-place parameter modification; return new data structures instead.

Proposed fix: Create a copy
     # fill-in existing vector_dbs entries
     if "vector_dbs" in ls_config:
-        output = ls_config["vector_dbs"]
+        output = list(ls_config["vector_dbs"])

173-174: Docstring type doesn't match signature.

Same issue as construct_vector_dbs_section: docstring mentions list[ByokRag] but actual type is list[dict[str, Any]].

Proposed fix
-        byok_rag (list[ByokRag]): List of BYOK RAG specifications to convert
+        byok_rag (list[dict[str, Any]]): List of BYOK RAG specifications to convert
         into provider entries.

187-188: Same reference assignment issue - creates mutation side effect.

As per coding guidelines, avoid in-place parameter modification.

Proposed fix
     # fill-in existing vector_io entries
     if "providers" in ls_config and "vector_io" in ls_config["providers"]:
-        output = ls_config["providers"]["vector_io"]
+        output = list(ls_config["providers"]["vector_io"])

253-256: Missing error handling for input file operations.

If input_file doesn't exist or contains invalid YAML, the exception will propagate up unhandled. Consider wrapping in try-except for consistent error handling.

Proposed fix
     logger.info("Reading Llama Stack configuration from file %s", input_file)

-    with open(input_file, "r", encoding="utf-8") as file:
-        ls_config = yaml.safe_load(file)
+    try:
+        with open(input_file, "r", encoding="utf-8") as file:
+            ls_config = yaml.safe_load(file)
+    except FileNotFoundError:
+        logger.error("Input Llama Stack config not found: %s", input_file)
+        raise
+    except yaml.YAMLError as e:
+        logger.error("Invalid YAML in input file %s: %s", input_file, e)
+        raise

310-312: Consider using sys.exit for error conditions.

When the config file is not found, the function logs an error and returns. For CLI tools, it's more conventional to exit with a non-zero status code to signal failure to calling scripts or CI pipelines.

Proposed fix
+import sys
+
 ...
 
     except FileNotFoundError:
         logger.error("Config not found: %s", args.config)
-        return
+        sys.exit(1)

275-276: Docstring could be more descriptive.

Per coding guidelines, all functions require docstrings with brief descriptions. The current docstring "CLI entry point." is minimal. Consider expanding to describe what the CLI does.

Proposed enhancement
 def main() -> None:
-    """CLI entry point."""
+    """CLI entry point for Llama Stack configuration enrichment.
+
+    Reads Lightspeed configuration, applies Azure Entra ID token setup
+    and BYOK RAG enrichment, and writes the enriched Llama Stack config.
+    """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c8a7ff and def9221.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (38)
  • .github/workflows/e2e_tests.yaml
  • .github/workflows/e2e_tests_providers.yaml
  • Makefile
  • README.md
  • docker-compose-library.yaml
  • docker-compose.yaml
  • docs/providers.md
  • examples/azure-run.yaml
  • examples/lightspeed-stack-azure-entraid-lib.yaml
  • examples/lightspeed-stack-azure-entraid-service.yaml
  • pyproject.toml
  • scripts/llama-stack-entrypoint.sh
  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/main.py
  • src/authorization/azure_token_manager.py
  • src/client.py
  • src/configuration.py
  • src/lightspeed_stack.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • test.containerfile
  • tests/e2e/configs/run-azure.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/features/steps/info.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_client.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
💤 Files with no reviewable changes (6)
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
  • src/app/endpoints/query_v2.py
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
🧰 Additional context used
📓 Path-based instructions (8)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = logging.getLogger(__name__) pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, using typing_extensions.Self for model validators
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Run uv run make format to auto-format code with black and ruff before completion
Run uv run make verify to run all linters (black, pyl...

Files:

  • src/models/config.py
  • src/configuration.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/test_client.py
  • tests/e2e/features/steps/info.py
  • tests/unit/authorization/test_azure_token_manager.py
  • src/app/endpoints/streaming_query.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • src/lightspeed_stack.py
  • src/app/endpoints/query.py
  • src/client.py
  • src/authorization/azure_token_manager.py
  • src/app/main.py
  • src/llama_stack_configuration.py
src/models/config.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Configuration base class sets extra="forbid" to reject unknown fields
Use type hints: Optional[FilePath], PositiveInt, SecretStr for configuration fields
Extend ConfigurationBase for configuration Pydantic models

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Extend BaseModel for data Pydantic models
Use @model_validator and @field_validator for Pydantic model validation

Files:

  • src/models/config.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest for all unit and integration tests, not unittest
Use pytest-mock for AsyncMock objects in tests
Use MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") pattern for authentication mocks in tests

Files:

  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/test_client.py
  • tests/e2e/features/steps/info.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/unit/**/*.py: Unit tests require 60% code coverage
Write unit tests covering new functionality before completion

Files:

  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/test_client.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
tests/e2e/features/steps/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define E2E test steps in tests/e2e/features/steps/ directory

Files:

  • tests/e2e/features/steps/info.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/endpoints/query.py
  • src/app/main.py
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: Check pyproject.toml for supported Python versions before development
Always check pyproject.toml for existing dependencies before adding new ones
Always verify current library versions in pyproject.toml rather than assuming versions
Use pylint with source-roots = "src" configuration

Files:

  • pyproject.toml
🧠 Learnings (13)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
📚 Learning: 2025-08-19T08:57:27.714Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.

Applied to files:

  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • src/models/config.py
  • src/configuration.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/**/*.py : Use `MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")` pattern for authentication mocks in tests

Applied to files:

  • tests/unit/authentication/test_api_key_token.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • tests/unit/test_client.py
  • test.containerfile
  • src/app/endpoints/streaming_query.py
  • src/lightspeed_stack.py
  • src/client.py
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.

Applied to files:

  • test.containerfile
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • examples/azure-run.yaml
  • src/lightspeed_stack.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling

Applied to files:

  • src/lightspeed_stack.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt`

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • .github/workflows/e2e_tests.yaml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to pyproject.toml : Always check `pyproject.toml` for existing dependencies before adding new ones

Applied to files:

  • pyproject.toml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Never commit secrets/keys; use environment variables for sensitive data

Applied to files:

  • src/authorization/azure_token_manager.py
🧬 Code graph analysis (11)
src/models/config.py (1)
src/configuration.py (1)
  • azure_entra_id (345-349)
src/configuration.py (1)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1482-1493)
tests/unit/test_client.py (2)
src/client.py (5)
  • load (32-40)
  • get_client (101-115)
  • is_library_client (28-30)
  • update_provider_data (133-164)
  • reload_library_client (117-131)
src/utils/types.py (1)
  • Singleton (43-57)
tests/unit/authorization/test_azure_token_manager.py (3)
src/authorization/azure_token_manager.py (7)
  • AzureEntraIDManager (22-100)
  • access_token (55-57)
  • is_token_expired (50-52)
  • is_entra_id_configured (45-47)
  • set_config (39-42)
  • refresh_token (59-74)
  • _update_access_token (76-83)
src/configuration.py (1)
  • configuration (95-106)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1482-1493)
src/app/endpoints/streaming_query.py (2)
src/authorization/azure_token_manager.py (5)
  • AzureEntraIDManager (22-100)
  • is_entra_id_configured (45-47)
  • is_token_expired (50-52)
  • refresh_token (59-74)
  • access_token (55-57)
src/client.py (5)
  • AsyncLlamaStackClientHolder (21-164)
  • get_client (101-115)
  • is_library_client (28-30)
  • reload_library_client (117-131)
  • update_provider_data (133-164)
tests/unit/test_configuration.py (1)
src/configuration.py (2)
  • load_configuration (65-74)
  • azure_entra_id (345-349)
tests/unit/test_llama_stack_configuration.py (1)
src/llama_stack_configuration.py (3)
  • generate_configuration (239-267)
  • construct_vector_dbs_section (115-157)
  • construct_vector_io_providers_section (160-210)
src/app/endpoints/query.py (2)
src/authorization/azure_token_manager.py (4)
  • is_entra_id_configured (45-47)
  • is_token_expired (50-52)
  • refresh_token (59-74)
  • access_token (55-57)
src/client.py (4)
  • get_client (101-115)
  • is_library_client (28-30)
  • reload_library_client (117-131)
  • update_provider_data (133-164)
src/authorization/azure_token_manager.py (3)
src/configuration.py (1)
  • configuration (95-106)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1482-1493)
src/utils/types.py (1)
  • Singleton (43-57)
src/app/main.py (6)
src/authorization/azure_token_manager.py (4)
  • AzureEntraIDManager (22-100)
  • set_config (39-42)
  • refresh_token (59-74)
  • access_token (55-57)
src/app/database.py (1)
  • initialize_database (102-136)
src/configuration.py (2)
  • configuration (95-106)
  • azure_entra_id (345-349)
src/log.py (1)
  • get_logger (7-25)
src/a2a_storage/storage_factory.py (1)
  • A2AStorageFactory (20-185)
src/models/responses.py (1)
  • InternalServerErrorResponse (1787-1945)
src/llama_stack_configuration.py (2)
src/models/config.py (3)
  • config (325-342)
  • config (1306-1314)
  • dump (1607-1619)
src/lightspeed_stack.py (1)
  • main (69-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (55)
tests/unit/authentication/test_api_key_token.py (1)

3-6: LGTM!

The added blank line after the module docstring improves formatting consistency with PEP 8 style conventions.

tests/e2e/configs/run-azure.yaml (2)

79-82: Verify: Should this config use the new Azure Entra ID credentials?

This e2e test configuration still uses ${env.AZURE_API_KEY} for the Azure inference provider. Given that the PR introduces Azure Entra ID token management with TENANT_ID, CLIENT_ID, and CLIENT_SECRET, please confirm whether this configuration should be updated to use the new credential pattern or if AZURE_API_KEY is intentionally retained for backward compatibility testing.


90-90: LGTM!

The whitespace normalization (blank line after provider entry and trailing newline) improves file formatting consistency.

Also applies to: 156-156

tests/e2e/configuration/server-mode/lightspeed-stack.yaml (1)

19-20: LGTM!

No functional changes; the file formatting is normalized with a proper trailing newline.

tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)

18-19: LGTM!

Trailing newline added for proper file formatting. No functional changes.

tests/e2e/features/steps/info.py (1)

113-116: Hardcoded value reduces test flexibility and creates inconsistency.

The check_shield_structure function now uses a hardcoded "openai/gpt-4o-mini" for provider_resource_id, while check_model_structure (lines 49-58) dynamically uses context.default_model and context.default_provider. This inconsistency could cause test failures if the test environment configuration changes.

Consider using the context variables for consistency:

♻️ Suggested fix
+    expected_model = f"{context.default_provider}/{context.default_model}"
-    assert found_shield["provider_resource_id"] == "openai/gpt-4o-mini", (
-        f"provider_resource_id should be 'openai/gpt-4o-mini', "
+    assert found_shield["provider_resource_id"] == expected_model, (
+        f"provider_resource_id should be '{expected_model}', "
         f"but is '{found_shield['provider_resource_id']}'"
     )

Was this hardcoding intentional to stabilize tests during the Azure credential migration? If so, please add a comment explaining the rationale.

tests/unit/models/config/test_dump_configuration.py (7)

103-103: LGTM!

The assertion correctly verifies that the new azure_entra_id field is present in the serialized configuration output.


205-206: LGTM!

The expected JSON correctly includes "azure_entra_id": None, aligning with the new optional configuration field and maintaining consistency with the assertion on line 103.


415-415: LGTM!

Consistent assertion for the new field presence in the quota limiters test.


532-533: LGTM!

Expected JSON correctly updated for the quota limiters with different values test.


749-750: LGTM!

Expected JSON correctly updated for the quota limiters with different scheduler values test.


940-941: LGTM!

Expected JSON correctly updated for the BYOK RAG configuration test.


1117-1118: LGTM!

Expected JSON correctly updated for the PostgreSQL namespace configuration test.

src/app/main.py (1)

12-12: LGTM!

Import added for the new Azure Entra ID token management functionality.

docker-compose-library.yaml (1)

23-26: LGTM with a naming consideration.

The migration from AZURE_API_KEY to separate Entra ID credentials (TENANT_ID, CLIENT_ID, CLIENT_SECRET) correctly supports dynamic token management. The comment on line 23 helpfully explains the new flow.

Consider using more specific names like AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET to avoid potential conflicts with other services or providers that might use similar generic names.

Verify that the configuration loading code expects these exact variable names and update consistently if renamed.

README.md (1)

179-180: LGTM!

Documentation updated to reflect Azure model compatibility with the new Entra ID authentication flow. The table formatting is consistent with other provider entries.

.github/workflows/e2e_tests.yaml (1)

63-63: LGTM!

The whitespace adjustment and the updated configuration source message now correctly reflect the environment-specific run YAML selection (run-${{ matrix.environment }}.yaml), aligning with the existing config selection logic.

Also applies to: 103-103

examples/lightspeed-stack-azure-entraid-lib.yaml (1)

1-30: LGTM!

This example configuration correctly demonstrates the Azure Entra ID integration for library mode, with proper environment variable references for credentials. The structure aligns with the new AzureEntraIdConfiguration model.

src/app/endpoints/query.py (2)

72-73: LGTM!

The import of AzureEntraIDManager follows the absolute import pattern as per coding guidelines.


310-311: LGTM!

The refactored client acquisition via AsyncLlamaStackClientHolder properly supports the dynamic token refresh workflow.

scripts/llama-stack-entrypoint.sh (1)

21-25: LGTM!

The .env sourcing with set -a / set +a correctly exports all variables, and the shellcheck directive is appropriately applied.

src/models/config.py (1)

1605-1605: LGTM!

The azure_entra_id field is correctly added as an optional configuration section, consistent with other optional sections like authorization and customization. Based on learnings, all config uses Pydantic models extending ConfigurationBase, which is satisfied.

test.containerfile (3)

2-2: LGTM!

Base image update is consistent with the project's image versioning approach.


6-11: LGTM!

Adding azure-identity dependency aligns with the Azure Entra ID token management feature. The directory setup and ownership configuration follow best practices for rootless container execution.


13-22: No action needed. The llama_stack_configuration.py script is invoked explicitly with the python3 interpreter in the entrypoint script (python3 /opt/app-root/llama_stack_configuration.py), so a shebang line is not required. The chmod +x is unnecessary but harmless since the file is never executed directly as a standalone executable.

src/configuration.py (2)

14-14: LGTM!

Import follows the existing pattern for configuration model imports.


344-349: LGTM!

The new azure_entra_id property follows the established pattern of other configuration accessors in this class. The Optional return type correctly reflects that Azure Entra ID configuration may not be present. The guard clause appropriately raises LogicError when configuration hasn't been loaded.

src/lightspeed_stack.py (1)

108-116: LGTM!

The core startup flow (environment setup, quota scheduler, and Uvicorn) is clean and correct. The comment clarification on line 109 improves readability.

examples/lightspeed-stack-azure-entraid-service.yaml (1)

27-30: LGTM!

The azure_entra_id section correctly wires credentials to environment variables using the ${env.VAR_NAME} syntax, consistent with Llama Stack configuration patterns.

Makefile (1)

13-18: LGTM!

Good use of ?= for configuration variables allowing command-line overrides. The run target update correctly passes the config file path.

tests/unit/test_client.py (3)

13-16: Good practice: Singleton reset fixture for test isolation.

The autouse fixture properly resets singleton state between tests, preventing test pollution.


86-131: Test coverage for update_provider_data looks good.

The test properly validates:

  • Pre-populating existing provider data
  • Merging new updates while preserving existing fields
  • Client replacement behavior

One note: accessing _custom_headers (line 102) is a private attribute access. Consider if there's a public way to set initial headers, though for test setup this is often acceptable.


133-152: Test coverage for reload_library_client is adequate.

The test verifies that:

  • Library mode is correctly identified
  • Reloading returns a new client instance
  • The holder is updated with the new client
tests/unit/test_configuration.py (2)

812-846: Good test coverage for Azure Entra ID configuration.

The test properly validates that all three required fields (tenant_id, client_id, client_secret) are loaded and accessible via get_secret_value().


849-877: Good validation error test for incomplete Azure Entra ID config.

This test ensures that partial configurations properly raise ValidationError, preventing misconfiguration at runtime.

src/app/endpoints/streaming_query.py (1)

52-52: Import follows project conventions.

The Azure token manager import aligns with the codebase's import style.

.github/workflows/e2e_tests_providers.yaml (2)

55-68: Azure Entra ID config injection step looks correct.

The script properly:

  • Only runs when matrix.environment == 'azure'
  • Uses proper escaping (\${env.VAR}) to preserve YAML env var syntax
  • Provides feedback on each modified file

190-192: Credential environment variables properly added.

The TENANT_ID, CLIENT_ID, and CLIENT_SECRET environment variables are correctly propagated to both server and library mode runs.

Also applies to: 221-223

docs/providers.md (1)

68-177: Comprehensive Azure Entra ID documentation.

The documentation covers the authentication flow thoroughly:

  • Configuration requirements with clear attribute table
  • Critical note about the ${env.AZURE_API_KEY} placeholder requirement
  • Token lifecycle for both library and service modes
  • Security considerations
tests/unit/authorization/test_azure_token_manager.py (2)

21-41: Well-structured test fixtures.

The fixtures properly handle singleton reset with autouse=True, ensuring test isolation. The dummy_config fixture provides a clean configuration object for testing.


145-158: Good dynamic expiration testing.

The time mocking approach properly validates the is_token_expired property behavior across time boundaries.

tests/unit/test_llama_stack_configuration.py (2)

28-83: Comprehensive vector_dbs section tests.

Tests properly cover empty input, preserving existing entries, adding new entries, default values, and merge behavior. Good coverage of the construct_vector_dbs_section function.


90-128: Good coverage for vector_io providers section.

Tests cover the key scenarios: empty config, preserving existing entries, adding new entries, and default provider type fallback.

src/client.py (4)

32-41: Clean separation of library vs service client initialization.

The early return pattern and clear branching between library and service modes is well-structured.


117-131: Library client reload implementation looks correct.

The reload_library_client properly reuses the stored _config_path to reinitialize the library client, ensuring env var changes (like refreshed tokens) are picked up.


133-164: Provider data update handles edge cases well.

The update_provider_data method properly handles:

  • Missing headers
  • Invalid JSON in existing header
  • Merging new values with existing provider data

This aligns with the Azure token refresh flow for service mode clients.


72-99: Use unique temporary filenames or document library-mode single-worker requirement.

The enriched config is written to a fixed path in the temp directory. If library mode is used with multiple workers (despite examples showing workers: 1), concurrent processes could overwrite each other's config files. Consider using tempfile.NamedTemporaryFile(delete=False) for unique filenames, or add validation to prevent library mode with workers > 1 and document this constraint.

docker-compose.yaml (1)

21-24: Azure Entra ID credential environment variables look correct.

The switch from AZURE_API_KEY to individual credential components (TENANT_ID, CLIENT_ID, CLIENT_SECRET) properly supports the new token management flow where the API key is obtained dynamically via client credentials grant.

Also applies to: 64-67

src/authorization/azure_token_manager.py (2)

34-43: Clean initialization and configuration setup.

The initialization properly sets default state, and set_config correctly stores the configuration with debug logging.


44-57: Properties are well-implemented.

The expiration check correctly handles both uninitialized state (_expires_on == 0) and actual expiry. Using SecretStr for the access token ensures it won't be accidentally logged.

examples/azure-run.yaml (2)

114-136: Well-structured storage configuration.

The storage section properly separates backends (kv_sqlite, sql_sqlite) from stores, with clear namespace isolation. The inference store configuration with max_write_queue_size: 10000 and num_writers: 4 provides reasonable defaults for handling concurrent writes.


101-110: LGTM!

The registered resources section provides a clean starter configuration with the rag-runtime tool group properly registered. Empty lists allow users to add their own resources as needed.

src/llama_stack_configuration.py (3)

1-19: LGTM! Module setup follows coding guidelines.

The imports use absolute paths, the logger is configured correctly using logging.getLogger(__name__), and the module docstring explains both usage modes.


46-48: Add return type annotation.

The function signature is missing the -> None return type annotation required by coding guidelines.

Proposed fix
 def setup_azure_entra_id_token(
     azure_config: dict[str, Any] | None, env_file: str
-) -> None:
+) -> None:

Actually, reviewing again - the return type is present. Disregard.


213-231: LGTM! In-place modification is documented.

The docstring clearly states ls_config is "modified in place", making the mutation behavior explicit and intentional. This is acceptable when documented.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good in overall. I'll add some comments later, but LGTM

for brag in byok_rag:
output.append(
{
"vector_db_id": brag.vector_db_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: Right now the registration of vector_stores from config is not working properly. AFAIK it should but it doesn't, since llama-stack picks up the registration of the vector store from the metadata store

for brag in byok_rag:
output.append(
{
"provider_id": "byok_" + brag.vector_db_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has also changed in LLS 0.3.x

  vector_io:
  - config: # Define the storage backend for RAG
      persistence:
        namespace: vector_io::faiss
        backend: kv_default
    provider_id: faiss
    provider_type: inline::faiss

this is the config for vector_io, then you need to register a storage backend:

storage:
  backends:
    kv_default: # Define the storage backend type for RAG, in this case registry and RAG are unified i.e. information on registered resources (e.g. models, vector_stores) are saved together with the RAG chunks
      type: kv_sqlite
      db_path: ~/.llama/storage/rag/kv_store.db

"urllib3==2.6.2",
# Used for agent card configuration
"PyYAML>=6.0.0",
"azure-core",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add a comment as other deps

@are-ces
Copy link
Contributor

are-ces commented Jan 14, 2026

LGTM overall, I have added some comments

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/llama_stack_configuration.py (1)

191-204: Same vector_db_id validation issue applies here.

The same validation gap exists in this function. If vector_db_id is missing or empty, the provider_id becomes "byok_" and db_path becomes ".llama/.db", both malformed. Apply consistent validation as suggested for construct_vector_dbs_section.

🤖 Fix all issues with AI agents
In `@src/llama_stack_configuration.py`:
- Around line 310-312: The except FileNotFoundError block currently logs the
missing config and returns, which doesn't signal failure to the CLI; replace the
silent return with a non-zero exit (e.g., call sys.exit(1) or raise
SystemExit(1)) so the process terminates with a failure status when args.config
is missing; ensure sys is imported if you choose sys.exit and keep the
logger.error("Config not found: %s", args.config) call.
♻️ Duplicate comments (2)
src/llama_stack_configuration.py (2)

106-108: Authentication failures are silently swallowed.

When ClientAuthenticationError or CredentialUnavailableError occurs, the function logs an error but continues execution. Callers have no way to know authentication failed, which may lead to runtime failures later when the missing AZURE_API_KEY causes API calls to fail.

Consider re-raising the exception after logging so callers can handle or fail fast.

Proposed fix: Re-raise after logging
     except (ClientAuthenticationError, CredentialUnavailableError) as e:
         logger.error("Azure Entra ID: Failed to generate token: %s", e)
+        raise

143-151: Missing vector_db_id validation results in malformed provider_id.

If brag.get("vector_db_id") returns None or empty string, the provider_id will be "byok_" (empty suffix), which may cause issues downstream. Consider validating and skipping entries with missing vector_db_id.

Proposed fix: Add validation
     # append new vector_dbs entries
     for brag in byok_rag:
+        vector_db_id = brag.get("vector_db_id")
+        if not vector_db_id:
+            logger.warning("BYOK RAG entry missing vector_db_id, skipping")
+            continue
         output.append(
             {
-                "vector_db_id": brag.get("vector_db_id", ""),
-                "provider_id": "byok_" + brag.get("vector_db_id", ""),
-                "embedding_model": brag.get("embedding_model", ""),
-                "embedding_dimension": brag.get("embedding_dimension"),
+                "vector_db_id": vector_db_id,
+                "provider_id": f"byok_{vector_db_id}",
+                "embedding_model": brag.get("embedding_model", ""),
+                "embedding_dimension": brag.get("embedding_dimension"),
             }
         )
🧹 Nitpick comments (8)
docs/providers.md (3)

106-106: Consider adding a note about the preview API version.

The example shows api_version: 2025-01-01-preview. Since this is a preview API version, consider adding a brief note that users should verify the latest stable or preview API version from Azure's documentation, or explain that preview versions may have different availability/SLA characteristics.


122-129: Clarify the pre-startup script mechanism.

The documentation mentions "runs a pre-startup script" but doesn't specify what script, where it's located, or how it's invoked. Consider adding more details about this mechanism or referencing the relevant code/configuration, as users deploying in service mode may need to understand or troubleshoot this step.


143-145: Consider specifying the token refresh safety margin.

The documentation mentions that "Lightspeed refreshes tokens proactively before expiration (with a safety margin)" but doesn't specify what this margin is. If this is a configurable value or a specific time window (e.g., 5 minutes before expiration), documenting it would help users understand the refresh behavior and plan accordingly.

src/llama_stack_configuration.py (5)

126-127: Docstring type mismatch with function signature.

The docstring states byok_rag (list[ByokRag]) but the function signature uses list[dict[str, Any]]. Update the docstring to match the actual type.

Proposed fix
-        byok_rag (list[ByokRag]): List of BYOK RAG definitions to be added to
+        byok_rag (list[dict[str, Any]]): List of BYOK RAG definitions to be added to

173-174: Docstring type mismatch with function signature.

Similar to construct_vector_dbs_section, the docstring states list[ByokRag] but the signature uses list[dict[str, Any]].

Proposed fix
-        byok_rag (list[ByokRag]): List of BYOK RAG specifications to convert
+        byok_rag (list[dict[str, Any]]): List of BYOK RAG specifications to convert

213-231: In-place modification is documented but consider returning a new dict.

Per coding guidelines, in-place parameter modification should be avoided. While the docstring documents this behavior, returning a modified copy would be more explicit and testable.

Additionally, use more Pythonic empty check: if not byok_rag: instead of if len(byok_rag) == 0:.

Proposed minor improvement
-    if len(byok_rag) == 0:
+    if not byok_rag:
         logger.info("BYOK RAG is not configured: skipping")
         return

255-256: Missing error handling for file read operations.

If the input file doesn't exist or contains invalid YAML, an unhandled exception will be raised. Consider adding try-except to handle FileNotFoundError and yaml.YAMLError with appropriate logging, similar to the error handling in main().

Proposed fix
     logger.info("Reading Llama Stack configuration from file %s", input_file)

-    with open(input_file, "r", encoding="utf-8") as file:
-        ls_config = yaml.safe_load(file)
+    try:
+        with open(input_file, "r", encoding="utf-8") as file:
+            ls_config = yaml.safe_load(file)
+    except FileNotFoundError:
+        logger.error("Input file not found: %s", input_file)
+        raise
+    except yaml.YAMLError as e:
+        logger.error("Invalid YAML in input file %s: %s", input_file, e)
+        raise

275-276: Docstring is minimal for CLI entry point.

Per coding guidelines, all functions require docstrings with brief descriptions. Consider expanding the docstring to describe the CLI behavior, arguments, and exit conditions.

Proposed improvement
 def main() -> None:
-    """CLI entry point."""
+    """CLI entry point for Llama Stack configuration enrichment.
+
+    Parses command-line arguments for config paths, loads the Lightspeed
+    configuration, applies environment variable substitution, and generates
+    an enriched Llama Stack configuration with Azure Entra ID and BYOK RAG
+    settings.
+
+    Exits with status 1 if the configuration file is not found.
+    """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between def9221 and 3e82d84.

📒 Files selected for processing (5)
  • docs/providers.md
  • pyproject.toml
  • scripts/llama-stack-entrypoint.sh
  • src/app/main.py
  • src/llama_stack_configuration.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • scripts/llama-stack-entrypoint.sh
  • src/app/main.py
  • pyproject.toml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = logging.getLogger(__name__) pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, using typing_extensions.Self for model validators
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Run uv run make format to auto-format code with black and ruff before completion
Run uv run make verify to run all linters (black, pyl...

Files:

  • src/llama_stack_configuration.py
🧠 Learnings (2)
📚 Learning: 2026-01-14T09:38:00.340Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:38:00.340Z
Learning: In the lightspeed-stack codebase, when provider_id is "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list, so defensive checks for StopIteration when using next() to find the Azure provider are not required.

Applied to files:

  • docs/providers.md
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • src/llama_stack_configuration.py
🧬 Code graph analysis (1)
src/llama_stack_configuration.py (2)
src/models/config.py (3)
  • config (325-342)
  • config (1306-1314)
  • dump (1607-1619)
src/lightspeed_stack.py (1)
  • main (69-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: library mode / ci
🔇 Additional comments (4)
docs/providers.md (1)

68-91: LGTM! Clear and well-structured documentation.

The introduction and configuration requirements are well-documented with clear tables and examples. The environment variable naming is now consistent throughout the document.

src/llama_stack_configuration.py (3)

1-19: LGTM!

Imports are well-organized, using absolute imports as per guidelines. The logger follows the logging.getLogger(__name__) pattern correctly.


22-38: LGTM!

The custom YAML dumper properly enforces block-style indentation. The docstring is well-written and the pylint disable is appropriately scoped.


46-48: Add return type annotation.

The function signature is missing the -> None return type annotation, which is required per coding guidelines for complete type annotations.

Proposed fix
 def setup_azure_entra_id_token(
     azure_config: dict[str, Any] | None, env_file: str
-) -> None:
+) -> None:

Note: The current code already has -> None on line 48, so this is actually correct. Let me re-check...

Actually, looking again at line 48, it does have -> None:. This is fine.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@asimurka asimurka force-pushed the azure-entra-id-final branch from 3e82d84 to afc72ca Compare January 14, 2026 12:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@pyproject.toml`:
- Around line 63-65: Fix the typo in the dependency comment by changing
"managment" to "management" and add consistent version constraints for the two
entries "azure-core" and "azure-identity" to match the project's dependency
version pattern (e.g., pin or use the same range format used elsewhere); update
the quoted entries "azure-core" and "azure-identity" in the pyproject.toml
dependency list to include the appropriate version specifiers and ensure
formatting matches surrounding dependencies.

In `@src/app/main.py`:
- Around line 44-50: The startup silently continues when token fetch fails
because AzureEntraIDManager.refresh_token() swallows authentication errors from
_retrieve_access_token() and returns None; update refresh_token() to raise on
auth failures (re-raise ClientAuthenticationError/CredentialUnavailableError
instead of returning None) and/or after calling
AzureEntraIDManager().refresh_token() in main.py immediately verify
AzureEntraIDManager().access_token is non-empty and, if empty, log the error and
fail startup (raise/exit) so the service does not continue with an empty
AZURE_API_KEY.
♻️ Duplicate comments (3)
src/llama_stack_configuration.py (3)

106-107: Authentication failures are silently swallowed.

When ClientAuthenticationError or CredentialUnavailableError occurs, the function logs an error but continues. Callers have no indication authentication failed, potentially leading to runtime failures when the missing AZURE_API_KEY causes Azure API calls to fail.

Proposed fix: Re-raise after logging
     except (ClientAuthenticationError, CredentialUnavailableError) as e:
         logger.error("Azure Entra ID: Failed to generate token: %s", e)
+        raise

143-150: Missing vector_db_id results in malformed provider_id.

If brag.get("vector_db_id") returns None or empty string, the provider_id will be "byok_" (empty suffix). Consider validating and skipping invalid entries.

Proposed fix: Add validation
     for brag in byok_rag:
+        vector_db_id = brag.get("vector_db_id")
+        if not vector_db_id:
+            logger.warning("BYOK RAG entry missing vector_db_id, skipping")
+            continue
         output.append(
             {
-                "vector_db_id": brag.get("vector_db_id", ""),
-                "provider_id": "byok_" + brag.get("vector_db_id", ""),
+                "vector_db_id": vector_db_id,
+                "provider_id": f"byok_{vector_db_id}",
                 "embedding_model": brag.get("embedding_model", ""),
                 "embedding_dimension": brag.get("embedding_dimension"),
             }
         )

310-312: CLI should exit with non-zero status on error.

When the config file is not found, the function logs an error and returns. For a CLI entry point, this should exit with a non-zero status code to indicate failure to shell scripts and CI pipelines.

Proposed fix
+import sys
+
 ...

     except FileNotFoundError:
         logger.error("Config not found: %s", args.config)
-        return
+        sys.exit(1)
🧹 Nitpick comments (4)
src/llama_stack_configuration.py (3)

139-140: Reference assignment mutates input config.

output = ls_config["vector_dbs"] creates a reference, not a copy. Subsequent output.append() calls modify ls_config["vector_dbs"] directly. This is inconsistent with the function's documented behavior of "building" a section and may cause unexpected side effects if callers reuse ls_config.

Proposed fix: Use a copy
     # fill-in existing vector_dbs entries
     if "vector_dbs" in ls_config:
-        output = ls_config["vector_dbs"]
+        output = list(ls_config["vector_dbs"])

187-188: Reference assignment mutates input config.

Same issue as construct_vector_dbs_section - output = ls_config["providers"]["vector_io"] is a reference, not a copy.

Proposed fix
     if "providers" in ls_config and "vector_io" in ls_config["providers"]:
-        output = ls_config["providers"]["vector_io"]
+        output = list(ls_config["providers"]["vector_io"])

83-100: Security consideration: Token written to plaintext .env file.

Writing the Azure access token to a .env file leaves credentials in plaintext on disk. This may be acceptable for development/containerized environments, but consider:

  1. Setting restrictive file permissions (e.g., 0600)
  2. Documenting this behavior and its security implications
Optional: Set restrictive permissions
+        import stat
+
         with open(env_file, "w", encoding="utf-8") as f:
             f.writelines(lines)
+        os.chmod(env_file, stat.S_IRUSR | stat.S_IWUSR)  # 0600
tests/unit/test_llama_stack_configuration.py (1)

132-132: The test configuration file tests/configuration/run.yaml exists in the repository. However, consider whether reducing external file dependencies would improve test maintainability. If desired, create a minimal test configuration inline or via a fixture instead of relying on the committed file.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e82d84 and afc72ca.

📒 Files selected for processing (6)
  • docs/providers.md
  • pyproject.toml
  • scripts/llama-stack-entrypoint.sh
  • src/app/main.py
  • src/llama_stack_configuration.py
  • tests/unit/test_llama_stack_configuration.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • scripts/llama-stack-entrypoint.sh
  • docs/providers.md
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = logging.getLogger(__name__) pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, using typing_extensions.Self for model validators
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Run uv run make format to auto-format code with black and ruff before completion
Run uv run make verify to run all linters (black, pyl...

Files:

  • src/llama_stack_configuration.py
  • tests/unit/test_llama_stack_configuration.py
  • src/app/main.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest for all unit and integration tests, not unittest
Use pytest-mock for AsyncMock objects in tests
Use MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") pattern for authentication mocks in tests

Files:

  • tests/unit/test_llama_stack_configuration.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/unit/**/*.py: Unit tests require 60% code coverage
Write unit tests covering new functionality before completion

Files:

  • tests/unit/test_llama_stack_configuration.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/main.py
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: Check pyproject.toml for supported Python versions before development
Always check pyproject.toml for existing dependencies before adding new ones
Always verify current library versions in pyproject.toml rather than assuming versions
Use pylint with source-roots = "src" configuration

Files:

  • pyproject.toml
🧠 Learnings (2)
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • src/llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use `logger.error()` for serious problems that prevented function execution

Applied to files:

  • src/llama_stack_configuration.py
🧬 Code graph analysis (3)
src/llama_stack_configuration.py (1)
src/models/config.py (3)
  • config (325-342)
  • config (1306-1314)
  • dump (1607-1619)
tests/unit/test_llama_stack_configuration.py (1)
src/llama_stack_configuration.py (3)
  • generate_configuration (239-267)
  • construct_vector_dbs_section (115-157)
  • construct_vector_io_providers_section (160-210)
src/app/main.py (2)
src/authorization/azure_token_manager.py (3)
  • AzureEntraIDManager (22-100)
  • set_config (39-42)
  • refresh_token (59-74)
src/configuration.py (2)
  • configuration (95-106)
  • azure_entra_id (345-349)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: library mode / ci
🔇 Additional comments (4)
src/app/main.py (1)

12-12: LGTM!

Import of AzureEntraIDManager is correctly added for the new Azure Entra ID integration.

tests/unit/test_llama_stack_configuration.py (3)

1-13: LGTM on test structure and imports.

Tests are correctly structured with pytest and properly import the functions under test.


28-74: Good coverage for construct_vector_dbs_section.

Tests cover empty input, preserving existing entries, adding new entries, and merging - all key scenarios.


81-110: Good coverage for construct_vector_io_providers_section.

Tests cover empty input, preserving existing entries, and adding new entries.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@asimurka asimurka force-pushed the azure-entra-id-final branch from afc72ca to 50bc95d Compare January 14, 2026 13:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/llama_stack_configuration.py (2)

142-154: In-place modification of input parameter.

Line 143 assigns a reference to the existing list, so subsequent append() calls modify ls_config["vector_dbs"] in place. Per coding guidelines, avoid in-place parameter modification; return new data structures instead.

Additionally, empty vector_db_id produces malformed provider_id of "byok_" (previously flagged).

Proposed fix
     output = []

     # fill-in existing vector_dbs entries
     if "vector_dbs" in ls_config:
-        output = ls_config["vector_dbs"]
+        output = list(ls_config["vector_dbs"])

     # append new vector_dbs entries
     for brag in byok_rag:
+        vector_db_id = brag.get("vector_db_id")
+        if not vector_db_id:
+            logger.warning("BYOK RAG entry missing vector_db_id, skipping")
+            continue
         output.append(
             {
-                "vector_db_id": brag.get("vector_db_id", ""),
-                "provider_id": "byok_" + brag.get("vector_db_id", ""),
-                "embedding_model": brag.get("embedding_model", ""),
+                "vector_db_id": vector_db_id,
+                "provider_id": f"byok_{vector_db_id}",
+                "embedding_model": brag.get("embedding_model", ""),
                 "embedding_dimension": brag.get("embedding_dimension"),
             }
         )

189-207: Same in-place modification and validation issues.

Similar to construct_vector_dbs_section: line 191 assigns a reference causing in-place modification, and empty vector_db_id produces malformed paths like ".llama/.db".

Proposed fix
     output = []

     # fill-in existing vector_io entries
     if "providers" in ls_config and "vector_io" in ls_config["providers"]:
-        output = ls_config["providers"]["vector_io"]
+        output = list(ls_config["providers"]["vector_io"])

     # append new vector_io entries
     for brag in byok_rag:
+        vector_db_id = brag.get("vector_db_id")
+        if not vector_db_id:
+            continue  # Already warned in construct_vector_dbs_section
         output.append(
             {
-                "provider_id": "byok_" + brag.get("vector_db_id", ""),
+                "provider_id": f"byok_{vector_db_id}",
                 "provider_type": brag.get("rag_type", "inline::faiss"),
                 "config": {
                     "kvstore": {
-                        "db_path": ".llama/" + brag.get("vector_db_id", "") + ".db",
+                        "db_path": f".llama/{vector_db_id}.db",
                         "namespace": None,
                         "type": "sqlite",
                     }
                 },
             }
         )
🤖 Fix all issues with AI agents
In `@src/app/endpoints/streaming_query.py`:
- Around line 895-916: After calling AzureEntraIDManager().refresh_token(),
verify the refresh succeeded and that AzureEntraIDManager().access_token
contains a non-empty/valid token before proceeding; if refresh failed or
access_token is empty, log/error and skip updating the client or raise so you
don't call AsyncLlamaStackClientHolder().update_provider_data with an invalid
token. Specifically, in the block where refresh_token() is invoked, check the
return value or AzureEntraIDManager().access_token.get_secret_value() for
presence/validity, only call
AsyncLlamaStackClientHolder().reload_library_client() or
update_provider_data(...) when valid, and handle/log the failure path (do not
continue to call providers.list() / update_provider_data with an empty token).

In `@src/authorization/azure_token_manager.py`:
- Around line 59-71: The tests are awaiting a synchronous method: keep
refresh_token as a regular def on AzureTokenManager (refresh_token) and update
the tests to stop using await; specifically remove the await usages in
tests/unit/authorization/test_azure_token_manager.py (the calls at lines ~84,
~114, ~142) so they call token_manager.refresh_token() directly, or
alternatively convert refresh_token to async def and adjust all production call
sites (main.py, query.py, streaming_query.py) to await it and run the blocking
_retrieve_access_token()/ClientSecretCredential.get_token() in an executor —
pick one approach and apply consistently (recommended: keep refresh_token
synchronous and change the tests to remove await).

In `@src/llama_stack_configuration.py`:
- Around line 102-107: The code writes the Azure token to env_file but doesn't
set it in the running process, so calls like generate_configuration that rely on
os.environ won't see AZURE_API_KEY; after writing the file (the block that opens
env_file and calls logger.info) set os.environ["AZURE_API_KEY"] = <token_value>
(the same value you wrote into lines) so the current process environment matches
the .env on disk, and keep the logger.info call as-is to record the write.
♻️ Duplicate comments (4)
pyproject.toml (1)

63-65: Add version constraints for Azure dependencies.

The azure-core and azure-identity packages lack version constraints, which is inconsistent with the project's dependency versioning pattern (most dependencies specify minimum versions like >=x.y.z).

Suggested fix
     # Used for Azure Entra ID token management
-    "azure-core",
-    "azure-identity",
+    "azure-core>=1.30.0",
+    "azure-identity>=1.15.0",
src/app/main.py (1)

44-48: Token retrieval failures are silently ignored during startup.

If refresh_token() fails to obtain a token (e.g., due to ClientAuthenticationError), it silently returns without raising an exception, and the service continues with an empty AZURE_API_KEY. This will cause all subsequent Azure API calls to fail with authentication errors.

Consider verifying the token was obtained after refresh:

Suggested fix
     azure_config = configuration.configuration.azure_entra_id
     if azure_config is not None:
         AzureEntraIDManager().set_config(azure_config)
         AzureEntraIDManager().refresh_token()
+        if not AzureEntraIDManager().access_token.get_secret_value():
+            logger.error("Failed to obtain Azure access token during startup")
+            raise RuntimeError("Azure Entra ID configured but token acquisition failed")
src/llama_stack_configuration.py (2)

109-111: Authentication failures are silently swallowed.

This was previously flagged. When ClientAuthenticationError or CredentialUnavailableError occurs, the function logs an error but continues execution. Callers have no way to know authentication failed, which may lead to runtime failures later when the missing AZURE_API_KEY causes API calls to fail.


309-314: CLI should handle errors and exit with non-zero status.

The CLI entry point lacks error handling for file operations. When config or input files are not found, the function should log the error and exit with a non-zero status code to signal failure to shell scripts and CI pipelines. This was previously flagged.

Proposed fix
+import sys
+
 ...

 def main() -> None:
     """CLI entry point."""
     ...
     args = parser.parse_args()

+    try:
         with open(args.config, "r", encoding="utf-8") as f:
             config = yaml.safe_load(f)
             config = replace_env_vars(config)
-
-    generate_configuration(args.input, args.output, config, args.env_file)
+        generate_configuration(args.input, args.output, config, args.env_file)
+    except FileNotFoundError as e:
+        logger.error("File not found: %s", e.filename)
+        sys.exit(1)
+    except Exception as e:
+        logger.error("Failed to generate configuration: %s", e)
+        sys.exit(1)
🧹 Nitpick comments (3)
scripts/llama-stack-entrypoint.sh (1)

7-10: Path collision risk remains between INPUT_CONFIG and ENRICHED_CONFIG.

When LLAMA_STACK_CONFIG is not set, both INPUT_CONFIG and ENRICHED_CONFIG default to /opt/app-root/run.yaml. While the ENRICHMENT_FAILED flag on line 28 mitigates incorrect "enriched" reporting, it would be clearer to use distinct paths.

Suggested fix
 INPUT_CONFIG="${LLAMA_STACK_CONFIG:-/opt/app-root/run.yaml}"
-ENRICHED_CONFIG="/opt/app-root/run.yaml"
+ENRICHED_CONFIG="/opt/app-root/run-enriched.yaml"
 LIGHTSPEED_CONFIG="${LIGHTSPEED_CONFIG:-/opt/app-root/lightspeed-stack.yaml}"
 ENV_FILE="/opt/app-root/.env"
src/llama_stack_configuration.py (2)

223-225: Minor: Prefer Pythonic empty check.

Use if not byok_rag: instead of if len(byok_rag) == 0: for checking empty lists.

-    if len(byok_rag) == 0:
+    if not byok_rag:
         logger.info("BYOK RAG is not configured: skipping")
         return

258-270: Missing error handling for file operations.

The function does not handle FileNotFoundError for input_file or PermissionError/OSError for write operations. For a function intended to be called as a module API, consider adding error handling or documenting potential exceptions in the docstring.

Proposed approach
 def generate_configuration(
     input_file: str,
     output_file: str,
     config: dict[str, Any],
     env_file: str = ".env",
 ) -> None:
     """Generate enriched Llama Stack configuration for service/container mode.

     Args:
         input_file: Path to input Llama Stack config
         output_file: Path to write enriched config
         config: Lightspeed config dict (from YAML)
         env_file: Path to .env file
+
+    Raises:
+        FileNotFoundError: If input_file does not exist
+        OSError: If output_file cannot be written
     """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between afc72ca and 50bc95d.

📒 Files selected for processing (9)
  • docs/providers.md
  • pyproject.toml
  • scripts/llama-stack-entrypoint.sh
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/app/main.py
  • src/authorization/azure_token_manager.py
  • src/llama_stack_configuration.py
  • tests/unit/test_llama_stack_configuration.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • docs/providers.md
  • src/app/endpoints/query.py
  • tests/unit/test_llama_stack_configuration.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = logging.getLogger(__name__) pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, using typing_extensions.Self for model validators
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Run uv run make format to auto-format code with black and ruff before completion
Run uv run make verify to run all linters (black, pyl...

Files:

  • src/app/endpoints/streaming_query.py
  • src/authorization/azure_token_manager.py
  • src/llama_stack_configuration.py
  • src/app/main.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/main.py
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

pyproject.toml: Check pyproject.toml for supported Python versions before development
Always check pyproject.toml for existing dependencies before adding new ones
Always verify current library versions in pyproject.toml rather than assuming versions
Use pylint with source-roots = "src" configuration

Files:

  • pyproject.toml
🧠 Learnings (6)
📚 Learning: 2026-01-14T09:38:00.340Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:38:00.340Z
Learning: In the lightspeed-stack codebase, when provider_id is "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list, so defensive checks for StopIteration when using next() to find the Azure provider are not required.

Applied to files:

  • src/app/endpoints/streaming_query.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • src/app/endpoints/streaming_query.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to pyproject.toml : Always check `pyproject.toml` for existing dependencies before adding new ones

Applied to files:

  • pyproject.toml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Never commit secrets/keys; use environment variables for sensitive data

Applied to files:

  • src/authorization/azure_token_manager.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • src/llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use `logger.error()` for serious problems that prevented function execution

Applied to files:

  • src/llama_stack_configuration.py
🧬 Code graph analysis (4)
src/app/endpoints/streaming_query.py (2)
src/authorization/azure_token_manager.py (5)
  • AzureEntraIDManager (22-97)
  • is_entra_id_configured (45-47)
  • is_token_expired (50-52)
  • refresh_token (59-71)
  • access_token (55-57)
src/client.py (4)
  • AsyncLlamaStackClientHolder (21-164)
  • is_library_client (28-30)
  • reload_library_client (117-131)
  • update_provider_data (133-164)
src/authorization/azure_token_manager.py (3)
src/configuration.py (1)
  • configuration (95-106)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1482-1493)
src/utils/types.py (1)
  • Singleton (43-57)
src/llama_stack_configuration.py (2)
src/models/config.py (3)
  • config (325-342)
  • config (1306-1314)
  • dump (1607-1619)
src/lightspeed_stack.py (1)
  • main (69-116)
src/app/main.py (3)
src/authorization/azure_token_manager.py (3)
  • AzureEntraIDManager (22-97)
  • set_config (39-42)
  • refresh_token (59-71)
tests/integration/test_rh_identity_integration.py (1)
  • client (17-46)
src/configuration.py (2)
  • configuration (95-106)
  • azure_entra_id (345-349)
🪛 GitHub Actions: Unit tests
src/authorization/azure_token_manager.py

[error] 1-1: AzureEntraIDTokenManager.refresh_token is not awaitable / returns None, causing TypeError when awaited during pytest run. Ensure the method is defined as an async function or returns an awaitable.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: build-pr
🔇 Additional comments (5)
src/app/endpoints/streaming_query.py (1)

52-52: LGTM!

Import follows the project's absolute import convention.

src/authorization/azure_token_manager.py (2)

22-42: LGTM!

The singleton pattern is correctly implemented using the project's Singleton metaclass. Configuration handling is clean with proper type annotations and logging.


73-80: LGTM!

Token update logic correctly applies the expiration leeway and provides clear logging of the refresh event.

src/llama_stack_configuration.py (2)

1-20: LGTM on imports and module structure.

The module docstring clearly explains the dual usage modes (script vs module), and imports follow the coding guidelines with logger = logging.getLogger(__name__) pattern.


23-39: LGTM!

The custom YAML dumper properly enforces block-style indentation with clear documentation.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@asimurka asimurka force-pushed the azure-entra-id-final branch 3 times, most recently from fefbaca to 22841de Compare January 14, 2026 14:18
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/lightspeed_stack.py (2)

26-40: Stale docstring references removed CLI options.

The docstring documents -g/--generate-llama-stack-configuration, -i/--input-config-file, and -o/--output-config-file options (lines 33-36), but these have been removed from the parser. Update the docstring to match the actual arguments.

📝 Suggested docstring fix
 def create_argument_parser() -> ArgumentParser:
     """Create and configure argument parser object.
 
     The parser includes these options:
     - -v / --verbose: enable verbose output
     - -d / --dump-configuration: dump the loaded configuration to JSON and exit
     - -c / --config: path to the configuration file (default "lightspeed-stack.yaml")
-    - -g / --generate-llama-stack-configuration: generate a Llama Stack
-                                                 configuration from the service configuration
-    - -i / --input-config-file: Llama Stack input configuration filename (default "run.yaml")
-    - -o / --output-config-file: Llama Stack output configuration filename (default "run_.yaml")
 
     Returns:
         Configured ArgumentParser for parsing the service CLI options.
     """

69-86: main() docstring references removed functionality.

The docstring mentions --generate-llama-stack-configuration (lines 77-79) and "Llama Stack generation fails" (line 84), but this feature has been moved to src/llama_stack_configuration.py. Update to reflect current behavior.

📝 Suggested docstring fix
     """Entry point to the web service.
 
     Start the Lightspeed Core Stack service process based on CLI flags and configuration.
 
     Parses command-line arguments, loads the configured settings, and then:
     - If --dump-configuration is provided, writes the active configuration to
       configuration.json and exits (exits with status 1 on failure).
-    - If --generate-llama-stack-configuration is provided, generates and stores
-      the Llama Stack configuration to the specified output file and exits
-      (exits with status 1 on failure).
     - Otherwise, sets LIGHTSPEED_STACK_CONFIG_PATH for worker processes, starts
       the quota scheduler, and starts the Uvicorn web service.
 
     Raises:
-        SystemExit: when configuration dumping or Llama Stack generation fails
-                    (exits with status 1).
+        SystemExit: when configuration dumping fails (exits with status 1).
     """
🤖 Fix all issues with AI agents
In `@src/app/endpoints/query.py`:
- Around line 318-339: The token refresh path can proceed with a missing/empty
token and set "None" as a string for api_base; after calling
AzureEntraIDManager().refresh_token() verify the refresh succeeded and that
AzureEntraIDManager().access_token (or its secret value) is non-empty before
updating the client via AsyncLlamaStackClientHolder().update_provider_data; if
refresh failed, stop and surface/log/raise an error instead of updating the
provider data. Also sanitize azure_config.get("api_base") by checking for None
and only include it (or pass None/not set) rather than converting None to the
string "None" when building the dict for update_provider_data.

In `@test.containerfile`:
- Around line 13-17: The Dockerfile adds execute permission to the Python module
llama_stack_configuration.py even though it’s invoked via the interpreter;
update the RUN line so only enrich-entrypoint.sh is made executable (remove +x
for llama_stack_configuration.py) and still chown both files (or chown them in a
separate command); specifically modify the chmod/chown command that references
/opt/app-root/enrich-entrypoint.sh and
/opt/app-root/llama_stack_configuration.py so that chmod applies only to
enrich-entrypoint.sh while chown continues to set ownership for both files.
♻️ Duplicate comments (10)
src/app/endpoints/streaming_query.py (1)

895-916: Token refresh failure not handled; duplicated logic with query.py.

This block has the same issue as query.py: if refresh_token() fails silently, the code proceeds with a potentially empty token. Additionally, this logic is duplicated verbatim from query.py lines 318-339.

Consider:

  1. Validating the token after refresh (see suggestion for query.py)
  2. Extracting the token refresh logic into a shared utility to avoid duplication
♻️ Suggested refactor to shared utility

Create a helper in a shared module (e.g., utils/azure_token.py):

async def refresh_azure_token_if_needed(
    provider_id: str,
    client: AsyncLlamaStackClient,
    client_holder: AsyncLlamaStackClientHolder,
) -> AsyncLlamaStackClient:
    """Refresh Azure token if expired and return updated client."""
    if (
        provider_id != "azure"
        or not AzureEntraIDManager().is_entra_id_configured
        or not AzureEntraIDManager().is_token_expired
    ):
        return client

    AzureEntraIDManager().refresh_token()
    
    if not AzureEntraIDManager().access_token.get_secret_value():
        raise ValueError("Failed to refresh Azure access token")

    if client_holder.is_library_client:
        return await client_holder.reload_library_client()
    
    azure_config = next(
        p.config for p in await client.providers.list()
        if p.provider_type == "remote::azure"
    )
    api_base = azure_config.get("api_base")
    if not api_base:
        raise ValueError("Azure provider config missing api_base")
    
    return client_holder.update_provider_data({
        "azure_api_key": AzureEntraIDManager().access_token.get_secret_value(),
        "azure_api_base": str(api_base),
    })
src/app/main.py (1)

44-47: Token retrieval failures are silently ignored during startup.

The refresh_token() method catches ClientAuthenticationError and CredentialUnavailableError internally and returns None, allowing startup to continue with an empty AZURE_API_KEY. Subsequent Azure API calls will fail with confusing errors.

Consider validating the token after refresh and failing startup if authentication fails:

🔧 Suggested validation
     azure_config = configuration.configuration.azure_entra_id
     if azure_config is not None:
         AzureEntraIDManager().set_config(azure_config)
         AzureEntraIDManager().refresh_token()
+        
+        if not AzureEntraIDManager().access_token.get_secret_value():
+            logger.error("Failed to obtain Azure Entra ID access token during startup")
+            raise RuntimeError(
+                "Azure Entra ID authentication failed. "
+                "Check tenant_id, client_id, and client_secret configuration."
+            )
src/authorization/azure_token_manager.py (1)

59-71: Missing failure signal prevents callers from handling token refresh errors.

_retrieve_access_token() logs a warning when authentication fails and returns None, but refresh_token() silently accepts this and returns without signaling to callers. All three call sites in streaming_query.py, main.py, and query.py do not (and cannot) check whether refresh succeeded, allowing the code to proceed with a potentially stale or missing token.

While error visibility exists via warning logs, the method design prevents callers from handling failures gracefully. Returning a success indicator or raising an exception on failure would allow appropriate error handling.

🔧 Option 1: Return success indicator
-    def refresh_token(self) -> None:
+    def refresh_token(self) -> bool:
     """Refresh the cached Azure access token.
 
+        Returns:
+            bool: True if token was successfully refreshed, False otherwise.
+
         Raises:
             ValueError: If Entra ID configuration has not been set.
         """
         if self._entra_id_config is None:
             raise ValueError("Azure Entra ID configuration not set")
 
         logger.info("Refreshing Azure access token")
         token_obj = self._retrieve_access_token()
         if token_obj:
             self._update_access_token(token_obj.token, token_obj.expires_on)
+            return True
+        return False
🔧 Option 2: Raise on failure
         token_obj = self._retrieve_access_token()
         if token_obj:
             self._update_access_token(token_obj.token, token_obj.expires_on)
+        else:
+            raise RuntimeError("Failed to retrieve Azure access token")
tests/unit/test_llama_stack_configuration.py (1)

118-183: Add test coverage for untested functions to meet 60% code coverage requirement.

The test file covers construct_vector_dbs_section, construct_vector_io_providers_section, and generate_configuration, but missing tests for:

  1. setup_azure_entra_id_token - Azure token generation and .env file handling
  2. enrich_byok_rag - orchestration function for BYOK RAG enrichment
  3. main() - CLI entry point

Key scenarios to add:

  • setup_azure_entra_id_token: skips when AZURE_API_KEY set, skips when config is None, handles missing credentials, writes token to file, catches auth errors
  • enrich_byok_rag: returns early when list empty, modifies config in place
  • main(): argument parsing, config loading, FileNotFoundError handling

Based on learnings, unit tests require 60% code coverage.

examples/azure-run.yaml (1)

71-77: Update Azure OpenAI API version to a current release.

The inference provider correctly uses ${env.AZURE_API_KEY} for credential injection, aligning with Azure Entra ID token management. However, api_version: 2025-01-01-preview is outdated. Consider updating to a more recent version like 2025-04-01-preview.

src/llama_stack_configuration.py (4)

109-110: Authentication failures are silently swallowed.

When ClientAuthenticationError or CredentialUnavailableError occurs, the function logs an error but continues. Callers have no way to know authentication failed, which may cause API calls to fail later due to missing AZURE_API_KEY.

Consider re-raising the exception or returning a status indicator. As per coding guidelines, use logger.error() for serious problems that prevented function execution - but the function should also propagate the failure.


102-107: Token written to file but not set in current process environment.

After writing the token to .env, the AZURE_API_KEY won't be available via os.environ in the current process. If downstream code needs the token during the same execution, it won't find it.

♻️ Proposed fix
         with open(env_file, "w", encoding="utf-8") as f:
             f.writelines(lines)

+        # Make token available to current process
+        os.environ["AZURE_API_KEY"] = token.token
+
         logger.info(
             "Azure Entra ID: Access token set in env and written to %s", env_file
         )

146-152: Missing vector_db_id produces malformed provider_id.

If brag.get("vector_db_id") returns None or empty string, provider_id becomes "byok_" which is likely invalid. Consider validating vector_db_id before appending.

♻️ Proposed fix with validation
     for brag in byok_rag:
+        vector_db_id = brag.get("vector_db_id")
+        if not vector_db_id:
+            logger.warning("BYOK RAG entry missing vector_db_id, skipping")
+            continue
         output.append(
             {
-                "vector_db_id": brag.get("vector_db_id", ""),
-                "provider_id": "byok_" + brag.get("vector_db_id", ""),
+                "vector_db_id": vector_db_id,
+                "provider_id": f"byok_{vector_db_id}",
                 "embedding_model": brag.get("embedding_model", ""),
                 "embedding_dimension": brag.get("embedding_dimension"),
             }
         )

309-313: CLI lacks error handling for missing config file.

The main() function opens args.config without a try-except block. If the file doesn't exist, it will raise an unhandled FileNotFoundError. For a CLI entry point, this should catch the exception and exit with a non-zero status code.

♻️ Proposed fix
+import sys
+
 ...

-    with open(args.config, "r", encoding="utf-8") as f:
-        config = yaml.safe_load(f)
-        config = replace_env_vars(config)
+    try:
+        with open(args.config, "r", encoding="utf-8") as f:
+            config = yaml.safe_load(f)
+            config = replace_env_vars(config)
+    except FileNotFoundError:
+        logger.error("Config not found: %s", args.config)
+        sys.exit(1)

     generate_configuration(args.input, args.output, config, args.env_file)
src/client.py (1)

163-163: Remove copy() method call — it does not exist on AsyncLlamaStackClient.

The code at line 163 calls self._lsc.copy(set_default_headers=updated_headers), but AsyncLlamaStackClient v0.3.5 does not expose a copy() method. Use with_options() or create a new AsyncLlamaStackClient instance with the updated headers instead. The # type: ignore comment masks this as a type error rather than fixing the underlying issue.

🧹 Nitpick comments (10)
src/configuration.py (1)

344-349: LGTM!

The property follows the established pattern in the class with proper LogicError guard and correct return type. The implementation is consistent with other configuration accessors.

Optional: For full alignment with the Google Python docstring conventions referenced in the coding guidelines, consider expanding the docstring to include Returns and Raises sections, similar to other properties like configuration or service_configuration.

📝 Optional docstring enhancement
     `@property`
     def azure_entra_id(self) -> Optional[AzureEntraIdConfiguration]:
-        """Return Azure Entra ID configuration, or None if not provided."""
+        """Return Azure Entra ID configuration.
+
+        Returns:
+            Optional[AzureEntraIdConfiguration]: The Azure Entra ID
+            configuration if provided, otherwise None.
+
+        Raises:
+            LogicError: If the configuration has not been loaded.
+        """
         if self._configuration is None:
             raise LogicError("logic error: configuration is not loaded")
         return self._configuration.azure_entra_id
Makefile (1)

20-23: Consider adding validation for .env file and AZURE_API_KEY.

The run-llama-stack target reads AZURE_API_KEY from .env silently. If the file doesn't exist or the key is missing, AZURE_API_KEY will be empty, which may cause the llama stack to fail with a confusing error.

Consider adding a check:

🛠️ Suggested improvement for robustness
 run-llama-stack: ## Start Llama Stack with enriched config (for local service mode)
 	uv run src/llama_stack_configuration.py -c $(CONFIG) -i $(LLAMA_STACK_CONFIG) -o $(LLAMA_STACK_CONFIG) && \
+	`@if` [ ! -f .env ]; then echo "Error: .env file not found"; exit 1; fi && \
+	`@if` ! grep -q '^AZURE_API_KEY=' .env; then echo "Error: AZURE_API_KEY not found in .env"; exit 1; fi && \
 	AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env | cut -d'=' -f2-) \
 	uv run llama stack run $(LLAMA_STACK_CONFIG)

Alternatively, document the .env requirement in the target's help comment.

.github/workflows/e2e_tests_providers.yaml (1)

55-68: Echo-based YAML modification is safe in this context.

All test configuration files properly end with newlines, eliminating the concatenation risk mentioned. The code defensively adds a blank line before the azure_entra_id: block, providing additional safety. The approach is acceptable for these controlled test configurations.

For future robustness or when handling untrusted YAML files, consider yq as an alternative, but this is not required here.

examples/lightspeed-stack-azure-entraid-service.yaml (1)

16-17: Consider using environment variable for api_key.

The api_key: xyzzy appears to be a placeholder. For consistency with the Azure Entra ID credentials (which use ${env.VAR} syntax), consider using an environment variable reference here as well to avoid accidentally committing real credentials.

💡 Suggested change
   url: http://localhost:8321
-  api_key: xyzzy
+  api_key: ${env.LLAMA_STACK_API_KEY:xyzzy}
tests/unit/test_configuration.py (1)

849-877: Test validates incomplete Azure Entra ID config correctly.

Consider adding a test case to test_default_configuration() for the azure_entra_id property to maintain consistency with other configuration properties.

♻️ Optional: Add consistency test

Add to test_default_configuration():

with pytest.raises(Exception, match="logic error: configuration is not loaded"):
    # try to read property
    _ = cfg.azure_entra_id  # pylint: disable=pointless-statement
tests/unit/test_client.py (2)

13-16: Fixture missing cleanup and type hint.

The fixture resets singleton state before tests but doesn't clean up after. Follow the pattern from tests/unit/test_configuration.py:

♻️ Suggested fix
+from typing import Generator
+
+
 `@pytest.fixture`(autouse=True)
-def reset_singleton() -> None:
+def reset_singleton() -> Generator:
     """Reset singleton state between tests."""
     Singleton._instances = {}
+    yield
+    Singleton._instances = {}

101-107: Test accesses internal _custom_headers attribute.

Setting _custom_headers directly is fragile as it depends on internal httpx/client implementation. The test works because this eventually gets merged into default_headers, but consider using a cleaner approach:

♻️ Alternative: Use update_provider_data to set initial state
-    # Pre-populate with existing provider data via headers
-    original_client._custom_headers["X-LlamaStack-Provider-Data"] = json.dumps(
-        {
-            "existing_field": "keep_this",
-            "azure_api_key": "old_token",
-        }
-    )
-
-    updated_client = holder.update_provider_data(
+    # First call to set initial provider data
+    holder.update_provider_data(
+        {
+            "existing_field": "keep_this",
+            "azure_api_key": "old_token",
+        }
+    )
+    intermediate_client = holder.get_client()
+
+    # Second call to update - this is what we're testing
+    updated_client = holder.update_provider_data(
         {
             "azure_api_key": "new_token",
             "azure_api_base": "https://new.example.com",
         }
     )
+
+    # Returns new client and updates holder
+    assert updated_client is not intermediate_client
src/authorization/azure_token_manager.py (1)

95-97: Exception details not logged.

The warning message doesn't include the exception details, making troubleshooting difficult. Consider logging the exception:

♻️ Log exception details
-        except (ClientAuthenticationError, CredentialUnavailableError):
-            logger.warning("Failed to retrieve Azure access token")
+        except (ClientAuthenticationError, CredentialUnavailableError) as e:
+            logger.warning("Failed to retrieve Azure access token: %s", e)
             return None
src/client.py (2)

88-89: Fixed temp file path may cause issues in concurrent deployments.

Using tempfile.gettempdir() with a fixed filename llama_stack_enriched_config.yaml could cause race conditions if multiple processes enrich configs simultaneously. Consider using tempfile.NamedTemporaryFile with delete=False for unique paths.

♻️ Proposed fix using unique temp file
-        enriched_path = os.path.join(
-            tempfile.gettempdir(), "llama_stack_enriched_config.yaml"
-        )
+        # Use unique temp file to avoid race conditions
+        fd, enriched_path = tempfile.mkstemp(
+            suffix=".yaml", prefix="llama_stack_enriched_"
+        )
+        os.close(fd)  # Close fd, will reopen with encoding

79-84: Silent fallback on config read errors may mask issues.

When YAML read fails, the function logs a warning and returns the original path, but the caller proceeds as if BYOK RAG was configured. This could lead to confusing runtime behavior where RAG features silently don't work.

Consider logging at error level or raising the exception since BYOK RAG was explicitly configured but enrichment failed.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fefbaca and 22841de.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (38)
  • .github/workflows/e2e_tests.yaml
  • .github/workflows/e2e_tests_providers.yaml
  • Makefile
  • README.md
  • docker-compose-library.yaml
  • docker-compose.yaml
  • docs/providers.md
  • examples/azure-run.yaml
  • examples/lightspeed-stack-azure-entraid-lib.yaml
  • examples/lightspeed-stack-azure-entraid-service.yaml
  • pyproject.toml
  • scripts/llama-stack-entrypoint.sh
  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/main.py
  • src/authorization/azure_token_manager.py
  • src/client.py
  • src/configuration.py
  • src/lightspeed_stack.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • test.containerfile
  • tests/e2e/configs/run-azure.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/features/steps/info.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_client.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
💤 Files with no reviewable changes (6)
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • src/app/endpoints/query_v2.py
✅ Files skipped from review due to trivial changes (1)
  • tests/e2e/configs/run-azure.yaml
🚧 Files skipped from review as they are similar to previous changes (12)
  • scripts/llama-stack-entrypoint.sh
  • tests/unit/authentication/test_api_key_token.py
  • README.md
  • docs/providers.md
  • src/models/config.py
  • tests/e2e/features/steps/info.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/models/config/test_dump_configuration.py
  • .github/workflows/e2e_tests.yaml
  • pyproject.toml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • docker-compose-library.yaml
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = logging.getLogger(__name__) pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, using typing_extensions.Self for model validators
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Run uv run make format to auto-format code with black and ruff before completion
Run uv run make verify to run all linters (black, pyl...

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/lightspeed_stack.py
  • src/client.py
  • tests/unit/test_llama_stack_configuration.py
  • src/app/main.py
  • src/configuration.py
  • tests/unit/test_client.py
  • src/authorization/azure_token_manager.py
  • tests/unit/test_configuration.py
  • src/llama_stack_configuration.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/app/main.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest for all unit and integration tests, not unittest
Use pytest-mock for AsyncMock objects in tests
Use MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") pattern for authentication mocks in tests

Files:

  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/test_client.py
  • tests/unit/test_configuration.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/unit/**/*.py: Unit tests require 60% code coverage
Write unit tests covering new functionality before completion

Files:

  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/test_client.py
  • tests/unit/test_configuration.py
🧠 Learnings (13)
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2026-01-14T09:38:00.340Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:38:00.340Z
Learning: In the lightspeed-stack codebase, when provider_id is "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list, so defensive checks for StopIteration when using next() to find the Azure provider are not required.

Applied to files:

  • src/app/endpoints/streaming_query.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • src/app/endpoints/streaming_query.py
  • src/lightspeed_stack.py
  • src/client.py
  • tests/unit/test_client.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Handle `APIConnectionError` from Llama Stack in error handling

Applied to files:

  • src/lightspeed_stack.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • src/lightspeed_stack.py
  • examples/azure-run.yaml
  • src/llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Unit tests require 60% code coverage

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Write unit tests covering new functionality before completion

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/integration/**/*.py : Integration tests require 10% code coverage

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • src/configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Never commit secrets/keys; use environment variables for sensitive data

Applied to files:

  • src/authorization/azure_token_manager.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use `logger.error()` for serious problems that prevented function execution

Applied to files:

  • src/llama_stack_configuration.py
🧬 Code graph analysis (9)
src/app/endpoints/query.py (2)
src/authorization/azure_token_manager.py (5)
  • AzureEntraIDManager (22-97)
  • is_entra_id_configured (45-47)
  • is_token_expired (50-52)
  • refresh_token (59-71)
  • access_token (55-57)
src/client.py (3)
  • is_library_client (28-30)
  • reload_library_client (117-131)
  • update_provider_data (133-164)
src/app/endpoints/streaming_query.py (2)
src/authorization/azure_token_manager.py (5)
  • AzureEntraIDManager (22-97)
  • is_entra_id_configured (45-47)
  • is_token_expired (50-52)
  • refresh_token (59-71)
  • access_token (55-57)
src/client.py (4)
  • AsyncLlamaStackClientHolder (21-164)
  • is_library_client (28-30)
  • reload_library_client (117-131)
  • update_provider_data (133-164)
src/client.py (2)
src/configuration.py (2)
  • configuration (95-106)
  • llama_stack_configuration (123-134)
src/llama_stack_configuration.py (2)
  • enrich_byok_rag (216-234)
  • YamlDumper (23-39)
tests/unit/test_llama_stack_configuration.py (1)
src/llama_stack_configuration.py (3)
  • generate_configuration (242-270)
  • construct_vector_dbs_section (118-160)
  • construct_vector_io_providers_section (163-213)
src/app/main.py (2)
src/authorization/azure_token_manager.py (3)
  • AzureEntraIDManager (22-97)
  • set_config (39-42)
  • refresh_token (59-71)
src/configuration.py (2)
  • configuration (95-106)
  • azure_entra_id (345-349)
src/configuration.py (1)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1537-1548)
tests/unit/test_client.py (2)
src/client.py (6)
  • AsyncLlamaStackClientHolder (21-164)
  • load (32-40)
  • get_client (101-115)
  • is_library_client (28-30)
  • update_provider_data (133-164)
  • reload_library_client (117-131)
src/utils/types.py (1)
  • Singleton (43-57)
src/authorization/azure_token_manager.py (3)
src/configuration.py (1)
  • configuration (95-106)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1537-1548)
src/utils/types.py (1)
  • Singleton (43-57)
tests/unit/test_configuration.py (1)
src/configuration.py (2)
  • load_configuration (65-74)
  • azure_entra_id (345-349)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: library mode / ci
🔇 Additional comments (30)
tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)

18-19: LGTM!

The configuration maintains the noop authentication module as expected for library-mode e2e tests. Based on learnings, noop authentication tests use this default configuration, which is the correct setup.

src/configuration.py (1)

14-14: LGTM!

The import follows the absolute import pattern and is correctly placed with other model imports.

.github/workflows/e2e_tests_providers.yaml (2)

190-192: LGTM!

The Azure Entra ID credentials are properly sourced from GitHub secrets and align with the transition from static API keys to dynamic token management.


221-223: LGTM!

Library mode step correctly mirrors the server mode credential setup, maintaining consistency across deployment modes.

test.containerfile (2)

6-6: LGTM!

Adding azure-identity is necessary for Azure Entra ID token acquisition. The package is appropriate for this authentication flow.


21-22: LGTM!

The ENTRYPOINT correctly references the enrichment script that handles configuration enrichment and initial token generation before service startup, as clarified in past comments.

Makefile (2)

13-15: LGTM!

Good use of conditional assignment (?=) allowing users to override configuration files via command line (e.g., make run CONFIG=custom.yaml).


17-18: LGTM!

The run target now properly accepts a configurable configuration file path.

src/app/endpoints/query.py (1)

72-73: LGTM!

Import is correctly placed and follows the absolute import pattern per coding guidelines.

src/app/main.py (1)

12-12: LGTM!

Import follows the absolute import pattern per coding guidelines.

src/app/endpoints/streaming_query.py (1)

52-52: LGTM!

Import follows the absolute import pattern.

examples/lightspeed-stack-azure-entraid-lib.yaml (1)

1-25: LGTM!

Configuration structure is clear and follows the expected format for library client mode. Environment variable references for secrets follow best practices.

examples/lightspeed-stack-azure-entraid-service.yaml (2)

27-30: Verify if scope is required in the Azure Entra ID configuration.

Same as the library config - the scope field may be required by AzureEntraIdConfiguration.


1-25: LGTM!

Configuration structure is correct for service mode with appropriate comments explaining the alternatives.

tests/unit/test_configuration.py (1)

812-846: LGTM! Good coverage for Azure Entra ID configuration.

The test validates that all three credential fields (tenant_id, client_id, client_secret) are correctly loaded as SecretStr values. The pattern matches existing tests in the file.

tests/unit/test_client.py (2)

69-84: LGTM! Exception type now matches implementation.

The change from Exception to ValueError correctly reflects the actual exception raised in src/client.py line 47.


133-152: LGTM! Good coverage for library client reload.

The test properly validates that reload_library_client creates a new client instance and updates the holder.

docker-compose.yaml (2)

21-24: LGTM! Azure credentials properly configured for Entra ID flow.

The switch from AZURE_API_KEY to individual TENANT_ID, CLIENT_ID, and CLIENT_SECRET environment variables aligns with the new Azure Entra ID token management implementation.


14-14: Volume mount added for configuration file.

The shared SELinux label (:z) is appropriate since both llama-stack and lightspeed-stack services need read access to lightspeed-stack.yaml.

src/authorization/azure_token_manager.py (4)

1-19: LGTM! Module setup follows coding guidelines.

Imports, logging setup, and constant definition follow the project patterns. The 30-second leeway for token expiration is a reasonable safety margin.


22-42: Singleton token manager with proper initialization.

The singleton pattern ensures consistent token state across the application. Configuration is properly typed with AzureEntraIdConfiguration.


54-57: access_token returns empty string when not set.

The property returns SecretStr("") when AZURE_API_KEY is unset. Callers should check is_entra_id_configured and is_token_expired before using the token value.


73-80: Token update implementation is correct.

The leeway is properly applied to the expiration time, and the token is stored in the environment variable as designed (single source of truth for Llama Stack config injection).

tests/unit/test_llama_stack_configuration.py (2)

28-74: LGTM! Good coverage for construct_vector_dbs_section.

Tests cover empty input, preserving existing entries, adding new entries, and merging behavior. Assertions validate the expected dict structure with vector_db_id, provider_id, embedding_model, and embedding_dimension.


81-110: LGTM! Adequate coverage for construct_vector_io_providers_section.

Tests validate empty inputs, preserving existing entries, and adding new BYOK RAG entries with correct provider_id and provider_type values.

examples/azure-run.yaml (1)

1-137: LGTM! Configuration follows Llama Stack 0.3.x patterns.

The configuration properly uses the 0.3.x storage backend structure with kv_default/sql_default backends, persistence namespaces, and registered_resources scaffolding. The vector_io provider uses the correct persistence.backend pattern per the Llama Stack 0.3.x format. Based on learnings, telemetry is correctly not configured under the providers section.

src/client.py (2)

117-131: LGTM! Reload mechanism for library client.

The reload_library_client method properly reinitializes the library client from the stored config path, enabling token refresh for library mode. The guard for missing _config_path provides clear error messaging.


53-53: The double .configuration access pattern is correct and intentional.

The code correctly accesses a module-level singleton instance (configuration) which is an AppConfig object, then calls its configuration property to retrieve the loaded Configuration object, then accesses the byok_rag field. This is a standard pattern when using singleton classes.

src/llama_stack_configuration.py (2)

216-234: LGTM! Clean orchestration function for BYOK RAG enrichment.

The enrich_byok_rag function properly guards against empty input, initializes the providers dict if missing, and delegates to the construct functions. The in-place modification is documented in the docstring.


242-270: LGTM! Well-structured configuration generation.

The generate_configuration function properly orchestrates reading input, applying Azure Entra ID token setup, BYOK RAG enrichment, and writing output. The use of YamlDumper ensures consistent formatting.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@asimurka asimurka force-pushed the azure-entra-id-final branch from 22841de to b84740b Compare January 14, 2026 15:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/llama_stack_configuration.py (1)

194-207: Same vector_db_id validation issue as in construct_vector_dbs_section.

Apply consistent validation logic to skip entries with missing vector_db_id to avoid malformed provider_id and db_path values.

Proposed fix: Add validation
     for brag in byok_rag:
+        vector_db_id = brag.get("vector_db_id")
+        if not vector_db_id:
+            logger.warning("BYOK RAG entry missing vector_db_id in vector_io, skipping")
+            continue
         output.append(
             {
-                "provider_id": "byok_" + brag.get("vector_db_id", ""),
+                "provider_id": f"byok_{vector_db_id}",
                 "provider_type": brag.get("rag_type", "inline::faiss"),
                 "config": {
                     "kvstore": {
-                        "db_path": ".llama/" + brag.get("vector_db_id", "") + ".db",
+                        "db_path": f".llama/{vector_db_id}.db",
                         "namespace": None,
                         "type": "sqlite",
                     }
                 },
             }
         )
♻️ Duplicate comments (5)
test.containerfile (1)

13-17: Remove unnecessary execute permission from Python file.

The chmod +x applies to both scripts, but llama_stack_configuration.py is invoked via python3 interpreter (not as a direct executable) and lacks a shebang. Only the shell script needs execute permission.

Proposed fix
-RUN chmod +x /opt/app-root/enrich-entrypoint.sh && \
-    chown 1001:0 /opt/app-root/enrich-entrypoint.sh /opt/app-root/llama_stack_configuration.py
+RUN chmod +x /opt/app-root/enrich-entrypoint.sh && \
+    chown 1001:0 /opt/app-root/enrich-entrypoint.sh && \
+    chown 1001:0 /opt/app-root/llama_stack_configuration.py
src/llama_stack_configuration.py (3)

109-110: Authentication failures are silently swallowed.

When ClientAuthenticationError or CredentialUnavailableError occurs, the function logs an error but continues. Callers cannot detect the failure, leading to runtime errors later when AZURE_API_KEY is missing.

Proposed fix: Re-raise after logging
     except (ClientAuthenticationError, CredentialUnavailableError) as e:
         logger.error("Azure Entra ID: Failed to generate token: %s", e)
+        raise

102-107: Token written to file but not set in current process environment.

After writing the token to .env, downstream code in the same process won't have access to AZURE_API_KEY via os.environ. If generate_configuration or other code needs this token during the same execution, it won't be available.

Proposed fix: Set environment variable in current process
         with open(env_file, "w", encoding="utf-8") as f:
             f.writelines(lines)

+        # Make token available to current process
+        os.environ["AZURE_API_KEY"] = token.token
+
         logger.info(
             "Azure Entra ID: Access token set in env and written to %s", env_file
         )

146-154: Missing vector_db_id validation results in malformed entries.

If brag.get("vector_db_id") returns None or empty string, provider_id becomes "byok_" (empty suffix). Consider validating vector_db_id before constructing the entry.

Proposed fix: Add validation
     for brag in byok_rag:
+        vector_db_id = brag.get("vector_db_id")
+        if not vector_db_id:
+            logger.warning("BYOK RAG entry missing vector_db_id, skipping")
+            continue
         output.append(
             {
-                "vector_db_id": brag.get("vector_db_id", ""),
-                "provider_id": "byok_" + brag.get("vector_db_id", ""),
+                "vector_db_id": vector_db_id,
+                "provider_id": f"byok_{vector_db_id}",
                 "embedding_model": brag.get("embedding_model", ""),
                 "embedding_dimension": brag.get("embedding_dimension"),
             }
         )
tests/unit/test_llama_stack_configuration.py (1)

118-183: Good test coverage for generate_configuration, but missing tests for other functions.

The tests effectively cover generate_configuration scenarios. However, as noted in a past review, the module is missing tests for:

  1. setup_azure_entra_id_token - token generation, skip conditions, error handling
  2. enrich_byok_rag - empty list handling, config modification
  3. main() - CLI argument parsing

Per retrieved learnings, unit tests require 60% code coverage.

Suggested test additions for setup_azure_entra_id_token
from unittest.mock import patch, MagicMock

def test_setup_azure_entra_id_token_skips_when_env_set(monkeypatch: pytest.MonkeyPatch) -> None:
    """Test skips token generation when AZURE_API_KEY already set."""
    monkeypatch.setenv("AZURE_API_KEY", "existing_token")
    # Should return early without error
    from llama_stack_configuration import setup_azure_entra_id_token
    setup_azure_entra_id_token({"tenant_id": "t", "client_id": "c", "client_secret": "s"}, ".env")


def test_setup_azure_entra_id_token_skips_when_config_none() -> None:
    """Test skips when azure_config is None."""
    from llama_stack_configuration import setup_azure_entra_id_token
    setup_azure_entra_id_token(None, ".env")  # Should not raise


def test_setup_azure_entra_id_token_warns_on_missing_fields(caplog: pytest.LogCaptureFixture) -> None:
    """Test warns when required fields are missing."""
    from llama_stack_configuration import setup_azure_entra_id_token
    setup_azure_entra_id_token({"tenant_id": "t"}, ".env")
    assert "Missing required fields" in caplog.text
🧹 Nitpick comments (5)
src/authorization/azure_token_manager.py (1)

87-102: Consider logging the exception details for auth failures.

The warning log on line 101 doesn't include the exception details, which could make debugging authentication issues harder in production.

Proposed improvement
         except (ClientAuthenticationError, CredentialUnavailableError):
-            logger.warning("Failed to retrieve Azure access token")
+            logger.warning("Failed to retrieve Azure access token", exc_info=True)
             return None
src/app/endpoints/query.py (1)

72-73: Import placement inconsistent with other imports.

The AzureEntraIDManager import is placed after the relative imports from utils.types, breaking the import organization pattern. Consider moving it to group with other authorization imports.

Suggested placement
 from authentication import get_auth_dependency
 from authentication.interface import AuthTuple
 from authorization.middleware import authorize
+from authorization.azure_token_manager import AzureEntraIDManager
 from client import AsyncLlamaStackClientHolder
src/app/endpoints/streaming_query.py (1)

894-915: Code duplication with query.py - consider extracting shared helper.

This Azure token refresh logic is nearly identical to lines 317-338 in query.py. Consider extracting this into a shared helper function to reduce duplication and ensure consistent behavior.

The same str(azure_config.get("api_base")) concern applies here as noted in the query.py review.

Suggested refactor: Extract shared helper

Create a helper in src/authorization/azure_token_manager.py or a shared utils module:

async def refresh_azure_token_and_update_client(
    client: AsyncLlamaStackClient,
    client_holder: AsyncLlamaStackClientHolder,
) -> AsyncLlamaStackClient:
    """Refresh Azure token and update client if needed.
    
    Returns:
        Updated client instance.
    """
    if not (
        AzureEntraIDManager().is_entra_id_configured
        and AzureEntraIDManager().is_token_expired
        and AzureEntraIDManager().refresh_token()
    ):
        return client
    
    if client_holder.is_library_client:
        return await client_holder.reload_library_client()
    
    azure_config = next(
        p.config
        for p in await client.providers.list()
        if p.provider_type == "remote::azure"
    )
    return client_holder.update_provider_data(
        {
            "azure_api_key": AzureEntraIDManager().access_token.get_secret_value(),
            "azure_api_base": str(azure_config.get("api_base", "")),
        }
    )

Then use in both endpoints:

if provider_id == "azure":
    client = await refresh_azure_token_and_update_client(
        client, AsyncLlamaStackClientHolder()
    )
src/llama_stack_configuration.py (1)

216-234: In-place modification is documented but contradicts coding guidelines.

The function modifies ls_config in place, which is documented in the docstring. However, per coding guidelines, returning a new data structure is preferred. Consider either:

  1. Returning the modified config (preferred per guidelines)
  2. Adding an explicit note that this is intentional for performance/integration reasons
Optional: Return modified config instead
-def enrich_byok_rag(ls_config: dict[str, Any], byok_rag: list[dict[str, Any]]) -> None:
-    """Enrich Llama Stack config with BYOK RAG settings.
-
-    Args:
-        ls_config: Llama Stack configuration dict (modified in place)
-        byok_rag: List of BYOK RAG configurations
-    """
+def enrich_byok_rag(ls_config: dict[str, Any], byok_rag: list[dict[str, Any]]) -> dict[str, Any]:
+    """Enrich Llama Stack config with BYOK RAG settings.
+
+    Args:
+        ls_config: Llama Stack configuration dict
+        byok_rag: List of BYOK RAG configurations
+
+    Returns:
+        Enriched configuration dict
+    """
     if len(byok_rag) == 0:
         logger.info("BYOK RAG is not configured: skipping")
-        return
+        return ls_config
 
     logger.info("Enriching Llama Stack config with BYOK RAG")
     ls_config["vector_dbs"] = construct_vector_dbs_section(ls_config, byok_rag)
 
     if "providers" not in ls_config:
         ls_config["providers"] = {}
     ls_config["providers"]["vector_io"] = construct_vector_io_providers_section(
         ls_config, byok_rag
     )
+    return ls_config
tests/unit/test_llama_stack_configuration.py (1)

81-111: Consider adding a merge test for consistency.

The construct_vector_dbs_section tests include a merge test (test_construct_vector_dbs_section_merge), but construct_vector_io_providers_section tests don't have an equivalent. Consider adding one for consistency.

Suggested: Add merge test
def test_construct_vector_io_providers_section_merge() -> None:
    """Test merges existing and new entries."""
    ls_config = {"providers": {"vector_io": [{"provider_id": "existing"}]}}
    byok_rag = [{"vector_db_id": "new_db"}]
    output = construct_vector_io_providers_section(ls_config, byok_rag)
    assert len(output) == 2
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22841de and b84740b.

📒 Files selected for processing (11)
  • docs/providers.md
  • pyproject.toml
  • scripts/llama-stack-entrypoint.sh
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/app/main.py
  • src/authorization/azure_token_manager.py
  • src/llama_stack_configuration.py
  • test.containerfile
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/test_llama_stack_configuration.py
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/unit/authorization/test_azure_token_manager.py
  • scripts/llama-stack-entrypoint.sh
  • docs/providers.md
  • pyproject.toml
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = logging.getLogger(__name__) pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, using typing_extensions.Self for model validators
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Run uv run make format to auto-format code with black and ruff before completion
Run uv run make verify to run all linters (black, pyl...

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/llama_stack_configuration.py
  • src/authorization/azure_token_manager.py
  • src/app/main.py
  • tests/unit/test_llama_stack_configuration.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • src/app/main.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest for all unit and integration tests, not unittest
Use pytest-mock for AsyncMock objects in tests
Use MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") pattern for authentication mocks in tests

Files:

  • tests/unit/test_llama_stack_configuration.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/unit/**/*.py: Unit tests require 60% code coverage
Write unit tests covering new functionality before completion

Files:

  • tests/unit/test_llama_stack_configuration.py
🧠 Learnings (8)
📚 Learning: 2025-08-19T08:57:27.714Z
Learnt from: onmete
Repo: lightspeed-core/lightspeed-stack PR: 417
File: src/lightspeed_stack.py:60-63
Timestamp: 2025-08-19T08:57:27.714Z
Learning: In the lightspeed-stack project, file permission hardening (chmod 0o600) for stored configuration JSON files is not required as it's not considered a security concern in their deployment environment.

Applied to files:

  • test.containerfile
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2026-01-14T09:38:00.340Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:38:00.340Z
Learning: In the lightspeed-stack codebase, when provider_id is "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list, so defensive checks for StopIteration when using next() to find the Azure provider are not required.

Applied to files:

  • src/app/endpoints/streaming_query.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • src/llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Never commit secrets/keys; use environment variables for sensitive data

Applied to files:

  • src/authorization/azure_token_manager.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Unit tests require 60% code coverage

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Write unit tests covering new functionality before completion

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/integration/**/*.py : Integration tests require 10% code coverage

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
🧬 Code graph analysis (5)
src/app/endpoints/query.py (2)
src/authorization/azure_token_manager.py (5)
  • AzureEntraIDManager (22-102)
  • is_entra_id_configured (45-47)
  • is_token_expired (50-52)
  • refresh_token (59-76)
  • access_token (55-57)
src/client.py (3)
  • is_library_client (28-30)
  • reload_library_client (117-131)
  • update_provider_data (133-164)
src/app/endpoints/streaming_query.py (2)
src/authorization/azure_token_manager.py (5)
  • AzureEntraIDManager (22-102)
  • is_entra_id_configured (45-47)
  • is_token_expired (50-52)
  • refresh_token (59-76)
  • access_token (55-57)
src/client.py (4)
  • AsyncLlamaStackClientHolder (21-164)
  • is_library_client (28-30)
  • reload_library_client (117-131)
  • update_provider_data (133-164)
src/llama_stack_configuration.py (1)
src/models/config.py (3)
  • config (330-347)
  • config (1361-1369)
  • dump (1701-1713)
src/authorization/azure_token_manager.py (3)
src/configuration.py (1)
  • configuration (95-106)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1537-1548)
src/utils/types.py (1)
  • Singleton (43-57)
tests/unit/test_llama_stack_configuration.py (1)
src/llama_stack_configuration.py (3)
  • generate_configuration (242-270)
  • construct_vector_dbs_section (118-160)
  • construct_vector_io_providers_section (163-213)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🔇 Additional comments (18)
test.containerfile (2)

6-6: LGTM: Azure Identity dependency added.

The azure-identity package is correctly added to support the new Azure Entra ID token management feature.


21-22: LGTM: Entrypoint correctly configured.

The entrypoint is set to the enrichment script which will handle configuration enrichment and token generation before starting the Llama Stack service.

src/authorization/azure_token_manager.py (5)

1-20: LGTM: Module setup follows coding guidelines.

  • Module docstring present
  • Logger pattern logging.getLogger(__name__) used correctly
  • Absolute imports used
  • Type imports properly organized
  • Constant TOKEN_EXPIRATION_LEEWAY appropriately defined at module level

22-47: LGTM: Singleton class with proper configuration management.

The class correctly implements the singleton pattern via metaclass and provides clear configuration lifecycle management with set_config and is_entra_id_configured.


49-57: Potential race condition when checking expiration and reading token from different sources.

is_token_expired checks _expires_on (instance state) while access_token reads from os.environ["AZURE_API_KEY"]. In concurrent scenarios, these could become inconsistent if another thread updates the env var between checking expiration and reading the token.

Since this is a singleton used across requests in an async context, consider whether this inconsistency could cause issues. If library mode reloads the client (which re-reads env vars), this might be acceptable.


59-76: LGTM: Token refresh with boolean return.

The method correctly returns True/False to indicate success, allowing callers to handle failures appropriately. This addresses the previous review concern about silent failures.


78-85: Token stored in environment variable - verify thread safety.

Using os.environ["AZURE_API_KEY"] as storage works for the intended use case (env var injection for Llama Stack), but be aware that environment variable updates are not atomic. In high-concurrency scenarios with multiple simultaneous refreshes, this could theoretically cause brief inconsistencies.

Based on learnings, the environment variable approach is intentional as the single source of truth for Azure token storage.

src/app/main.py (2)

12-12: LGTM: Import follows project conventions.

Absolute import from authorization.azure_token_manager correctly added.


43-51: LGTM: Startup token refresh with graceful degradation.

The implementation correctly:

  1. Checks if Azure Entra ID is configured before attempting initialization
  2. Uses set_config to provide configuration to the singleton
  3. Checks refresh_token() return value and logs a warning on failure
  4. Allows service to continue (will retry on next Azure request)

This addresses the previous review concern about silent failures at startup.

src/app/endpoints/streaming_query.py (1)

52-52: LGTM: Import correctly placed with authorization imports.

src/app/endpoints/query.py (1)

317-338: Code is correct; api_base is guaranteed by Llama Stack schema validation.

The short-circuit pattern with and AzureEntraIDManager().refresh_token() is clean and prevents invalid token updates. Regarding the api_base concern: according to Lightspeed's documentation, the Azure provider config must include api_key, api_base, and api_version as required fields for Llama Stack schema validation—Llama Stack fails to start if any are missing. By the time line 335 executes, api_base is guaranteed to be present, so str(azure_config.get("api_base")) is safe and will not produce "None".

src/llama_stack_configuration.py (4)

1-20: LGTM!

Module docstring is clear, imports are properly organized with standard library first, then third-party, and the logger follows the logger = logging.getLogger(__name__) pattern as per coding guidelines.


23-39: LGTM!

The custom YAML dumper is well-documented with a clear docstring explaining its purpose and behavior.


242-271: LGTM!

The function is well-structured with clear documentation. File errors propagate naturally, and the enrichment flow is logical.


278-317: LGTM!

CLI entry point is well-structured with clear argument definitions and defaults. Unhandled exceptions (like FileNotFoundError) will naturally result in non-zero exit status.

tests/unit/test_llama_stack_configuration.py (3)

9-13: LGTM!

Imports are properly organized using absolute imports as per coding guidelines.


28-74: LGTM!

Good test coverage for construct_vector_dbs_section with tests for empty inputs, preservation of existing entries, adding new entries, and merging behavior.


127-137: Test fixture file is properly committed.

The file tests/configuration/run.yaml exists in the repository and is available for the test to use. The test code correctly references this fixture file.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@asimurka asimurka force-pushed the azure-entra-id-final branch from f991321 to c68a5f4 Compare January 15, 2026 11:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/llama_stack_configuration.py (1)

194-207: Same vector_db_id validation needed here.

This function has the same issue - missing vector_db_id creates malformed provider IDs like "byok_".

♻️ Suggested fix
     for brag in byok_rag:
+        vector_db_id = brag.get("vector_db_id")
+        if not vector_db_id:
+            continue  # Already logged in construct_vector_dbs_section
         output.append(
             {
-                "provider_id": "byok_" + brag.get("vector_db_id", ""),
+                "provider_id": f"byok_{vector_db_id}",
                 "provider_type": brag.get("rag_type", "inline::faiss"),
                 "config": {
                     "kvstore": {
-                        "db_path": ".llama/" + brag.get("vector_db_id", "") + ".db",
+                        "db_path": f".llama/{vector_db_id}.db",
                         "namespace": None,
                         "type": "sqlite",
                     }
                 },
             }
         )
🤖 Fix all issues with AI agents
In `@requirements.x86_64.txt`:
- Around line 248-257: pyproject.toml currently lists unpinned azure-core and
azure-identity while the generated lockfiles (requirements.x86_64.txt /
requirements.aarch64.txt) pin azure-core==1.38.0 and azure-identity==1.25.1;
update the dependency declarations for azure-core and azure-identity in
pyproject.toml (the entries around lines 66–67 and the other listed occurrences)
to match the pinned versions (or alternatively add a clear comment/section
documenting that exact versions are enforced via lockfiles), ensuring the
pyproject uses the correct pinned spec (==1.38.0 and ==1.25.1) or documents the
pinning strategy to avoid drift.
♻️ Duplicate comments (3)
src/llama_stack_configuration.py (2)

146-154: Missing validation for vector_db_id creates malformed provider IDs.

If brag.get("vector_db_id") returns None or empty string, the provider_id becomes "byok_" which is likely invalid. Consider validating or skipping entries with missing vector_db_id:

♻️ Suggested fix
     for brag in byok_rag:
+        vector_db_id = brag.get("vector_db_id")
+        if not vector_db_id:
+            logger.warning("BYOK RAG entry missing vector_db_id, skipping")
+            continue
         output.append(
             {
-                "vector_db_id": brag.get("vector_db_id", ""),
-                "provider_id": "byok_" + brag.get("vector_db_id", ""),
+                "vector_db_id": vector_db_id,
+                "provider_id": f"byok_{vector_db_id}",
                 "embedding_model": brag.get("embedding_model", ""),
                 "embedding_dimension": brag.get("embedding_dimension"),
             }
         )

74-111: Token not available in current process after writing to .env.

After writing the token to the .env file, the current process's os.environ doesn't have AZURE_API_KEY set. If downstream code in the same process needs this token, it won't be available.

Additionally, authentication failures are silently swallowed - the function logs an error but returns normally, making it difficult for callers to detect failures.

♻️ Suggested fix
         with open(env_file, "w", encoding="utf-8") as f:
             f.writelines(lines)

+        # Make token available to current process
+        os.environ["AZURE_API_KEY"] = token.token
+
         logger.info(
             "Azure Entra ID: Access token set in env and written to %s", env_file
         )

     except (ClientAuthenticationError, CredentialUnavailableError) as e:
         logger.error("Azure Entra ID: Failed to generate token: %s", e)
+        raise
tests/unit/test_llama_stack_configuration.py (1)

118-183: Missing test coverage for setup_azure_entra_id_token, enrich_byok_rag, and main() functions.

Based on the past review comment, the module has approximately 35% code coverage, well below the 60% requirement. The following functions remain untested:

  1. setup_azure_entra_id_token (~68 lines) - Azure token generation and .env file writing
  2. enrich_byok_rag (~17 lines) - BYOK RAG enrichment orchestration
  3. main() (~40 lines) - CLI entry point

Key scenarios to add:

  • setup_azure_entra_id_token: skips when AZURE_API_KEY already set, handles missing credentials, writes token to file
  • enrich_byok_rag: returns early when list is empty, modifies config in place
  • main(): parses arguments, handles FileNotFoundError

Based on learnings, unit tests require 60% code coverage.

🧹 Nitpick comments (15)
tests/unit/models/config/test_dump_configuration.py (1)

206-206: Consider adding a test with populated azure_entra_id.

All tests assert azure_entra_id: None. Consider adding a test case where azure_entra_id is populated with an AzureEntraIdConfiguration to verify secrets are properly redacted in the dump output (similar to how api_key and password show "**********").

src/configuration.py (1)

344-350: Prefer delegating to self.configuration to avoid duplicating the “not loaded” guard.
This keeps the guard logic/message centralized.

Proposed tweak
 `@property`
 def azure_entra_id(self) -> Optional[AzureEntraIdConfiguration]:
     """Return Azure Entra ID configuration, or None if not provided."""
-    if self._configuration is None:
-        raise LogicError("logic error: configuration is not loaded")
-    return self._configuration.azure_entra_id
+    return self.configuration.azure_entra_id
tests/unit/test_configuration.py (1)

812-877: Good coverage for azure_entra_id: happy path + incomplete-block validation.
The assertions on SecretStr.get_secret_value() are the right way to validate secrets without changing the model.

  • Consider switching the fixture param from tmpdir to tmp_path (and keep it typed as Path) to avoid the legacy tmpdir typing mismatch in new tests.
Makefile (1)

20-23: Fragile .env parsing may fail for edge cases.

The cut -d'=' -f2- correctly handles values containing =, but this approach will fail silently if .env doesn't exist or if AZURE_API_KEY uses quotes (e.g., AZURE_API_KEY="token"). Consider using a more robust approach or adding a check:

♻️ Suggested improvement
 run-llama-stack: ## Start Llama Stack with enriched config (for local service mode)
 	uv run src/llama_stack_configuration.py -c $(CONFIG) -i $(LLAMA_STACK_CONFIG) -o $(LLAMA_STACK_CONFIG) && \
-	AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env | cut -d'=' -f2-) \
+	AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env 2>/dev/null | cut -d'=' -f2- | tr -d '"'"'" || echo "") \
 	uv run llama stack run $(LLAMA_STACK_CONFIG)
src/authorization/azure_token_manager.py (2)

78-85: Consider thread safety for concurrent token refreshes.

If multiple requests trigger refresh_token concurrently, there's a potential race condition where _expires_on and the environment variable could be updated non-atomically. While Python's GIL provides some protection, the multi-step update could still result in inconsistent state.

For production robustness, consider adding a threading lock around the token update:

♻️ Suggested improvement
+import threading
+
 class AzureEntraIDManager(metaclass=Singleton):
     ...
     def __init__(self) -> None:
         """Initialize the token manager with empty state."""
         self._expires_on: int = 0
         self._entra_id_config: Optional[AzureEntraIdConfiguration] = None
+        self._lock = threading.Lock()
     
     def _update_access_token(self, token: str, expires_on: int) -> None:
         """Update the token in env var and track expiration time."""
-        self._expires_on = expires_on - TOKEN_EXPIRATION_LEEWAY
-        os.environ["AZURE_API_KEY"] = token
+        with self._lock:
+            self._expires_on = expires_on - TOKEN_EXPIRATION_LEEWAY
+            os.environ["AZURE_API_KEY"] = token

87-102: Silent failure may mask authentication issues in production.

When ClientAuthenticationError or CredentialUnavailableError occurs, the method logs a warning and returns None. While callers can check the return value of refresh_token(), the warning-level log may not be prominent enough for authentication failures that could indicate misconfiguration or credential rotation issues.

Consider logging at error level for authentication failures to improve observability:

♻️ Suggested improvement
         except (ClientAuthenticationError, CredentialUnavailableError):
-            logger.warning("Failed to retrieve Azure access token")
+            logger.error("Failed to retrieve Azure access token - check credentials")
             return None
tests/unit/authorization/test_azure_token_manager.py (1)

119-155: Use logging.WARNING constant for log level.

Using string "WARNING" works but logging.WARNING constant is more explicit and avoids potential typos.

♻️ Suggested improvement
+import logging as log_module
+
     def test_refresh_token_failure_logs_error(
         ...
     ) -> None:
         ...
-        with caplog.at_level("WARNING"):
+        with caplog.at_level(log_module.WARNING):
             result = token_manager.refresh_token()
src/llama_stack_configuration.py (1)

278-317: CLI should exit with non-zero status on file errors.

If the config file doesn't exist or the input file is invalid, the script will raise an unhandled exception. For CLI tools, wrapping in try/except with sys.exit(1) provides cleaner error handling:

♻️ Suggested improvement
+import sys
+
 def main() -> None:
     """CLI entry point."""
     ...
     args = parser.parse_args()

+    try:
         with open(args.config, "r", encoding="utf-8") as f:
             config = yaml.safe_load(f)
             config = replace_env_vars(config)

         generate_configuration(args.input, args.output, config, args.env_file)
+    except FileNotFoundError as e:
+        logger.error("File not found: %s", e.filename)
+        sys.exit(1)
+    except yaml.YAMLError as e:
+        logger.error("Invalid YAML: %s", e)
+        sys.exit(1)
tests/unit/test_llama_stack_configuration.py (1)

140-160: Excessive type: ignore comments suggest incomplete test model construction.

The # type: ignore[call-arg] comments on lines 142, 144, 145, 149, 150 indicate that required fields are not being passed to the Pydantic models. While this may work at runtime with defaults, it creates fragile tests that won't catch breaking changes to required model fields.

Consider providing required fields explicitly
def test_generate_configuration_with_pydantic_model(tmp_path: Path) -> None:
    """Test generate_configuration accepts Pydantic model via model_dump()."""
    cfg = Configuration(
        name="test",
        service=ServiceConfiguration(port=8080),  # provide required fields
        llama_stack=LlamaStackConfiguration(
            use_as_library_client=True,
            library_client_config_path="run.yaml",
            url="http://localhost:5000",  # if required
        ),
        user_data_collection=UserDataCollection(transcripts_enabled=False),
        inference=InferenceConfiguration(
            default_model="test-model",
            default_provider="test-provider",
        ),
    )
    # ... rest of test
src/client.py (4)

72-99: Fixed temp file path creates potential collision in concurrent scenarios.

The enriched config is written to a fixed path llama_stack_enriched_config.yaml in the system temp directory. If multiple processes run simultaneously (e.g., during parallel tests), they could overwrite each other's configs.

Consider using a unique temp file per instance
-        enriched_path = os.path.join(
-            tempfile.gettempdir(), "llama_stack_enriched_config.yaml"
-        )
+        # Use NamedTemporaryFile or include process ID for uniqueness
+        enriched_path = os.path.join(
+            tempfile.gettempdir(), f"llama_stack_enriched_config_{os.getpid()}.yaml"
+        )

Alternatively, use tempfile.NamedTemporaryFile(delete=False, suffix='.yaml') for guaranteed uniqueness.


79-84: Silent fallback on config read failure may mask configuration issues.

When YAML reading fails, the method logs a warning and returns the original path, which may not contain the required BYOK RAG configuration. This could lead to subtle runtime failures.

Consider whether this should raise an exception in production scenarios where BYOK RAG is explicitly configured but the enrichment fails. At minimum, escalate to logger.error() to make failures more visible.


117-131: reload_library_client lacks mode validation.

The method is documented as "For use with library mode only" but doesn't verify the client is actually in library mode before reloading. If called on a service-mode client, it would create a library client from an invalid config path.

Add mode validation
     async def reload_library_client(self) -> AsyncLlamaStackClient:
         """Reload library client to pick up env var changes.

         For use with library mode only.

         Returns:
             The reloaded client instance.
+
+        Raises:
+            RuntimeError: If not in library mode or config path not set.
         """
+        if not self.is_library_client:
+            raise RuntimeError("Cannot reload: not in library mode")
         if not self._config_path:
             raise RuntimeError("Cannot reload: config path not set")

133-163: update_provider_data lacks mode validation.

The method is documented as "For use with service mode only" but doesn't verify the client is actually in service mode. Calling this on a library client would succeed but have no effect on actual authentication.

Add mode validation
     def update_provider_data(self, updates: dict[str, str]) -> AsyncLlamaStackClient:
         """Update provider data headers for service client.

         For use with service mode only.

         Args:
             updates: Key-value pairs to merge into provider data header.

         Returns:
             The updated client instance.
+
+        Raises:
+            RuntimeError: If client not initialized or in library mode.
         """
         if not self._lsc:
             raise RuntimeError(
                 "AsyncLlamaStackClient has not been initialised. Ensure 'load(..)' has been called."
             )
+        if self.is_library_client:
+            raise RuntimeError("Cannot update provider data in library mode")
src/app/endpoints/streaming_query.py (1)

318-338: Consider extracting duplicated token refresh logic.

The Azure Entra ID token refresh logic in query.py (lines 318-338) and streaming_query.py (lines 895-914) is nearly identical. Consider extracting this to a shared helper function to improve maintainability.

Example extraction
# In a shared module (e.g., utils/azure_token.py or client.py)
async def refresh_azure_token_if_needed(
    client: AsyncLlamaStackClient,
    provider_id: str,
    client_holder: AsyncLlamaStackClientHolder,
) -> AsyncLlamaStackClient:
    """Refresh Azure token if configured and expired.
    
    Returns:
        Updated client instance (may be same or reloaded/updated).
    """
    if (
        provider_id != "azure"
        or not AzureEntraIDManager().is_entra_id_configured
        or not AzureEntraIDManager().is_token_expired
        or not AzureEntraIDManager().refresh_token()
    ):
        return client
    
    if client_holder.is_library_client:
        return await client_holder.reload_library_client()
    
    azure_config = next(
        p.config
        for p in await client.providers.list()
        if p.provider_type == "remote::azure"
    )
    return client_holder.update_provider_data({
        "azure_api_key": AzureEntraIDManager().access_token.get_secret_value(),
        "azure_api_base": str(azure_config.get("api_base")),
    })

Also applies to: 895-914

tests/unit/test_client.py (1)

13-16: Consider using Generator pattern for consistency with existing fixtures.

The reset_singleton fixture in tests/unit/authorization/test_azure_token_manager.py uses a Generator pattern with yield. While the current implementation works, matching the existing pattern improves consistency and allows for potential teardown logic.

♻️ Suggested refactor for consistency
+from typing import Generator
+
+
 `@pytest.fixture`(autouse=True)
-def reset_singleton() -> None:
+def reset_singleton() -> Generator[None, None, None]:
     """Reset singleton state between tests."""
     Singleton._instances = {}
+    yield
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f991321 and c68a5f4.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (41)
  • .github/workflows/e2e_tests.yaml
  • .github/workflows/e2e_tests_providers.yaml
  • Makefile
  • README.md
  • docker-compose-library.yaml
  • docker-compose.yaml
  • docs/providers.md
  • examples/azure-run.yaml
  • examples/lightspeed-stack-azure-entraid-lib.yaml
  • examples/lightspeed-stack-azure-entraid-service.yaml
  • pyproject.toml
  • requirements-build.txt
  • requirements.aarch64.txt
  • requirements.x86_64.txt
  • scripts/llama-stack-entrypoint.sh
  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/main.py
  • src/authorization/azure_token_manager.py
  • src/client.py
  • src/configuration.py
  • src/lightspeed_stack.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • test.containerfile
  • tests/e2e/configs/run-azure.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/features/steps/info.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_client.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
💤 Files with no reviewable changes (6)
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
  • src/app/endpoints/query_v2.py
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
✅ Files skipped from review due to trivial changes (1)
  • docs/providers.md
🚧 Files skipped from review as they are similar to previous changes (14)
  • docker-compose-library.yaml
  • README.md
  • src/lightspeed_stack.py
  • examples/lightspeed-stack-azure-entraid-service.yaml
  • scripts/llama-stack-entrypoint.sh
  • src/app/main.py
  • test.containerfile
  • examples/lightspeed-stack-azure-entraid-lib.yaml
  • .github/workflows/e2e_tests.yaml
  • pyproject.toml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • examples/azure-run.yaml
  • src/models/config.py
  • tests/e2e/configs/run-azure.yaml
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = logging.getLogger(__name__) pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, using typing_extensions.Self for model validators
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Run uv run make format to auto-format code with black and ruff before completion
Run uv run make verify to run all linters (black, pyl...

Files:

  • tests/e2e/features/steps/info.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/test_client.py
  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/test_configuration.py
  • src/client.py
  • tests/unit/test_llama_stack_configuration.py
  • src/llama_stack_configuration.py
  • src/configuration.py
  • src/authorization/azure_token_manager.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest for all unit and integration tests, not unittest
Use pytest-mock for AsyncMock objects in tests
Use MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") pattern for authentication mocks in tests

Files:

  • tests/e2e/features/steps/info.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/test_client.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
tests/e2e/features/steps/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define E2E test steps in tests/e2e/features/steps/ directory

Files:

  • tests/e2e/features/steps/info.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/unit/**/*.py: Unit tests require 60% code coverage
Write unit tests covering new functionality before completion

Files:

  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/test_client.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/query.py
  • src/app/endpoints/streaming_query.py
🧠 Learnings (12)
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • requirements.aarch64.txt
  • tests/unit/test_client.py
  • src/client.py
  • requirements.x86_64.txt
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/**/*.py : Use `MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token")` pattern for authentication mocks in tests

Applied to files:

  • tests/unit/authentication/test_api_key_token.py
📚 Learning: 2026-01-14T09:37:51.612Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:37:51.612Z
Learning: In the lightspeed-stack repository, when provider_id == "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list. Therefore, avoid defensive StopIteration handling for next() when locating the Azure provider in providers within src/app/endpoints/query.py. This change applies specifically to this file (or nearby provider lookup code) and relies on the invariant that the Azure provider exists; if the invariant could be violated, keep the existing StopIteration handling.

Applied to files:

  • src/app/endpoints/query.py
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
📚 Learning: 2026-01-14T09:38:00.340Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:38:00.340Z
Learning: In the lightspeed-stack codebase, when provider_id is "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list, so defensive checks for StopIteration when using next() to find the Azure provider are not required.

Applied to files:

  • src/app/endpoints/streaming_query.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Unit tests require 60% code coverage

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Write unit tests covering new functionality before completion

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/integration/**/*.py : Integration tests require 10% code coverage

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • src/llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • src/configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Never commit secrets/keys; use environment variables for sensitive data

Applied to files:

  • src/authorization/azure_token_manager.py
🧬 Code graph analysis (8)
tests/unit/test_client.py (2)
src/utils/types.py (1)
  • Singleton (43-57)
tests/unit/authorization/test_azure_token_manager.py (1)
  • reset_singleton (32-35)
src/app/endpoints/query.py (2)
src/authorization/azure_token_manager.py (5)
  • AzureEntraIDManager (22-102)
  • is_entra_id_configured (45-47)
  • is_token_expired (50-52)
  • refresh_token (59-76)
  • access_token (55-57)
src/client.py (3)
  • is_library_client (28-30)
  • reload_library_client (117-131)
  • update_provider_data (133-164)
tests/unit/authorization/test_azure_token_manager.py (2)
src/authorization/azure_token_manager.py (7)
  • AzureEntraIDManager (22-102)
  • access_token (55-57)
  • is_token_expired (50-52)
  • is_entra_id_configured (45-47)
  • set_config (39-42)
  • refresh_token (59-76)
  • _update_access_token (78-85)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1537-1548)
tests/unit/test_configuration.py (1)
src/configuration.py (2)
  • load_configuration (65-74)
  • azure_entra_id (345-349)
tests/unit/test_llama_stack_configuration.py (1)
src/llama_stack_configuration.py (3)
  • generate_configuration (242-270)
  • construct_vector_dbs_section (118-160)
  • construct_vector_io_providers_section (163-213)
src/llama_stack_configuration.py (1)
src/models/config.py (3)
  • config (330-347)
  • config (1361-1369)
  • dump (1701-1713)
src/configuration.py (1)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1537-1548)
src/authorization/azure_token_manager.py (3)
src/configuration.py (1)
  • configuration (95-106)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1537-1548)
src/utils/types.py (1)
  • Singleton (43-57)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: build-pr
🔇 Additional comments (36)
tests/unit/authentication/test_api_key_token.py (1)

5-5: LGTM!

Trivial formatting change (blank line after docstring). The existing tests provide good coverage for the APIKeyTokenAuthDependency class.

requirements-build.txt (1)

216-250: LGTM!

The trove-classifiers version update and the azure-identity build dependency comments are auto-generated by pip-compile. These align with the Azure Entra ID support being added to the project.

tests/unit/models/config/test_dump_configuration.py (1)

104-104: LGTM!

The assertion correctly verifies that the new azure_entra_id field is present in the dumped configuration output.

tests/e2e/configuration/library-mode/lightspeed-stack.yaml (1)

18-19: LGTM!

Trailing newline normalization. The noop authentication module is correctly configured for e2e testing as per the project's test conventions.

requirements.aarch64.txt (2)

544-545: Provenance updates look consistent; ensure transitive constraints don’t diverge across environments.

The updated “via …” comments for cryptography, pyjwt, requests, and typing-extensions align with the Azure/MSAL dependency chain. Please double-check that other platform lockfiles (e.g., non-aarch64) and pyproject.toml end up with the same direct Azure pins to avoid drifting dependency graphs across deployment modes.

Also applies to: 2521-2521, 2765-2772, 3309-3311


248-257: All four Azure identity stack versions are confirmed available on PyPI and vulnerability-free.

  • azure-core==1.38.0 ✓ (actually fixes CVE-2026-21226)
  • azure-identity==1.25.1 ✓ (not affected by CVE-2024-35255)
  • msal==1.34.0 ✓ (no published Python-specific CVEs)
  • msal-extensions==1.3.1 ✓ (no known vulnerabilities)

Safe to merge.

src/configuration.py (1)

10-26: Import addition is consistent with existing config access patterns.
Pulling AzureEntraIdConfiguration into the import list aligns with the new AppConfig.azure_entra_id property and keeps typing explicit.

tests/unit/test_configuration.py (1)

5-5: Nice: explicit ValidationError import for schema failures.

Makefile (1)

13-18: LGTM!

The configuration variables with ?= allow easy overrides, and the updated run target correctly passes the config flag.

docker-compose.yaml (2)

15-25: LGTM!

The switch to :z SELinux label is correct for shared volume access between containers. Replacing AZURE_API_KEY with Azure Entra ID credentials (TENANT_ID, CLIENT_ID, CLIENT_SECRET) aligns with the dynamic token generation approach.


62-68: LGTM!

The lightspeed-stack service correctly mirrors the Azure Entra ID credential configuration from the llama-stack service.

src/authorization/azure_token_manager.py (4)

1-20: LGTM!

Module structure follows coding guidelines with descriptive docstring, proper imports (absolute for internal modules), and module-level logger using logging.getLogger(__name__).


22-42: LGTM!

The singleton implementation correctly uses the Singleton metaclass from utils.types. The class docstring clearly describes the responsibilities.


44-57: LGTM!

Properties are well-defined with proper type annotations. The access_token property correctly wraps the env var value in SecretStr to prevent accidental logging.


59-76: LGTM!

The refresh_token method correctly returns a boolean to indicate success/failure, addressing the previous review feedback. The ValueError for missing config provides clear error messaging.

tests/unit/authorization/test_azure_token_manager.py (4)

1-19: LGTM!

Imports are appropriate, using pytest and pytest-mock as required by coding guidelines. The pylint: disable=protected-access is justified for testing internal methods.


21-41: LGTM!

Fixtures are well-structured:

  • dummy_config provides reusable test configuration
  • reset_singleton with autouse=True ensures test isolation
  • token_manager provides clean instances per test

44-92: LGTM!

Tests comprehensively cover:

  • Singleton behavior verification
  • Initial state checks (empty token, expired status, unconfigured)
  • Configuration setting
  • Token expiration logic with direct _expires_on manipulation
  • Error handling when config is missing
  • Token update functionality

94-117: LGTM!

The mock setup correctly patches ClientSecretCredential at the module level and verifies:

  • Token refresh returns True on success
  • Token value is correctly stored
  • Token is not expired after refresh
  • Correct scope is passed to get_token
src/llama_stack_configuration.py (4)

1-20: LGTM!

Module structure follows coding guidelines with descriptive docstring, proper imports (absolute for internal modules), and module-level logger.


47-72: LGTM!

Good defensive programming:

  • Early return when AZURE_API_KEY is already set (orchestrator-injected)
  • Early return when azure_config is None
  • Validation of required fields before attempting token generation

216-234: LGTM!

The enrich_byok_rag function correctly:

  • Handles empty BYOK RAG list with early return
  • Initializes providers dict if missing
  • Delegates to construction functions

242-270: LGTM!

The generate_configuration function orchestrates the enrichment flow correctly, reading input, applying enrichments, and writing output.

.github/workflows/e2e_tests_providers.yaml (2)

55-68: LGTM - Azure Entra ID config injection for test configurations.

The step correctly injects the Azure Entra ID configuration block into all test configs when the environment is azure. The use of environment variable placeholders (${env.TENANT_ID}) aligns with the YAML template pattern used elsewhere in the project.

One minor note: the script appends without checking if azure_entra_id already exists. If re-run on the same config, this could create duplicate blocks. However, since CI runs on fresh checkouts, this is not a practical concern.


186-229: LGTM - Credentials properly propagated to both server and library mode steps.

The TENANT_ID, CLIENT_ID, and CLIENT_SECRET environment variables are correctly sourced from GitHub secrets and made available to both "Run services (Server Mode)" and "Run services (Library Mode)" steps. This aligns with the PR's shift from AZURE_API_KEY to Entra ID client credentials.

src/client.py (1)

42-64: LGTM - Library client initialization with BYOK RAG enrichment.

The _load_library_client method properly validates the config path, handles BYOK RAG enrichment when configured, and stores the config path for potential reloads. The flow is clear and well-structured.

src/app/endpoints/query.py (2)

318-338: LGTM - Azure Entra ID token refresh integration.

The token refresh logic correctly:

  1. Uses short-circuit evaluation to only refresh when needed
  2. Dispatches to reload_library_client() or update_provider_data() based on client mode
  3. Only proceeds if refresh_token() returns True

Based on learnings, the Azure provider is guaranteed to be present when provider_id == "azure", so the next() without default is acceptable in this context.


72-73: LGTM - Import follows project conventions.

The import of AzureEntraIDManager from authorization.azure_token_manager uses absolute imports as per coding guidelines.

src/app/endpoints/streaming_query.py (2)

895-914: LGTM - Azure Entra ID token refresh (mirrors query.py implementation).

The token refresh logic correctly handles both library and service client modes. Based on learnings, the Azure provider is guaranteed to be present when provider_id == "azure".


52-52: LGTM - Import follows project conventions.

tests/unit/test_llama_stack_configuration.py (1)

127-138: The test fixture path tests/configuration/run.yaml exists in the repository and is a stable, committed file. The test is correctly written and will pass as-is using this standard fixture pattern for configuration-based tests. No changes needed.

tests/unit/test_client.py (5)

3-4: LGTM!

The pylint disable directive is appropriate for test files that need to access protected members like Singleton._instances and _custom_headers for verification purposes.


5-10: LGTM!

Imports are absolute as per coding guidelines and are necessary for the new test functionality.


78-83: LGTM!

Using ValueError instead of a generic Exception is more precise and helps catch only the intended configuration error, making the test more robust.


86-131: LGTM!

Well-structured test that thoroughly verifies the update_provider_data behavior:

  • Confirms new client is returned and holder is updated
  • Validates provider data merging (existing fields preserved, new fields added/updated)
  • Good use of descriptive assertions and comments

133-152: LGTM!

Concise test that verifies the essential behavior of reload_library_client - confirming a new client is created and the holder reference is updated.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines 248 to 257
azure-core==1.38.0 \
--hash=sha256:8194d2682245a3e4e3151a667c686464c3786fed7918b394d035bdcd61bb5993 \
--hash=sha256:ab0c9b2cd71fecb1842d52c965c95285d3cfb38902f6766e4a471f1cd8905335
# via
# lightspeed-stack (pyproject.toml)
# azure-identity
azure-identity==1.25.1 \
--hash=sha256:87ca8328883de6036443e1c37b40e8dc8fb74898240f61071e09d2e369361456 \
--hash=sha256:e9edd720af03dff020223cd269fa3a61e8f345ea75443858273bcb44844ab651
# via lightspeed-stack (pyproject.toml)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find pyproject.toml
echo "=== Finding pyproject.toml ==="
find . -name "pyproject.toml" -type f 2>/dev/null | head -20

# List all requirements files
echo -e "\n=== All requirements files ==="
ls -la requirements*.txt 2>/dev/null || echo "No requirements files in root"
find . -name "requirements*.txt" -type f 2>/dev/null | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 662


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check pyproject.toml for azure dependencies
echo "=== Checking pyproject.toml for azure dependencies ==="
if [ -f "pyproject.toml" ]; then
  cat -n pyproject.toml | grep -A 5 -B 5 -i "azure-core\|azure-identity" || echo "Not found in grep"
fi

# Show the full pyproject.toml if it exists and is not too large
if [ -f "pyproject.toml" ]; then
  lines=$(wc -l < pyproject.toml)
  echo "pyproject.toml has $lines lines"
  if [ "$lines" -lt 300 ]; then
    echo "=== Full pyproject.toml ==="
    cat -n pyproject.toml
  fi
fi

Repository: lightspeed-core/lightspeed-stack

Length of output: 7929


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check requirements.x86_64.txt at the specified lines
echo "=== Requirements.x86_64.txt at lines 248-257 ==="
if [ -f "requirements.x86_64.txt" ]; then
  sed -n '248,257p' requirements.x86_64.txt
  
  echo -e "\n=== Also check other line ranges ==="
  echo "Lines 544-546:"
  sed -n '544,546p' requirements.x86_64.txt
  echo "Lines 1455-1464:"
  sed -n '1455,1464p' requirements.x86_64.txt
fi

Repository: lightspeed-core/lightspeed-stack

Length of output: 1203


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check if aarch64 lockfile has same versions
echo "=== Checking requirements.aarch64.txt for same Azure versions ==="
echo "Azure-core:"
grep "^azure-core==" requirements.aarch64.txt
echo "Azure-identity:"
grep "^azure-identity==" requirements.aarch64.txt
echo "MSAL:"
grep "^msal==" requirements.aarch64.txt
echo "MSAL-extensions:"
grep "^msal-extensions==" requirements.aarch64.txt

# Also check if there are other requirements files with different content
echo -e "\n=== Checking requirements.torch.txt ==="
cat requirements.torch.txt

echo -e "\n=== Checking requirements.hermetic.txt ==="
cat requirements.hermetic.txt

Repository: lightspeed-core/lightspeed-stack

Length of output: 2323


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check PyPI for current versions of these packages as of January 2026
echo "=== Checking PyPI for latest versions ==="

for pkg in azure-core azure-identity msal msal-extensions; do
  echo "=== $pkg ==="
  curl -fsSL "https://pypi.org/pypi/${pkg}/json" 2>/dev/null | python3 << 'PYEOF'
import json, sys
try:
    data = json.load(sys.stdin)
    print(f"Latest version: {data['info']['version']}")
    # Show release date of current version
    releases = data.get('releases', {})
    latest = data['info']['version']
    if latest in releases:
        print(f"Release info: {releases[latest][0] if releases[latest] else 'N/A'}")
except Exception as e:
    print(f"Error: {e}")
PYEOF
done

Repository: lightspeed-core/lightspeed-stack

Length of output: 186


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check for known vulnerabilities using OSV database
echo "=== Checking for CVEs/vulnerabilities ==="

# Define specific versions to check
declare -A versions
versions[azure-core]="1.38.0"
versions[azure-identity]="1.25.1"
versions[msal]="1.34.0"
versions[msal-extensions]="1.3.1"

for pkg in azure-core azure-identity msal msal-extensions; do
  version=${versions[$pkg]}
  echo -e "\n=== $pkg==$version ==="
  
  # Try to get vulnerability info from OSV
  curl -fsSL "https://api.osv.dev/v1/query" \
    -H "Content-Type: application/json" \
    -d "{\"package\":{\"name\":\"${pkg}\",\"ecosystem\":\"PyPI\"},\"version\":\"${version}\"}" 2>/dev/null | python3 << PYEOF || echo "OSV lookup failed"
import json, sys
try:
    r = json.load(sys.stdin)
    vulns = r.get('vulns') or []
    print(f"Found {len(vulns)} vulnerability entries")
    for v in vulns[:10]:
        print(f"  - {v.get('id', 'N/A')}: {v.get('summary', 'N/A')[:100]}")
        if v.get('references'):
            for ref in v['references'][:2]:
                print(f"    {ref.get('url', 'N/A')}")
except Exception as e:
    print(f"Parse error: {e}")
PYEOF
done

Repository: lightspeed-core/lightspeed-stack

Length of output: 523


🌐 Web query:

azure-core 1.38.0 CVE security advisory 2026

💡 Result:

Short answer — yes.

  • A high‑severity CVE (CVE‑2026‑21226) was published for Azure Core (Python): deserialization of untrusted data enabling remote code execution. (nvd.nist.gov)
  • Affected versions: azure-core < 1.38.0; fixed in 1.38.0 (security update released mid‑January 2026). (advisories.ecosyste.ms)
  • References: Microsoft Security Response Center advisory / update guide and NVD entry. (nvd.nist.gov)

If you want, I can: (1) show the exact GitHub changelog entry and release date, (2) give upgrade commands for pip / Maven, or (3) scan your project for vulnerable azure-core usages. Which do you want?

Citations:


🌐 Web query:

azure-identity 1.25.1 security vulnerabilities

💡 Result:

Short answer: azure-identity 1.25.1 is not affected by the known race‑condition elevation-of‑privilege issue CVE-2024-35255 (that affects versions < 1.16.1). No other publicly documented, high‑severity vulnerabilities specific to azure-identity 1.25.1 were found in my search. [1][2][3]

Sources

  • CVE‑2024‑35255 details and affected versions (< 1.16.1). [Wiz / vulnerability database]. [1]
  • Package vulnerability trackers showing CVE-2024-35255 affects azure-identity < 1.16.1 (so 1.25.1 is fixed). [Snyk / Safety DB]. [2][3]
  • Azure SDK releases / azure-identity 1.25.1 changelog (release listing). [4]

References

  1. Wiz / CVE-2024-35255 entry.
  2. Snyk / azure-identity package page.
  3. Safety DB / azure-identity vulnerabilities summary.
  4. Azure SDK releases (azure-identity 1.25.1 changelog).

🌐 Web query:

msal 1.34.0 msal-extensions 1.3.1 security updates

💡 Result:

Short answer: as of the sources I checked, msal 1.34.0 and msal-extensions 1.3.1 do not have any currently recorded direct security vulnerabilities; msal-extensions 1.3.1 was released Mar 14–15, 2025 and msal 1.34.0 was published Jan 7, 2026 (PyPI / package-db records). Recommendations: keep packages up-to-date and monitor advisories (if you depend on older msal versions upgrade to >=1.24.1 which fixed a historical issue). [1][2][3][4]

Sources

  • Package health / vulnerability scan for msal (no direct issues reported for 1.34.0). [1]
  • PyPI release history for msal-extensions (1.3.1 release date). [2]
  • Package vulnerability summary for msal-extensions (no known issues for 1.3.1). [3]
  • Advisory / historical vulnerability note (older msal versions were vulnerable; upgrade to ≥1.24.1). [4]

If you want, I can: (a) fetch vendor (Microsoft) security advisories for these exact versions, or (b) run a dependency scan on your project to check indirect vulnerabilities. Which would you prefer?


Lockfiles are consistent across platforms, and versions are secure—but pyproject.toml has unpinned versions.

Cross-platform sync is verified: both requirements.x86_64.txt and requirements.aarch64.txt pin identical versions (azure-core==1.38.0, azure-identity==1.25.1, msal==1.34.0, msal-extensions==1.3.1).

Security status:

  • azure-core 1.38.0: This is the security fix for CVE-2026-21226 (high-severity RCE via untrusted data deserialization, affects versions < 1.38.0).
  • azure-identity 1.25.1: No known vulnerabilities; safe version.
  • msal 1.34.0 and msal-extensions 1.3.1: No known direct vulnerabilities.

Action item: pyproject.toml (lines 66–67) specifies unpinned azure-core and azure-identity dependencies, while the lockfiles pin specific versions. Consider either pinning versions in pyproject.toml or documenting the pinning strategy to match the lockfile approach.

Also applies to: 544-546, 1455-1464, 2521-2522, 2760-2772, 3304-3311

🤖 Prompt for AI Agents
In `@requirements.x86_64.txt` around lines 248 - 257, pyproject.toml currently
lists unpinned azure-core and azure-identity while the generated lockfiles
(requirements.x86_64.txt / requirements.aarch64.txt) pin azure-core==1.38.0 and
azure-identity==1.25.1; update the dependency declarations for azure-core and
azure-identity in pyproject.toml (the entries around lines 66–67 and the other
listed occurrences) to match the pinned versions (or alternatively add a clear
comment/section documenting that exact versions are enforced via lockfiles),
ensuring the pyproject uses the correct pinned spec (==1.38.0 and ==1.25.1) or
documents the pinning strategy to avoid drift.

Comment on lines +113 to 116
assert found_shield["provider_resource_id"] == "openai/gpt-4o-mini", (
f"provider_resource_id should be 'openai/gpt-4o-mini', "
f"but is '{found_shield['provider_resource_id']}'"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's look at the specific file mentioned in the review
echo "=== File: tests/e2e/features/steps/info.py (lines around 113-116) ==="
if [ -f "tests/e2e/features/steps/info.py" ]; then
  cat -n tests/e2e/features/steps/info.py | sed -n '100,130p'
else
  echo "File not found"
fi

echo ""
echo "=== Checking the check_model_structure function mentioned (lines 38-86) ==="
cat -n tests/e2e/features/steps/info.py | sed -n '38,90p'

Repository: lightspeed-core/lightspeed-stack

Length of output: 4385


🏁 Script executed:

#!/bin/bash
# Check shield configuration in e2e test files
echo "=== Searching for shield/llama-guard configuration ==="
rg -l "shield|llama-guard" tests/e2e/ --type py --type yaml 2>/dev/null | head -20

echo ""
echo "=== Check context setup and default values ==="
rg -n "default_provider|default_model" tests/e2e/features/ -A 2 -B 2 --type py

echo ""
echo "=== Look for conftest.py or context setup ==="
fd "conftest.py|context.py" tests/e2e/ --type f

Repository: lightspeed-core/lightspeed-stack

Length of output: 4223


🏁 Script executed:

#!/bin/bash
# Check shield configuration in the run config files
echo "=== Shield configuration in run-ci.yaml ==="
rg -A 10 -B 2 "llama-guard|shield" tests/e2e/configs/run-ci.yaml

echo ""
echo "=== Check all run-*.yaml files for shield patterns ==="
for file in tests/e2e/configs/run-*.yaml; do
  echo "--- $file ---"
  grep -A 5 -B 2 "llama-guard\|shield" "$file" 2>/dev/null | head -15
done

echo ""
echo "=== Check for shield-related constants or configuration ==="
rg -n "gpt-4o-mini|provider_resource_id" tests/e2e/ --type py

Repository: lightspeed-core/lightspeed-stack

Length of output: 4414


🏁 Script executed:

#!/bin/bash
# Look at the environment setup more carefully
echo "=== Full environment.py setup for context ==="
cat -n tests/e2e/features/environment.py | sed -n '50,120p'

echo ""
echo "=== Check if there's any shield-specific context setup ==="
rg -n "shield|llama.guard" tests/e2e/features/environment.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 3712


Hardcoded shield provider_resource_id will cause test failures across environments.

The shield validation hardcodes "openai/gpt-4o-mini" (line 113), but this value is only specified in run-ci.yaml. The other configuration files (run-azure.yaml, run-watsonx.yaml, run-rhelai.yaml, run-rhaiis.yaml, run-vertexai.yaml) do not specify provider_shield_id at all, meaning they may have different or missing values.

Unlike model validation which dynamically uses context.default_model and context.default_provider (lines 49-50), there is no corresponding shield context setup in environment.py. This creates an inconsistency where the test will fail when run with any configuration file other than run-ci.yaml.

Consider either:

  • Setting up dynamic shield configuration in the test context (similar to how models are handled)
  • Making this test environment-specific (tagged to run only with CI configuration)
  • Using the actual shield value from the service response without asserting a hardcoded value

@asimurka asimurka force-pushed the azure-entra-id-final branch from c68a5f4 to 4550d0b Compare January 15, 2026 13:44
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/openapi.json (1)

4102-4137: Fix duplicate operationId for /a2a POST method (breaks codegen/tooling).

Both GET and POST operations at /a2a currently use handle_a2a_jsonrpc_a2a_get (lines 4109 and 4127). The operationId must be unique per operation in OpenAPI specifications, or code generation and documentation tools will fail.

Proposed fix
            "post": {
                "tags": [
                    "a2a"
                ],
                "summary": "Handle A2A Jsonrpc",
                "description": "...",
-               "operationId": "handle_a2a_jsonrpc_a2a_get",
+               "operationId": "handle_a2a_jsonrpc_a2a_post",
                "responses": {
🤖 Fix all issues with AI agents
In `@docs/openapi.json`:
- Around line 5060-5096: The AzureEntraIdConfiguration schema mislabels
tenant_id and client_id as secrets; remove "writeOnly": true and "format":
"password" from the tenant_id and client_id properties in the
AzureEntraIdConfiguration object so only client_secret remains marked as
password/writeOnly; locate the AzureEntraIdConfiguration block (properties:
tenant_id, client_id, client_secret) in the OpenAPI JSON and delete the
writeOnly and format entries for tenant_id and client_id while leaving
client_secret unchanged.

In `@README.md`:
- Line 206: The Markdown table row has a missing space before the closing pipe
after the model list; update the table string containing "gpt-5, gpt-5-mini,
gpt-5-nano, gpt-4o-mini, o3-mini, o4-mini, o1" to insert a space after "o1" so
it becomes "o1 " before the final pipe, ensuring the table cell formats
correctly.

In `@src/client.py`:
- Around line 88-99: The current code writes to a fixed temp filename via
enriched_path which can race in concurrent runs; change the logic to create a
unique temp file (e.g., use tempfile.NamedTemporaryFile(delete=False,
dir=tempfile.gettempdir(), prefix="llama_stack_enriched_", suffix=".yaml") or
append os.getpid() / uuid to the filename), write ls_config with yaml.dump using
YamlDumper to that unique path, close the file, and return that path instead of
input_config_path on success; keep the same logger.warning/return
input_config_path behavior in the OSError except block and ensure the variable
names (enriched_path, ls_config, YamlDumper, input_config_path, logger) are used
to locate and update the code.
♻️ Duplicate comments (6)
tests/e2e/features/steps/info.py (1)

113-116: Hardcoded provider_resource_id creates environment-specific test fragility.

This assertion hardcodes "openai/gpt-4o-mini", but the model validation (lines 49-50) dynamically uses context.default_model and context.default_provider. This inconsistency means the shield test will fail when run against configurations other than run-ci.yaml (e.g., Azure, WatsonX, VertexAI).

Consider adding dynamic shield context setup in environment.py (similar to how models are handled) or parameterizing this value.

tests/unit/test_llama_stack_configuration.py (1)

118-183: Coverage gap remains for setup_azure_entra_id_token, enrich_byok_rag, and main().

As noted in the previous review, these functions lack test coverage. The current tests cover approximately 35% of the module. To meet the 60% code coverage requirement per coding guidelines, add tests for:

  1. setup_azure_entra_id_token - token generation, env var skip logic, error handling
  2. enrich_byok_rag - empty list early return, in-place modification
  3. main() - CLI argument parsing, config loading
src/llama_stack_configuration.py (4)

194-207: Same vector_db_id validation issue applies here.

Apply the same validation pattern suggested for construct_vector_dbs_section to avoid malformed provider_id and db_path values.


109-111: Authentication failures are silently swallowed.

When ClientAuthenticationError or CredentialUnavailableError occurs, the function logs an error but continues execution. Callers have no way to know authentication failed, which may lead to runtime failures later when the missing AZURE_API_KEY causes API calls to fail.

Consider re-raising the exception after logging so callers can handle the failure appropriately.

Proposed fix: Re-raise after logging
     except (ClientAuthenticationError, CredentialUnavailableError) as e:
         logger.error("Azure Entra ID: Failed to generate token: %s", e)
+        raise

102-107: Token written to file but not set in current process environment.

After writing the token to the .env file, subsequent code in the same process won't have access to AZURE_API_KEY via os.environ. If downstream code (like replace_env_vars or YAML processing) needs this token during the same execution, it won't be available.

Proposed fix: Set env var in current process
         with open(env_file, "w", encoding="utf-8") as f:
             f.writelines(lines)

+        # Make token available to current process
+        os.environ["AZURE_API_KEY"] = token.token
+
         logger.info(
             "Azure Entra ID: Access token set in env and written to %s", env_file
         )

146-154: Missing vector_db_id validation produces malformed entries.

If brag.get("vector_db_id") returns None or empty string, provider_id becomes "byok_" (malformed) and other fields may be empty. Consider validating required fields and skipping invalid entries.

Proposed fix: Add validation
     for brag in byok_rag:
+        vector_db_id = brag.get("vector_db_id")
+        if not vector_db_id:
+            logger.warning("BYOK RAG entry missing vector_db_id, skipping")
+            continue
         output.append(
             {
-                "vector_db_id": brag.get("vector_db_id", ""),
-                "provider_id": "byok_" + brag.get("vector_db_id", ""),
-                "embedding_model": brag.get("embedding_model", ""),
-                "embedding_dimension": brag.get("embedding_dimension"),
+                "vector_db_id": vector_db_id,
+                "provider_id": f"byok_{vector_db_id}",
+                "embedding_model": brag.get("embedding_model", ""),
+                "embedding_dimension": brag.get("embedding_dimension"),
             }
         )
🧹 Nitpick comments (16)
src/models/config.py (2)

1537-1548: Configuration structure is correct; consider field type refinement.

The AzureEntraIdConfiguration class correctly extends ConfigurationBase and uses SecretStr for sensitive credentials. However, a minor observation:

  • tenant_id and client_id are typically not secrets—they're public identifiers often found in documentation and URLs. Only client_secret is truly confidential.
  • Using SecretStr for all three works correctly but may cause minor inconvenience when these values need to be logged for debugging (they'll appear as ***).

This is not a blocking issue since the current implementation is secure by default.

Optional: Use plain str for public identifiers
 class AzureEntraIdConfiguration(ConfigurationBase):
     """Microsoft Entra ID authentication attributes for Azure."""

-    tenant_id: SecretStr
-    client_id: SecretStr
+    tenant_id: str = Field(
+        ...,
+        title="Tenant ID",
+        description="Azure AD tenant ID (directory ID).",
+    )
+    client_id: str = Field(
+        ...,
+        title="Client ID",
+        description="Application (client) ID registered in Azure AD.",
+    )
     client_secret: SecretStr

1660-1660: Consider adding Field metadata for consistency with other configuration fields.

Other fields in the Configuration class include Field() with title and description for documentation and schema generation. This field could benefit from the same treatment.

Optional: Add Field metadata
-    azure_entra_id: Optional[AzureEntraIdConfiguration] = None
+    azure_entra_id: Optional[AzureEntraIdConfiguration] = Field(
+        None,
+        title="Azure Entra ID configuration",
+        description="Microsoft Entra ID authentication configuration for Azure OpenAI. "
+        "When configured, enables automatic token management for Azure services.",
+    )
src/app/endpoints/streaming_query.py (1)

895-914: Token refresh logic looks correct given the conditional structure.

The refactored conditional now includes AzureEntraIDManager().refresh_token() as part of the and chain (line 899), which means the provider update block only executes when refresh succeeds. This addresses the previous concern about proceeding with an empty token.

However, if refresh fails, the code silently continues with the existing (possibly expired) token. Consider logging when refresh fails but the request proceeds:

Suggested improvement
         if (
             provider_id == "azure"
             and AzureEntraIDManager().is_entra_id_configured
             and AzureEntraIDManager().is_token_expired
-            and AzureEntraIDManager().refresh_token()
         ):
+            if not AzureEntraIDManager().refresh_token():
+                logger.warning(
+                    "Azure token refresh failed; proceeding with existing token"
+                )
+            else:
             if AsyncLlamaStackClientHolder().is_library_client:
                 client = await AsyncLlamaStackClientHolder().reload_library_client()
             else:
                 azure_config = next(
                     p.config
                     for p in await client.providers.list()
                     if p.provider_type == "remote::azure"
                 )
                 client = AsyncLlamaStackClientHolder().update_provider_data(
                     {
                         "azure_api_key": AzureEntraIDManager().access_token.get_secret_value(),
                         "azure_api_base": str(azure_config.get("api_base")),
                     }
                 )

Regarding the next() call without a default (lines 904-908): Based on learnings, the Azure provider is guaranteed to be present when provider_id == "azure", so defensive handling is not required here.

src/app/main.py (1)

44-51: Token refresh failure handling is minimal but intentional.

The warning message indicates the design intent: allow startup to proceed and retry token refresh on the first Azure request. This aligns with the token refresh logic added in the query endpoints.

However, per the past review concern, if Azure Entra ID is configured but authentication fails (e.g., invalid credentials), the service starts with an empty AZURE_API_KEY. All Azure requests will then fail until credentials are fixed, which could be confusing to operators.

Consider making startup failure configurable or logging an error-level message that explicitly states Azure requests will fail:

Suggested improvement
     azure_config = configuration.configuration.azure_entra_id
     if azure_config is not None:
         AzureEntraIDManager().set_config(azure_config)
         if not AzureEntraIDManager().refresh_token():
-            logger.warning(
-                "Failed to refresh Azure token at startup. "
-                "Token refresh will be retried on next Azure request."
+            logger.error(
+                "Failed to obtain Azure token at startup. "
+                "Azure inference requests will fail until token is successfully refreshed."
             )
Makefile (1)

20-23: Potential silent failure if .env file or AZURE_API_KEY is missing.

The grep '^AZURE_API_KEY=' .env | cut -d'=' -f2- command will silently produce an empty string if:

  1. The .env file doesn't exist
  2. AZURE_API_KEY is not defined in .env

This could lead to confusing failures when running Llama Stack locally. Consider adding a check or documenting the requirement:

Suggested improvement
 run-llama-stack: ## Start Llama Stack with enriched config (for local service mode)
 	uv run src/llama_stack_configuration.py -c $(CONFIG) -i $(LLAMA_STACK_CONFIG) -o $(LLAMA_STACK_CONFIG) && \
-	AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env | cut -d'=' -f2-) \
+	AZURE_API_KEY=$$(grep '^AZURE_API_KEY=' .env 2>/dev/null | cut -d'=' -f2- || echo "") \
 	uv run llama stack run $(LLAMA_STACK_CONFIG)

Alternatively, document that .env with AZURE_API_KEY is required for Azure deployments.

tests/unit/test_llama_stack_configuration.py (2)

81-111: Tests for construct_vector_io_providers_section are adequate.

Consider adding a merge test (similar to test_construct_vector_dbs_section_merge) to verify that existing vector_io entries are preserved when adding new BYOK entries, ensuring test parity with the vector_dbs section.


142-151: Consider providing required fields instead of suppressing type errors.

The # type: ignore[call-arg] comments suppress validation for required fields. While acceptable for tests, providing explicit values (even dummy ones) would make tests more explicit and catch configuration model changes.

src/authorization/azure_token_manager.py (1)

100-102: Consider logging exception details for debugging.

The warning log would be more helpful with exception details to aid troubleshooting authentication failures.

♻️ Suggested improvement
         except (ClientAuthenticationError, CredentialUnavailableError):
-            logger.warning("Failed to retrieve Azure access token")
+            logger.warning("Failed to retrieve Azure access token", exc_info=True)
             return None
src/client.py (2)

117-131: Consider enforcing library mode check in reload_library_client.

The docstring states "For use with library mode only" but there's no runtime enforcement. Consider adding a check to prevent misuse.

♻️ Suggested improvement
     async def reload_library_client(self) -> AsyncLlamaStackClient:
         """Reload library client to pick up env var changes.

         For use with library mode only.

         Returns:
             The reloaded client instance.
         """
+        if not self.is_library_client:
+            raise RuntimeError("reload_library_client is only valid for library mode")
         if not self._config_path:
             raise RuntimeError("Cannot reload: config path not set")

133-164: update_provider_data implementation is robust.

The method correctly handles JSON parsing edge cases and merges updates into the provider data header. The use of client.copy() ensures immutability of the original client configuration.

Similar to reload_library_client, consider adding a runtime check to enforce service mode usage as stated in the docstring.

♻️ Optional: Add service mode enforcement
     def update_provider_data(self, updates: dict[str, str]) -> AsyncLlamaStackClient:
         """Update provider data headers for service client.

         For use with service mode only.
         ...
         """
         if not self._lsc:
             raise RuntimeError(
                 "AsyncLlamaStackClient has not been initialised. Ensure 'load(..)' has been called."
             )
+        if self.is_library_client:
+            raise RuntimeError("update_provider_data is only valid for service mode")
docs/openapi.json (3)

4894-4899: Consider clarifying “authorization” vs “authentication” wording for probe skipping.
skip_for_health_probes lives under AuthenticationConfiguration, but the title/description now say “Skip authorization…” (Line 4896-4897). If probes skip both authN and authZ (or only authN), this wording could mislead API consumers.


5342-5356: Add title/description metadata for Configuration.azure_entra_id for API usability.
The new field is present (Line 5347-5356) but has no title/description, unlike most other config fields—this makes the OpenAPI output harder to consume.

Proposed fix
 "azure_entra_id": {
+  "title": "Azure Entra ID configuration",
+  "description": "Optional Microsoft Entra ID (OAuth2 client credentials) settings used to obtain Azure access tokens for Azure OpenAI / Cognitive Services.",
   "anyOf": [
     { "$ref": "#/components/schemas/AzureEntraIdConfiguration" },
     { "type": "null" }
   ]
 }

6772-6793: authorization_headers schema/description mismatches the likely mental model (“header values” vs “secret sources”).
The schema is map[string]string, but the description says values are paths to files (or special tokens like kubernetes / client) (Line 6773-6780). That’s fine, but readers will easily assume the value is the literal header value.

Suggestion: add explicit examples (file path + kubernetes + client) and/or rename to reflect “header value sources” if the underlying config is still evolving.

tests/unit/authorization/test_azure_token_manager.py (2)

44-45: Minor: Docstring class name mismatch.

The docstring references "AzureEntraIDTokenManager" but the actual class is named AzureEntraIDManager. Consider aligning for consistency.

 class TestAzureEntraIDTokenManager:
-    """Unit tests for AzureEntraIDTokenManager."""
+    """Unit tests for AzureEntraIDManager."""

119-140: Consider adding test coverage for CredentialUnavailableError.

The implementation handles both ClientAuthenticationError and CredentialUnavailableError, but only the former is tested. Consider adding a similar test for CredentialUnavailableError to ensure complete coverage of the exception handling path.

💡 Suggested additional test
def test_refresh_token_failure_credential_unavailable(
    self,
    token_manager: AzureEntraIDManager,
    dummy_config: AzureEntraIdConfiguration,
    mocker: MockerFixture,
    caplog: Any,
) -> None:
    """Log error when credential is unavailable."""
    from azure.identity import CredentialUnavailableError
    
    token_manager.set_config(dummy_config)
    mock_credential_instance = mocker.Mock()
    mock_credential_instance.get_token.side_effect = CredentialUnavailableError(
        "unavailable"
    )
    mocker.patch(
        "authorization.azure_token_manager.ClientSecretCredential",
        return_value=mock_credential_instance,
    )

    with caplog.at_level("WARNING"):
        result = token_manager.refresh_token()
        assert result is False
        assert "Failed to retrieve Azure access token" in caplog.text
src/llama_stack_configuration.py (1)

309-313: Consider explicit error handling for better CLI UX.

If the config file doesn't exist or is invalid YAML, the exception traceback may not be user-friendly. Consider wrapping with try/except for cleaner error messages:

Suggested improvement
+    try:
         with open(args.config, "r", encoding="utf-8") as f:
             config = yaml.safe_load(f)
             config = replace_env_vars(config)
+    except FileNotFoundError:
+        logger.error("Config file not found: %s", args.config)
+        raise SystemExit(1)
+    except yaml.YAMLError as e:
+        logger.error("Invalid YAML in config file: %s", e)
+        raise SystemExit(1)

     generate_configuration(args.input, args.output, config, args.env_file)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c68a5f4 and 4550d0b.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (44)
  • .github/workflows/e2e_tests.yaml
  • .github/workflows/e2e_tests_providers.yaml
  • .tekton/lightspeed-stack-pull-request.yaml
  • .tekton/lightspeed-stack-push.yaml
  • Makefile
  • README.md
  • docker-compose-library.yaml
  • docker-compose.yaml
  • docs/openapi.json
  • docs/providers.md
  • examples/azure-run.yaml
  • examples/lightspeed-stack-azure-entraid-lib.yaml
  • examples/lightspeed-stack-azure-entraid-service.yaml
  • pyproject.toml
  • requirements-build.txt
  • requirements.hashes.source.txt
  • requirements.hashes.wheel.txt
  • scripts/llama-stack-entrypoint.sh
  • src/app/endpoints/query.py
  • src/app/endpoints/query_v2.py
  • src/app/endpoints/streaming_query.py
  • src/app/main.py
  • src/authorization/azure_token_manager.py
  • src/client.py
  • src/configuration.py
  • src/lightspeed_stack.py
  • src/llama_stack_configuration.py
  • src/models/config.py
  • test.containerfile
  • tests/e2e/configs/run-azure.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/e2e/features/steps/info.py
  • tests/unit/authentication/test_api_key_token.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/test_client.py
  • tests/unit/test_configuration.py
  • tests/unit/test_llama_stack_configuration.py
💤 Files with no reviewable changes (6)
  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml
  • src/app/endpoints/query_v2.py
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack-no-cache.yaml
✅ Files skipped from review due to trivial changes (2)
  • docs/providers.md
  • examples/azure-run.yaml
🚧 Files skipped from review as they are similar to previous changes (16)
  • examples/lightspeed-stack-azure-entraid-lib.yaml
  • src/lightspeed_stack.py
  • test.containerfile
  • tests/unit/test_client.py
  • docker-compose.yaml
  • src/app/endpoints/query.py
  • docker-compose-library.yaml
  • examples/lightspeed-stack-azure-entraid-service.yaml
  • tests/e2e/configs/run-azure.yaml
  • pyproject.toml
  • scripts/llama-stack-entrypoint.sh
  • tests/e2e/configuration/library-mode/lightspeed-stack.yaml
  • tests/e2e/configuration/server-mode/lightspeed-stack.yaml
  • tests/unit/models/config/test_dump_configuration.py
  • tests/unit/authentication/test_api_key_token.py
  • requirements-build.txt
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use absolute imports for internal modules: from authentication import get_auth_dependency
Use FastAPI dependencies: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use Llama Stack imports: from llama_stack_client import AsyncLlamaStackClient
Check constants.py for shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Use logger = logging.getLogger(__name__) pattern for module logging
Type aliases defined at module level for clarity
All functions require docstrings with brief descriptions
Complete type annotations for function parameters and return types, using typing_extensions.Self for model validators
Use union types with modern syntax: str | int instead of Union[str, int]
Use Optional[Type] for optional parameters
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling
Use logger.debug() for detailed diagnostic information
Use logger.info() for general information about program execution
Use logger.warning() for unexpected events or potential problems
Use logger.error() for serious problems that prevented function execution
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with descriptive names and standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with @abstractmethod decorators
Complete type annotations for all class attributes
Follow Google Python docstring conventions (https://google.github.io/styleguide/pyguide.html) with sections: Args, Returns, Raises, Attributes
Run uv run make format to auto-format code with black and ruff before completion
Run uv run make verify to run all linters (black, pyl...

Files:

  • src/app/endpoints/streaming_query.py
  • src/authorization/azure_token_manager.py
  • src/configuration.py
  • tests/e2e/features/steps/info.py
  • src/client.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/authorization/test_azure_token_manager.py
  • src/models/config.py
  • tests/unit/test_configuration.py
  • src/app/main.py
  • src/llama_stack_configuration.py
src/app/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/streaming_query.py
  • src/app/main.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest for all unit and integration tests, not unittest
Use pytest-mock for AsyncMock objects in tests
Use MOCK_AUTH = ("mock_user_id", "mock_username", False, "mock_token") pattern for authentication mocks in tests

Files:

  • tests/e2e/features/steps/info.py
  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/test_configuration.py
tests/e2e/features/steps/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Define E2E test steps in tests/e2e/features/steps/ directory

Files:

  • tests/e2e/features/steps/info.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/unit/**/*.py: Unit tests require 60% code coverage
Write unit tests covering new functionality before completion

Files:

  • tests/unit/test_llama_stack_configuration.py
  • tests/unit/authorization/test_azure_token_manager.py
  • tests/unit/test_configuration.py
src/models/config.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/config.py: All config uses Pydantic models extending ConfigurationBase
Configuration base class sets extra="forbid" to reject unknown fields
Use type hints: Optional[FilePath], PositiveInt, SecretStr for configuration fields
Extend ConfigurationBase for configuration Pydantic models

Files:

  • src/models/config.py
src/models/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/models/**/*.py: Use @field_validator and @model_validator for custom validation in Pydantic models
Extend BaseModel for data Pydantic models
Use @model_validator and @field_validator for Pydantic model validation

Files:

  • src/models/config.py
🧠 Learnings (12)
📚 Learning: 2026-01-14T09:38:00.340Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 988
File: src/app/endpoints/query.py:319-339
Timestamp: 2026-01-14T09:38:00.340Z
Learning: In the lightspeed-stack codebase, when provider_id is "azure", the Azure provider with provider_type "remote::azure" is guaranteed to be present in the providers list, so defensive checks for StopIteration when using next() to find the Azure provider are not required.

Applied to files:

  • src/app/endpoints/streaming_query.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Never commit secrets/keys; use environment variables for sensitive data

Applied to files:

  • src/authorization/azure_token_manager.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to src/models/config.py : All config uses Pydantic models extending `ConfigurationBase`

Applied to files:

  • src/configuration.py
  • src/models/config.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to **/*.py : Use Llama Stack imports: `from llama_stack_client import AsyncLlamaStackClient`

Applied to files:

  • src/client.py
📚 Learning: 2025-08-18T10:57:39.266Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:59-59
Timestamp: 2025-08-18T10:57:39.266Z
Learning: In the lightspeed-stack project, transitive dependencies like faiss-cpu are intentionally pinned as top-level dependencies to maintain better control over the dependency graph and avoid version conflicts when bundling ML/LLM tooling packages.

Applied to files:

  • requirements.hashes.wheel.txt
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Unit tests require 60% code coverage

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/unit/**/*.py : Write unit tests covering new functionality before completion

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/integration/**/*.py : Integration tests require 10% code coverage

Applied to files:

  • tests/unit/test_llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt`

Applied to files:

  • .github/workflows/e2e_tests.yaml
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.

Applied to files:

  • src/models/config.py
📚 Learning: 2025-12-18T10:21:09.038Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 935
File: run.yaml:114-115
Timestamp: 2025-12-18T10:21:09.038Z
Learning: In Llama Stack version 0.3.x, telemetry provider configuration is not supported under the `providers` section in run.yaml configuration files. Telemetry can be enabled with just `telemetry.enabled: true` without requiring an explicit provider block.

Applied to files:

  • src/llama_stack_configuration.py
📚 Learning: 2026-01-11T16:30:41.784Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-11T16:30:41.784Z
Learning: Applies to src/authentication/k8s.py : Pyright type checking excludes `src/authentication/k8s.py`

Applied to files:

  • docs/openapi.json
🧬 Code graph analysis (10)
src/app/endpoints/streaming_query.py (2)
src/authorization/azure_token_manager.py (5)
  • AzureEntraIDManager (22-102)
  • is_entra_id_configured (45-47)
  • is_token_expired (50-52)
  • refresh_token (59-76)
  • access_token (55-57)
src/client.py (4)
  • AsyncLlamaStackClientHolder (21-164)
  • is_library_client (28-30)
  • reload_library_client (117-131)
  • update_provider_data (133-164)
src/authorization/azure_token_manager.py (3)
src/configuration.py (1)
  • configuration (95-106)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1537-1548)
src/utils/types.py (1)
  • Singleton (43-57)
src/configuration.py (1)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1537-1548)
src/client.py (2)
src/llama_stack_configuration.py (2)
  • enrich_byok_rag (216-234)
  • YamlDumper (23-39)
src/utils/types.py (1)
  • Singleton (43-57)
tests/unit/test_llama_stack_configuration.py (1)
src/llama_stack_configuration.py (3)
  • generate_configuration (242-270)
  • construct_vector_dbs_section (118-160)
  • construct_vector_io_providers_section (163-213)
tests/unit/authorization/test_azure_token_manager.py (3)
src/authorization/azure_token_manager.py (7)
  • AzureEntraIDManager (22-102)
  • access_token (55-57)
  • is_token_expired (50-52)
  • is_entra_id_configured (45-47)
  • set_config (39-42)
  • refresh_token (59-76)
  • _update_access_token (78-85)
src/configuration.py (1)
  • configuration (95-106)
src/models/config.py (1)
  • AzureEntraIdConfiguration (1537-1548)
src/models/config.py (1)
src/configuration.py (1)
  • azure_entra_id (345-349)
tests/unit/test_configuration.py (1)
src/configuration.py (2)
  • load_configuration (65-74)
  • azure_entra_id (345-349)
src/app/main.py (2)
src/authorization/azure_token_manager.py (3)
  • AzureEntraIDManager (22-102)
  • set_config (39-42)
  • refresh_token (59-76)
src/configuration.py (2)
  • configuration (95-106)
  • azure_entra_id (345-349)
src/llama_stack_configuration.py (2)
src/models/config.py (3)
  • config (330-347)
  • config (1361-1369)
  • dump (1701-1713)
src/lightspeed_stack.py (1)
  • main (69-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: library mode / ci
🔇 Additional comments (34)
.tekton/lightspeed-stack-pull-request.yaml (1)

57-62: LGTM — einops correctly added to prefetch packages.

The addition of einops to the binary packages list is consistent with the corresponding update in requirements.hashes.wheel.txt. The package is correctly placed in alphabetical order within the list.

requirements.hashes.wheel.txt (2)

1-2: Clarify the regex dependency concern against the correct file and version.

The review references regex==2025.11.3, but the actual version in requirements.hashes.source.txt is regex==2026.1.15. Additionally, requirements.hashes.wheel.txt and requirements.hashes.source.txt are generated from different sources using different indexes (RHOAI wheels vs. PyPI), so their contents are expected to differ. The wheel file is compiled with the --no-deps flag against a private Red Hat index, which may not include regex or may resolve it as a transitive dependency. Verify whether the absence of regex from the wheel requirements file actually causes runtime issues with nltk or transformers during builds.


59-60: Add einops==0.8.1 with correct hash.

The package addition is valid (einops 0.8.1 is the latest version as of February 2025), and the hash is correctly formed. However, there is no evidence that this addition aligns with a prefetch-dependencies update in the Tekton pipeline—the einops dependency appears to be added through standard uv dependency resolution rather than as part of a Tekton configuration change. The Tekton pipeline does contain a prefetch-dependencies task, but it was not modified as part of this requirements update.

Likely an incorrect or invalid review comment.

requirements.hashes.source.txt (2)

145-150: MSAL dependencies support Azure Entra ID token acquisition.

The msal and msal-extensions packages are the Microsoft Authentication Library components that handle token acquisition, caching, and refresh for Azure Entra ID. These are appropriate additions for the token management feature.


11-16: Azure SDK dependencies appropriately pinned to secure versions.

The addition of azure-core==1.38.0 and azure-identity==1.25.1 is well-chosen. These are the current latest versions and address security requirements:

  • azure-core==1.38.0 fixes CVE-2026-21226 (critical deserialization/RCE vulnerability)
  • azure-identity==1.25.1 has no known CVEs and is well beyond the vulnerable range of CVE-2024-35255 (<1.16.1)

The versions are current and secure.

src/configuration.py (1)

344-349: LGTM! Property follows established configuration access pattern.

The new azure_entra_id property correctly follows the same lazy-access pattern and error handling as other configuration properties in the class. The type annotation and docstring are appropriate.

.tekton/lightspeed-stack-push.yaml (1)

49-54: LGTM! Package addition for build prefetch.

The einops package is correctly added to the binary packages list for prefetching during the hermetic build. The package is positioned alphabetically in the list, maintaining consistency.

.github/workflows/e2e_tests.yaml (1)

103-103: LGTM! Fix aligns display message with actual configuration source.

The change correctly reflects the dynamic environment-based configuration path that's actually used in the "Select and configure run.yaml" step. This improves debugging transparency by showing the accurate source file path.

src/app/endpoints/streaming_query.py (1)

52-52: LGTM!

The import follows the absolute import convention specified in the coding guidelines.

.github/workflows/e2e_tests_providers.yaml (2)

55-68: Workflow step correctly scoped to Azure environment.

The conditional if: matrix.environment == 'azure' ensures the Entra ID config block is only added for Azure tests, addressing the concern about shared config files across test jobs.

One minor consideration: if this step re-runs (e.g., workflow retry), it would append duplicate azure_entra_id blocks to the config files. Since CI workflows typically use fresh checkouts, this is unlikely to be an issue in practice.


190-192: LGTM!

Azure Entra ID credentials are correctly passed to both server and library mode containers, replacing the previous AZURE_API_KEY approach with the new credential-based token management.

src/app/main.py (1)

12-12: LGTM!

Import follows the absolute import convention.

Makefile (1)

13-18: LGTM!

The configurable defaults with override capability improve flexibility for local development across different environments.

tests/unit/test_configuration.py (3)

812-846: LGTM!

The test correctly validates Azure Entra ID configuration loading with proper assertions for all three credential fields. The test follows the existing patterns in the file and uses get_secret_value() appropriately for secret fields.


849-877: LGTM!

The test validates that incomplete Azure Entra ID configuration raises ValidationError, which is the expected Pydantic behavior for required fields. The test coverage is adequate for the validation scenario.


5-5: LGTM!

The ValidationError import is correctly added for the new validation test.

tests/unit/test_llama_stack_configuration.py (2)

1-20: Imports and module setup look good.

The imports are properly organized and follow the coding guidelines with absolute imports. The test file correctly imports the functions under test from llama_stack_configuration and necessary models from models.config.


28-74: Tests for construct_vector_dbs_section provide good coverage.

The four tests appropriately cover empty input, preserving existing entries, adding new BYOK entries, and merging scenarios. The assertions correctly validate the expected dict structure including provider_id prefixing with "byok_".

src/authorization/azure_token_manager.py (4)

1-19: Module setup and imports are well-organized.

The imports follow coding guidelines with proper separation. The TOKEN_EXPIRATION_LEEWAY constant is appropriately defined at module level. Good use of type hints with Optional for nullable types.


22-57: Singleton class structure is well-designed.

The class properly encapsulates Azure Entra ID token management with clear separation of concerns. The is_token_expired property correctly handles both uninitialized state (_expires_on == 0) and actual expiration checks. Using SecretStr for the access token is a good security practice.


59-76: refresh_token method correctly returns success status.

The boolean return value enables callers to handle refresh failures appropriately, addressing the earlier review feedback. The ValueError for missing configuration provides clear error messaging.


78-85: Token update implementation is secure and correct.

The leeway subtraction ensures proactive token refresh. Logging only the expiry time (not the token value) follows security best practices. Environment variable storage aligns with the single source of truth principle.

src/client.py (3)

1-18: Imports are properly organized and follow coding guidelines.

The new imports for json, os, tempfile, and yaml support the BYOK RAG enrichment workflow. The absolute imports from internal modules follow the coding guidelines.


32-40: Clean routing logic for client initialization.

The load method now cleanly separates library and service mode initialization. The early return prevents redundant initialization.


42-64: Library client initialization handles BYOK RAG enrichment correctly.

The method properly validates the config path, conditionally applies BYOK RAG enrichment, and stores the config path for potential reload. The async initialization is correctly awaited.

tests/unit/authorization/test_azure_token_manager.py (6)

31-41: LGTM!

The fixtures correctly handle singleton reset for test isolation and follow pytest conventions. The autouse=True on reset_singleton ensures clean state between tests.


47-76: LGTM!

Tests for singleton behavior, initial state, configuration, and expiration logic are well-structured and provide good coverage of the basic functionality.


78-92: LGTM!

The error handling test correctly verifies the ValueError exception with proper message matching. The token update test validates both the environment variable update and the expiration leeway calculation.


94-117: LGTM!

Comprehensive test for the happy path - correctly mocks ClientSecretCredential, verifies the token is set, expiration is updated, and the scope is passed correctly.


142-155: LGTM!

Good test for dynamic expiration behavior using time mocking. Verifies the token is not expired initially and becomes expired after time passes the threshold.


18-18: No changes needed. The import from configuration import AzureEntraIdConfiguration is correct and consistent with the codebase pattern, where the class is defined in src/models/config.py and re-exported through src/configuration.py.

src/llama_stack_configuration.py (3)

1-40: LGTM!

Module docstring, imports, and YamlDumper class follow coding guidelines. Logger uses the correct logging.getLogger(__name__) pattern.


216-234: LGTM!

The function clearly documents that ls_config is modified in place, which is acceptable for configuration enrichment patterns. The early return for empty byok_rag prevents unnecessary processing.


242-270: LGTM!

The function orchestrates the enrichment workflow cleanly. Consider wrapping file operations with error handling if this function is called from contexts where graceful failure is preferred over exceptions propagating.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +5060 to +5096
"AzureEntraIdConfiguration": {
"properties": {
"tenant_id": {
"type": "string",
"format": "password",
"title": "Tenant Id",
"writeOnly": true
},
"client_id": {
"type": "string",
"format": "password",
"title": "Client Id",
"writeOnly": true
},
"client_secret": {
"type": "string",
"format": "password",
"title": "Client Secret",
"writeOnly": true
},
"scope": {
"type": "string",
"title": "Token scope",
"description": "Azure Cognitive Services scope for token requests. Override only if using a different Azure service.",
"default": "https://cognitiveservices.azure.com/.default"
}
},
"additionalProperties": false,
"type": "object",
"required": [
"tenant_id",
"client_id",
"client_secret"
],
"title": "AzureEntraIdConfiguration",
"description": "Microsoft Entra ID authentication attributes for Azure."
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's confirm the current state of the file at the specified lines
head -5096 docs/openapi.json | tail -50 | cat -n

Repository: lightspeed-core/lightspeed-stack

Length of output: 2661


🏁 Script executed:

# Check if there are any other references to AzureEntraIdConfiguration or Azure setup docs
rg -i "azureentraidconfiguration|tenant_id.*client_id" --type json -A 5 -B 5 | head -100

Repository: lightspeed-core/lightspeed-stack

Length of output: 1958


🏁 Script executed:

# Check if there are any comments or documentation about these fields in the codebase
rg -i "tenant.*id|client.*id" docs/ --type md -B 2 -A 2 | head -80

Repository: lightspeed-core/lightspeed-stack

Length of output: 2845


🏁 Script executed:

# Verify the OpenAPI standard guidance on format:password and writeOnly
web_search

Repository: lightspeed-core/lightspeed-stack

Length of output: 126


🌐 Web query:

OpenAPI format password writeOnly security best practices

💡 Result:

Summary — OpenAPI writeOnly for passwords + security best practices

Key OpenAPI rules

  • Mark password fields as writeOnly: true (property MAY appear in requests but SHOULD NOT appear in responses). [1]
  • writeOnly applies only to schema properties (not root schemas); a property MUST NOT be both readOnly and writeOnly = true. [1][2]

API design best practices (concise)

  1. Use writeOnly + format: "password" for input-only password fields so docs/clients hide them in responses. [1][2]
  2. Never return or echo plaintext passwords in any API responses or error messages; strip them from logs and telemetry. (OpenAPI writeOnly intent + OWASP guidance). [1][3]
  3. Transport: require TLS for any endpoint that accepts passwords; never send credentials in URLs. [4]
  4. Storage: never store plaintext — use modern password hashing (Argon2id preferred; bcrypt/PBKDF2 if unavailable), unique salts, and consider peppering in a vault/HSM. Follow OWASP password-storage guidance. [3][5]
  5. Server-side validation & rate-limiting: enforce complexity, length, account lockout / throttling to prevent brute-force or credential-stuffing attacks. [4]
  6. Secrets handling: treat passwords as sensitive secrets in code/repo/fixtures; use vaults/secret managers and rotate keys/peppers regularly. [3]
  7. UX/APIs: prefer passwordless (WebAuthn/FIDO2) or token-based (OAuth2 authorization_code) flows when possible; use ephemeral tokens for authentication. [4]
  8. Codegen & clients: ensure code generators and SDKs respect writeOnly (generate separate request/response types) and that server code enforces not including writeOnly properties in responses. Test APIs to confirm they never serialize password fields back to clients. [2][6]

If you want, I can:

  • scan your OpenAPI document and flag password fields, writeOnly misuse, and any places that may return secrets; or
  • produce an example OpenAPI schema snippet and server-side checklist implementing the above.

Sources
[1] OpenAPI Specification (writeOnly/readOnly rules).
[2] APIMatic linting guidance / OpenAPI tooling notes on writeOnly usage.
[3] OWASP Password Storage Cheat Sheet (hashing, salts, pepper, algorithms).
[4] OWASP API Security — Broken User Authentication guidance.
[5] Snyk — Password storage best practices.
[6] Community discussion / examples showing writeOnly usage (StackOverflow / GitHub issues).


Remove writeOnly and format: password from tenant_id and client_id in AzureEntraIdConfiguration.
These are public identifiers in Azure Entra ID (directory and application registration IDs), not secrets. Only client_secret should carry password/sensitivity markings. The current schema misclassifies their security level and violates OpenAPI semantic intent for password fields.

Proposed fix
 "AzureEntraIdConfiguration": {
   "properties": {
     "tenant_id": {
       "type": "string",
-      "format": "password",
       "title": "Tenant Id",
-      "writeOnly": true
+      "description": "Microsoft Entra tenant (directory) ID."
     },
     "client_id": {
       "type": "string",
-      "format": "password",
       "title": "Client Id",
-      "writeOnly": true
+      "description": "Application (client) ID."
     },
     "client_secret": {
       "type": "string",
       "format": "password",
       "title": "Client Secret",
       "writeOnly": true
     },
     ...
   },
   ...
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"AzureEntraIdConfiguration": {
"properties": {
"tenant_id": {
"type": "string",
"format": "password",
"title": "Tenant Id",
"writeOnly": true
},
"client_id": {
"type": "string",
"format": "password",
"title": "Client Id",
"writeOnly": true
},
"client_secret": {
"type": "string",
"format": "password",
"title": "Client Secret",
"writeOnly": true
},
"scope": {
"type": "string",
"title": "Token scope",
"description": "Azure Cognitive Services scope for token requests. Override only if using a different Azure service.",
"default": "https://cognitiveservices.azure.com/.default"
}
},
"additionalProperties": false,
"type": "object",
"required": [
"tenant_id",
"client_id",
"client_secret"
],
"title": "AzureEntraIdConfiguration",
"description": "Microsoft Entra ID authentication attributes for Azure."
},
"AzureEntraIdConfiguration": {
"properties": {
"tenant_id": {
"type": "string",
"title": "Tenant Id",
"description": "Microsoft Entra tenant (directory) ID."
},
"client_id": {
"type": "string",
"title": "Client Id",
"description": "Application (client) ID."
},
"client_secret": {
"type": "string",
"format": "password",
"title": "Client Secret",
"writeOnly": true
},
"scope": {
"type": "string",
"title": "Token scope",
"description": "Azure Cognitive Services scope for token requests. Override only if using a different Azure service.",
"default": "https://cognitiveservices.azure.com/.default"
}
},
"additionalProperties": false,
"type": "object",
"required": [
"tenant_id",
"client_id",
"client_secret"
],
"title": "AzureEntraIdConfiguration",
"description": "Microsoft Entra ID authentication attributes for Azure."
},
🤖 Prompt for AI Agents
In `@docs/openapi.json` around lines 5060 - 5096, The AzureEntraIdConfiguration
schema mislabels tenant_id and client_id as secrets; remove "writeOnly": true
and "format": "password" from the tenant_id and client_id properties in the
AzureEntraIdConfiguration object so only client_secret remains marked as
password/writeOnly; locate the AzureEntraIdConfiguration block (properties:
tenant_id, client_id, client_secret) in the OpenAPI JSON and delete the
writeOnly and format entries for tenant_id and client_id while leaving
client_secret unchanged.

| RHEL AI (vLLM)| meta-llama/Llama-3.1-8B-Instruct | Yes | remote::vllm | [1](tests/e2e/configs/run-rhelai.yaml) |
| Azure | gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3-mini, o4-mini | Yes | remote::azure | [1](examples/azure-run.yaml) |
| Azure | o1, o1-mini | No | remote::azure | |
| Azure | gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o-mini, o3-mini, o4-mini, o1| Yes | remote::azure | [1](examples/azure-run.yaml) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Azure OpenAI available models January 2026

💡 Result:

Quick list of Azure OpenAI / Azure AI Foundry models available (status as of Jan 15, 2026):

  • OpenAI GPT family: gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini (including gpt-4o-mini-audio and gpt-4o-mini-realtime). (learn.microsoft.com)
  • GPT-5 family (gpt-5, gpt-5-mini, gpt-5-nano) — GA entries listed on Microsoft docs. (learn.microsoft.com)
  • O-series reasoning models: o1, o3, o3-mini, o3-pro, o3-deep-research, o4-mini. (learn.microsoft.com)
  • Code/codex: codex-mini. (learn.microsoft.com)
  • Real-time / audio-focused variants and previews (audio/real-time GPT-4o mini models). (learn.microsoft.com)
  • Third‑party / partner models hosted via Azure AI Foundry (examples: xAI Grok, DeepSeek R1 and other marketplace/Foundry models offered on Azure). (theverge.com)

Sources: Microsoft Azure / Azure OpenAI documentation and announcements, plus Azure/Azure AI Foundry news. See Microsoft docs for exact region/version availability and retirement schedule. (learn.microsoft.com)

Would you like a region-by-region availability table or links to the specific model pages?

Citations:


🏁 Script executed:

# Check the actual line in README.md to confirm formatting
git ls-files README.md && sed -n '206p' README.md

Repository: lightspeed-core/lightspeed-stack

Length of output: 273


Fix formatting issue: add space before closing pipe.

The line is missing a space between o1 and the closing pipe character.

Formatting fix
-| Azure    | gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o-mini, o3-mini, o4-mini, o1| Yes          | remote::azure  | [1](examples/azure-run.yaml)                                               |
+| Azure    | gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o-mini, o3-mini, o4-mini, o1 | Yes          | remote::azure  | [1](examples/azure-run.yaml)                                               |

All listed model names are valid Azure OpenAI models as of January 2026 (gpt-5 family, gpt-4o-mini, and o-series reasoning models including o4-mini are confirmed available per Microsoft Azure documentation).

🤖 Prompt for AI Agents
In `@README.md` at line 206, The Markdown table row has a missing space before the
closing pipe after the model list; update the table string containing "gpt-5,
gpt-5-mini, gpt-5-nano, gpt-4o-mini, o3-mini, o4-mini, o1" to insert a space
after "o1" so it becomes "o1 " before the final pipe, ensuring the table cell
formats correctly.

| Azure | gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3-mini, o4-mini | Yes | remote::azure | [1](examples/azure-run.yaml) |
| Azure | o1, o1-mini | No | remote::azure | |
| Azure | gpt-5, gpt-5-mini, gpt-5-nano, gpt-4o-mini, o3-mini, o4-mini, o1| Yes | remote::azure | [1](examples/azure-run.yaml) |
| Azure | gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o1-mini | No or limited | remote::azure | |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the actual content around line 207 in README.md
cd / && find . -name "README.md" -type f 2>/dev/null | head -5

Repository: lightspeed-core/lightspeed-stack

Length of output: 297


🏁 Script executed:

# Search for README.md in common locations
fd -name "README.md" -type f | head -5

Repository: lightspeed-core/lightspeed-stack

Length of output: 306


🏁 Script executed:

# List files to understand repository structure
git ls-files | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 782


🌐 Web query:

Azure OpenAI available models GPT-4 GPT-5 model names 2025

💡 Result:

Here are the Azure OpenAI GPT‑4 and GPT‑5 model names (status as of Jan 15, 2026):

  • GPT‑4 family (available variants): gpt-4, gpt-4-32k, gpt-4-turbo / gpt-4-1106 (turbo), gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano. [1][2]

  • GPT‑5 family (preview/rolling availability): gpt-5, gpt-5-chat (listed as Preview in Azure docs). [2][3]

Sources: Azure OpenAI models and versions (Microsoft docs) and reporting on GPT‑5 availability. [1] Microsoft Learn — Models (Azure OpenAI). [2] Microsoft Learn — Model retirements / availability. [3] Coverage of Microsoft integrating GPT‑5 into Copilot / Azure.


🌐 Web query:

Azure OpenAI API models list gpt-4.1 gpt-5-chat

💡 Result:

Yes — Azure (Azure OpenAI / Azure AI Foundry) offers the GPT-4.1 family and GPT-5 (including gpt-5-chat). Examples in Microsoft’s catalog and announcements include gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, and gpt-5 / gpt-5-chat (availability varies by region and deployment). [1][2][3]

Sources:

  • Azure announcement for GPT‑4.1. [1]
  • Azure AI model catalog entry for gpt‑4.1. [2]
  • Microsoft Learn / Azure AI Foundry model availability list (shows gpt‑5 and gpt‑5‑chat entries). [3]

🏁 Script executed:

# Find README.md in the repository root
fd -name "README.md" -type f --max-depth 1

Repository: lightspeed-core/lightspeed-stack

Length of output: 306


🏁 Script executed:

# Look at the file structure to locate the main README
ls -la | grep -i readme

Repository: lightspeed-core/lightspeed-stack

Length of output: 137


🏁 Script executed:

# Check if there's a README.md in the root
cat README.md 2>/dev/null | wc -l

Repository: lightspeed-core/lightspeed-stack

Length of output: 80


🏁 Script executed:

# If file exists, get lines around 207
if [ -f README.md ]; then
  echo "=== Context around line 207 (lines 200-215) ===" 
  sed -n '200,215p' README.md
fi

Repository: lightspeed-core/lightspeed-stack

Length of output: 2291


🏁 Script executed:

# Also check the exact content and spacing on line 207
if [ -f README.md ]; then
  echo "=== Line 207 with visible spaces ===" 
  sed -n '207p' README.md | cat -A
fi

Repository: lightspeed-core/lightspeed-stack

Length of output: 231


Fix spacing and clarify "No or limited" status; remove incorrect model name claims.

This line has formatting and clarity issues that need addressing:

  1. Formatting: Extra space before gpt-5-chat and o1-mini. These should align with other table entries.

  2. Ambiguous status: "No or limited" is vague. Clarify whether support is unavailable or has specific limitations for these particular models.

  3. Missing example: The row above includes example configuration files; this row should too for consistency.

Suggested fix
-| Azure    |  gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano,  o1-mini | No or limited         | remote::azure  |  |
+| Azure    | gpt-5-chat, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o1-mini | Limited[^2]     | remote::azure  | [1](examples/azure-run.yaml) |

Then add a footnote:

+[^2]: Tool calling support varies by model; some features may be unavailable or in preview.

Comment on lines +88 to +99
enriched_path = os.path.join(
tempfile.gettempdir(), "llama_stack_enriched_config.yaml"
)

try:
with open(enriched_path, "w", encoding="utf-8") as f:
yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False)
logger.info("Wrote enriched llama-stack config to %s", enriched_path)
return enriched_path
except OSError as e:
logger.warning("Failed to write enriched config: %s", e)
return input_config_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Hardcoded temp file path may cause issues in concurrent deployments.

Using a fixed filename "llama_stack_enriched_config.yaml" in the system temp directory could cause race conditions if multiple processes initialize simultaneously. Consider using tempfile.NamedTemporaryFile with delete=False or adding a unique identifier (e.g., PID) to the filename.

🔧 Suggested improvement
-        enriched_path = os.path.join(
-            tempfile.gettempdir(), "llama_stack_enriched_config.yaml"
-        )
+        enriched_path = os.path.join(
+            tempfile.gettempdir(), f"llama_stack_enriched_config_{os.getpid()}.yaml"
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
enriched_path = os.path.join(
tempfile.gettempdir(), "llama_stack_enriched_config.yaml"
)
try:
with open(enriched_path, "w", encoding="utf-8") as f:
yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False)
logger.info("Wrote enriched llama-stack config to %s", enriched_path)
return enriched_path
except OSError as e:
logger.warning("Failed to write enriched config: %s", e)
return input_config_path
enriched_path = os.path.join(
tempfile.gettempdir(), f"llama_stack_enriched_config_{os.getpid()}.yaml"
)
try:
with open(enriched_path, "w", encoding="utf-8") as f:
yaml.dump(ls_config, f, Dumper=YamlDumper, default_flow_style=False)
logger.info("Wrote enriched llama-stack config to %s", enriched_path)
return enriched_path
except OSError as e:
logger.warning("Failed to write enriched config: %s", e)
return input_config_path
🤖 Prompt for AI Agents
In `@src/client.py` around lines 88 - 99, The current code writes to a fixed temp
filename via enriched_path which can race in concurrent runs; change the logic
to create a unique temp file (e.g., use
tempfile.NamedTemporaryFile(delete=False, dir=tempfile.gettempdir(),
prefix="llama_stack_enriched_", suffix=".yaml") or append os.getpid() / uuid to
the filename), write ls_config with yaml.dump using YamlDumper to that unique
path, close the file, and return that path instead of input_config_path on
success; keep the same logger.warning/return input_config_path behavior in the
OSError except block and ensure the variable names (enriched_path, ls_config,
YamlDumper, input_config_path, logger) are used to locate and update the code.

@tisnik tisnik merged commit 1a12376 into lightspeed-core:main Jan 15, 2026
21 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants