Skip to content

LCORE-2035: Add logic for tunnel and interception proxies on konflux#1758

Open
radofuchs wants to merge 4 commits into
lightspeed-core:mainfrom
radofuchs:LCORE_2035_konflux_fix
Open

LCORE-2035: Add logic for tunnel and interception proxies on konflux#1758
radofuchs wants to merge 4 commits into
lightspeed-core:mainfrom
radofuchs:LCORE_2035_konflux_fix

Conversation

@radofuchs
Copy link
Copy Markdown
Contributor

@radofuchs radofuchs commented May 18, 2026

Description

Add logic for tunnel and interception proxies on konflux

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added in-cluster tunnel and TLS-interception proxies with runtime stats endpoints and CA handling for e2e runs.
  • Tests

    • Enabled proxy-related e2e scenarios to run in-cluster (adjusted tags and expectations) and added in-cluster stats verification.
  • Chores

    • Script and tooling updates to deploy/sync proxy components and improved pod operation error reporting.

Review Change Stack

@radofuchs radofuchs requested review from max-svistunov and tisnik and removed request for tisnik May 18, 2026 14:46
@radofuchs radofuchs marked this pull request as ready for review May 18, 2026 14:46
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 18, 2026

Walkthrough

This PR enables proxy feature e2e tests to run inside Prow/Konflux clusters by introducing in-cluster tunnel and interception proxy pods, extending proxy implementations with in-cluster mode and stats endpoints, adding orchestration commands to deploy and manage proxies, and updating test steps to route through and verify in-cluster proxies.

Changes

In-Cluster Proxy for Prow E2E Execution

Layer / File(s) Summary
In-Cluster Pod Manifests
tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml, tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml
Kubernetes Pod and Service definitions for tunnel proxy (8888 proxy, 8887 stats) and interception proxy (8889 proxy, 8886 stats), both with non-root security hardening, readiness/liveness probes on /stats, and ConfigMap volumes for script execution.
Llama manifest: optional interception CA mount
tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
Adds optional Secret-backed volume/volumeMount for e2e-interception-proxy-ca to mount ca.pem into /tmp/interception-proxy-ca.pem in the Llama container.
Proxy Implementations with In-Cluster Mode
tests/e2e/proxy/tunnel_proxy.py, tests/e2e/proxy/interception_proxy.py
Both proxies extended with run_in_cluster() entry points, HTTP stats servers exposing /stats with CONNECT counts and intercepted host:port targets as JSON, and main() CLI functions for containerized execution.
E2E Operations: Deployment and Stats
tests/e2e-prow/rhoai/scripts/e2e-ops.sh, tests/e2e-prow/rhoai/pipeline-konflux.sh
New e2e-ops commands deploy-e2e-tunnel-proxy, deploy-e2e-interception-proxy, tunnel-proxy-stats, interception-proxy-stats, and sync-interception-proxy-ca-secret to deploy manifests, fetch stats, and manage CA certificates. cmd_restart_llama_stack now can sync the CA before recreating the Llama pod and waits/verifies CA mount. Pipeline comment updated to reference proxy deployment steps.
Test Feature Enablement for Prow
tests/e2e/features/proxy.feature, tests/e2e/features/steps/proxy.py, tests/e2e/utils/prow_utils.py
Removed @skip-in-prow tags from proxy scenarios and adjusted flaky tags. Step implementations detect Prow context, deploy in-cluster proxies when needed, route Llama Stack through in-cluster DNS/ports vs local proxies, fetch/parse in-cluster /stats JSON for verification, sync interception CA into a Secret, and scenario cleanup updated. restart_pod() now raises CalledProcessError with stderr-derived detail on failure.

Sequence Diagram(s)

sequenceDiagram
  participant Test as E2E Test (proxy.feature)
  participant StepImpl as Step Implementation (proxy.py)
  participant EzOps as e2e-ops.sh
  participant Cluster as Prow Cluster
  participant TProxy as Tunnel Proxy Pod
  participant IProxy as Interception Proxy Pod
  participant Llama as Llama Stack Pod

  Test->>StepImpl: Scenario starts (Prow context detected)
  StepImpl->>EzOps: run_e2e_ops deploy-e2e-tunnel-proxy
  EzOps->>Cluster: kubectl apply e2e-tunnel-proxy manifest
  Cluster->>TProxy: Create Pod + Service
  TProxy->>TProxy: Start CONNECT proxy + stats server
  StepImpl->>EzOps: run_e2e_ops deploy-e2e-interception-proxy
  EzOps->>Cluster: kubectl apply e2e-interception-proxy manifest
  Cluster->>IProxy: Create Pod + Service
  IProxy->>IProxy: Start TLS interception proxy + stats server (exports CA PEM)
  StepImpl->>Llama: Configure proxy env (in-cluster DNS/ports)
  Test->>StepImpl: Verify proxy behavior
  StepImpl->>TProxy: GET /stats -> JSON counters
  TProxy-->>StepImpl: JSON counters (connect_count, last_connect_target)
  StepImpl->>IProxy: GET /stats -> JSON counters/hosts
  IProxy-->>StepImpl: JSON counters (connect_count, intercepted hosts)
  StepImpl->>Test: Assert CONNECT counts and intercepted hosts
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • tisnik
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding logic for tunnel and interception proxies on Konflux, which is reflected throughout the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@radofuchs radofuchs requested a review from tisnik May 18, 2026 14:47
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/proxy/interception_proxy.py (1)

165-167: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a timeout for upstream TLS connect.

The interception path opens the remote TLS connection without a timeout, so a stalled upstream can hang this handler much longer than expected. Align with tunnel_proxy.py behavior and wrap this in asyncio.wait_for(...).

Suggested patch
-                remote_reader, remote_writer = await asyncio.open_connection(
-                    target_host, target_port, ssl=True
-                )
+                remote_reader, remote_writer = await asyncio.wait_for(
+                    asyncio.open_connection(target_host, target_port, ssl=True),
+                    timeout=10,
+                )
-            except (OSError, ConnectionRefusedError, ssl.SSLError) as e:
+            except (asyncio.TimeoutError, OSError, ConnectionRefusedError, ssl.SSLError) as e:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/proxy/interception_proxy.py` around lines 165 - 167, The remote TLS
connect call using asyncio.open_connection (the line assigning remote_reader,
remote_writer) lacks a timeout; wrap that await in asyncio.wait_for(...) using
the same upstream connect timeout constant/variable used by the tunnel proxy
(e.g., UPSTREAM_CONNECT_TIMEOUT or the existing upstream_connect_timeout
parameter) so the connect becomes: remote_reader, remote_writer = await
asyncio.wait_for(asyncio.open_connection(target_host, target_port, ssl=True),
timeout=UPSTREAM_CONNECT_TIMEOUT). Ensure you import or reference the same
timeout symbol used by tunnel_proxy behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml`:
- Around line 35-37: Remove the runtime pip install from the pod startup and
instead bake the trustme dependency into the container image used by the
interception proxy; specifically, eliminate the "pip install --quiet
--no-cache-dir --target /tmp/pydeps 'trustme>=1.2.1'" line in the startup
snippet that ultimately execs python /app/interception_proxy.py and update the
Dockerfile or image build pipeline that produces the image used by
tests/e2e-proxy to pip-install trustme (or add it to requirements) so
tests/e2e/proxy/interception_proxy.py can import trustme at runtime without
network access.

In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml`:
- Around line 204-208: The Secret mount for e2e-interception-proxy-ca is marked
optional and uses a subPath (volumeMount name interception-proxy-ca, mountPath
/tmp/interception-proxy-ca.pem, subPath ca.pem), which causes Kubernetes to
create an empty directory when the Secret is missing; either remove optional:
true to make the Secret required so the pod will fail fast, or remove subPath
and mount the whole secret directory (so the app can check for the file at
/tmp/interception-proxy-ca.pem at runtime); apply the same change for the other
occurrence referenced (lines 221-224).

---

Outside diff comments:
In `@tests/e2e/proxy/interception_proxy.py`:
- Around line 165-167: The remote TLS connect call using asyncio.open_connection
(the line assigning remote_reader, remote_writer) lacks a timeout; wrap that
await in asyncio.wait_for(...) using the same upstream connect timeout
constant/variable used by the tunnel proxy (e.g., UPSTREAM_CONNECT_TIMEOUT or
the existing upstream_connect_timeout parameter) so the connect becomes:
remote_reader, remote_writer = await
asyncio.wait_for(asyncio.open_connection(target_host, target_port, ssl=True),
timeout=UPSTREAM_CONNECT_TIMEOUT). Ensure you import or reference the same
timeout symbol used by tunnel_proxy behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 726f808a-48bb-4619-8976-b6595102ebd2

📥 Commits

Reviewing files that changed from the base of the PR and between 73637c6 and 176e358.

📒 Files selected for processing (10)
  • tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml
  • tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
  • tests/e2e-prow/rhoai/pipeline-konflux.sh
  • tests/e2e-prow/rhoai/scripts/e2e-ops.sh
  • tests/e2e/features/proxy.feature
  • tests/e2e/features/steps/proxy.py
  • tests/e2e/proxy/interception_proxy.py
  • tests/e2e/proxy/tunnel_proxy.py
  • tests/e2e/utils/prow_utils.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: build-pr
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
🧰 Additional context used
📓 Path-based instructions (2)
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/features/proxy.feature
  • tests/e2e/proxy/interception_proxy.py
  • tests/e2e/utils/prow_utils.py
  • tests/e2e/proxy/tunnel_proxy.py
  • tests/e2e/features/steps/proxy.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Use pytest.mark.asyncio marker for async tests

Files:

  • tests/e2e/proxy/interception_proxy.py
  • tests/e2e/utils/prow_utils.py
  • tests/e2e/proxy/tunnel_proxy.py
  • tests/e2e/features/steps/proxy.py
🧠 Learnings (3)
📚 Learning: 2026-02-19T10:06:50.647Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1181
File: tests/e2e-prow/rhoai/manifests/lightspeed/mock-jwks.yaml:32-34
Timestamp: 2026-02-19T10:06:50.647Z
Learning: In the rhoai tests under tests/e2e-prow/rhoai/manifests, avoid static ConfigMap definitions for mock-jwks-script and mcp-mock-server-script since these ConfigMaps are created dynamically by the pipeline.sh deployment script using 'oc create configmap'. Ensure there are no static ConfigMap resources for these names in the manifests. If such ConfigMaps are added in the future, coordinate with the pipeline to reflect dynamic creation or adjust tests to rely on the dynamic provisioning.

Applied to files:

  • tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml
  • tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml
  • tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml
📚 Learning: 2026-04-07T09:20:26.590Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1467
File: tests/e2e/features/steps/common.py:36-49
Timestamp: 2026-04-07T09:20:26.590Z
Learning: For Behave-based Python tests, rely on Behave’s Context layered stack for attribute lifecycle: Behave pushes a new Context layer when entering feature scope (before_feature) and again for scenario scope (before_scenario). Attributes assigned inside given/when/then steps live on the current scenario layer and are automatically removed when the scenario ends. As a result, step-set attributes should not be expected to persist across scenarios or features, and manual cleanup in after_scenario/after_feature is generally unnecessary for attributes set in step functions. Only perform manual cleanup for attributes that you set explicitly in before_feature/before_scenario, since those live on the respective feature/scenario layers.

Applied to files:

  • tests/e2e/features/steps/proxy.py
📚 Learning: 2026-04-13T13:39:54.963Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 1490
File: tests/e2e/features/environment.py:206-211
Timestamp: 2026-04-13T13:39:54.963Z
Learning: In lightspeed-stack E2E tests under tests/e2e/features, it is intentional to set context.feature_config inside Background/step functions (scenario-scoped Behave layer). The environment.py after_scenario restore logic should only restore configuration when context.scenario_lightspeed_override_active is True; this flag is set by configure_service only when a real config switch occurs (so restore does not run for scenarios without a switch). Additionally, steps/common.py’s module-level _active_lightspeed_stack_config_basename is used to prevent re-applying the same config across subsequent scenarios, ensuring scenario_lightspeed_override_active stays False after the first apply. Therefore, reviewers should not “fix” this flow as if feature_config were incorrectly scoped or if after_scenario restoration is missing—config switching and restoration are meant to happen exactly once per actual switch, not redundantly per scenario.

Applied to files:

  • tests/e2e/features/steps/proxy.py
🪛 Checkov (3.2.528)
tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml

[low] 3-63: CPU limits should be set

(CKV_K8S_11)


[low] 3-63: CPU requests should be set

(CKV_K8S_10)


[low] 3-63: The default namespace should not be used

(CKV_K8S_21)


[low] 3-63: Image should use digest

(CKV_K8S_43)


[low] 3-63: Image Pull Policy should be Always

(CKV_K8S_15)


[low] 3-63: Memory limits should be set

(CKV_K8S_13)


[low] 3-63: Memory requests should be set

(CKV_K8S_12)


[low] 3-63: Use read-only filesystem for containers where possible

(CKV_K8S_22)


[low] 3-63: Containers should run as a high UID to avoid host conflict

(CKV_K8S_40)


[low] 3-63: Ensure that Service Account Tokens are only mounted where necessary

(CKV_K8S_38)


[low] 64-77: The default namespace should not be used

(CKV_K8S_21)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml

[medium] 6-224: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[low] 6-224: CPU limits should be set

(CKV_K8S_11)


[low] 6-224: CPU requests should be set

(CKV_K8S_10)


[low] 6-224: The default namespace should not be used

(CKV_K8S_21)


[low] 6-224: Minimize the admission of containers with the NET_RAW capability

(CKV_K8S_28)


[low] 6-224: Image should use digest

(CKV_K8S_43)


[low] 6-224: Image Tag should be fixed - not latest or blank

(CKV_K8S_14)


[low] 6-224: Memory limits should be set

(CKV_K8S_13)


[low] 6-224: Memory requests should be set

(CKV_K8S_12)


[low] 6-224: Minimize the admission of containers with capabilities assigned

(CKV_K8S_37)


[low] 6-224: Use read-only filesystem for containers where possible

(CKV_K8S_22)


[low] 6-224: Containers should run as a high UID to avoid host conflict

(CKV_K8S_40)


[low] 6-224: Prefer using secrets as files over secrets as environment variables

(CKV_K8S_35)


[low] 6-224: Ensure that Service Account Tokens are only mounted where necessary

(CKV_K8S_38)

tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml

[low] 3-55: CPU limits should be set

(CKV_K8S_11)


[low] 3-55: CPU requests should be set

(CKV_K8S_10)


[low] 3-55: The default namespace should not be used

(CKV_K8S_21)


[low] 3-55: Image should use digest

(CKV_K8S_43)


[low] 3-55: Image Pull Policy should be Always

(CKV_K8S_15)


[low] 3-55: Memory limits should be set

(CKV_K8S_13)


[low] 3-55: Memory requests should be set

(CKV_K8S_12)


[low] 3-55: Use read-only filesystem for containers where possible

(CKV_K8S_22)


[low] 3-55: Containers should run as a high UID to avoid host conflict

(CKV_K8S_40)


[low] 3-55: Ensure that Service Account Tokens are only mounted where necessary

(CKV_K8S_38)


[low] 56-69: The default namespace should not be used

(CKV_K8S_21)

🪛 Trivy (0.69.3)
tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml

[info] 15-58: CPU not limited

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)


[error] 15-58: Root file system is not read-only

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[info] 15-58: CPU requests not specified

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)


[info] 15-58: Memory requests not specified

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)


[info] 15-58: Memory not limited

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)


[info] 15-58: Runs with UID <= 10000

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)


[info] 15-58: Runs with GID <= 10000

Container 'e2e-interception-proxy' of Pod 'e2e-interception-proxy' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)


[info] 5-8: Workloads in the default namespace

pod e2e-interception-proxy in default namespace should set metadata.namespace to a non-default namespace

Rule: KSV-0110

Learn more

(IaC/Kubernetes)

tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml

[info] 95-208: CPU not limited

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)


[warning] 95-208: Image tag ":latest" used

Container 'llama-stack-container' of Pod 'llama-stack-service' should specify an image tag

Rule: KSV-0013

Learn more

(IaC/Kubernetes)


[error] 95-208: Root file system is not read-only

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[info] 95-208: CPU requests not specified

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)


[info] 95-208: Memory requests not specified

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)


[info] 95-208: Memory not limited

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)


[info] 95-208: Runs with UID <= 10000

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)


[info] 95-208: Runs with GID <= 10000

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)


[warning] 95-208: Restrict container images to trusted registries

Container llama-stack-container in pod llama-stack-service (namespace: default) uses an image from an untrusted registry.

Rule: KSV-0125

Learn more

(IaC/Kubernetes)

tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml

[info] 15-50: CPU not limited

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)


[error] 15-50: Root file system is not read-only

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[info] 15-50: CPU requests not specified

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)


[info] 15-50: Memory requests not specified

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)


[info] 15-50: Memory not limited

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)


[info] 15-50: Runs with UID <= 10000

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)


[info] 15-50: Runs with GID <= 10000

Container 'e2e-tunnel-proxy' of Pod 'e2e-tunnel-proxy' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)


[info] 5-8: Workloads in the default namespace

pod e2e-tunnel-proxy in default namespace should set metadata.namespace to a non-default namespace

Rule: KSV-0110

Learn more

(IaC/Kubernetes)

🔇 Additional comments (7)
tests/e2e-prow/rhoai/manifests/lightspeed/e2e-tunnel-proxy.yaml (1)

3-69: LGTM!

tests/e2e/proxy/tunnel_proxy.py (1)

22-29: LGTM!

Also applies to: 187-263

tests/e2e-prow/rhoai/scripts/e2e-ops.sh (1)

631-746: LGTM!

Also applies to: 800-836

tests/e2e-prow/rhoai/pipeline-konflux.sh (1)

138-139: LGTM!

tests/e2e/features/proxy.feature (1)

24-25: LGTM!

Also applies to: 56-57, 69-70

tests/e2e/features/steps/proxy.py (1)

99-193: LGTM!

Also applies to: 324-604

tests/e2e/utils/prow_utils.py (1)

113-118: LGTM!

Comment on lines +35 to +37
set -e
pip install --quiet --no-cache-dir --target /tmp/pydeps 'trustme>=1.2.1'
exec python /app/interception_proxy.py
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify runtime dependency fetch is in startup command and identify where trustme is used.
rg -n "pip install|trustme" tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml tests/e2e/proxy/interception_proxy.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 892


Avoid runtime package installation in pod startup path.

The pip install trustme command in the pod startup (line 36) introduces an external network dependency during pod initialization, which can cause flaky or slow e2e runs. Since trustme is required by the interception proxy logic (tests/e2e/proxy/interception_proxy.py imports and uses it), prebuild the container image with this dependency included so pod startup is deterministic and does not depend on external package indices.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e-prow/rhoai/manifests/lightspeed/e2e-interception-proxy.yaml` around
lines 35 - 37, Remove the runtime pip install from the pod startup and instead
bake the trustme dependency into the container image used by the interception
proxy; specifically, eliminate the "pip install --quiet --no-cache-dir --target
/tmp/pydeps 'trustme>=1.2.1'" line in the startup snippet that ultimately execs
python /app/interception_proxy.py and update the Dockerfile or image build
pipeline that produces the image used by tests/e2e-proxy to pip-install trustme
(or add it to requirements) so tests/e2e/proxy/interception_proxy.py can import
trustme at runtime without network access.

Comment on lines +204 to +208
# proxy.feature (interception): PEM from Secret e2e-interception-proxy-ca (optional).
- name: interception-proxy-ca
mountPath: /tmp/interception-proxy-ca.pem
subPath: ca.pem
readOnly: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

In Kubernetes, when a Secret volume is defined with optional: true, what happens if a container mounts a file from that volume using subPath and the secret/key is missing? Does pod/container startup fail?

💡 Result:

In Kubernetes, when a Secret volume is defined with optional: true, the pod/container startup does not fail if the secret or the specific key is missing [1]. However, the behavior when using subPath in this scenario is that Kubernetes will mount an empty directory at the specified mountPath instead of failing or leaving the path empty [2][3]. This occurs because the kubelet, when encountering a subPath mount for an optional volume that is missing, defaults to creating an empty directory at the target location [2][3]. This can lead to unexpected behavior if your application expects a file to exist or expects an error when the secret is missing [3]. Key points: - Pod/Container Startup: The pod will start successfully because the volume is marked as optional [1]. - Mount Behavior: Instead of the expected file, you will find an empty directory at the mountPath [2][3]. - No Error/Warning: This behavior is often silent, meaning there are no explicit errors or warnings in the pod events indicating that the secret was missing or that an empty directory was created in its place [2].

Citations:


Optional Secret + subPath file mount creates silent empty directory instead of file

When e2e-interception-proxy-ca is marked optional: true, Kubernetes does not fail pod startup if the Secret is missing. However, it silently creates an empty directory at /tmp/interception-proxy-ca.pem instead of the expected file. This is problematic because the application will encounter an empty directory rather than an error or file, making failures hard to debug.

Please either:

  • Make the Secret required (remove optional: true), or
  • Mount the Secret directory (without subPath) and let the application check for file existence at runtime

Also applies to: 221-224

🧰 Tools
🪛 Trivy (0.69.3)

[info] 95-208: CPU not limited

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.limits.cpu'

Rule: KSV-0011

Learn more

(IaC/Kubernetes)


[warning] 95-208: Image tag ":latest" used

Container 'llama-stack-container' of Pod 'llama-stack-service' should specify an image tag

Rule: KSV-0013

Learn more

(IaC/Kubernetes)


[error] 95-208: Root file system is not read-only

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.readOnlyRootFilesystem' to true

Rule: KSV-0014

Learn more

(IaC/Kubernetes)


[info] 95-208: CPU requests not specified

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.requests.cpu'

Rule: KSV-0015

Learn more

(IaC/Kubernetes)


[info] 95-208: Memory requests not specified

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.requests.memory'

Rule: KSV-0016

Learn more

(IaC/Kubernetes)


[info] 95-208: Memory not limited

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'resources.limits.memory'

Rule: KSV-0018

Learn more

(IaC/Kubernetes)


[info] 95-208: Runs with UID <= 10000

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.runAsUser' > 10000

Rule: KSV-0020

Learn more

(IaC/Kubernetes)


[info] 95-208: Runs with GID <= 10000

Container 'llama-stack-container' of Pod 'llama-stack-service' should set 'securityContext.runAsGroup' > 10000

Rule: KSV-0021

Learn more

(IaC/Kubernetes)


[warning] 95-208: Restrict container images to trusted registries

Container llama-stack-container in pod llama-stack-service (namespace: default) uses an image from an untrusted registry.

Rule: KSV-0125

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e-prow/rhoai/manifests/lightspeed/llama-stack-openai.yaml` around
lines 204 - 208, The Secret mount for e2e-interception-proxy-ca is marked
optional and uses a subPath (volumeMount name interception-proxy-ca, mountPath
/tmp/interception-proxy-ca.pem, subPath ca.pem), which causes Kubernetes to
create an empty directory when the Secret is missing; either remove optional:
true to make the Secret required so the pod will fail fast, or remove subPath
and mount the whole secret directory (so the app can check for the file at
/tmp/interception-proxy-ca.pem at runtime); apply the same change for the other
occurrence referenced (lines 221-224).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/e2e/features/proxy.feature`:
- Around line 79-80: The feature currently accepts either 404 or 500 as valid
outcomes for proxy errors which masks which subsystem failed; update the test(s)
to assert a single deterministic status per scenario by either (A) splitting the
ambiguous step in tests/e2e/features/proxy.feature into two explicit scenarios
each expecting a specific status, or (B) change the step usage to call the
specific step definition "The status code of the response is one of {first:d} or
{second:d}" in common_http.py and replace it with a new step (e.g., "The status
code of the response is {status:d}") and update scenario(s) to use the new step;
modify the step implementation in common_http.py to only accept one status and
fail clearly when other codes are returned so the tests precisely reflect
whether the main query or Llama Guard moderation path failed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4bf0739a-9ad0-42dd-af6a-2d84feebbdd5

📥 Commits

Reviewing files that changed from the base of the PR and between 176e358 and 0757895.

📒 Files selected for processing (1)
  • tests/e2e/features/proxy.feature
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: Pylinter
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
📓 Path-based instructions (1)
tests/e2e/**/*.{py,feature}

📄 CodeRabbit inference engine (AGENTS.md)

Use behave (BDD) framework for end-to-end testing with Gherkin feature files

Files:

  • tests/e2e/features/proxy.feature
🔇 Additional comments (2)
tests/e2e/features/proxy.feature (2)

24-24: Inconsistency between summary and code changes.

The AI-generated summary claims that @skip-in-prow tags were removed from the scenarios at lines 24, 41, and 69, but these lines are not marked as changed in the provided code. Either the summary is incorrect, or these changes occurred in a different commit not shown here.

Also applies to: 41-41, 69-69


56-56: 🏗️ Heavy lift

No action needed — @flaky is a designed feature with automatic retry logic.

The @flaky tag is not a workaround that masks issues. It's a documented, intentional feature implemented in tests/e2e/features/environment.py that automatically retries scenarios up to 5 times to handle transient infrastructure failures (timing issues, network instability, etc.). The hook applies patch_scenario_with_autoretry() to all scenarios tagged with @flaky, allowing the test suite to accommodate infrastructure flakiness without compromising test validity.

Comment on lines +79 to +80
#will be fixed in https://redhat.atlassian.net/browse/LCORE-2255
Then The status code of the response is one of 404 or 500
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check JIRA issue LCORE-2255 status and related error handling

# Search for other scenarios with relaxed error code expectations
echo "=== Scenarios accepting multiple status codes ==="
rg -n 'one of.*or' --glob '*.feature' -C2

# Check if there's error handling or retry logic that might affect status codes
echo -e "\n=== Error handling in proxy steps ==="
rg -n -A5 'status.*code.*(404|500)' tests/e2e/features/steps/proxy.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 813


🏁 Script executed:

# Find the step definition that checks status codes
rg -n 'The status code of the response is' tests/e2e/ -A3

# Get the full scenarios from the feature file
echo "=== Scenario context for both occurrences ==="
sed -n '40,82p' tests/e2e/features/proxy.feature

Repository: lightspeed-core/lightspeed-stack

Length of output: 50389


🏁 Script executed:

# Get the step implementation
sed -n '43,50p' tests/e2e/features/steps/common_http.py

# Get full proxy.feature scenarios for both occurrences
echo "=== Scenario at lines 40-52 ==="
sed -n '40,52p' tests/e2e/features/proxy.feature

echo -e "\n=== Scenario at lines 68-82 ==="
sed -n '68,82p' tests/e2e/features/proxy.feature

# Check if there's any documentation about why both codes are needed
echo -e "\n=== Proxy feature file header/comments ==="
head -40 tests/e2e/features/proxy.feature

Repository: lightspeed-core/lightspeed-stack

Length of output: 3500


🏁 Script executed:

# Look for comments or documentation about proxy error handling
rg -n 'proxy.*error|error.*proxy' tests/e2e/ -i -A2 -B2

# Check the full step implementation
sed -n '43,60p' tests/e2e/features/steps/common_http.py

# Search for any recent changes or comments in proxy-related code
rg -n '404|500' tests/e2e/features/proxy.feature

# Check if there's any configuration or setup that might affect error codes
grep -n "unreachable\|CA\|certificate" tests/e2e/features/proxy.feature

Repository: lightspeed-core/lightspeed-stack

Length of output: 1684


Accept both 404 and 500 status codes as temporary trade-off while LCORE-2255 is resolved.

The dual status codes (404 or 500) in both proxy error scenarios appear intentional. The step definition "The status code of the response is one of {first:d} or {second:d}" is explicitly defined in common_http.py (lines 43-60) with proper error reporting.

The feature file header explains that "Llama Guard moderation can issue separate provider calls inside Llama Stack that may not inherit the same proxy/TLS CA trust as the scenario's remote inference provider." This means both error scenarios can fail via different code paths—either the main query fails or the Llama Guard moderation calls fail—resulting in either 404 or 500 depending on where the failure occurs.

While this reduces test precision, it appears to be a pragmatic solution acknowledging multiple error paths in the system. The JIRA reference (LCORE-2255) confirms this is a known issue being tracked for resolution. Consider prioritizing the fix so that both scenarios specify exactly which code should be returned for each failure mode.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/e2e/features/proxy.feature` around lines 79 - 80, The feature currently
accepts either 404 or 500 as valid outcomes for proxy errors which masks which
subsystem failed; update the test(s) to assert a single deterministic status per
scenario by either (A) splitting the ambiguous step in
tests/e2e/features/proxy.feature into two explicit scenarios each expecting a
specific status, or (B) change the step usage to call the specific step
definition "The status code of the response is one of {first:d} or {second:d}"
in common_http.py and replace it with a new step (e.g., "The status code of the
response is {status:d}") and update scenario(s) to use the new step; modify the
step implementation in common_http.py to only accept one status and fail clearly
when other codes are returned so the tests precisely reflect whether the main
query or Llama Guard moderation path failed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant