Skip to content

trusted-execution-clusters: Intro Azure upstream CI#79035

Draft
Jakob-Naucke wants to merge 1 commit intoopenshift:mainfrom
Jakob-Naucke:tec-azure
Draft

trusted-execution-clusters: Intro Azure upstream CI#79035
Jakob-Naucke wants to merge 1 commit intoopenshift:mainfrom
Jakob-Naucke:tec-azure

Conversation

@Jakob-Naucke
Copy link
Copy Markdown
Member

@Jakob-Naucke Jakob-Naucke commented May 8, 2026

Create a Kind VM (also on Azure) as container platform. Run integration tests on Azure.

Azure Upstream CI for trusted-execution-clusters Operator

This PR introduces Azure upstream CI infrastructure for the trusted-execution-clusters/operator repository, enabling integration testing on Azure cloud infrastructure.

CI Configuration Changes

Added ci-operator/config/trusted-execution-clusters/operator/azure.yaml to define the main test job operator-lifecycle-azure-verify. This job:

  • Runs the trusted-execution-clusters-operator-azure-lifecycle test chain
  • Executes the trusted-execution-clusters-operator-azure-cleanup cleanup chain afterward
  • Requires intranet capability
  • Skips execution for documentation-only changes
  • Uses telco-runner:latest container image for test execution

Test Infrastructure

Added two main CI step chains:

Azure Lifecycle Chain (trusted-execution-clusters-operator-azure-lifecycle): Provisioning and testing flow that:

  • Provisions an ephemeral Azure VM (Standard_D2s_v3, in eastus region)
  • Creates a Kind Kubernetes cluster on the VM
  • Builds and pushes the operator images to a local registry on the VM
  • Runs the operator's integration tests against the Kind cluster
  • Authenticates to Azure using service principal credentials

Azure Cleanup Chain (trusted-execution-clusters-operator-azure-cleanup): Resource cleanup that:

  • Removes the provisioned Azure resource group after testing completes

Implementation Details

The test step (test.sh) handles the full provisioning workflow:

  • Clones the operator repository if not already present
  • Installs required tooling (azure-cli, cargo, jq, rustfmt)
  • Creates Azure resource group and VM using subscription credentials
  • Configures networking (opens ports 6443, 8000, 8080)
  • Sets up Kind cluster with Docker registry
  • Builds and deploys operator images to the local registry
  • Executes integration tests with proper environment configuration

The deprovision step (deprovision.sh) performs cleanup:

  • Authenticates to Azure using service principal
  • Deletes the Azure resource group created during testing

Governance

Added OWNERS files specifying alicefr, Jakob-Naucke, yalzhang, and fangge1212 as reviewers/approvers for the Azure test infrastructure components, along with corresponding metadata JSON files for tracking.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Warning

Rate limit exceeded

@Jakob-Naucke has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 31 minutes and 50 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 6da3c054-9a4b-4f96-ae98-0162a61604ab

📥 Commits

Reviewing files that changed from the base of the PR and between 904bd57 and c6d2529.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/trusted-execution-clusters/operator/trusted-execution-clusters-operator-main-presubmits.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (16)
  • ci-operator/config/trusted-execution-clusters/operator/azure.yaml
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/OWNERS
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/OWNERS
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/trusted-execution-clusters-operator-azure-cleanup-chain.metadata.json
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/trusted-execution-clusters-operator-azure-cleanup-chain.yaml
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/OWNERS
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/trusted-execution-clusters-operator-azure-lifecycle-chain.metadata.json
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/trusted-execution-clusters-operator-azure-lifecycle-chain.yaml
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/OWNERS
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/trusted-execution-clusters-ref-operator-azure-deprovision-commands.sh
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/trusted-execution-clusters-ref-operator-azure-deprovision-ref.metadata.json
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/trusted-execution-clusters-ref-operator-azure-deprovision-ref.yaml
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/OWNERS
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/trusted-execution-clusters-ref-operator-azure-test-commands.sh
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/trusted-execution-clusters-ref-operator-azure-test-ref.metadata.json
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/trusted-execution-clusters-ref-operator-azure-test-ref.yaml

Walkthrough

This PR adds Azure-based CI infrastructure for the trusted-execution-clusters operator, including step registry entries for provisioning an Azure VM with Kind Kubernetes cluster and running integration tests, along with cleanup automation and ownership metadata.

Changes

Azure Operator CI Infrastructure

Layer / File(s) Summary
Step Registry - Test Ref
ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/{OWNERS, test.metadata.json, test.yaml}
ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh
Defines trusted-execution-clusters-ref-operator-azure-test step ref that provisions an ephemeral Azure VM, creates a Kind cluster, builds/pushes operator images to an intra-VM registry, and executes integration tests. The test.sh script handles repository checkout, tooling installation, Azure resource provisioning, SSH networking, Kind cluster setup, and test execution.
Step Registry - Deprovision Ref
ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/{OWNERS, deprovision.metadata.json, deprovision.yaml, deprovision.sh}
Defines trusted-execution-clusters-ref-operator-azure-deprovision step ref for cleanup. The deprovision.sh script validates environment, authenticates to Azure via service principal credentials, and deletes the provisioned resource group.
Chain Definitions
ci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/{OWNERS, azure.metadata.json, azure.yaml}
ci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/{OWNERS, trusted-execution-clusters-operator-azure-cleanup.metadata.json, trusted-execution-clusters-operator-azure-cleanup.yaml}
Defines trusted-execution-clusters-operator-lifecycle-azure chain (runs test step) and trusted-execution-clusters-operator-azure-cleanup chain (runs deprovision step) with ownership metadata.
Operator Configuration & Ownership
ci-operator/config/trusted-execution-clusters/operator/azure.yaml
ci-operator/step-registry/trusted-execution-clusters/operator-azure/OWNERS
Main Azure CI operator config that sets base image stream (rhel-9-golang-1.25), resource limits (4Gi memory), and registers the lifecycle-azure verification test with cleanup. Top-level OWNERS file assigns reviewers and approvers.

Sequence Diagram

sequenceDiagram
    participant Prow as Prow CI
    participant Test as Test Step
    participant Azure as Azure Cloud
    participant VM as Azure VM
    participant Registry as Local Registry
    participant Kind as Kind Cluster
    participant Cleanup as Cleanup Step

    Prow->>Test: Trigger test step
    Test->>Test: Validate SHARED_DIR
    Test->>Test: Checkout repository
    Test->>Test: Install tooling (azure-cli, kind, jq)
    Test->>Azure: Authenticate with service principal
    Test->>Azure: Create resource group & VM
    Test->>Azure: Write resource group name to SHARED_DIR
    Test->>VM: Wait for VM readiness
    Test->>VM: Configure SSH & NSG inbound rules
    Test->>VM: Transfer source & provisioning script
    VM->>VM: Install Docker & extend LVM
    VM->>VM: Install kind, kubectl, yq
    Test->>Kind: Create Kind cluster on VM
    Kind->>Kind: Wait for cluster readiness
    Test->>Test: Retrieve kubeconfig & update API endpoint
    Test->>Registry: Build & push operator images
    Registry->>Registry: Store images on VM
    Test->>Kind: Run integration tests
    Kind->>Kind: Execute test suite

    Prow->>Cleanup: Trigger cleanup step
    Cleanup->>Cleanup: Validate SHARED_DIR & resource group
    Cleanup->>Cleanup: Authenticate to Azure
    Cleanup->>Azure: Delete resource group
    Azure->>VM: Deprovision VM & resources
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding Azure upstream CI infrastructure for the trusted-execution-clusters operator.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed This PR adds CI infrastructure and shell scripts only, not Ginkgo test definitions. Actual test code resides in the external operator repository and is invoked via make. The check is not applicable.
Test Structure And Quality ✅ Passed Custom check for Ginkgo test quality is not applicable to this PR. The PR adds CI configuration, shell scripts, and metadata files only—no Ginkgo test code is present.
Microshift Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. The PR only adds CI infrastructure (config, metadata, OWNERS, and provisioning scripts). The custom check is not applicable to CI configuration files.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR adds CI infrastructure (shell scripts, YAML configs, OWNERS files) for Azure upstream CI. No Ginkgo e2e test definitions are added. The custom check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds only CI/CD configuration and test scripts. No deployment manifests, operator code, or controllers are present. Topology-aware scheduling check is not applicable.
Ote Binary Stdout Contract ✅ Passed Check not applicable. PR adds CI config (YAML, JSON, shell scripts) to openshift/release, not OTE binaries. No Go process-level entry points are added.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. PR adds CI infrastructure (YAML, shell scripts) to run existing tests on Azure. Check not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from alicefr and danilo-gemoli May 8, 2026 09:12
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 8, 2026
@Jakob-Naucke Jakob-Naucke marked this pull request as draft May 8, 2026 09:12
@openshift-ci openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. labels May 8, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@Jakob-Naucke, pj-rehearse: unable to determine affected jobs. This could be due to a branch that needs to be rebased. ERROR:

could not determine changed registry steps: could not load step registry: invalid file name: /var/tmp/gitrepo131959482/ci-operator/step-registry/trusted-execution-clusters/operator/cleanup/trusted-execution-clusters-operator-cleanup-azure.yaml
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci openshift-ci Bot removed the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label May 8, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@Jakob-Naucke, pj-rehearse: unable to determine affected jobs. This could be due to a branch that needs to be rebased. ERROR:

could not determine changed registry steps: could not load step registry: invalid file name: /var/tmp/gitrepo1196305233/ci-operator/step-registry/trusted-execution-clusters/operator/cleanup/trusted-execution-clusters-operator-cleanup-azure.yaml
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 8, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jakob-Naucke

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label May 8, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@Jakob-Naucke, pj-rehearse: unable to determine affected jobs. This could be due to a branch that needs to be rebased. ERROR:

could not determine changed registry steps: could not load step registry: invalid file name: /var/tmp/gitrepo3321177835/ci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/trusted-execution-clusters-operator-azure-cleanup.yaml
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci openshift-ci Bot removed the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label May 8, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh (1)

48-48: 💤 Low value

Brittle parse of KIND_HOST_URN from the Makefile.

grep KIND_HOST_URN operator/Makefile | cut -d= -f2 | tr -d ' ' will silently misbehave if the variable is referenced anywhere else (a comment, $(KIND_HOST_URN) usage, a derived variable, etc.) — the result becomes a multiline blob fed straight to az vm create --image. Same caveat applies to the awk '/kindest/ {print $NF}' Cargo.toml on line 142.

Consider asking make to print the resolved value, e.g. make -s -C operator print-KIND_HOST_URN with a small print-%: ; @echo $($*) rule, or anchor the grep to the start of the line (grep '^KIND_HOST_URN[[:space:]]*=').

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`
at line 48, The current extraction of KIND_HOST_URN into kind_vm_image using a
loose grep is brittle; update the script to obtain the resolved Makefile
variable instead of grepping (e.g. invoke make -s -C operator
print-KIND_HOST_URN relying on a simple print-%: ; `@echo` $($*) rule added to the
Makefile) or at minimum anchor the grep to the line start (grep
'^KIND_HOST_URN[[:space:]]*=') to avoid matching comments/usages; likewise
replace the fragile awk '/kindest/ {print $NF}' Cargo.toml usage with a more
robust query or anchored grep/Make print target that returns the single intended
kindest image value so that kind_vm_image and the Cargo.toml-derived image are
always single, resolved strings passed to az vm create.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/azure.yaml`:
- Line 2: The chain name in the YAML uses
"trusted-execution-clusters-operator-lifecycle-azure" which doesn't match the
consumer's expected chain name
"trusted-execution-clusters-operator-azure-lifecycle"; update the `as:` value to
the consumer-facing identifier
"trusted-execution-clusters-operator-azure-lifecycle" (or alternatively update
the consumer to use the current name) so chain resolution succeeds—look for the
`as:` entry in this file and replace the existing string with the exact
consumer-config name.

In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.sh`:
- Around line 4-7: The guard that checks SHARED_DIR uses direct expansion under
nounset mode (set -u), which can cause an unbound-variable error; update the if
condition to use a safe default expansion like ${SHARED_DIR:-} (e.g., if [ -z
"${SHARED_DIR:-}" ]; then ...) so the check works even when SHARED_DIR is unset,
and leave the existing error/exit behavior unchanged.
- Around line 9-27: Make the cleanup script idempotent: if the marker file
"${SHARED_DIR}/az-resource-group" is missing, exit 0 instead of failing; if
present, read az_resource_group and call "az group exists --name
\"$az_resource_group\"" and use its stdout ("true"/"false") to decide whether to
delete; only run "az group delete --name \"$az_resource_group\" --yes" when
exists returns true and print a success message for both "deleted" and "already
absent" cases; keep the existing az login (secret_base/client-id, client-secret,
tenant-id) flow but ensure missing marker file is treated as no-op and use "az
group exists" to guard deletion.

In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`:
- Around line 43-61: The script writes the resource group name to
$SHARED_DIR/az-resource-group before authentication and creation, which can
cause cleanup to try deleting a non-existent RG; move the write so it happens
only after az login and a successful az group create (i.e., after the az group
create call that uses az_resource_group), or alternatively make the downstream
deprovision tolerant of a missing group (check az group exists or handle "not
found" in azure-deprovision/deprovision.sh). Update the sequence around az
login, az group create and the SHARED_DIR/az-resource-group write so the file is
persisted only on success.
- Around line 20-23: The guard checking SHARED_DIR fails under nounset (-u)
because ${SHARED_DIR} is expanded when unset; change the conditional to use a
safe default expansion such as if [ -z "${SHARED_DIR:-}" ]; then so the test can
run even if SHARED_DIR is unset, keep the existing log_error and exit 1
behavior, and update any other checks that reference SHARED_DIR in this script
(e.g., replace bare ${SHARED_DIR} expansions with ${SHARED_DIR:-} or
${SHARED_DIR:+...} as appropriate) so the script respects the #!/bin/bash -eu
shebang.
- Around line 186-193: The ssh-agent is started by eval "$(ssh-agent)" but no
key is added, so either remove the ssh-agent startup or add an ssh-add step
before calling make integration-tests; update the test script around eval
"$(ssh-agent)" and the make integration-tests invocation to either (1) drop the
eval line entirely (matching
trusted-execution-clusters-ref-operator-test-commands.sh) if agent auth is not
required, or (2) call ssh-add with the correct private key (e.g., ssh-add
"$SSH_KEY_PATH" or ssh-add "$secret_base/id_rsa") and ensure SSH_AUTH_SOCK is
set so make integration-tests can use agent auth. Ensure the change touches the
block containing eval "$(ssh-agent)" and the make integration-tests invocation
so the agent is either populated or removed.
- Around line 91-104: The NSG rule loop currently opens port 6443 to the
internet by using --source-address-prefixes "*" in the az network nsg rule
create call; change the logic in the ports/loop (ports=(6443 8000 8080) and the
for loop that calls az network nsg rule create) to scope
--source-address-prefixes for the kube-apiserver port (6443) to a specific
origin (e.g., fetch the CI host or Prow egress public IP and use it as CIDR/32)
while keeping the other ports as-is; locate the az network nsg rule create
invocation and add conditional handling so when port == 6443 you substitute the
fetched public IP (or a configured allowed CIDR) instead of "*" for
--source-address-prefixes.

---

Nitpick comments:
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`:
- Line 48: The current extraction of KIND_HOST_URN into kind_vm_image using a
loose grep is brittle; update the script to obtain the resolved Makefile
variable instead of grepping (e.g. invoke make -s -C operator
print-KIND_HOST_URN relying on a simple print-%: ; `@echo` $($*) rule added to the
Makefile) or at minimum anchor the grep to the line start (grep
'^KIND_HOST_URN[[:space:]]*=') to avoid matching comments/usages; likewise
replace the fragile awk '/kindest/ {print $NF}' Cargo.toml usage with a more
robust query or anchored grep/Make print target that returns the single intended
kindest image value so that kind_vm_image and the Cargo.toml-derived image are
always single, resolved strings passed to az vm create.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 1f4cff19-27c8-448b-97ac-a00ce96abb6c

📥 Commits

Reviewing files that changed from the base of the PR and between be1c46c and 904bd57.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/trusted-execution-clusters/operator/trusted-execution-clusters-operator-main-presubmits.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (16)
  • ci-operator/config/trusted-execution-clusters/operator/azure.yaml
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/OWNERS
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/OWNERS
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/trusted-execution-clusters-operator-azure-cleanup.metadata.json
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/trusted-execution-clusters-operator-azure-cleanup.yaml
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/OWNERS
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/azure.metadata.json
  • ci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/azure.yaml
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/OWNERS
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.metadata.json
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.sh
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.yaml
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/OWNERS
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.metadata.json
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh
  • ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.yaml

Comment on lines +4 to +7
if [ -z "${SHARED_DIR}" ]; then
echo "[ERROR] SHARED_DIR is not set. This script must run in Prow CI environment."
exit 1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the pattern is nounset-safe
rg -n '^\#!/bin/bash -eu|SHARED_DIR' ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.sh
# Repro of current behavior with nounset:
env -i bash -eu -c '[ -z "${SHARED_DIR}" ]' || true

Repository: openshift/release

Length of output: 444


Guard SHARED_DIR safely under set -u.

Line 4 expands ${SHARED_DIR} directly under nounset mode (-u), causing an unbound-variable error before the intended guard check runs. Use a default expansion to allow the condition to evaluate properly when the variable is unset.

Suggested fix
-if [ -z "${SHARED_DIR}" ]; then
+if [ -z "${SHARED_DIR:-}" ]; then
   echo "[ERROR] SHARED_DIR is not set. This script must run in Prow CI environment."
   exit 1
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ -z "${SHARED_DIR}" ]; then
echo "[ERROR] SHARED_DIR is not set. This script must run in Prow CI environment."
exit 1
fi
if [ -z "${SHARED_DIR:-}" ]; then
echo "[ERROR] SHARED_DIR is not set. This script must run in Prow CI environment."
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.sh`
around lines 4 - 7, The guard that checks SHARED_DIR uses direct expansion under
nounset mode (set -u), which can cause an unbound-variable error; update the if
condition to use a safe default expansion like ${SHARED_DIR:-} (e.g., if [ -z
"${SHARED_DIR:-}" ]; then ...) so the check works even when SHARED_DIR is unset,
and leave the existing error/exit behavior unchanged.

Comment on lines +9 to +27
if [ ! -f "${SHARED_DIR}/az-resource-group" ]; then
echo "[ERROR] az-resource-group was not placed in SHARED_DIR"
exit 1
fi

rpm --import https://packages.microsoft.com/keys/microsoft.asc
dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm
dnf install -y azure-cli

secret_base=/var/run/azure-upstream-ci
az login --service-principal \
--username "$(cat $secret_base/client-id)" \
--password "$(cat $secret_base/client-secret)" \
--tenant "$(cat $secret_base/tenant-id)"

az_resource_group=$(cat "${SHARED_DIR}/az-resource-group")
echo "[INFO] Delete Kind VM resource group $az_resource_group"
az group delete --name "$az_resource_group" --yes
echo "[SUCCESS] Deleted Kind VM resource group $az_resource_group"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

`For Azure CLI, what are the documented outputs/exit semantics of:

  1. az group exists --name <group>
  2. az group delete --name <group> --yes
    especially when the resource group does not exist?`

💡 Result:

  1. az group exists --name <group> - Documented output: emits the string true or false (default output format is JSON, but for human use it prints true/false).[1][2] - Exit semantics when the resource group does not exist: the command can return exit code 0 even when the output is false (i.e., scripts should not rely on exit code; they should inspect stdout).[3] - Note: if you’re using --output json then you’ll get a JSON boolean, but the “false vs true” meaning remains the same.[1] 2) az group delete --name <group> --yes - Documented behavior: --yes only suppresses the interactive confirmation prompt; it doesn’t document any special handling for “group does not exist”.[1] - Exit semantics when the resource group does not exist: the authoritative behavior (specific exit code) is not clearly stated in the official command docs we found; instead, the underlying Azure Resource Manager delete operation treats HTTP 404 responses as successful for deletions (Resource Manager issues GET and expects 404; it considers deletion successful when it gets 404).[4] - Practically, this means: if Azure returns “not found” for the delete, it is treated as successful by ARM deletion logic, so you should expect the CLI not to fail the operation purely due to “already missing”, but the exact CLI exit code isn’t explicitly documented in the sources above.[4][5] Bottom line for scripting - Use stdout (true/false) for az group exists rather than $?/exit code.[3] - For az group delete --yes, treat “resource group not found” as a successful/no-op at the ARM level (404 is treated as success), but if you need an exact exit code guarantee, you’ll have to test in your environment since the CLI docs we located don’t spell out the specific exit code for the not-found case.[4][1]

Citations:


Make the cleanup script idempotent to avoid failures in post-steps.

The script hard-fails when the marker file is missing (exits with code 1) and deletes unconditionally without checking if the resource group still exists. In post-step execution, this causes noisy failures when these conditions aren't met. The script should treat "nothing to delete" and "already deleted" as success states.

The suggested changes are valid:

  • Exit 0 when the marker file is missing (the operation has nothing to do, which is not an error in post-cleanup)
  • Check if the resource group exists before deletion using az group exists, which reliably returns true or false to stdout (exit code cannot be used reliably for this check)
  • Handle the case where the resource group is already absent with a success message

This ensures the post-step completes without error whether resources were previously provisioned or already cleaned up.

Suggested fix
-if [ ! -f "${SHARED_DIR}/az-resource-group" ]; then
-  echo "[ERROR] az-resource-group was not placed in SHARED_DIR"
-  exit 1
+if [ ! -f "${SHARED_DIR}/az-resource-group" ]; then
+  echo "[WARN] az-resource-group not found; skipping Azure cleanup."
+  exit 0
 fi
@@
-az_resource_group=$(cat "${SHARED_DIR}/az-resource-group")
+az_resource_group="$(<"${SHARED_DIR}/az-resource-group")"
+if [ -z "${az_resource_group}" ]; then
+  echo "[WARN] Empty az-resource-group; skipping Azure cleanup."
+  exit 0
+fi
 echo "[INFO] Delete Kind VM resource group $az_resource_group"
-az group delete --name "$az_resource_group" --yes
+if [ "$(az group exists --name "$az_resource_group")" = "true" ]; then
+  az group delete --name "$az_resource_group" --yes
+else
+  echo "[INFO] Resource group $az_resource_group already absent; nothing to delete."
+fi
 echo "[SUCCESS] Deleted Kind VM resource group $az_resource_group"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ ! -f "${SHARED_DIR}/az-resource-group" ]; then
echo "[ERROR] az-resource-group was not placed in SHARED_DIR"
exit 1
fi
rpm --import https://packages.microsoft.com/keys/microsoft.asc
dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm
dnf install -y azure-cli
secret_base=/var/run/azure-upstream-ci
az login --service-principal \
--username "$(cat $secret_base/client-id)" \
--password "$(cat $secret_base/client-secret)" \
--tenant "$(cat $secret_base/tenant-id)"
az_resource_group=$(cat "${SHARED_DIR}/az-resource-group")
echo "[INFO] Delete Kind VM resource group $az_resource_group"
az group delete --name "$az_resource_group" --yes
echo "[SUCCESS] Deleted Kind VM resource group $az_resource_group"
if [ ! -f "${SHARED_DIR}/az-resource-group" ]; then
echo "[WARN] az-resource-group not found; skipping Azure cleanup."
exit 0
fi
rpm --import https://packages.microsoft.com/keys/microsoft.asc
dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm
dnf install -y azure-cli
secret_base=/var/run/azure-upstream-ci
az login --service-principal \
--username "$(cat $secret_base/client-id)" \
--password "$(cat $secret_base/client-secret)" \
--tenant "$(cat $secret_base/tenant-id)"
az_resource_group="$(<"${SHARED_DIR}/az-resource-group")"
if [ -z "${az_resource_group}" ]; then
echo "[WARN] Empty az-resource-group; skipping Azure cleanup."
exit 0
fi
echo "[INFO] Delete Kind VM resource group $az_resource_group"
if [ "$(az group exists --name "$az_resource_group")" = "true" ]; then
az group delete --name "$az_resource_group" --yes
else
echo "[INFO] Resource group $az_resource_group already absent; nothing to delete."
fi
echo "[SUCCESS] Deleted Kind VM resource group $az_resource_group"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.sh`
around lines 9 - 27, Make the cleanup script idempotent: if the marker file
"${SHARED_DIR}/az-resource-group" is missing, exit 0 instead of failing; if
present, read az_resource_group and call "az group exists --name
\"$az_resource_group\"" and use its stdout ("true"/"false") to decide whether to
delete; only run "az group delete --name \"$az_resource_group\" --yes" when
exists returns true and print a success message for both "deleted" and "already
absent" cases; keep the existing az login (secret_base/client-id, client-secret,
tenant-id) flow but ensure missing marker file is treated as no-op and use "az
group exists" to guard deletion.

Comment on lines +20 to +23
if [ -z "${SHARED_DIR}" ]; then
log_error "SHARED_DIR is not set. This script must run in Prow CI environment."
exit 1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

SHARED_DIR guard fails under set -u.

Because the shebang is #!/bin/bash -eu, ${SHARED_DIR} is expanded under nounset. If SHARED_DIR is unset, this line aborts with SHARED_DIR: unbound variable before the helpful log_error message ever runs, defeating the purpose of the check.

🛡️ Use a default expansion so the guard can actually fire
-if [ -z "${SHARED_DIR}" ]; then
+if [ -z "${SHARED_DIR:-}" ]; then
   log_error "SHARED_DIR is not set. This script must run in Prow CI environment."
   exit 1
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ -z "${SHARED_DIR}" ]; then
log_error "SHARED_DIR is not set. This script must run in Prow CI environment."
exit 1
fi
if [ -z "${SHARED_DIR:-}" ]; then
log_error "SHARED_DIR is not set. This script must run in Prow CI environment."
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`
around lines 20 - 23, The guard checking SHARED_DIR fails under nounset (-u)
because ${SHARED_DIR} is expanded when unset; change the conditional to use a
safe default expansion such as if [ -z "${SHARED_DIR:-}" ]; then so the test can
run even if SHARED_DIR is unset, keep the existing log_error and exit 1
behavior, and update any other checks that reference SHARED_DIR in this script
(e.g., replace bare ${SHARED_DIR} expansions with ${SHARED_DIR:-} or
${SHARED_DIR:+...} as appropriate) so the script respects the #!/bin/bash -eu
shebang.

Comment on lines +43 to +61
az_region=eastus
az_resource_group=upstream-ci-$test_id
echo "$az_resource_group" > "$SHARED_DIR/az-resource-group"
kind_vm_user=ci
kind_vm_name=kind-vm
kind_vm_image=$(grep KIND_HOST_URN operator/Makefile | cut -d= -f2 | tr -d ' ')
vm_size=Standard_D2s_v3

AZURE_SUBSCRIPTION_ID=$(cat $secret_base/subscription-id)
export AZURE_SUBSCRIPTION_ID
az login --service-principal \
--username "$(cat $secret_base/client-id)" \
--password "$(cat $secret_base/client-secret)" \
--tenant "$(cat $secret_base/tenant-id)"

log_info "Create Azure resource group $az_resource_group"
az group create \
--location $az_region \
--resource-group "$az_resource_group"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resource group name is recorded before the group actually exists.

echo "$az_resource_group" > "$SHARED_DIR/az-resource-group" (line 45) runs before az login and az group create (lines 53–61). If login or group creation fails, the cleanup step downstream will attempt to delete a resource group that was never created. Depending on how azure-deprovision/deprovision.sh is implemented, this can either be a benign 404 or a failing cleanup that masks the original error.

Recommend persisting the RG name only after a successful az group create, or making the deprovision step explicitly tolerate a missing group (e.g. az group exists first, or az group delete --no-wait and ignore "not found").

♻️ Move the SHARED_DIR write to after group creation
 az_resource_group=upstream-ci-$test_id
-echo "$az_resource_group" > "$SHARED_DIR/az-resource-group"
 kind_vm_user=ci
@@
 log_info "Create Azure resource group $az_resource_group"
 az group create \
   --location $az_region \
   --resource-group "$az_resource_group"
+echo "$az_resource_group" > "$SHARED_DIR/az-resource-group"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
az_region=eastus
az_resource_group=upstream-ci-$test_id
echo "$az_resource_group" > "$SHARED_DIR/az-resource-group"
kind_vm_user=ci
kind_vm_name=kind-vm
kind_vm_image=$(grep KIND_HOST_URN operator/Makefile | cut -d= -f2 | tr -d ' ')
vm_size=Standard_D2s_v3
AZURE_SUBSCRIPTION_ID=$(cat $secret_base/subscription-id)
export AZURE_SUBSCRIPTION_ID
az login --service-principal \
--username "$(cat $secret_base/client-id)" \
--password "$(cat $secret_base/client-secret)" \
--tenant "$(cat $secret_base/tenant-id)"
log_info "Create Azure resource group $az_resource_group"
az group create \
--location $az_region \
--resource-group "$az_resource_group"
az_region=eastus
az_resource_group=upstream-ci-$test_id
kind_vm_user=ci
kind_vm_name=kind-vm
kind_vm_image=$(grep KIND_HOST_URN operator/Makefile | cut -d= -f2 | tr -d ' ')
vm_size=Standard_D2s_v3
AZURE_SUBSCRIPTION_ID=$(cat $secret_base/subscription-id)
export AZURE_SUBSCRIPTION_ID
az login --service-principal \
--username "$(cat $secret_base/client-id)" \
--password "$(cat $secret_base/client-secret)" \
--tenant "$(cat $secret_base/tenant-id)"
log_info "Create Azure resource group $az_resource_group"
az group create \
--location $az_region \
--resource-group "$az_resource_group"
echo "$az_resource_group" > "$SHARED_DIR/az-resource-group"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`
around lines 43 - 61, The script writes the resource group name to
$SHARED_DIR/az-resource-group before authentication and creation, which can
cause cleanup to try deleting a non-existent RG; move the write so it happens
only after az login and a successful az group create (i.e., after the az group
create call that uses az_resource_group), or alternatively make the downstream
deprovision tolerant of a missing group (check az group exists or handle "not
found" in azure-deprovision/deprovision.sh). Update the sequence around az
login, az group create and the SHARED_DIR/az-resource-group write so the file is
persisted only on success.

Comment on lines +91 to +104
ports=(6443 8000 8080)
for i in "${!ports[@]}"; do
port=${ports[$i]}
az network nsg rule create \
--resource-group "$az_resource_group" \
--nsg-name "$nsg" \
--name "allow-$port" \
--priority $((1001 + i)) \
--source-address-prefixes "*" \
--destination-port-ranges "$port" \
--protocol Tcp \
--access Allow \
--direction Inbound
done
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🌐 Web query:

Azure CLI az network nsg rule create source-address-prefixes scope to single IP example

💡 Result:

In Azure CLI, you set the source match for an NSG rule with --source-address-prefixes. It takes a space-separated list of CIDR prefixes or IP ranges, so a single IP works when you pass just one value [1]. Single IP example: az network nsg rule create \ --resource-group \ --nsg-name \ --name \ --priority \ --direction Inbound \ --access Allow \ --protocol Tcp \ --source-address-prefixes 203.0.113.10 \ --destination-address-prefixes '*' \ --destination-port-ranges 22 [1] Multiple IPs/prefixes: pass multiple values as separate CLI arguments (space-separated) [1]. Common pitfall (variables): if you’re using a shell that passes the variable as a single token, ensure the CLI receives multiple tokens (e.g., an array expansion / separate arguments). If you pass a single string that contains embedded commas/spaces, it may be treated as one invalid address prefix and fail validation [2].

Citations:


🏁 Script executed:

# Read the test.sh file to understand the context
head -110 ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh | tail -40

Repository: openshift/release

Length of output: 1168


🏁 Script executed:

# Get more context around the suspicious lines and understand what this script does
head -20 ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh

Repository: openshift/release

Length of output: 274


🏁 Script executed:

# Check if there's any documentation about this script
ls -la ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/

Repository: openshift/release

Length of output: 413


🏁 Script executed:

# Read the test.yaml to understand the test setup
cat ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.yaml

Repository: openshift/release

Length of output: 588


🏁 Script executed:

# Check more of the test.sh to understand what runs on ports 8000/8080
wc -l ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh

Repository: openshift/release

Length of output: 150


🏁 Script executed:

# Read the full test.sh to understand what services run on these ports
sed -n '105,150p' ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh

Repository: openshift/release

Length of output: 1328


🏁 Script executed:

# Search for what uses ports 8000 and 8080 in the test script
grep -n "8000\|8080" ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh

Repository: openshift/release

Length of output: 87


🏁 Script executed:

# Check if there's any authentication configuration mentioned
grep -n "anonymous\|auth\|insecure" ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh

Repository: openshift/release

Length of output: 151


Restrict Kubernetes API server access from the public internet.

Port 6443 (kube-apiserver) is exposed to the entire internet via --source-address-prefixes "*". The Kind cluster configuration explicitly binds the API server to 0.0.0.0 on this port, creating an unnecessary attack surface. Even for ephemeral CI VMs, this risks exposure to API CVEs, authn/authz misconfigurations, and probing attacks.

Scope --source-address-prefixes to known origins such as the Prow build cluster egress IP range or the public IP of the host running this script (e.g., curl -fsS https://api.ipify.org). Access from the CI host using kubeconfig is sufficient for cluster setup and testing without requiring world-open ingress.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`
around lines 91 - 104, The NSG rule loop currently opens port 6443 to the
internet by using --source-address-prefixes "*" in the az network nsg rule
create call; change the logic in the ports/loop (ports=(6443 8000 8080) and the
for loop that calls az network nsg rule create) to scope
--source-address-prefixes for the kube-apiserver port (6443) to a specific
origin (e.g., fetch the CI host or Prow egress public IP and use it as CIDR/32)
while keeping the other ports as-is; locate the az network nsg rule create
invocation and add conditional handling so when port == 6443 you substitute the
fetched public IP (or a configured allowed CIDR) instead of "*" for
--source-address-prefixes.

Comment on lines +186 to +193
log_info "Run integration tests"
eval "$(ssh-agent)"
PLATFORM=kind_public VIRT_PROVIDER=azure REGISTRY=localhost:5000 \
TEST_NAMESPACE_PREFIX="$az_resource_group-" \
TEST_IMAGE=$(cat $secret_base/test-image) \
CLUSTER_URL="$kind_vm_ip" \
make integration-tests
log_success "Ran integration tests"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look at the operator's integration-tests target to see whether it relies on ssh-agent.
fd -i -t f 'Makefile|\.mk$' --exec rg -nP -C5 '\b(integration-tests|ssh-add|SSH_AUTH_SOCK)\b' {}

Repository: openshift/release

Length of output: 43


🏁 Script executed:

# First, verify the test.sh file exists and see context
cd ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test 2>/dev/null && ls -la test.sh 2>/dev/null || echo "File not found in expected path"

Repository: openshift/release

Length of output: 118


🏁 Script executed:

# Search for the test.sh file more broadly
find . -name "test.sh" -path "*/trusted-execution-clusters/*" -type f 2>/dev/null | head -20

Repository: openshift/release

Length of output: 148


🏁 Script executed:

# Look for any Makefile in the repo
fd -i -t f '(Makefile|makefile)' 2>/dev/null | head -20

Repository: openshift/release

Length of output: 366


🏁 Script executed:

# Search for ssh-add and ssh-agent usage in the repository
rg -i 'ssh-add|ssh-agent' --type-list | head -5
rg -i 'ssh-add|ssh-agent' -l 2>/dev/null | head -20

Repository: openshift/release

Length of output: 836


🏁 Script executed:

# Read the test.sh file to see the context around lines 186-193
cat -n ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh | sed -n '170,200p'

Repository: openshift/release

Length of output: 1213


🏁 Script executed:

# Check if ssh-add is called anywhere in test.sh
rg -n 'ssh-add' ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh

Repository: openshift/release

Length of output: 43


🏁 Script executed:

# Check for SSH_AUTH_SOCK or other ssh-agent references in test.sh
rg -n 'SSH_AUTH_SOCK|ssh-agent' ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh

Repository: openshift/release

Length of output: 85


🏁 Script executed:

# Find where the integration-tests make target is defined - check the root Makefile first
rg -n 'integration-tests:' Makefile

Repository: openshift/release

Length of output: 43


🏁 Script executed:

# Search for integration-tests in all files to understand what this target does
rg -n 'integration-tests' ci-operator/step-registry/trusted-execution-clusters/ --type-list=shell 2>/dev/null | head -20

Repository: openshift/release

Length of output: 43


🏁 Script executed:

# Better search for integration-tests
rg -n 'integration-tests' ci-operator/step-registry/trusted-execution-clusters/ | head -30

Repository: openshift/release

Length of output: 920


🏁 Script executed:

# Look at the test-commands.sh file around line 551 where it calls make integration-tests
sed -n '530,570p' ci-operator/step-registry/trusted-execution-clusters/ref/operator/test/trusted-execution-clusters-ref-operator-test-commands.sh

Repository: openshift/release

Length of output: 1804


🏁 Script executed:

# Check if ssh-agent is used in test-commands.sh at all
rg -n 'ssh-agent\|ssh-add' ci-operator/step-registry/trusted-execution-clusters/ref/operator/test/trusted-execution-clusters-ref-operator-test-commands.sh

Repository: openshift/release

Length of output: 43


🏁 Script executed:

# Get more context from the beginning of test.sh to understand the overall flow
head -50 ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh

Repository: openshift/release

Length of output: 1281


🏁 Script executed:

# Look for any documentation or comments about ssh-agent usage
rg -B5 -A5 'ssh-agent' ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh

Repository: openshift/release

Length of output: 554


ssh-agent is started but no key is added.

eval "$(ssh-agent)" launches an agent but nothing ever calls ssh-add, so SSH_AUTH_SOCK points at an empty agent. If make integration-tests relies on agent-based auth (e.g., to ssh into the Kind VM), it will fail; if it doesn't, this line is dead code that just leaks a process. Notably, the comparable test suite in trusted-execution-clusters-ref-operator-test-commands.sh runs make integration-tests without any ssh-agent setup.

Please confirm whether the make target needs an agent and either add the corresponding ssh-add call with the appropriate key, or drop the line.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`
around lines 186 - 193, The ssh-agent is started by eval "$(ssh-agent)" but no
key is added, so either remove the ssh-agent startup or add an ssh-add step
before calling make integration-tests; update the test script around eval
"$(ssh-agent)" and the make integration-tests invocation to either (1) drop the
eval line entirely (matching
trusted-execution-clusters-ref-operator-test-commands.sh) if agent auth is not
required, or (2) call ssh-add with the correct private key (e.g., ssh-add
"$SSH_KEY_PATH" or ssh-add "$secret_base/id_rsa") and ensure SSH_AUTH_SOCK is
set so make integration-tests can use agent auth. Ensure the change touches the
block containing eval "$(ssh-agent)" and the make integration-tests invocation
so the agent is either populated or removed.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@Jakob-Naucke, pj-rehearse: unable to determine affected jobs. This could be due to a branch that needs to be rebased. ERROR:

could not determine changed registry steps: could not load step registry: invalid file name: /var/tmp/gitrepo3990203622/ci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/trusted-execution-clusters-operator-azure-cleanup.yaml
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

Create a Kind VM (also on Azure) as container platform. Run
integration tests on Azure.

Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@Jakob-Naucke: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-trusted-execution-clusters-operator-main-operator-lifecycle-azure-verify trusted-execution-clusters/operator presubmit Presubmit changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@Jakob-Naucke
Copy link
Copy Markdown
Member Author

/pj-rehearse

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@Jakob-Naucke
Copy link
Copy Markdown
Member Author

/pj-rehearse

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@Jakob-Naucke
Copy link
Copy Markdown
Member Author

/pj-rehearse

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@Jakob-Naucke
Copy link
Copy Markdown
Member Author

last attempt today (none are scheduling)

/pj-rehearse

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant