trusted-execution-clusters: Intro Azure upstream CI#79035
trusted-execution-clusters: Intro Azure upstream CI#79035Jakob-Naucke wants to merge 1 commit intoopenshift:mainfrom
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (16)
WalkthroughThis PR adds Azure-based CI infrastructure for the trusted-execution-clusters operator, including step registry entries for provisioning an Azure VM with Kind Kubernetes cluster and running integration tests, along with cleanup automation and ownership metadata. ChangesAzure Operator CI Infrastructure
Sequence DiagramsequenceDiagram
participant Prow as Prow CI
participant Test as Test Step
participant Azure as Azure Cloud
participant VM as Azure VM
participant Registry as Local Registry
participant Kind as Kind Cluster
participant Cleanup as Cleanup Step
Prow->>Test: Trigger test step
Test->>Test: Validate SHARED_DIR
Test->>Test: Checkout repository
Test->>Test: Install tooling (azure-cli, kind, jq)
Test->>Azure: Authenticate with service principal
Test->>Azure: Create resource group & VM
Test->>Azure: Write resource group name to SHARED_DIR
Test->>VM: Wait for VM readiness
Test->>VM: Configure SSH & NSG inbound rules
Test->>VM: Transfer source & provisioning script
VM->>VM: Install Docker & extend LVM
VM->>VM: Install kind, kubectl, yq
Test->>Kind: Create Kind cluster on VM
Kind->>Kind: Wait for cluster readiness
Test->>Test: Retrieve kubeconfig & update API endpoint
Test->>Registry: Build & push operator images
Registry->>Registry: Store images on VM
Test->>Kind: Run integration tests
Kind->>Kind: Execute test suite
Prow->>Cleanup: Trigger cleanup step
Cleanup->>Cleanup: Validate SHARED_DIR & resource group
Cleanup->>Cleanup: Authenticate to Azure
Cleanup->>Azure: Delete resource group
Azure->>VM: Deprovision VM & resources
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@Jakob-Naucke, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@Jakob-Naucke, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jakob-Naucke The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@Jakob-Naucke, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh (1)
48-48: 💤 Low valueBrittle parse of
KIND_HOST_URNfrom the Makefile.
grep KIND_HOST_URN operator/Makefile | cut -d= -f2 | tr -d ' 'will silently misbehave if the variable is referenced anywhere else (a comment,$(KIND_HOST_URN)usage, a derived variable, etc.) — the result becomes a multiline blob fed straight toaz vm create --image. Same caveat applies to theawk '/kindest/ {print $NF}' Cargo.tomlon line 142.Consider asking make to print the resolved value, e.g.
make -s -C operator print-KIND_HOST_URNwith a smallprint-%: ;@echo$($*)rule, or anchor the grep to the start of the line (grep '^KIND_HOST_URN[[:space:]]*=').🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh` at line 48, The current extraction of KIND_HOST_URN into kind_vm_image using a loose grep is brittle; update the script to obtain the resolved Makefile variable instead of grepping (e.g. invoke make -s -C operator print-KIND_HOST_URN relying on a simple print-%: ; `@echo` $($*) rule added to the Makefile) or at minimum anchor the grep to the line start (grep '^KIND_HOST_URN[[:space:]]*=') to avoid matching comments/usages; likewise replace the fragile awk '/kindest/ {print $NF}' Cargo.toml usage with a more robust query or anchored grep/Make print target that returns the single intended kindest image value so that kind_vm_image and the Cargo.toml-derived image are always single, resolved strings passed to az vm create.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/azure.yaml`:
- Line 2: The chain name in the YAML uses
"trusted-execution-clusters-operator-lifecycle-azure" which doesn't match the
consumer's expected chain name
"trusted-execution-clusters-operator-azure-lifecycle"; update the `as:` value to
the consumer-facing identifier
"trusted-execution-clusters-operator-azure-lifecycle" (or alternatively update
the consumer to use the current name) so chain resolution succeeds—look for the
`as:` entry in this file and replace the existing string with the exact
consumer-config name.
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.sh`:
- Around line 4-7: The guard that checks SHARED_DIR uses direct expansion under
nounset mode (set -u), which can cause an unbound-variable error; update the if
condition to use a safe default expansion like ${SHARED_DIR:-} (e.g., if [ -z
"${SHARED_DIR:-}" ]; then ...) so the check works even when SHARED_DIR is unset,
and leave the existing error/exit behavior unchanged.
- Around line 9-27: Make the cleanup script idempotent: if the marker file
"${SHARED_DIR}/az-resource-group" is missing, exit 0 instead of failing; if
present, read az_resource_group and call "az group exists --name
\"$az_resource_group\"" and use its stdout ("true"/"false") to decide whether to
delete; only run "az group delete --name \"$az_resource_group\" --yes" when
exists returns true and print a success message for both "deleted" and "already
absent" cases; keep the existing az login (secret_base/client-id, client-secret,
tenant-id) flow but ensure missing marker file is treated as no-op and use "az
group exists" to guard deletion.
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`:
- Around line 43-61: The script writes the resource group name to
$SHARED_DIR/az-resource-group before authentication and creation, which can
cause cleanup to try deleting a non-existent RG; move the write so it happens
only after az login and a successful az group create (i.e., after the az group
create call that uses az_resource_group), or alternatively make the downstream
deprovision tolerant of a missing group (check az group exists or handle "not
found" in azure-deprovision/deprovision.sh). Update the sequence around az
login, az group create and the SHARED_DIR/az-resource-group write so the file is
persisted only on success.
- Around line 20-23: The guard checking SHARED_DIR fails under nounset (-u)
because ${SHARED_DIR} is expanded when unset; change the conditional to use a
safe default expansion such as if [ -z "${SHARED_DIR:-}" ]; then so the test can
run even if SHARED_DIR is unset, keep the existing log_error and exit 1
behavior, and update any other checks that reference SHARED_DIR in this script
(e.g., replace bare ${SHARED_DIR} expansions with ${SHARED_DIR:-} or
${SHARED_DIR:+...} as appropriate) so the script respects the #!/bin/bash -eu
shebang.
- Around line 186-193: The ssh-agent is started by eval "$(ssh-agent)" but no
key is added, so either remove the ssh-agent startup or add an ssh-add step
before calling make integration-tests; update the test script around eval
"$(ssh-agent)" and the make integration-tests invocation to either (1) drop the
eval line entirely (matching
trusted-execution-clusters-ref-operator-test-commands.sh) if agent auth is not
required, or (2) call ssh-add with the correct private key (e.g., ssh-add
"$SSH_KEY_PATH" or ssh-add "$secret_base/id_rsa") and ensure SSH_AUTH_SOCK is
set so make integration-tests can use agent auth. Ensure the change touches the
block containing eval "$(ssh-agent)" and the make integration-tests invocation
so the agent is either populated or removed.
- Around line 91-104: The NSG rule loop currently opens port 6443 to the
internet by using --source-address-prefixes "*" in the az network nsg rule
create call; change the logic in the ports/loop (ports=(6443 8000 8080) and the
for loop that calls az network nsg rule create) to scope
--source-address-prefixes for the kube-apiserver port (6443) to a specific
origin (e.g., fetch the CI host or Prow egress public IP and use it as CIDR/32)
while keeping the other ports as-is; locate the az network nsg rule create
invocation and add conditional handling so when port == 6443 you substitute the
fetched public IP (or a configured allowed CIDR) instead of "*" for
--source-address-prefixes.
---
Nitpick comments:
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`:
- Line 48: The current extraction of KIND_HOST_URN into kind_vm_image using a
loose grep is brittle; update the script to obtain the resolved Makefile
variable instead of grepping (e.g. invoke make -s -C operator
print-KIND_HOST_URN relying on a simple print-%: ; `@echo` $($*) rule added to the
Makefile) or at minimum anchor the grep to the line start (grep
'^KIND_HOST_URN[[:space:]]*=') to avoid matching comments/usages; likewise
replace the fragile awk '/kindest/ {print $NF}' Cargo.toml usage with a more
robust query or anchored grep/Make print target that returns the single intended
kindest image value so that kind_vm_image and the Cargo.toml-derived image are
always single, resolved strings passed to az vm create.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 1f4cff19-27c8-448b-97ac-a00ce96abb6c
⛔ Files ignored due to path filters (1)
ci-operator/jobs/trusted-execution-clusters/operator/trusted-execution-clusters-operator-main-presubmits.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (16)
ci-operator/config/trusted-execution-clusters/operator/azure.yamlci-operator/step-registry/trusted-execution-clusters/operator-azure/OWNERSci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/OWNERSci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/trusted-execution-clusters-operator-azure-cleanup.metadata.jsonci-operator/step-registry/trusted-execution-clusters/operator-azure/cleanup/trusted-execution-clusters-operator-azure-cleanup.yamlci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/OWNERSci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/azure.metadata.jsonci-operator/step-registry/trusted-execution-clusters/operator-azure/lifecycle/azure.yamlci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/OWNERSci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.metadata.jsonci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.shci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.yamlci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/OWNERSci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.metadata.jsonci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.shci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.yaml
| if [ -z "${SHARED_DIR}" ]; then | ||
| echo "[ERROR] SHARED_DIR is not set. This script must run in Prow CI environment." | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the pattern is nounset-safe
rg -n '^\#!/bin/bash -eu|SHARED_DIR' ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.sh
# Repro of current behavior with nounset:
env -i bash -eu -c '[ -z "${SHARED_DIR}" ]' || trueRepository: openshift/release
Length of output: 444
Guard SHARED_DIR safely under set -u.
Line 4 expands ${SHARED_DIR} directly under nounset mode (-u), causing an unbound-variable error before the intended guard check runs. Use a default expansion to allow the condition to evaluate properly when the variable is unset.
Suggested fix
-if [ -z "${SHARED_DIR}" ]; then
+if [ -z "${SHARED_DIR:-}" ]; then
echo "[ERROR] SHARED_DIR is not set. This script must run in Prow CI environment."
exit 1
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ -z "${SHARED_DIR}" ]; then | |
| echo "[ERROR] SHARED_DIR is not set. This script must run in Prow CI environment." | |
| exit 1 | |
| fi | |
| if [ -z "${SHARED_DIR:-}" ]; then | |
| echo "[ERROR] SHARED_DIR is not set. This script must run in Prow CI environment." | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.sh`
around lines 4 - 7, The guard that checks SHARED_DIR uses direct expansion under
nounset mode (set -u), which can cause an unbound-variable error; update the if
condition to use a safe default expansion like ${SHARED_DIR:-} (e.g., if [ -z
"${SHARED_DIR:-}" ]; then ...) so the check works even when SHARED_DIR is unset,
and leave the existing error/exit behavior unchanged.
| if [ ! -f "${SHARED_DIR}/az-resource-group" ]; then | ||
| echo "[ERROR] az-resource-group was not placed in SHARED_DIR" | ||
| exit 1 | ||
| fi | ||
|
|
||
| rpm --import https://packages.microsoft.com/keys/microsoft.asc | ||
| dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm | ||
| dnf install -y azure-cli | ||
|
|
||
| secret_base=/var/run/azure-upstream-ci | ||
| az login --service-principal \ | ||
| --username "$(cat $secret_base/client-id)" \ | ||
| --password "$(cat $secret_base/client-secret)" \ | ||
| --tenant "$(cat $secret_base/tenant-id)" | ||
|
|
||
| az_resource_group=$(cat "${SHARED_DIR}/az-resource-group") | ||
| echo "[INFO] Delete Kind VM resource group $az_resource_group" | ||
| az group delete --name "$az_resource_group" --yes | ||
| echo "[SUCCESS] Deleted Kind VM resource group $az_resource_group" |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
`For Azure CLI, what are the documented outputs/exit semantics of:
az group exists --name <group>az group delete --name <group> --yes
especially when the resource group does not exist?`
💡 Result:
az group exists --name <group>- Documented output: emits the stringtrueorfalse(default output format is JSON, but for human use it printstrue/false).[1][2] - Exit semantics when the resource group does not exist: the command can return exit code 0 even when the output isfalse(i.e., scripts should not rely on exit code; they should inspect stdout).[3] - Note: if you’re using--output jsonthen you’ll get a JSON boolean, but the “false vs true” meaning remains the same.[1] 2)az group delete --name <group> --yes- Documented behavior:--yesonly suppresses the interactive confirmation prompt; it doesn’t document any special handling for “group does not exist”.[1] - Exit semantics when the resource group does not exist: the authoritative behavior (specific exit code) is not clearly stated in the official command docs we found; instead, the underlying Azure Resource Manager delete operation treats HTTP 404 responses as successful for deletions (Resource Manager issues GET and expects 404; it considers deletion successful when it gets 404).[4] - Practically, this means: if Azure returns “not found” for the delete, it is treated as successful by ARM deletion logic, so you should expect the CLI not to fail the operation purely due to “already missing”, but the exact CLI exit code isn’t explicitly documented in the sources above.[4][5] Bottom line for scripting - Use stdout (true/false) foraz group existsrather than$?/exit code.[3] - Foraz group delete --yes, treat “resource group not found” as a successful/no-op at the ARM level (404 is treated as success), but if you need an exact exit code guarantee, you’ll have to test in your environment since the CLI docs we located don’t spell out the specific exit code for the not-found case.[4][1]
Citations:
- 1: https://learn.microsoft.com/en-us/cli/azure/group?view=azure-cli-latest
- 2: https://stackoverflow.com/questions/46458034/azure-cli-how-to-check-if-a-resource-exists
- 3: az group exists does not return false Azure/azure-cli#8594
- 4: https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/delete-resource-group?tabs=azure-powershell
- 5: https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/delete-resource-group
Make the cleanup script idempotent to avoid failures in post-steps.
The script hard-fails when the marker file is missing (exits with code 1) and deletes unconditionally without checking if the resource group still exists. In post-step execution, this causes noisy failures when these conditions aren't met. The script should treat "nothing to delete" and "already deleted" as success states.
The suggested changes are valid:
- Exit 0 when the marker file is missing (the operation has nothing to do, which is not an error in post-cleanup)
- Check if the resource group exists before deletion using
az group exists, which reliably returnstrueorfalseto stdout (exit code cannot be used reliably for this check) - Handle the case where the resource group is already absent with a success message
This ensures the post-step completes without error whether resources were previously provisioned or already cleaned up.
Suggested fix
-if [ ! -f "${SHARED_DIR}/az-resource-group" ]; then
- echo "[ERROR] az-resource-group was not placed in SHARED_DIR"
- exit 1
+if [ ! -f "${SHARED_DIR}/az-resource-group" ]; then
+ echo "[WARN] az-resource-group not found; skipping Azure cleanup."
+ exit 0
fi
@@
-az_resource_group=$(cat "${SHARED_DIR}/az-resource-group")
+az_resource_group="$(<"${SHARED_DIR}/az-resource-group")"
+if [ -z "${az_resource_group}" ]; then
+ echo "[WARN] Empty az-resource-group; skipping Azure cleanup."
+ exit 0
+fi
echo "[INFO] Delete Kind VM resource group $az_resource_group"
-az group delete --name "$az_resource_group" --yes
+if [ "$(az group exists --name "$az_resource_group")" = "true" ]; then
+ az group delete --name "$az_resource_group" --yes
+else
+ echo "[INFO] Resource group $az_resource_group already absent; nothing to delete."
+fi
echo "[SUCCESS] Deleted Kind VM resource group $az_resource_group"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ ! -f "${SHARED_DIR}/az-resource-group" ]; then | |
| echo "[ERROR] az-resource-group was not placed in SHARED_DIR" | |
| exit 1 | |
| fi | |
| rpm --import https://packages.microsoft.com/keys/microsoft.asc | |
| dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm | |
| dnf install -y azure-cli | |
| secret_base=/var/run/azure-upstream-ci | |
| az login --service-principal \ | |
| --username "$(cat $secret_base/client-id)" \ | |
| --password "$(cat $secret_base/client-secret)" \ | |
| --tenant "$(cat $secret_base/tenant-id)" | |
| az_resource_group=$(cat "${SHARED_DIR}/az-resource-group") | |
| echo "[INFO] Delete Kind VM resource group $az_resource_group" | |
| az group delete --name "$az_resource_group" --yes | |
| echo "[SUCCESS] Deleted Kind VM resource group $az_resource_group" | |
| if [ ! -f "${SHARED_DIR}/az-resource-group" ]; then | |
| echo "[WARN] az-resource-group not found; skipping Azure cleanup." | |
| exit 0 | |
| fi | |
| rpm --import https://packages.microsoft.com/keys/microsoft.asc | |
| dnf install -y https://packages.microsoft.com/config/rhel/9.0/packages-microsoft-prod.rpm | |
| dnf install -y azure-cli | |
| secret_base=/var/run/azure-upstream-ci | |
| az login --service-principal \ | |
| --username "$(cat $secret_base/client-id)" \ | |
| --password "$(cat $secret_base/client-secret)" \ | |
| --tenant "$(cat $secret_base/tenant-id)" | |
| az_resource_group="$(<"${SHARED_DIR}/az-resource-group")" | |
| if [ -z "${az_resource_group}" ]; then | |
| echo "[WARN] Empty az-resource-group; skipping Azure cleanup." | |
| exit 0 | |
| fi | |
| echo "[INFO] Delete Kind VM resource group $az_resource_group" | |
| if [ "$(az group exists --name "$az_resource_group")" = "true" ]; then | |
| az group delete --name "$az_resource_group" --yes | |
| else | |
| echo "[INFO] Resource group $az_resource_group already absent; nothing to delete." | |
| fi | |
| echo "[SUCCESS] Deleted Kind VM resource group $az_resource_group" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-deprovision/deprovision.sh`
around lines 9 - 27, Make the cleanup script idempotent: if the marker file
"${SHARED_DIR}/az-resource-group" is missing, exit 0 instead of failing; if
present, read az_resource_group and call "az group exists --name
\"$az_resource_group\"" and use its stdout ("true"/"false") to decide whether to
delete; only run "az group delete --name \"$az_resource_group\" --yes" when
exists returns true and print a success message for both "deleted" and "already
absent" cases; keep the existing az login (secret_base/client-id, client-secret,
tenant-id) flow but ensure missing marker file is treated as no-op and use "az
group exists" to guard deletion.
| if [ -z "${SHARED_DIR}" ]; then | ||
| log_error "SHARED_DIR is not set. This script must run in Prow CI environment." | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
SHARED_DIR guard fails under set -u.
Because the shebang is #!/bin/bash -eu, ${SHARED_DIR} is expanded under nounset. If SHARED_DIR is unset, this line aborts with SHARED_DIR: unbound variable before the helpful log_error message ever runs, defeating the purpose of the check.
🛡️ Use a default expansion so the guard can actually fire
-if [ -z "${SHARED_DIR}" ]; then
+if [ -z "${SHARED_DIR:-}" ]; then
log_error "SHARED_DIR is not set. This script must run in Prow CI environment."
exit 1
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [ -z "${SHARED_DIR}" ]; then | |
| log_error "SHARED_DIR is not set. This script must run in Prow CI environment." | |
| exit 1 | |
| fi | |
| if [ -z "${SHARED_DIR:-}" ]; then | |
| log_error "SHARED_DIR is not set. This script must run in Prow CI environment." | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`
around lines 20 - 23, The guard checking SHARED_DIR fails under nounset (-u)
because ${SHARED_DIR} is expanded when unset; change the conditional to use a
safe default expansion such as if [ -z "${SHARED_DIR:-}" ]; then so the test can
run even if SHARED_DIR is unset, keep the existing log_error and exit 1
behavior, and update any other checks that reference SHARED_DIR in this script
(e.g., replace bare ${SHARED_DIR} expansions with ${SHARED_DIR:-} or
${SHARED_DIR:+...} as appropriate) so the script respects the #!/bin/bash -eu
shebang.
| az_region=eastus | ||
| az_resource_group=upstream-ci-$test_id | ||
| echo "$az_resource_group" > "$SHARED_DIR/az-resource-group" | ||
| kind_vm_user=ci | ||
| kind_vm_name=kind-vm | ||
| kind_vm_image=$(grep KIND_HOST_URN operator/Makefile | cut -d= -f2 | tr -d ' ') | ||
| vm_size=Standard_D2s_v3 | ||
|
|
||
| AZURE_SUBSCRIPTION_ID=$(cat $secret_base/subscription-id) | ||
| export AZURE_SUBSCRIPTION_ID | ||
| az login --service-principal \ | ||
| --username "$(cat $secret_base/client-id)" \ | ||
| --password "$(cat $secret_base/client-secret)" \ | ||
| --tenant "$(cat $secret_base/tenant-id)" | ||
|
|
||
| log_info "Create Azure resource group $az_resource_group" | ||
| az group create \ | ||
| --location $az_region \ | ||
| --resource-group "$az_resource_group" |
There was a problem hiding this comment.
Resource group name is recorded before the group actually exists.
echo "$az_resource_group" > "$SHARED_DIR/az-resource-group" (line 45) runs before az login and az group create (lines 53–61). If login or group creation fails, the cleanup step downstream will attempt to delete a resource group that was never created. Depending on how azure-deprovision/deprovision.sh is implemented, this can either be a benign 404 or a failing cleanup that masks the original error.
Recommend persisting the RG name only after a successful az group create, or making the deprovision step explicitly tolerate a missing group (e.g. az group exists first, or az group delete --no-wait and ignore "not found").
♻️ Move the SHARED_DIR write to after group creation
az_resource_group=upstream-ci-$test_id
-echo "$az_resource_group" > "$SHARED_DIR/az-resource-group"
kind_vm_user=ci
@@
log_info "Create Azure resource group $az_resource_group"
az group create \
--location $az_region \
--resource-group "$az_resource_group"
+echo "$az_resource_group" > "$SHARED_DIR/az-resource-group"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| az_region=eastus | |
| az_resource_group=upstream-ci-$test_id | |
| echo "$az_resource_group" > "$SHARED_DIR/az-resource-group" | |
| kind_vm_user=ci | |
| kind_vm_name=kind-vm | |
| kind_vm_image=$(grep KIND_HOST_URN operator/Makefile | cut -d= -f2 | tr -d ' ') | |
| vm_size=Standard_D2s_v3 | |
| AZURE_SUBSCRIPTION_ID=$(cat $secret_base/subscription-id) | |
| export AZURE_SUBSCRIPTION_ID | |
| az login --service-principal \ | |
| --username "$(cat $secret_base/client-id)" \ | |
| --password "$(cat $secret_base/client-secret)" \ | |
| --tenant "$(cat $secret_base/tenant-id)" | |
| log_info "Create Azure resource group $az_resource_group" | |
| az group create \ | |
| --location $az_region \ | |
| --resource-group "$az_resource_group" | |
| az_region=eastus | |
| az_resource_group=upstream-ci-$test_id | |
| kind_vm_user=ci | |
| kind_vm_name=kind-vm | |
| kind_vm_image=$(grep KIND_HOST_URN operator/Makefile | cut -d= -f2 | tr -d ' ') | |
| vm_size=Standard_D2s_v3 | |
| AZURE_SUBSCRIPTION_ID=$(cat $secret_base/subscription-id) | |
| export AZURE_SUBSCRIPTION_ID | |
| az login --service-principal \ | |
| --username "$(cat $secret_base/client-id)" \ | |
| --password "$(cat $secret_base/client-secret)" \ | |
| --tenant "$(cat $secret_base/tenant-id)" | |
| log_info "Create Azure resource group $az_resource_group" | |
| az group create \ | |
| --location $az_region \ | |
| --resource-group "$az_resource_group" | |
| echo "$az_resource_group" > "$SHARED_DIR/az-resource-group" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`
around lines 43 - 61, The script writes the resource group name to
$SHARED_DIR/az-resource-group before authentication and creation, which can
cause cleanup to try deleting a non-existent RG; move the write so it happens
only after az login and a successful az group create (i.e., after the az group
create call that uses az_resource_group), or alternatively make the downstream
deprovision tolerant of a missing group (check az group exists or handle "not
found" in azure-deprovision/deprovision.sh). Update the sequence around az
login, az group create and the SHARED_DIR/az-resource-group write so the file is
persisted only on success.
| ports=(6443 8000 8080) | ||
| for i in "${!ports[@]}"; do | ||
| port=${ports[$i]} | ||
| az network nsg rule create \ | ||
| --resource-group "$az_resource_group" \ | ||
| --nsg-name "$nsg" \ | ||
| --name "allow-$port" \ | ||
| --priority $((1001 + i)) \ | ||
| --source-address-prefixes "*" \ | ||
| --destination-port-ranges "$port" \ | ||
| --protocol Tcp \ | ||
| --access Allow \ | ||
| --direction Inbound | ||
| done |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Azure CLI az network nsg rule create source-address-prefixes scope to single IP example
💡 Result:
In Azure CLI, you set the source match for an NSG rule with --source-address-prefixes. It takes a space-separated list of CIDR prefixes or IP ranges, so a single IP works when you pass just one value [1]. Single IP example: az network nsg rule create \ --resource-group \ --nsg-name \ --name \ --priority \ --direction Inbound \ --access Allow \ --protocol Tcp \ --source-address-prefixes 203.0.113.10 \ --destination-address-prefixes '*' \ --destination-port-ranges 22 [1] Multiple IPs/prefixes: pass multiple values as separate CLI arguments (space-separated) [1]. Common pitfall (variables): if you’re using a shell that passes the variable as a single token, ensure the CLI receives multiple tokens (e.g., an array expansion / separate arguments). If you pass a single string that contains embedded commas/spaces, it may be treated as one invalid address prefix and fail validation [2].
Citations:
- 1: https://learn.microsoft.com/en-us/cli/azure/network/nsg/rule?view=azure-cli-latest
- 2: https://stackoverflow.com/questions/62654936/azure-network-security-group-add-source-ip-prefixes-using-variable
🏁 Script executed:
# Read the test.sh file to understand the context
head -110 ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh | tail -40Repository: openshift/release
Length of output: 1168
🏁 Script executed:
# Get more context around the suspicious lines and understand what this script does
head -20 ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.shRepository: openshift/release
Length of output: 274
🏁 Script executed:
# Check if there's any documentation about this script
ls -la ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/Repository: openshift/release
Length of output: 413
🏁 Script executed:
# Read the test.yaml to understand the test setup
cat ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.yamlRepository: openshift/release
Length of output: 588
🏁 Script executed:
# Check more of the test.sh to understand what runs on ports 8000/8080
wc -l ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.shRepository: openshift/release
Length of output: 150
🏁 Script executed:
# Read the full test.sh to understand what services run on these ports
sed -n '105,150p' ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.shRepository: openshift/release
Length of output: 1328
🏁 Script executed:
# Search for what uses ports 8000 and 8080 in the test script
grep -n "8000\|8080" ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.shRepository: openshift/release
Length of output: 87
🏁 Script executed:
# Check if there's any authentication configuration mentioned
grep -n "anonymous\|auth\|insecure" ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.shRepository: openshift/release
Length of output: 151
Restrict Kubernetes API server access from the public internet.
Port 6443 (kube-apiserver) is exposed to the entire internet via --source-address-prefixes "*". The Kind cluster configuration explicitly binds the API server to 0.0.0.0 on this port, creating an unnecessary attack surface. Even for ephemeral CI VMs, this risks exposure to API CVEs, authn/authz misconfigurations, and probing attacks.
Scope --source-address-prefixes to known origins such as the Prow build cluster egress IP range or the public IP of the host running this script (e.g., curl -fsS https://api.ipify.org). Access from the CI host using kubeconfig is sufficient for cluster setup and testing without requiring world-open ingress.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`
around lines 91 - 104, The NSG rule loop currently opens port 6443 to the
internet by using --source-address-prefixes "*" in the az network nsg rule
create call; change the logic in the ports/loop (ports=(6443 8000 8080) and the
for loop that calls az network nsg rule create) to scope
--source-address-prefixes for the kube-apiserver port (6443) to a specific
origin (e.g., fetch the CI host or Prow egress public IP and use it as CIDR/32)
while keeping the other ports as-is; locate the az network nsg rule create
invocation and add conditional handling so when port == 6443 you substitute the
fetched public IP (or a configured allowed CIDR) instead of "*" for
--source-address-prefixes.
| log_info "Run integration tests" | ||
| eval "$(ssh-agent)" | ||
| PLATFORM=kind_public VIRT_PROVIDER=azure REGISTRY=localhost:5000 \ | ||
| TEST_NAMESPACE_PREFIX="$az_resource_group-" \ | ||
| TEST_IMAGE=$(cat $secret_base/test-image) \ | ||
| CLUSTER_URL="$kind_vm_ip" \ | ||
| make integration-tests | ||
| log_success "Ran integration tests" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Look at the operator's integration-tests target to see whether it relies on ssh-agent.
fd -i -t f 'Makefile|\.mk$' --exec rg -nP -C5 '\b(integration-tests|ssh-add|SSH_AUTH_SOCK)\b' {}Repository: openshift/release
Length of output: 43
🏁 Script executed:
# First, verify the test.sh file exists and see context
cd ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test 2>/dev/null && ls -la test.sh 2>/dev/null || echo "File not found in expected path"Repository: openshift/release
Length of output: 118
🏁 Script executed:
# Search for the test.sh file more broadly
find . -name "test.sh" -path "*/trusted-execution-clusters/*" -type f 2>/dev/null | head -20Repository: openshift/release
Length of output: 148
🏁 Script executed:
# Look for any Makefile in the repo
fd -i -t f '(Makefile|makefile)' 2>/dev/null | head -20Repository: openshift/release
Length of output: 366
🏁 Script executed:
# Search for ssh-add and ssh-agent usage in the repository
rg -i 'ssh-add|ssh-agent' --type-list | head -5
rg -i 'ssh-add|ssh-agent' -l 2>/dev/null | head -20Repository: openshift/release
Length of output: 836
🏁 Script executed:
# Read the test.sh file to see the context around lines 186-193
cat -n ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh | sed -n '170,200p'Repository: openshift/release
Length of output: 1213
🏁 Script executed:
# Check if ssh-add is called anywhere in test.sh
rg -n 'ssh-add' ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.shRepository: openshift/release
Length of output: 43
🏁 Script executed:
# Check for SSH_AUTH_SOCK or other ssh-agent references in test.sh
rg -n 'SSH_AUTH_SOCK|ssh-agent' ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.shRepository: openshift/release
Length of output: 85
🏁 Script executed:
# Find where the integration-tests make target is defined - check the root Makefile first
rg -n 'integration-tests:' MakefileRepository: openshift/release
Length of output: 43
🏁 Script executed:
# Search for integration-tests in all files to understand what this target does
rg -n 'integration-tests' ci-operator/step-registry/trusted-execution-clusters/ --type-list=shell 2>/dev/null | head -20Repository: openshift/release
Length of output: 43
🏁 Script executed:
# Better search for integration-tests
rg -n 'integration-tests' ci-operator/step-registry/trusted-execution-clusters/ | head -30Repository: openshift/release
Length of output: 920
🏁 Script executed:
# Look at the test-commands.sh file around line 551 where it calls make integration-tests
sed -n '530,570p' ci-operator/step-registry/trusted-execution-clusters/ref/operator/test/trusted-execution-clusters-ref-operator-test-commands.shRepository: openshift/release
Length of output: 1804
🏁 Script executed:
# Check if ssh-agent is used in test-commands.sh at all
rg -n 'ssh-agent\|ssh-add' ci-operator/step-registry/trusted-execution-clusters/ref/operator/test/trusted-execution-clusters-ref-operator-test-commands.shRepository: openshift/release
Length of output: 43
🏁 Script executed:
# Get more context from the beginning of test.sh to understand the overall flow
head -50 ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.shRepository: openshift/release
Length of output: 1281
🏁 Script executed:
# Look for any documentation or comments about ssh-agent usage
rg -B5 -A5 'ssh-agent' ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.shRepository: openshift/release
Length of output: 554
ssh-agent is started but no key is added.
eval "$(ssh-agent)" launches an agent but nothing ever calls ssh-add, so SSH_AUTH_SOCK points at an empty agent. If make integration-tests relies on agent-based auth (e.g., to ssh into the Kind VM), it will fail; if it doesn't, this line is dead code that just leaks a process. Notably, the comparable test suite in trusted-execution-clusters-ref-operator-test-commands.sh runs make integration-tests without any ssh-agent setup.
Please confirm whether the make target needs an agent and either add the corresponding ssh-add call with the appropriate key, or drop the line.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/trusted-execution-clusters/ref/operator/azure-test/test.sh`
around lines 186 - 193, The ssh-agent is started by eval "$(ssh-agent)" but no
key is added, so either remove the ssh-agent startup or add an ssh-add step
before calling make integration-tests; update the test script around eval
"$(ssh-agent)" and the make integration-tests invocation to either (1) drop the
eval line entirely (matching
trusted-execution-clusters-ref-operator-test-commands.sh) if agent auth is not
required, or (2) call ssh-add with the correct private key (e.g., ssh-add
"$SSH_KEY_PATH" or ssh-add "$secret_base/id_rsa") and ensure SSH_AUTH_SOCK is
set so make integration-tests can use agent auth. Ensure the change touches the
block containing eval "$(ssh-agent)" and the make integration-tests invocation
so the agent is either populated or removed.
|
@Jakob-Naucke, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
Create a Kind VM (also on Azure) as container platform. Run integration tests on Azure. Signed-off-by: Jakob Naucke <jnaucke@redhat.com>
|
[REHEARSALNOTIFIER]
Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
/pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
/pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
last attempt today (none are scheduling) /pj-rehearse |
|
@Jakob-Naucke: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
Create a Kind VM (also on Azure) as container platform. Run integration tests on Azure.
Azure Upstream CI for trusted-execution-clusters Operator
This PR introduces Azure upstream CI infrastructure for the
trusted-execution-clusters/operatorrepository, enabling integration testing on Azure cloud infrastructure.CI Configuration Changes
Added
ci-operator/config/trusted-execution-clusters/operator/azure.yamlto define the main test joboperator-lifecycle-azure-verify. This job:trusted-execution-clusters-operator-azure-lifecycletest chaintrusted-execution-clusters-operator-azure-cleanupcleanup chain afterwardtelco-runner:latestcontainer image for test executionTest Infrastructure
Added two main CI step chains:
Azure Lifecycle Chain (
trusted-execution-clusters-operator-azure-lifecycle): Provisioning and testing flow that:Azure Cleanup Chain (
trusted-execution-clusters-operator-azure-cleanup): Resource cleanup that:Implementation Details
The test step (
test.sh) handles the full provisioning workflow:The deprovision step (
deprovision.sh) performs cleanup:Governance
Added OWNERS files specifying
alicefr,Jakob-Naucke,yalzhang, andfangge1212as reviewers/approvers for the Azure test infrastructure components, along with corresponding metadata JSON files for tracking.