Skip to content

docs: improve GCP UPI install guide based on real-world verification#10515

Draft
rochacbruno wants to merge 2 commits intoopenshift:mainfrom
rochacbruno:no-jira/improve-gcp-upi-docs
Draft

docs: improve GCP UPI install guide based on real-world verification#10515
rochacbruno wants to merge 2 commits intoopenshift:mainfrom
rochacbruno:no-jira/improve-gcp-upi-docs

Conversation

@rochacbruno
Copy link
Copy Markdown
Member

@rochacbruno rochacbruno commented Apr 23, 2026

Summary

  • Increase signed URL duration from 1h to 2h to survive deployment retries caused by zone capacity or Cloud Build transient failures
  • Add warning about manifest regeneration changing the Infrastructure ID, which causes DNS zone mismatches and ingress operator failures
  • Clarify why mastersSchedulable: false is critical (ingress LB cannot reach router pods on masters, causing connection refused)
  • Add notes about zone capacity fallback options and e2-standard-4 as alternative machine type
  • Warn that Infrastructure Manager does not recreate VMs on metadata-only changes (Ignition only runs on first boot)
  • Add batch CSR approval command and background approval loop for easier node onboarding
  • Add troubleshooting section covering signed URL expiration, zone capacity, VM recreation, and IAP SSH debugging

Context

These improvements come from hands-on verification testing of OCPSTRAT-2830 (GCP UPI migration from Deployment Manager to Terraform/Infrastructure Manager). All 8 Terraform template stages were verified successfully, but the documentation gaps led to several avoidable issues during the process.

Test plan

  • Verify markdown renders correctly on GitHub
  • Confirm all links and cross-references are valid
  • Run through the guide with a fresh cluster deployment

Generated with Claude Code

Summary by CodeRabbit

  • Documentation
    • Documented ingress failure symptoms and detection guidance
    • Stronger warnings about regenerating manifests/ignition to avoid DNS desync
    • Updated bootstrap signed URL validity to 2 hours
    • Added guidance on zone capacity, bootstrap redeploy requirements, and CSR bulk-approval automation
    • New troubleshooting section including remote debugging via IAP SSH tunneling

Add practical guidance discovered during hands-on OCPSTRAT-2830
verification testing of the Terraform/Infrastructure Manager templates:

- Increase signed URL duration from 1h to 2h to survive deployment retries
- Add warning about manifest regeneration changing the Infrastructure ID
- Explain ingress degradation when masters are schedulable
- Add notes about zone capacity fallback and machine type alternatives
- Warn that Infrastructure Manager does not recreate VMs on metadata changes
- Add batch CSR approval command and background approval loop
- Add troubleshooting section covering common failure scenarios
- Add IAP tunnel firewall rule for debugging nodes without public IPs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 23, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Walkthrough

Documentation for GCP UPI installation updated: documents ingress degradation symptoms, warns against regenerating manifests/ignition after extracting INFRA_ID, extends bootstrap signed URL lifetime from 1h to 2h, adds zone capacity and redeploy guidance, expands CSR approval commands, and adds a Troubleshooting section including IAP SSH tunneling.

Changes

Cohort / File(s) Summary
GCP UPI Installation Documentation
docs/user/gcp/install_upi.md
Added expected ingress degradation symptoms (Degraded=True, connection refused health-checks), strong warning against regenerating manifests/ignition after INFRA_ID extraction, extended bootstrap signed URL example to 2 hours, guidance on zone capacity and bootstrap redeploy (delete existing bootstrap VM when URL changes), expanded CSR bulk/loop approval commands, and a new Troubleshooting section covering expired URLs, capacity limits, metadata-only Ignition behavior, and IAP SSH tunneling for nodes without public IPs.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 12
✅ Passed checks (12 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately summarizes the main change—improving the GCP UPI installation guide based on real-world verification testing.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Documentation-only PR with no Ginkgo test code changes; check not applicable.
Test Structure And Quality ✅ Passed The custom check is not applicable to this PR. The PR only modifies documentation files and contains no Ginkgo test code.
Microshift Test Compatibility ✅ Passed This pull request contains only documentation updates and no new Ginkgo e2e tests, making this check not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Pull request contains only documentation updates to docs/user/gcp/install_upi.md without new Ginkgo e2e tests.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only documentation (docs/user/gcp/install_upi.md). No deployment manifests, operator code, or controller code are changed, so the scheduling constraints check does not apply.
Ote Binary Stdout Contract ✅ Passed The OTE Binary Stdout Contract check is not applicable to this PR as it exclusively modifies documentation files with no Go code, test code, or binary implementations.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR only modifies documentation (docs/user/gcp/install_upi.md), a markdown file, and does not add any Ginkgo e2e test files. The custom check assessing IPv6 and disconnected network compatibility is not applicable.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign patrickdillon for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
docs/user/gcp/install_upi.md (1)

906-947: Troubleshooting section is strong; consider adding firewall-rule cleanup guidance.

Nice addition overall. Optional hardening: note that the temporary IAP SSH firewall rule should be removed after debugging to reduce long-lived administrative exposure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/user/gcp/install_upi.md` around lines 906 - 947, Add a brief note after
the IAP SSH firewall-rule creation and usage that instructs operators to remove
the temporary firewall rule when finished (reference the created rule name
pattern `${INFRA_ID}-allow-iap-ssh` and the `gcloud compute firewall-rules`
command used to create it) and to remove the target tags from instances if they
were added for debugging, showing the cleanup action as a single step to reduce
long-lived administrative exposure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/user/gcp/install_upi.md`:
- Around line 505-513: The two adjacent blockquotes starting with "**Note**: If
a zone..." and "**Warning**: If you need to redeploy..." are broken by a blank
line inside the blockquote; remove the empty line so the blockquote remains
continuous (or ensure each paragraph keeps the leading ">" on every line) so
markdownlint MD028 is satisfied and the "**Note**" and "**Warning**" lines
remain inside a single valid blockquote.
- Around line 737-751: The current bulk CSR approval commands approve every
pending CSR; change them to only select CSRs where the requestor username
matches known node bootstrap/node identities before piping to oc adm certificate
approve. Update the oc get csr invocation (keep using oc get csr and oc adm
certificate approve) to filter on the username field (e.g. via -o go-template or
-o jsonpath) by matching .spec.username (or the JSONPath expression that selects
usernames) against your expected node requestor list (for example
system:node:<name> or system:node-bootstrap) and only emit those .metadata.name
values; apply the same filtered selection inside your background loop so only
approved CSRs come from known node requestors.

---

Nitpick comments:
In `@docs/user/gcp/install_upi.md`:
- Around line 906-947: Add a brief note after the IAP SSH firewall-rule creation
and usage that instructs operators to remove the temporary firewall rule when
finished (reference the created rule name pattern `${INFRA_ID}-allow-iap-ssh`
and the `gcloud compute firewall-rules` command used to create it) and to remove
the target tags from instances if they were added for debugging, showing the
cleanup action as a single step to reduce long-lived administrative exposure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7a8fb579-f0ef-4bb5-957a-fde6523efa2a

📥 Commits

Reviewing files that changed from the base of the PR and between 0bd82bc and 80fc0fb.

📒 Files selected for processing (1)
  • docs/user/gcp/install_upi.md

Comment thread docs/user/gcp/install_upi.md
Comment thread docs/user/gcp/install_upi.md Outdated
Fix blockquote continuity (MD028) between Note and Warning blocks, and
filter CSR approval commands to only approve CSRs from known node
requestors (node-bootstrapper and system:node:*) instead of all pending
CSRs indiscriminately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/user/gcp/install_upi.md (1)

11-20: ⚠️ Potential issue | 🟡 Minor

Add jq to the prerequisites list.

The document uses jq extensively for JSON processing (extracting metadata at lines 236, 260-266, parsing CSR output at lines 740-748, 756-763, and many other locations), but jq is not listed in the required binaries. Users without jq installed will encounter command failures.

📋 Suggested prerequisite addition

Add jq to the binaries list:

 * the following binaries installed and in $PATH:
   * gcloud
   * python
+  * jq
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/user/gcp/install_upi.md` around lines 11 - 20, The prerequisites list is
missing the jq binary which the guide repeatedly uses for JSON processing (e.g.,
alongside gcloud and python entries and the PyYAML bullet); update the
prerequisites bullet list that currently lists "gcloud", "python", and "PyYAML"
to also include "jq" so users are informed to install it before running the
documented commands.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@docs/user/gcp/install_upi.md`:
- Around line 11-20: The prerequisites list is missing the jq binary which the
guide repeatedly uses for JSON processing (e.g., alongside gcloud and python
entries and the PyYAML bullet); update the prerequisites bullet list that
currently lists "gcloud", "python", and "PyYAML" to also include "jq" so users
are informed to install it before running the documented commands.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4399d398-1f31-4322-9ebe-fa20914a5bfc

📥 Commits

Reviewing files that changed from the base of the PR and between 80fc0fb and 75ee033.

📒 Files selected for processing (1)
  • docs/user/gcp/install_upi.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant