METAL-156: Use Live ISO for baremetal bootstrap VM#9814
METAL-156: Use Live ISO for baremetal bootstrap VM#9814openshift-merge-bot[bot] merged 14 commits intoopenshift:mainfrom
Conversation
|
@zaneb: This pull request references METAL-156 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cc @bfournie @honza |
4476d96 to
400865d
Compare
|
LOL, virtualmedia now works (thanks to openshift/image-customization-controller#143), but PXE (which previously worked) has started failing with: due to the fix for OSSA-2025-001. |
|
/retest-required |
|
/test e2e-metal-ipi-ovn-dualstack |
|
/test e2e-metal-ipi-ovn-ipv6 |
|
/retest |
|
Tests should be fixed again by openshift/image-customization-controller#147 when it lands. |
4d939c2 to
12107f3
Compare
|
PXE is broken again because metal3-io/ironic-image#681 overrode the default |
|
openshift/ironic-image#702 should get PXE working again, if nothing else breaks it in the meantime. |
|
/retest |
1 similar comment
|
/retest |
| Environment="XDG_RUNTIME_DIR=/run/user/${UID}" | ||
| Environment="KUBECONFIG=/opt/openshift/auth/kubeconfig-loopback" | ||
| Environment="DEPLOY_KERNEL_URL=file:///shared/html/images/ironic-python-agent.kernel" | ||
| Environment="DEPLOY_KERNEL_URL=file:///var/lib/ironic/pxe/vmlinuz" |
There was a problem hiding this comment.
Not super critical, but nowadays this variable should be passed to Ironic, not BMO (although I haven't checked if the downstream fork already has the required change).
There was a problem hiding this comment.
It's not being passed to ironic now, so I guess that change hasn't made it to the bootstrap yet.
There was a problem hiding this comment.
It's either Ironic or BMO, both work. But the future-proof way is to pass it to Ironic. No urgency, we'll clean it up one day.
|
/lgtm from metal platform perspective (but you have a merge conflict again) |
Separate the non-agent-specific parts out into the rhcos package, leaving the agent-specific parts behind in the agent/image package.
Use the same struct to marshal and unmarshal. Presumably this wasn't previously possible when going through actual Terraform.
The amount of storage assigned to the Live /dev/loop0 partition in a Live image boot is proportional to the total amount of RAM. With a 6GiB VM, /dev/loop0 gets a little less than 3GiB. After running the node-image-pull.service there is typically about 2.8GiB used on /dev/loop0 (of which <200MiB was used before running the service). In a 6GiB VM this causes the "ostree container image pull" process to fail due to insufficient free space on the disk. 20GiB of RAM allows us to pull both the ostree container image and the necessary container images to run the metal bootstrap services.
Instead of downloading an RHCOS qemu image for the baremetal bootstrap - which disconnected users have to mirror locally - reuse the agent installer code for fetching a live ISO. If the `oc` binary is present, the live ISO will be retrieved from the release image (respecting the mirror config), which means disconnected users have nothing else to do. In the case that `oc` is not present, we fall back to downloading the image directly (this requires a connection to the Internet). In any case, the old qemu image is no longer used, and if a mirror URL is configured in the install-config it will be ignored.
If the cluster is FIPS mode, we should enable FIPS on the baremetal bootstrap VM also.
When the bootstrap is booted from a live ISO, we have access to both the ISO image and the PXE boot components in it directly from the host filesystem; there is no need to extract it again from the release image. This saves a lot of RAM in the live environment. We must pass SecurityLabelDisable=true because the mounted live ISO is a read-only filesystem, and therefore we are unable to relabel the files with the necessary SELinux context.
We have almost 9GiB of data to store in /var (mostly container images in the crio storage), and since only 50% of RAM is used for the tmpfs this requires an inordinate amount of RAM. Create a volume for mounting as /var so that none of this data ends up in the tmpfs. This allows us to return to the original RAM allocation of 6GiB. Avoid creating a new tmpfs for node-image-pull.service to pull the ostree repo, and instead let it land on disk. This is added to the ignition during the creation of the VM, so other use cases for the baremetal bootstrap ignition (e.g. assisted installer) cannot be affected.
This has not been required since 4.10, as we now can deploy the RHCOS live ISO to disk directly, and there is no need for a separate qcow image.
This field is neither required nor used if it is provided.
Doing systemctl isolate results in killing node-image-finish.service itself, which is thereafter reported as a failed unit. Stop the unit without killing the command so that this does not show up as a failure.
Avoid modifying the bootstrap user ignition, and instead add the disk configuration for the bootstrap VM to the system ignition.
|
/lgtm |
|
/retest-required |
|
@zaneb: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/verified by sgoveas link |
|
@sgoveas: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
ed29c38
into
openshift:main
Instead of downloading an RHCOS qemu image for the baremetal bootstrap - which disconnected users have to mirror locally - reuse the agent installer code for fetching a live ISO.
If the
ocbinary is present, the live ISO will be retrieved from the release image (respecting the mirror config), which means disconnected users have nothing else to do. In the case thatocis not present, we fall back to downloading the image directly (this requires a connection to the Internet).In any case, the old qemu image is no longer used, and if a mirror URL is configured in the install-config it will be ignored.