MCO-2200: refactored node-joiner to use embedded rhcos data#10537
MCO-2200: refactored node-joiner to use embedded rhcos data#10537andfasano wants to merge 3 commits intoopenshift:mainfrom
Conversation
|
@andfasano: This pull request references MCO-2200 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughBase RHCOS artifact fetching was centralized: injected CoreOS stream fetchers were removed in favor of a package-level metal-artifact getter. Call sites and signatures across rhcos, agent image, imagebased, and joiner code were simplified, and ClusterInfo no longer stores OS image metadata. ChangesCoreOS Artifact Fetch Consolidation
Sequence Diagram(s)sequenceDiagram
participant Agent as AgentImage / Ignition
participant Base as BaseIso / ReleasePayload
participant RHCOS as rhcos.GetMetalArtifact
participant Cache as Cache/Storage
Agent->>Base: Request base ISO or rootfs URL (ctx, arch)
Base->>RHCOS: GetMetalArtifact(ctx, arch)
RHCOS-->>Base: Return metal artifact (iso, installer metadata)
Base->>Cache: Check/verify or download ISO using artifact info
Cache-->>Base: Return cached or downloaded ISO + hash
Base-->>Agent: Return ISO path / rootfs URL
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/asset/rhcos/releaseextract.go (1)
269-319:⚠️ Potential issue | 🟠 Major | ⚡ Quick winCache validation can incorrectly trust default-stream metadata over the actual release payload.
On Line 273 you fetch hash data from
rhcos.DefaultOSImageStream, and on Line 317 you return early when that hash matches. That short-circuits the release-payload SHA check (Lines 330-344), so a cached ISO can be accepted even if it does not match the specificimagebeing installed.Suggested fix
- // Check if the hash of cached file matches hash in rhcos.json - found, rhcosSha := r.getHashFromInstaller(architecture) - if found && matchingHash(fileSha, rhcosSha) { - logrus.Debug("Found matching hash in installer metadata") - return true, nil - } - - // If no match, get the file containing the coreos sha256 and compare that + // First, compare against release payload SHA (authoritative for `image`) tempDir, err := os.MkdirTemp("", "cache") if err != nil { return false, err @@ if matchingHash(fileSha, string(payloadSha)) { logrus.Debugf("Found matching hash in %s", shaFilename) return true, nil } + + // Fallback to installer metadata only when payload SHA cannot confirm a match. + found, rhcosSha := r.getHashFromInstaller(architecture) + if found && matchingHash(fileSha, rhcosSha) { + logrus.Debug("Found matching hash in installer metadata") + return true, nil + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/asset/rhcos/releaseextract.go` around lines 269 - 319, verifyCacheFile currently trusts the default stream hash from getHashFromInstaller(architecture) and returns early when it matches, which can bypass validating the SHA from the specific release payload referenced by image; change the logic so verifyCacheFile first extracts the RHCOS SHA from the release payload associated with the given image (instead of or before calling getHashFromInstaller), compare fileSha against that release-specific SHA, and only if no release-payload SHA is available fall back to using getHashFromInstaller(architecture); update getHashFromInstaller or add a new helper to clearly indicate it returns a default-stream/fallback SHA so the code in verifyCacheFile checks payload SHA first, then fallback SHA.
🧹 Nitpick comments (1)
pkg/asset/agent/image/baseiso.go (1)
53-70: 💤 Low valueRemove unused
ClusterInfodependency.The
clusterInfovariable is fetched at line 70 but never used in theGeneratemethod. Remove it from theDependencies()declaration at line 57 and thedependencies.Get()call at line 70.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/asset/agent/image/baseiso.go` around lines 53 - 70, Remove the unused ClusterInfo dependency from BaseIso: in the Dependencies() method of BaseIso, delete the &joiner.ClusterInfo{} entry, and in the Generate(ctx context.Context, dependencies asset.Parents) method remove the clusterInfo variable and its inclusion from the dependencies.Get(...) call so you only fetch the actually used assets (manifests.AgentManifests, mirror.RegistriesConf, workflow.AgentWorkflow / agent.OptionalInstallConfig as applicable).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/nodejoiner/.claude/skills/nodejoiner-command/SKILL.md`:
- Around line 8-9: Update the documentation text in SKILL.md for the nodejoiner
description: fix the typos "builind" → "building", "come" → "code", and
"sligtly" → "slightly", and also correct "is a internal command" to "is an
internal command" so the sentence reads clearly; the sentence references the
nodejoiner used by oc adm nodeimage and the workflow.AgentWorkflowTypeAddNodes
enum — make these wording fixes in that paragraph accordingly.
---
Outside diff comments:
In `@pkg/asset/rhcos/releaseextract.go`:
- Around line 269-319: verifyCacheFile currently trusts the default stream hash
from getHashFromInstaller(architecture) and returns early when it matches, which
can bypass validating the SHA from the specific release payload referenced by
image; change the logic so verifyCacheFile first extracts the RHCOS SHA from the
release payload associated with the given image (instead of or before calling
getHashFromInstaller), compare fileSha against that release-specific SHA, and
only if no release-payload SHA is available fall back to using
getHashFromInstaller(architecture); update getHashFromInstaller or add a new
helper to clearly indicate it returns a default-stream/fallback SHA so the code
in verifyCacheFile checks payload SHA first, then fallback SHA.
---
Nitpick comments:
In `@pkg/asset/agent/image/baseiso.go`:
- Around line 53-70: Remove the unused ClusterInfo dependency from BaseIso: in
the Dependencies() method of BaseIso, delete the &joiner.ClusterInfo{} entry,
and in the Generate(ctx context.Context, dependencies asset.Parents) method
remove the clusterInfo variable and its inclusion from the dependencies.Get(...)
call so you only fetch the actually used assets (manifests.AgentManifests,
mirror.RegistriesConf, workflow.AgentWorkflow / agent.OptionalInstallConfig as
applicable).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8e9c903b-e447-4a5c-b2a8-91faad36afd7
📒 Files selected for processing (11)
pkg/asset/agent/image/agentimage.gopkg/asset/agent/image/baseiso.gopkg/asset/agent/image/ignition.gopkg/asset/agent/image/unconfigured_ignition.gopkg/asset/agent/joiner/clusterinfo.gopkg/asset/agent/joiner/clusterinfo_test.gopkg/asset/rhcos/iso.gopkg/asset/rhcos/iso_test.gopkg/asset/rhcos/releaseextract.gopkg/nodejoiner/.claude/skills/nodejoiner-command/SKILL.mdpkg/nodejoiner/.claude/skills/nodejoiner-command/acceptance/coreos-bootimages-data.md
💤 Files with no reviewable changes (1)
- pkg/asset/agent/joiner/clusterinfo.go
currently it contains just the additional file for removing the bootstrap images management from ClusterInfo
1f8983c to
df11856
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/asset/imagebased/image/baseiso.go (1)
88-94:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPotential nil pointer dereference on
format.Disk.The code checks if the
"iso"format exists in the map but does not verify thatformat.Diskis non-nil before accessing its fields. In stream-metadata-go'sImageFormat, theDiskfield is a pointer and could be nil if only other artifact types (e.g.,Kernel,Initramfs) are set for that format.🛡️ Proposed fix to add nil check
format, ok := metal.Formats["iso"] if !ok { return "", fmt.Errorf("no ISO found to download for %s", archName) } + if format.Disk == nil { + return "", fmt.Errorf("no ISO disk artifact found for %s", archName) + } url := format.Disk.Location sha := format.Disk.Sha256🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/asset/imagebased/image/baseiso.go` around lines 88 - 94, The code uses metal.Formats["iso"] without verifying format.Disk is non-nil, risking a nil pointer when accessing format.Disk.Location and format.Disk.Sha256; update the logic after retrieving format (from metal.Formats["iso"]) to check that format.Disk != nil and return a clear error (including archName) if it's nil, before using format.Disk to set url and sha.
🧹 Nitpick comments (2)
pkg/asset/imagebased/image/baseiso_test.go (2)
52-58: 💤 Low valueRedundant manual env var restoration in defer.
t.Setenvautomatically restores the previous value when the test completes, so manually restoringXDG_CACHE_HOMEandOPENSHIFT_INSTALL_OS_IMAGE_OVERRIDEin the defer is unnecessary. Only theos.RemoveAll(tmpPath)cleanup is needed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/asset/imagebased/image/baseiso_test.go` around lines 52 - 58, The defer in the test currently manually restores environment variables that were set with t.Setenv (XDG_CACHE_HOME and OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE); remove those manual t.Setenv restore calls from the deferred function and only perform the filesystem cleanup (os.RemoveAll(tmpPath)) there so t.Setenv can handle env restoration automatically (look for the defer func containing t.Setenv and os.RemoveAll in baseiso_test.go).
17-28: ⚖️ Poor tradeoffTest coverage gap: the non-override code path is untested.
The test only covers the
OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDEbranch. The primary production code path that callsdownloadBaseIso()→assetrhcos.GetMetalArtifact()has no unit test coverage. Consider adding a test case for the non-override path, or confirm that integration tests adequately cover this functionality.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/asset/imagebased/image/baseiso_test.go` around lines 17 - 28, Add a unit test path in TestBaseIso_Generate that exercises the non-override branch by leaving OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE unset/empty and asserting the code calls downloadBaseIso which in turn uses assetrhcos.GetMetalArtifact; mock or stub assetrhcos.GetMetalArtifact (and any download or HTTP calls) to return a controlled filename/reader and verify expectedBaseIsoFilename and behavior for the downloadBaseIso path, ensuring the non-override production flow is covered (reference TestBaseIso_Generate, OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE, downloadBaseIso, assetrhcos.GetMetalArtifact).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/asset/imagebased/image/baseiso.go`:
- Around line 88-94: The code uses metal.Formats["iso"] without verifying
format.Disk is non-nil, risking a nil pointer when accessing
format.Disk.Location and format.Disk.Sha256; update the logic after retrieving
format (from metal.Formats["iso"]) to check that format.Disk != nil and return a
clear error (including archName) if it's nil, before using format.Disk to set
url and sha.
---
Nitpick comments:
In `@pkg/asset/imagebased/image/baseiso_test.go`:
- Around line 52-58: The defer in the test currently manually restores
environment variables that were set with t.Setenv (XDG_CACHE_HOME and
OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE); remove those manual t.Setenv restore calls
from the deferred function and only perform the filesystem cleanup
(os.RemoveAll(tmpPath)) there so t.Setenv can handle env restoration
automatically (look for the defer func containing t.Setenv and os.RemoveAll in
baseiso_test.go).
- Around line 17-28: Add a unit test path in TestBaseIso_Generate that exercises
the non-override branch by leaving OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE
unset/empty and asserting the code calls downloadBaseIso which in turn uses
assetrhcos.GetMetalArtifact; mock or stub assetrhcos.GetMetalArtifact (and any
download or HTTP calls) to return a controlled filename/reader and verify
expectedBaseIsoFilename and behavior for the downloadBaseIso path, ensuring the
non-override production flow is covered (reference TestBaseIso_Generate,
OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE, downloadBaseIso,
assetrhcos.GetMetalArtifact).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 32baf4ab-59a7-4cb3-b80f-00457ff08a03
📒 Files selected for processing (2)
pkg/asset/imagebased/image/baseiso.gopkg/asset/imagebased/image/baseiso_test.go
- removed CoreOSBuildFetcher, defaultCoreOSStreamGetter and streamGetter (replaced by a direct call to rhcos.FetchCoreOSBuild) - removed OSImage/OSImageLocation from ClusterInfo
df11856 to
d1b6a22
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/asset/imagebased/image/baseiso_test.go (1)
17-71: 💤 Low valueTest coverage reduced to only the override scenario.
The test now only covers the
OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDEpath. The previous architecture-specific and default cases appear to have been removed. If the default path (non-override) is critical for this asset, consider adding a test case that exercises theGetMetalArtifactfallback, or confirm that this path is adequately covered by integration tests or tests inpkg/asset/rhcos/iso_test.go.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/asset/imagebased/image/baseiso_test.go` around lines 17 - 71, The test TestBaseIso_Generate was narrowed to only the OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE path; restore coverage by adding cases exercising the default (no override) flow and the architecture-specific fallback that exercises BaseIso.Generate's call to GetMetalArtifact (or the rhcos ISO retrieval path), e.g., add test cases where OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE is empty and where arch-specific behavior is required, create appropriate httptest servers or mocks for GetMetalArtifact responses, and assert the resulting baseIso.File.Filename matches the expected default/arch filenames; reference TestBaseIso_Generate and BaseIso.Generate (and GetMetalArtifact/rhcos ISO retrieval) when locating the code to extend.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/asset/imagebased/image/baseiso_test.go`:
- Around line 17-71: The test TestBaseIso_Generate was narrowed to only the
OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE path; restore coverage by adding cases
exercising the default (no override) flow and the architecture-specific fallback
that exercises BaseIso.Generate's call to GetMetalArtifact (or the rhcos ISO
retrieval path), e.g., add test cases where OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE
is empty and where arch-specific behavior is required, create appropriate
httptest servers or mocks for GetMetalArtifact responses, and assert the
resulting baseIso.File.Filename matches the expected default/arch filenames;
reference TestBaseIso_Generate and BaseIso.Generate (and GetMetalArtifact/rhcos
ISO retrieval) when locating the code to extend.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 666a873c-81aa-4c42-b813-62467f541fc3
📒 Files selected for processing (14)
pkg/asset/agent/image/agentimage.gopkg/asset/agent/image/baseiso.gopkg/asset/agent/image/ignition.gopkg/asset/agent/image/unconfigured_ignition.gopkg/asset/agent/joiner/clusterinfo.gopkg/asset/agent/joiner/clusterinfo_test.gopkg/asset/imagebased/image/baseiso.gopkg/asset/imagebased/image/baseiso_test.gopkg/asset/rhcos/iso.gopkg/asset/rhcos/iso_test.gopkg/asset/rhcos/releaseextract.gopkg/infrastructure/baremetal/bootstrap.gopkg/nodejoiner/.claude/skills/nodejoiner-command/SKILL.mdpkg/nodejoiner/.claude/skills/nodejoiner-command/acceptance/coreos-bootimages-data.md
💤 Files with no reviewable changes (1)
- pkg/asset/agent/joiner/clusterinfo.go
✅ Files skipped from review due to trivial changes (3)
- pkg/nodejoiner/.claude/skills/nodejoiner-command/acceptance/coreos-bootimages-data.md
- pkg/nodejoiner/.claude/skills/nodejoiner-command/SKILL.md
- pkg/asset/agent/image/unconfigured_ignition.go
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/asset/rhcos/releaseextract.go
- pkg/asset/imagebased/image/baseiso.go
|
/retest |
|
/retest-required |
|
I don't have powers here to approve or lgtm, but based on the Slack discussions we have had this change makes total sense from my point of view. |
|
/retest |
|
@andfasano: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
(Followup from the discussions in #10534)
This patch simplifies the node-joiner code, so that now it retrieves the bootimages data from the embedded streams (instead of fetching them from the
coreos-bootimagesConfigMap in the target cluster).As previously discussed, the advantages of this new approach are:
osImageStreamadoption for selecting the current steamSummary by CodeRabbit
Refactor
Tests
Documentation