Skip to content

BLS shared entries GC fix#2108

Open
Johan-Liebert1 wants to merge 4 commits intobootc-dev:mainfrom
Johan-Liebert1:gc-fix
Open

BLS shared entries GC fix#2108
Johan-Liebert1 wants to merge 4 commits intobootc-dev:mainfrom
Johan-Liebert1:gc-fix

Conversation

@Johan-Liebert1
Copy link
Copy Markdown
Collaborator

boot/config: Add method to get boot artifact name

Add a method in BLSConfig and Grub Menuconfig to get the boot artifact
name, i.e. get the name of the UKI or the name of the directory
containing the Kernel + Initrd. The names are stripped of all our custom
prefixes and suffixes, so basically they return the verity digest part
of the name.

This is useful for GC-ing Kernel + Initrd that are shared among multiple
deployments since we can't rely on the composefs= parameter in the
options as the cmdline verity digest might be different than the verity
digest of the shared Kernel + Initrd.

Tests written by Claude Code (Opus)


composefs/boot: Change the way we find sharable vmlinuz + initrd

Instead of looking in the ".origin" files and trying to match the
boot_digest to digests in other origin files, we now simply re-compute
the sha256sum for vmlinuz + initrd for all boot entries present.

This fixes the bug that arises after mutiple upgrades where the original
deployment that created the boot entry has been garbage collected, so we
end up linking to another deployment that does have the same boot
digest, but the verity digest doesn't match the verity digest used for
the name of the directory where we store the kernel + initrd pair.


composefs/gc: Fix shared entry GC even when they're in use

This fixes a bug where a shared Type1 entry would get GCd even when it's
in use due to the original image that created it being deleted. Combined
with the fact that we were comparing the fsverity digest in the options
field of the BLS config (which will be different than the name of the
directory containing the vmlinuz + initrd pair).

Now, we compare against the directory name when GC-ing boot binaries

Fixes: #2102


test: Add test for GC-ing shared Type1 entries

Add a test to make sure we do not GC shared Type1 entries when they're
still referenced

Add a method in BLSConfig and Grub Menuconfig to get the boot artifact
name, i.e. get the name of the UKI or the name of the directory
containing the Kernel + Initrd. The names are stripped of all our custom
prefixes and suffixes, so basically they return the verity digest part
of the name.

This is useful for GC-ing Kernel + Initrd that are shared among multiple
deployments since we can't rely on the composefs= parameter in the
options as the cmdline verity digest might be different than the verity
digest of the shared Kernel + Initrd.

Tests written by Claude Code (Opus)

Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>
@bootc-bot bootc-bot bot requested a review from jmarrero March 31, 2026 09:03
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the handling of shared boot binaries in bootc-composefs by dynamically hashing binaries to identify duplicates and introducing a BootloaderEntry struct for better artifact management. It also updates garbage collection logic and expands integration tests. Key feedback includes a critical regression where boot binaries are no longer written during fresh installations, potential fragility in the duplicate detection logic when encountering incomplete directories, and performance concerns regarding repeated binary hashing.

Comment on lines +381 to 404
pub(crate) fn find_vmlinuz_initrd_duplicate(
storage: &Storage,
digest: &str,
) -> Result<Option<String>> {
let boot_dir = storage.bls_boot_binaries_dir()?;

for entry in boot_dir.entries_utf8()? {
let entry = entry?;
let dir_name = entry.file_name()?;

if !entry.file_type()?.is_dir() {
continue;
}

let config = depl
.open_dir()
.with_context(|| format!("Opening {depl_file_name}"))?
.read_to_string(format!("{depl_file_name}.origin"))
.context("Reading origin file")?;
let Some(..) = dir_name.strip_prefix(TYPE1_BOOT_DIR_PREFIX) else {
continue;
};

let ini = tini::Ini::from_string(&config)
.with_context(|| format!("Failed to parse file {depl_file_name}.origin as ini"))?;
let entry_digest = compute_boot_digest_type1(&boot_dir.open_dir(&dir_name)?)?;

match ini.get::<String>(ORIGIN_KEY_BOOT, ORIGIN_KEY_BOOT_DIGEST) {
Some(hash) => {
if hash == digest {
match symlink_to {
Some(ref mut prev) => prev.push(depl_file_name.to_string()),
None => symlink_to = Some(vec![depl_file_name.to_string()]),
}
}
}

// No SHASum recorded in origin file
// `symlink_to` is already none, but being explicit here
None => symlink_to = None,
};
if entry_digest == digest {
return Ok(Some(dir_name));
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The find_vmlinuz_initrd_duplicate function is fragile because it calls compute_boot_digest_type1 which fails if any directory matching the TYPE1_BOOT_DIR_PREFIX is missing the expected vmlinuz or initrd files. If a previous operation left a stray or corrupted directory in /boot, this function will cause bootc upgrade to fail entirely. It would be safer to skip directories that cannot be hashed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be fixed by the GC, but the GC would need to be run manually. Maybe we update the error message, or should we run the GC before and after the upgrade? @cgwalters I believe ostree does a GC before the upgrade operation?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm well one thing here is we should ensure we atomically generate the dir by using the "create tempdir + rename_exchange" model.

But yes we should also be robust to partially written dirs - and sure we could GC those before upgrade too, but it seems like it'd also work to ensure we ignore corrupted/partial ones instead of erroring out.

@Johan-Liebert1 Johan-Liebert1 force-pushed the gc-fix branch 2 times, most recently from a2f5216 to 2acf235 Compare March 31, 2026 11:59
@cgwalters
Copy link
Copy Markdown
Collaborator

Currently btw ostree never shares entries between deployments, but it makes sense to do so for karg differences!

/// Returns name of UKI in case of EFI config
/// Returns name of the directory containing Kernel + Initrd in case of Non-EFI config
///
/// The names are stripped of our custom prefix and suffixes, so this basically returns
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the "basically" here. Either it's a verity digest or it's not.

///
/// The names are stripped of our custom prefix and suffixes, so this basically returns
/// the verity digest part of the name
pub(crate) fn boot_artifact_name(&self) -> Result<&str> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in that vein...and yes this is not a new problem but I'd like to start using a formal type for string digests. https://docs.rs/oci-spec/latest/oci_spec/image/struct.Digest.html is one candidate.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put up composefs/composefs-rs#278 related to this

}

#[test]
fn test_boot_artifact_name_efi_success() -> Result<()> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rework these to be table-driven tests, it really helps reduce the amount of code.

Instead of looking in the ".origin" files and trying to match the
boot_digest to digests in other origin files, we now simply re-compute
the sha256sum for vmlinuz + initrd for all boot entries present.

This fixes the bug that arises after mutiple upgrades where the original
deployment that created the boot entry has been garbage collected, so we
end up linking to another deployment that does have the same boot
digest, but the verity digest doesn't match the verity digest used for
the name of the directory where we store the kernel + initrd pair.

Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>
This fixes a bug where a shared Type1 entry would get GCd even when it's
in use due to the original image that created it being deleted. Combined
with the fact that we were comparing the fsverity digest in the options
field of the BLS config (which will be different than the name of the
directory containing the vmlinuz + initrd pair).

Now, we compare against the directory name when GC-ing boot binaries

Fixes: bootc-dev#2102

Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>
Add a test to make sure we do not GC shared Type1 entries when they're
still referenced

Remove openh264 from repos to speed up installs

Signed-off-by: Pragyan Poudyal <pragyanpoudyal41999@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ComposeFS-Native] Garbage collection sometimes removes all files from /Linux in EFI partition

2 participants