Skip to content

cmd, commit: change --pause default to true#28801

Open
giuseppe wants to merge 3 commits into
containers:mainfrom
giuseppe:commit-pause-by-default
Open

cmd, commit: change --pause default to true#28801
giuseppe wants to merge 3 commits into
containers:mainfrom
giuseppe:commit-pause-by-default

Conversation

@giuseppe
Copy link
Copy Markdown
Member

Pause the container by default during commit. It is safer as it avoids conflicts, and potentially security issues, when another process is accessing the container rootfs.

Originally this was not done because it was a breaking change and rootless containers weren't able to use the freezer cgroup controller. Now that we support only cgroup v2, there is no gap anymore with root (exotic configurations can still use --pause=false).

Checklist

Ensure you have completed the following checklist for your pull request to be reviewed:

  • Certify you wrote the patch or otherwise have the right to pass it on as an open-source patch by signing all
    commits. (git commit -s). (If needed, use git commit -s --amend). The author email must match
    the sign-off email address. See CONTRIBUTING.md
    for more information.
  • Referenced issues using Fixes: #00000 in commit message (if applicable)
  • Tests have been added/updated (or no tests are needed)
  • Documentation has been updated (or no documentation changes are needed)
  • All commits pass make validatepr (format/lint checks)
  • Release note entered in the section below (or None if no user-facing changes)

Does this PR introduce a user-facing change?

Now commit will pause the container by default

@giuseppe giuseppe added 6.0 Breaking changes for Podman 6.0 No New Tests Allow PR to proceed without adding regression tests labels May 27, 2026
@github-actions github-actions Bot added the kind/api-change Change to remote API; merits scrutiny label May 27, 2026
@giuseppe giuseppe changed the title cmd, commit: change --pause default to true [RFC] cmd, commit: change --pause default to true May 27, 2026
@giuseppe giuseppe force-pushed the commit-pause-by-default branch from 6a942fe to 8a8e25c Compare May 27, 2026 09:41
Copy link
Copy Markdown
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre existing but ContainerStateStopping is also technically a running container state so should that also cause a pause? I mean commit take the container lock so the other stop process will not kill it until commit will be done then which seems fine.

Also pause does error for containers with NoCgroups set, no idea how common this is but I guess breaking them by default is not nice? Or do we think it is best to make them pass --pause=false as explicit acknowledgement of the risks

if err := c.pause(); err != nil {
return nil, fmt.Errorf("pausing container %q to commit: %w", c.ID(), err)
}
handlerName := fmt.Sprintf("commit-unpause-%s", c.ID())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth noting that the container is locked here and thus we should never have two caller getting here at the same time as part of the podman service?

return nil, fmt.Errorf("pausing container %q to commit: %w", c.ID(), err)
}
handlerName := fmt.Sprintf("commit-unpause-%s", c.ID())
if err := shutdown.Register(handlerName, func(sig os.Signal) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a race condition, the process can get killed between pause() and the call here, the handler must be registered before pause() is called.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I'll fix this one

Comment thread test/e2e/commit_test.go Outdated
Expect(ec).To(Equal(0))
Expect(podmanTest.NumberOfContainers()).To(Equal(1))

// Default is --pause=true, test explicit --pause=false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is is not testing any relevant, there is nothing that ensure that the process was indeed paused or not paused so the flag might as well be a NOP.

But then I don't know of a good race free way to know if it was paused there either? Do we have something in the podman state we could show in inspect?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like:

python3 -c "
  import select, sys
  f = open(sys.argv[1])
  f.read()
  p = select.poll()
  p.register(f, select.POLLPRI)
  while True:
      p.poll()
      f.seek(0)
      echo('something happened')
  " /sys/fs/cgroup/cgroup.events

inside the container we could detect that something happened on the cgroup, but reading the value of the file is not race free. Maybe this is good enough?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am nor familiar with cgroups behaviours, if the event is proof enough that we got frozen then sure this sounds like a good idea.

Copy link
Copy Markdown
Member Author

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also pause does error for containers with NoCgroups set, no idea how common this is but I guess breaking them by default is not nice? Or do we think it is best to make them pass --pause=false as explicit acknowledgement of the risks

I think it is better to expect an explicit --pause=false in this case and understand there are some risks involved with this action.

return nil, fmt.Errorf("pausing container %q to commit: %w", c.ID(), err)
}
handlerName := fmt.Sprintf("commit-unpause-%s", c.ID())
if err := shutdown.Register(handlerName, func(sig os.Signal) error {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I'll fix this one

@giuseppe giuseppe force-pushed the commit-pause-by-default branch from 8a8e25c to 60044d2 Compare May 27, 2026 10:00
@mheon
Copy link
Copy Markdown
Member

mheon commented May 27, 2026

I'm a NACK on requiring --pause=false for non-cgroup containers - we should handle that natively. Requiring explicit input is just broken.


## DESCRIPTION
**podman commit** creates an image based on a changed *container*. The author of the image can be set using the **--author** OPTION. Various image instructions can be configured with the **--change** OPTION and a commit message can be set using the **--message** OPTION. The *container* and its processes aren't paused while the image is committed. If this is not desired, the **--pause** OPTION can be set to *true*. When the commit is complete, Podman prints out the ID of the new image.
**podman commit** creates an image based on a changed *container*. The author of the image can be set using the **--author** OPTION. Various image instructions can be configured with the **--change** OPTION and a commit message can be set using the **--message** OPTION. The *container* and its processes are paused while the image is committed. If this is not desired, the **--pause** OPTION can be set to *false*. When the commit is complete, Podman prints out the ID of the new image.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend adding something about security in here, give a reason for why this is useful

@giuseppe
Copy link
Copy Markdown
Member Author

I'm a NACK on requiring --pause=false for non-cgroup containers - we should handle that natively. Requiring explicit input is just broken.

how to make explicit that it is a security risk that doesn't happen with a sane configuration? That defeats this change, because in this way we make explicit that the user accepted the risk and commit is safe by default, if we make it only in some cases then we are still vulnerable to race conditions while accessing the rootfs

@Luap99
Copy link
Copy Markdown
Member

Luap99 commented May 27, 2026

I'm a NACK on requiring --pause=false for non-cgroup containers - we should handle that natively. Requiring explicit input is just broken.

how to make explicit that it is a security risk that doesn't happen with a sane configuration? That defeats this change, because in this way we make explicit that the user accepted the risk and commit is safe by default, if we make it only in some cases then we are still vulnerable to race conditions while accessing the rootfs

I mean logically using commit is rare, using no cgroups mode is rare, using both at the same time is does even rarer so I think requiring an explicit arg is fine in practise, we should just make sure the error/docs tells the user what do.

And if we decide to merge it like this/next week before 6.0 we can call it a breaking change if you want.

@mheon
Copy link
Copy Markdown
Member

mheon commented May 27, 2026

For reference, I am LGTM on the rest of the PR

@mheon
Copy link
Copy Markdown
Member

mheon commented May 27, 2026

@Luap99 I feel like supported configurations should not require outside intervention to function, and --cgroups=none is still a supported configuration. I'd be fine with throwing warnings by default that "Safety features have been DISABLED due to a lack of cgroups" or similar of course.

@giuseppe giuseppe force-pushed the commit-pause-by-default branch 2 times, most recently from f63eabf to 0b60208 Compare May 27, 2026 13:32
@giuseppe
Copy link
Copy Markdown
Member Author

@Luap99 I feel like supported configurations should not require outside intervention to function, and --cgroups=none is still a supported configuration. I'd be fine with throwing warnings by default that "Safety features have been DISABLED due to a lack of cgroups" or similar of course.

how can I differentiate between an explicit --pause=true (which I think should still fail with nocgroups) and the default being used without breaking ContainerCommitOptions?

giuseppe and others added 2 commits May 27, 2026 16:37
ContainerStateStopping is still a running state where processes are
active, so it should also be paused during commit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
When --pause defaults to true, a Ctrl-C during commit would leave the
container paused.  Register a shutdown handler that unpauses the
container on SIGINT/SIGTERM so it is always restored to its running
state.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the commit-pause-by-default branch from 0b60208 to 4769c4c Compare May 27, 2026 14:37
@giuseppe
Copy link
Copy Markdown
Member Author

@mtrmac PTAL

@giuseppe giuseppe changed the title [RFC] cmd, commit: change --pause default to true cmd, commit: change --pause default to true May 27, 2026
@giuseppe giuseppe force-pushed the commit-pause-by-default branch from 4769c4c to f4f14ae Compare May 28, 2026 15:42
@giuseppe
Copy link
Copy Markdown
Member Author

I am not happy with the special handling for nocgroups, it makes us unsafe in certain cases and I've reverted to the previous version.

I see that Docker does pause the container by default on commit and there is a --no-pause option to run without pausing, so I don't think we should be less safe than Docker.

@mtrmac
Copy link
Copy Markdown
Contributor

mtrmac commented May 28, 2026

I am not happy with the special handling for nocgroups, it makes us unsafe in certain cases and I've reverted to the previous version.

I see that Docker does pause the container by default on commit and there is a --no-pause option to run without pausing, so I don't think we should be less safe than Docker.

We are working on a a different kind of change that should, eventually, make the trade-off of features vs. performance (vs. compatibility) ~moot. Certainly, that code will be unproven and risky for a while, but maybe we don’t need to be so drastic as to break existing scripts in the default configuration.

@giuseppe
Copy link
Copy Markdown
Member Author

Certainly, that code will be unproven and risky for a while, but maybe we don’t need to be so drastic as to break existing scripts in the default configuration.

nocgroups is used rarely, so I don't think we will really break many scripts out there

@Luap99
Copy link
Copy Markdown
Member

Luap99 commented May 28, 2026

IMO this is about consistency, tar'ing all files while the container process can write to them can lead to "corrupted" exports as you end up with partial states between files. pause on its own would not safe us from reading half written files if we pause the process during writes, but at least files wont change while we read them which greatly reduces risk.

Comment thread test/system/126-commit.bats Outdated
# Start a background watcher before commit.
_cgroup_watch "$events_file" 10000 > "$watch_log" &
local watcher_pid=$!
sleep 0.5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at my new CI work 0.5 is not nearly good enough,
a better race free way would be to add a print between

p.register(fd, select.POLLPRI | select.POLLERR)
print("poll ready")
events = p.poll(int(sys.argv[2]))

and then loop cat the log file for that print before running podman commit

Comment thread test/system/126-commit.bats Outdated
type -p python3 &>/dev/null || skip "python3 needed for cgroup poll(2)"
if is_rootless && ! is_cgroupsv2; then
skip "'podman pause' (rootless) only works with cgroups v2"
fi
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though we removed all the cgroups conditions? As we not longer support v1 we should not bother with these checks and just drop them

Comment thread test/system/126-commit.bats Outdated

# bats test_tags=ci:parallel
@test "podman commit --pause default" {
skip_if_remote "commit --pause via api does not change the cgroup freeze state locally"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not true though?! Remote tests always run on the same system and must be. So you can read the cgroup info just fine.

I think testing remote is generally important so we know the API side works the same.

Pause the container by default during commit.  It is safer as it
avoids conflicts, and potentially security issues, when another
process is accessing the container rootfs.

Originally this was not done because it was a breaking change and
rootless containers weren't able to use the freezer cgroup controller.
Now that we support only cgroup v2, there is no gap anymore with
root (exotic configurations can still use --pause=false).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the commit-pause-by-default branch from f4f14ae to 69139f3 Compare May 28, 2026 16:39
@mheon
Copy link
Copy Markdown
Member

mheon commented May 28, 2026

I said this in Matrix, but I think I've been outvoted and am dropping my NACK

Copy link
Copy Markdown
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.0 Breaking changes for Podman 6.0 kind/api-change Change to remote API; merits scrutiny No New Tests Allow PR to proceed without adding regression tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants