Skip to content

guest: unify pod model for V1, virtual pod, and V2 shim support#2699

Open
shreyanshjain7174 wants to merge 2 commits intomicrosoft:mainfrom
shreyanshjain7174:guest-pod-unification-v2
Open

guest: unify pod model for V1, virtual pod, and V2 shim support#2699
shreyanshjain7174 wants to merge 2 commits intomicrosoft:mainfrom
shreyanshjain7174:guest-pod-unification-v2

Conversation

@shreyanshjain7174
Copy link
Copy Markdown
Contributor

@shreyanshjain7174 shreyanshjain7174 commented Apr 22, 2026

The GCS guest runtime (internal/guest/runtime/hcsv2/uvm.go) tracks virtual pods separately from V1 sandbox containers — a dedicated VirtualPod type, seven exported methods, a parent cgroup manager, and a reverse-lookup map. V1 sandboxes have no pod-level tracking at all. Adding V2 shim support would need a third path.

This collapses all three into one: a private uvmPod type and a single pods map on Host. Every sandbox — V1, virtual pod, or V2 shim — goes through createPodInUVM, which allocates a cgroup under /pods/{sandboxID}. Workload containers nest at /pods/{sandboxID}/{containerID}. Container-to-pod membership is tracked via addContainerToPod. Cleanup in RemoveContainer is a single code path: remove the container from the pod, and when the sandbox container itself is removed, delete the pod's cgroup.

Cgroup hierarchy changes from:

/containers/{id}                         (V1 sandbox)
/containers/virtual-pods/{virtualPodID}  (virtual pod)

to:

/pods/{sandboxID}                        (all pod types)
/pods/{sandboxID}/{containerID}          (workload containers)

Standalone (non-CRI) containers keep their own cgroup at /pods/{id} with no pod entry — same isolation as before, just under the new prefix.

Network namespace teardown for virtual pod sandboxes is preserved: RemoveContainer skips RemoveNetworkNamespace for virtual pod sandbox containers since the host-driven path (TearDownNetworkingRemoveNetNSremoveNIC) handles adapter removal first.

cmd/gcs/main.go replaces the /containers/virtual-pods parent cgroup with /pods and drops the InitializeVirtualPodSupport call.

Tested E2E with both shims:

V1 shim (io.containerd.runhcs.v1) V2 shim (io.containerd.lcow.v2)
OCIBundlePath /run/gcs/c/<podId> /run/gcs/pods/<podId>/<podId>
Pod cgroup /sys/fs/cgroup/memory/pods/<podId> /sys/fs/cgroup/memory/pods/<podId>
/containers/virtual-pods/ absent absent

Comment thread cmd/gcs/main.go Outdated
Comment thread cmd/gcs/main.go Outdated
Comment thread cmd/gcs/main.go Outdated
Comment thread cmd/gcs/main.go Outdated
Comment thread internal/guest/runtime/hcsv2/container.go Outdated
Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated
Comment thread internal/guest/runtime/hcsv2/uvm.go
Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated
Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated
Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated
Comment thread internal/guest/runtime/hcsv2/workload_container.go
Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated
Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated
Comment thread internal/guest/runtime/hcsv2/uvm.go
@rawahars rawahars requested a review from helsaawy May 4, 2026 06:35
@shreyanshjain7174 shreyanshjain7174 force-pushed the guest-pod-unification-v2 branch from ad3ee5f to f51f773 Compare May 4, 2026 06:46
@shreyanshjain7174 shreyanshjain7174 requested a review from rawahars May 4, 2026 07:54
Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated
Comment thread internal/guest/runtime/hcsv2/uvm.go Outdated
Replace the separate VirtualPod tracking (dedicated type, exported
methods, parent cgroup manager, reverse-lookup map) with a unified
uvmPod type and a single pods map on Host. All pod types (V1 sandbox,
virtual pod, V2 shim) now go through the same code path:

- createPodInUVM allocates a cgroup under /pods/{sandboxID}
- RemoveContainer handles cleanup uniformly

Cgroup hierarchy changes from:
  /containers/{id}                         (V1 sandbox)
  /containers/virtual-pods/{virtualPodID}  (virtual pod)
to:
  /pods/{sandboxID}                        (all pod types)
  /pods/{sandboxID}/{containerID}          (workload containers)

Signed-off-by: Shreyansh Jain <shreyanshjain7174@gmail.com>
Signed-off-by: Shreyansh Sancheti <shsancheti@microsoft.com>
@shreyanshjain7174 shreyanshjain7174 force-pushed the guest-pod-unification-v2 branch from f51f773 to 32802d9 Compare May 4, 2026 11:22
@shreyanshjain7174 shreyanshjain7174 requested a review from rawahars May 4, 2026 11:29
…ID assign

- Lock containersMutex over the entire createPodInUVM method instead of
  the double-check pattern.
- Assign sandboxID directly from annotation without intermediate sid
  variable in the early resolution block.

Signed-off-by: Shreyansh Jain <shreyanshjain7174@gmail.com>
Signed-off-by: Shreyansh Sancheti <shsancheti@microsoft.com>
spec.Linux.CgroupsPath = "/containers/" + id
// Set cgroup path under the pod.
sandboxID := id
if vpID := spec.Annotations[annotations.VirtualPodID]; vpID != "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You have virtualSandboxID from above. You can consolidate them.

if vpID := spec.Annotations[annotations.VirtualPodID]; vpID != "" {
sandboxID = vpID
}
spec.Linux.CgroupsPath = "/pods/" + sandboxID
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do the following tests-

  1. Run a pod with cgroup resources set at pod level. Run 3 containers such that 2 normal containers do not consume much memory but 3rd one is OOM. Then the entire pod should be killed.
  2. Run a pod with cgroup resources set at pod level. Run 3 containers such that 2 normal containers do not consume much memory. 3rd one has cgroup limit set and is lower than pod limit. This time when 3rd one OOMs, only 3rd container should be killed.

}
}

// Normally we would be doing policy checking here at the start of our
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add these comments after modifying.

}
h.containersMutex.Unlock()
} else {
// Capture namespaceID if any because setupStandaloneContainerSpec clears the Windows section.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please add comments back.

return fmt.Errorf("virtual pod %s does not exist", virtualSandboxID)
h.pods[sid] = &uvmPod{
sandboxID: sid,
networkNamespace: nsID,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please check if you are using networkNamespace and cgroupPath anywhere.

}

// Register container with its pod.
h.containersMutex.Lock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this block to front after resolving sandboxID.
Return error if pod or container already exists.

spec.Linux.CgroupsPath = "/containers/" + id
}
// Set cgroup path under the sandbox's pod cgroup.
spec.Linux.CgroupsPath = "/pods/" + sbid + "/" + id
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you please convert the format as a constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants