Skip to content

fix(bootstrap): align midstream with main and fix rootful Podman cgroupns#21

Merged
maxamillion merged 10 commits intomidstreamfrom
merge/align-midstream-with-main
Apr 7, 2026
Merged

fix(bootstrap): align midstream with main and fix rootful Podman cgroupns#21
maxamillion merged 10 commits intomidstreamfrom
merge/align-midstream-with-main

Conversation

@maxamillion
Copy link
Copy Markdown

Summary

Merge main into midstream to bring the midstream branch into alignment with upstream changes, and fix a rootful Podman cgroup namespace bug discovered during end-to-end testing on Fedora 43.

Changes

Merge alignment (7 upstream commits)

Merge conflict resolution (3 files)

  • crates/openshell-bootstrap/src/docker.rs: Combined midstream's container runtime detection imports with main's node_name import; added both runtime and resume parameters to ensure_container
  • crates/openshell-bootstrap/src/lib.rs: Updated ensure_container call site to pass both runtime and resume arguments
  • install.sh: Kept main's security hardening (--no-same-owner, --no-same-permissions, explicit filename in tar) with midstream's indentation

Bug fix: rootful Podman cgroupns

  • The code unconditionally set cgroupns=PRIVATE for all Podman, but rootful Podman (uid 0) needs cgroupns=HOST — same as Docker — because root has full write access to the host cgroup tree
  • Without this fix, k3s kubelet fails with missing controllers: cpu, cpuset, hugetlb, memory, pids
  • Added is_rootless() detection to container_runtime.rs using existing current_uid() pattern

Testing

End-to-end tested on ephemeral Fedora 43 x86_64 system (Linode, 4 CPU, 7.7 GB RAM):

Build from source:

  • cargo check --workspace passes locally
  • cargo build --release --bin openshell succeeds on Fedora 43

Rootful Podman (as root):

  • setup-podman-linux.sh --rootful completes successfully
  • openshell gateway start deploys and becomes healthy
  • openshell sandbox create creates sandbox and runs commands
  • openshell sandbox list shows sandbox in Ready state
  • openshell gateway destroy cleans up resources

Rootless Podman (as non-root testuser):

  • setup-podman-linux.sh (rootless) completes successfully
  • openshell gateway start deploys and becomes healthy
  • openshell sandbox create creates sandbox and runs commands
  • openshell sandbox list shows sandbox in Ready state
  • openshell gateway destroy cleans up resources

Fedora 43 prerequisites discovered during testing

  • Kernel modules iptable_nat, iptable_filter, iptable_mangle must be loaded (Fedora 43 defaults to nftables-only)
  • br_netfilter module required for pod-to-service networking
  • IMAGE_TAG=midstream required when using LobsterTrap container images

Checklist

  • Follows Conventional Commits
  • Code compiles without errors
  • Tested on target platform (Fedora 43, rootful + rootless Podman)

johntmyers and others added 9 commits April 2, 2026 12:54
…VEs (NVIDIA#737)

- tar 0.4.44 -> 0.4.45 (CVE-2026-33055: PAX size header skip)
- aws-lc-rs 1.16.1 -> 1.16.2 / aws-lc-sys 0.38.0 -> 0.39.1
  (BDSA-2026-5232: name constraints bypass in certificate validation)
- Pygments 2.19.2 -> 2.20.0
  (BDSA-2026-5113 / CVE-2026-4539: catastrophic regex backtracking)
…VIDIA#742)

Remove test_live_policy_update_and_logs and
test_live_policy_update_from_empty_network_policies from the Python e2e
suite. Both used a manual 90s poll loop against GetSandboxPolicyStatus
that flaked in CI with 'Policy v2 was not loaded within 90s'.

Add e2e/rust/tests/live_policy_update.rs with two replacement tests
that exercise the same policy lifecycle (version bumping, hash
idempotency, policy list history) through the CLI using the built-in
--wait flag for reliable synchronization.
… OS-23) (NVIDIA#744)

* fix(install): restrict tar extraction to expected binary member

Prevents CWE-22 path traversal by extracting only the expected APP_NAME
member instead of the full archive contents. Adds --no-same-owner and
--no-same-permissions for defense-in-depth.

OS-20

* fix(deploy): quote registry credentials in YAML heredocs

Wraps username/password values with a yaml_quote helper to prevent YAML
injection from special characters in registry credentials (CWE-94).
Applied to all three heredoc blocks that emit registries.yaml auth.

OS-23

* fix(server): redact session token in SSH tunnel rate-limit log

Logs only the last 4 characters of bearer tokens to prevent credential
exposure in log aggregation systems (CWE-532).

OS-18

* fix(server): escape gateway_display in auth connect page

Applies html_escape() to the Host/X-Forwarded-Host header value before
rendering it into the HTML template, preventing HTML injection (CWE-79).

OS-17

* fix(server): prevent XSS via code param with validation and proper JS escaping

Adds server-side validation rejecting confirmation codes that do not
match the CLI-generated format, replaces manual JS string escaping with
serde_json serialization (handling U+2028/U+2029 line terminators), and
adds a Content-Security-Policy header with nonce-based script-src.

OS-16

* fix(sandbox): add byte cap and idle timeout to streaming inference relay

Prevents resource exhaustion from upstream inference endpoints that stream
indefinitely or hold connections open. Adds a 32 MiB total body limit
and 30-second per-chunk idle timeout (CWE-400).

OS-21

* fix(policy): narrow port field from u32 to u16 to reject invalid values

Prevents meaningless port values >65535 from being accepted in policy
YAML definitions. The proto field remains uint32 (protobuf has no u16)
with validation at the conversion boundary.

OS-22

* fix(deps): migrate from archived serde_yaml to serde_yml

Replaces serde_yaml 0.9 (archived, RUSTSEC-2024-0320) with serde_yml
0.0.12, a maintained API-compatible fork. All import sites updated
across openshell-policy, openshell-sandbox, and openshell-router.

OS-19

* fix(server): re-validate sandbox-submitted security_notes and cap hit_count

The gateway now re-runs security heuristics on proposed policy chunks
instead of trusting sandbox-provided security_notes, validates host
wildcards, caps hit_count at 100, and clamps confidence to [0,1]. The
TUI approve-all path is updated to use ApproveAllDraftChunks RPC which
respects the security_notes filtering gate (CWE-284, confused deputy).

OS-15

* chore: apply cargo fmt and update Cargo.lock for serde_yml

---------

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
Resolves merge conflicts in:
- crates/openshell-bootstrap/src/docker.rs: integrate both container
  runtime detection (midstream) and sandbox persistence/resume (main)
- crates/openshell-bootstrap/src/lib.rs: pass both runtime and resume
  parameters to ensure_container
- install.sh: keep main's security hardening (--no-same-owner,
  --no-same-permissions) with midstream's tab indentation
Rootful Podman (uid 0) has full write access to the host cgroup tree,
so it should use cgroupns=host like Docker. Only rootless Podman needs
cgroupns=private because user namespace restrictions make the host
cgroup tree read-only.

Previously, all Podman instances used PRIVATE cgroupns mode, causing
k3s kubelet to fail with 'missing controllers: cpu, cpuset, hugetlb,
memory, pids' when running as root.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 41ef1fc5-d4ce-466e-81a4-e4aafc8c63e1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch merge/align-midstream-with-main

Comment @coderabbitai help to get the list of available commands and usage tips.

- Apply cargo fmt to container_runtime.rs (import order) and docker.rs
  (line-break style)
- Make install.sh REPO overridable via OPENSHELL_REPO env var so the
  e2e tests can point at the upstream NVIDIA repo (which has actual
  releases) when running against fork repos that have no releases
- Update e2e/install helpers (sh and fish) to default OPENSHELL_REPO
  to NVIDIA/OpenShell for test runs
@maxamillion maxamillion merged commit 9a4c43a into midstream Apr 7, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants