-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
Rationale
Shadow FDs copy LKL file contents into a memfd at open time, enabling native host kernel mmap (critical for dynamic linkers that mmap .so files with MAP_PRIVATE). Three inherent limitations:
- Point-in-time: if the LKL file changes after open, the shadow is stale. The guest sees old data.
- 256MB cap (
KBOX_SHADOW_MAX_SIZE): files larger than 256MB cannot be shadowed, returning EFBIG at open time. - O_RDONLY only: files opened for writing use virtual FDs and cannot be mmapped by the host kernel.
Additionally, shadowed files become writable from the host side since the tracee receives a memfd where write() can succeed despite the O_RDONLY open mode -- a known semantic quirk.
These limitations are acceptable for the primary use case (dynamic linker .so loading) but restrict workloads involving large read-only datasets or files that change after open.
Proposed Changes (priority order)
- Configurable size cap: allow users to set the shadow size limit via environment variable or CLI flag, with 256MB as default. Users with sufficient memory can raise it for specific workloads. Include an upper bound to prevent OOM.
- Document limitations: add user-facing documentation explaining mmap behavior, snapshot semantics, and the size cap.
- Staleness detection: compare LKL file mtime/size with snapshot metadata on fstat. Useful for observability and debugging only -- cannot repair already-established mmaps.
- Future exploration: lazy demand-paged population via userfaultfd. Significant scope increase (new kernel dependency, portability concerns, security implications).
Considerations
- Current design is correct for its intended purpose; avoid over-engineering the hot path
- Raising the cap without limits risks OOM on the host -- enforce an upper bound or memory budget
- Re-snapshotting on staleness could cause subtle bugs if the guest has already mmapped pages from the old snapshot
- Write-back would contradict the O_RDONLY open mode invariant
- The host-writable memfd quirk is documented but not enforced; consider
memfd_createwithMFD_NOEXEC_SEALorF_SEAL_WRITEif the kernel supports it
References
src/shadow-fd.h:KBOX_SHADOW_MAX_SIZEdefinition (256MB)src/shadow-fd.c: EFBIG enforcement; snapshot creation (pread64 loop into memfd)src/seccomp-dispatch.c: O_RDONLY gating for shadow creationtests/guest/errno-test.c: documents the host-writable memfd quirk
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels