PCI: hv: Reserve hv_pci swiotlb from buddy and publish via sysfs#260
Draft
benhillis wants to merge 1 commit into
Draft
PCI: hv: Reserve hv_pci swiotlb from buddy and publish via sysfs#260benhillis wants to merge 1 commit into
benhillis wants to merge 1 commit into
Conversation
The old early_param parsed hv_pci_swiotlb=<base>,<size> and reserved the
host-supplied physical address with memblock_reserve(), which does not
validate that the range is backed by EPT. Under Hyper-V page-reporting
the backing for a nominally usable e820 range can be absent, so the
memset() inside swiotlb_init_io_tlb_pool() triple-faulted the guest.
Pick the base in the guest instead:
* core_initcall calls alloc_contig_pages(__GFP_DMA32 | __GFP_ZERO) for
a kernel-owned, contiguous, below-4G range. __GFP_ZERO faults the
pages in, and kernel ownership keeps page reporting away. Gated on
CONFIG_CONTIG_ALLOC; without it the dedicated pool is skipped.
* (base, size) is exposed via DRIVER_ATTR_RO under
/sys/bus/vmbus/drivers/hv_pci/swiotlb_{base,size} so userspace can
forward the real GPA to the host-side device backend.
swiotlb has no destroy_pool() counterpart, so the pages are leaked on
driver unload; hv_pci is rarely hot-replaced and the pool is bounded.
Signed-off-by: Ben Hillis <benhillis@microsoft.com>
9077664 to
9bc4922
Compare
There was a problem hiding this comment.
Pull request overview
Reworks how the hv_pci driver provisions its dedicated swiotlb pool. Instead of accepting a host-supplied GPA (which could be reclaimed by Hyper-V page-reporting and triple-fault the guest), the driver now reserves a contiguous DMA32 range from the buddy allocator at core_initcall time and exposes the resulting base/size via /sys/bus/vmbus/drivers/hv_pci/swiotlb_{base,size} for a userspace agent to forward to the host.
Changes:
- Replaces
<base>,<size>cmdline parsing with size-onlyhv_pci_swiotlb=<size>, 2 MiB aligned. - Adds a
core_initcall(hv_pci_swiotlb_alloc_pool) that usesalloc_contig_pages(GFP_KERNEL|__GFP_DMA32|__GFP_ZERO, …)to back the pool, gated byCONFIG_CONTIG_ALLOCwith a no-op fallback. - Adds
DRIVER_ATTR_RO(swiotlb_base/size)sysfs files published aftervmbus_driver_register()and removed on exit; backing pages are intentionally leaked on driver unload.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
479
to
489
| static int __init early_hv_pci_swiotlb(char *p) | ||
| { | ||
| hv_pci_swiotlb_base = memparse(p, &p); | ||
| if (*p == ',') | ||
| hv_pci_swiotlb_size = memparse(p + 1, NULL); | ||
| if (hv_pci_swiotlb_base && hv_pci_swiotlb_size) | ||
| memblock_reserve(hv_pci_swiotlb_base, hv_pci_swiotlb_size); | ||
| return 0; | ||
| if (!p || !*p) | ||
| return 0; | ||
|
|
||
| hv_pci_swiotlb_size = memparse(p, NULL); | ||
| if (hv_pci_swiotlb_size) | ||
| hv_pci_swiotlb_size = ALIGN(hv_pci_swiotlb_size, SZ_2M); | ||
|
|
||
| return 0; | ||
| } |
Comment on lines
479
to
489
| static int __init early_hv_pci_swiotlb(char *p) | ||
| { | ||
| hv_pci_swiotlb_base = memparse(p, &p); | ||
| if (*p == ',') | ||
| hv_pci_swiotlb_size = memparse(p + 1, NULL); | ||
| if (hv_pci_swiotlb_base && hv_pci_swiotlb_size) | ||
| memblock_reserve(hv_pci_swiotlb_base, hv_pci_swiotlb_size); | ||
| return 0; | ||
| if (!p || !*p) | ||
| return 0; | ||
|
|
||
| hv_pci_swiotlb_size = memparse(p, NULL); | ||
| if (hv_pci_swiotlb_size) | ||
| hv_pci_swiotlb_size = ALIGN(hv_pci_swiotlb_size, SZ_2M); | ||
|
|
||
| return 0; | ||
| } |
Comment on lines
+4352
to
+4353
| hv_pci_swiotlb_pages = pages; | ||
| hv_pci_swiotlb_nr_pages = nr_pages; |
Comment on lines
+4263
to
+4280
| static void hv_pci_swiotlb_unpublish(void) | ||
| { | ||
| driver_remove_file(&hv_pci_drv.driver, &driver_attr_swiotlb_size); | ||
| driver_remove_file(&hv_pci_drv.driver, &driver_attr_swiotlb_base); | ||
| } | ||
|
|
||
| static void hv_pci_swiotlb_publish(void) | ||
| { | ||
| if (driver_create_file(&hv_pci_drv.driver, &driver_attr_swiotlb_base) || | ||
| driver_create_file(&hv_pci_drv.driver, &driver_attr_swiotlb_size)) { | ||
| pr_warn("hv_pci: failed to publish swiotlb range to sysfs\n"); | ||
| hv_pci_swiotlb_unpublish(); | ||
| } | ||
| } | ||
|
|
||
| static void __exit exit_hv_pci_drv(void) | ||
| { | ||
| if (hv_pci_swiotlb_pool) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Reserve the dedicated hv_pci swiotlb pool from the buddy allocator at
core_initcalltime and publish the resulting(base, size)under/sys/bus/vmbus/drivers/hv_pci/swiotlb_{base,size}so userspace can forward the real GPA to the host-side device backend. This replaces the old "host dictates a GPA" flow.Why
WSL container test runs intermittently saw the guest die with
WorkerExitType=StoppedOnReset WorkerExitDetail=TripleFault WorkerExitInitiator=GuestOSbetweenio scheduler mq-deadline registeredand the next initcall. Root cause:memblock_reserve()accepts ranges that are not actually backed by EPT, andswiotlb_create_pool()->swiotlb_init_io_tlb_pool()thenmemsets 64 MiB of unbacked pages.What changed
hv_pci_swiotlb=<size>is now the only accepted form. The legacy<base>,<size>form is rejected withpr_warn(memparse would otherwise silently treat the leading hex value as the size).core_initcall(hv_pci_swiotlb_alloc_pool)asks the buddy allocator for a contiguous DMA32 range viaalloc_contig_pages(__GFP_DMA32 | __GFP_ZERO, first_online_node, &node_online_map).__GFP_ZEROfaults every page in via the page allocator, so by the timeswiotlb_create_pool()runs the memory is known-good. Kernel ownership keeps Hyper-V page reporting from yanking the backing.CONFIG_CONTIG_ALLOCwith a no-op stub fallback.(base, size)published viaDRIVER_ATTR_ROoncevmbus_driver_register()succeeds.Validation
scripts/checkpatch.pl --strict -g HEAD-> 0 errors, 0 warnings, 0 checks, 209 lines checked.make W=1 drivers/pci/controller/pci-hyperv.o-> no new warnings.swiotlb=force hv_pci_swiotlb=64M:dmesg:hv_pci: reserved swiotlb pool [0x0000000008000000..0x000000000c000000)/sys/bus/vmbus/drivers/hv_pci/swiotlb_base->0x8000000/sys/bus/vmbus/drivers/hv_pci/swiotlb_size->67108864Notes
swiotlbhas nodestroy_pool()counterpart toswiotlb_create_pool(), so the backing pages are deliberately leaked on driver unload. hv_pci is rarely hot-replaced and the pool is bounded (default 64 MiB).