Skip to content

PCI: hv: Reserve hv_pci swiotlb from buddy and publish via sysfs#260

Draft
benhillis wants to merge 1 commit into
linux-msft-wsl-6.18.yfrom
user/benhillis/hv-pci-swiotlb-fix
Draft

PCI: hv: Reserve hv_pci swiotlb from buddy and publish via sysfs#260
benhillis wants to merge 1 commit into
linux-msft-wsl-6.18.yfrom
user/benhillis/hv-pci-swiotlb-fix

Conversation

@benhillis
Copy link
Copy Markdown
Member

@benhillis benhillis commented May 26, 2026

Summary

Reserve the dedicated hv_pci swiotlb pool from the buddy allocator at core_initcall time and publish the resulting (base, size) under /sys/bus/vmbus/drivers/hv_pci/swiotlb_{base,size} so userspace can forward the real GPA to the host-side device backend. This replaces the old "host dictates a GPA" flow.

Why

WSL container test runs intermittently saw the guest die with WorkerExitType=StoppedOnReset WorkerExitDetail=TripleFault WorkerExitInitiator=GuestOS between io scheduler mq-deadline registered and the next initcall. Root cause: memblock_reserve() accepts ranges that are not actually backed by EPT, and swiotlb_create_pool() -> swiotlb_init_io_tlb_pool() then memsets 64 MiB of unbacked pages.

What changed

  • hv_pci_swiotlb=<size> is now the only accepted form. The legacy <base>,<size> form is rejected with pr_warn (memparse would otherwise silently treat the leading hex value as the size).
  • core_initcall(hv_pci_swiotlb_alloc_pool) asks the buddy allocator for a contiguous DMA32 range via alloc_contig_pages(__GFP_DMA32 | __GFP_ZERO, first_online_node, &node_online_map). __GFP_ZERO faults every page in via the page allocator, so by the time swiotlb_create_pool() runs the memory is known-good. Kernel ownership keeps Hyper-V page reporting from yanking the backing.
  • Allocator path gated on CONFIG_CONTIG_ALLOC with a no-op stub fallback.
  • (base, size) published via DRIVER_ATTR_RO once vmbus_driver_register() succeeds.

Validation

  • scripts/checkpatch.pl --strict -g HEAD -> 0 errors, 0 warnings, 0 checks, 209 lines checked.
  • make W=1 drivers/pci/controller/pci-hyperv.o -> no new warnings.
  • Boot-tested under WSL2 with swiotlb=force hv_pci_swiotlb=64M:
    • dmesg: hv_pci: reserved swiotlb pool [0x0000000008000000..0x000000000c000000)
    • /sys/bus/vmbus/drivers/hv_pci/swiotlb_base -> 0x8000000
    • /sys/bus/vmbus/drivers/hv_pci/swiotlb_size -> 67108864

Notes

swiotlb has no destroy_pool() counterpart to swiotlb_create_pool(), so the backing pages are deliberately leaked on driver unload. hv_pci is rarely hot-replaced and the pool is bounded (default 64 MiB).

The old early_param parsed hv_pci_swiotlb=<base>,<size> and reserved the
host-supplied physical address with memblock_reserve(), which does not
validate that the range is backed by EPT.  Under Hyper-V page-reporting
the backing for a nominally usable e820 range can be absent, so the
memset() inside swiotlb_init_io_tlb_pool() triple-faulted the guest.

Pick the base in the guest instead:

  * core_initcall calls alloc_contig_pages(__GFP_DMA32 | __GFP_ZERO) for
    a kernel-owned, contiguous, below-4G range.  __GFP_ZERO faults the
    pages in, and kernel ownership keeps page reporting away.  Gated on
    CONFIG_CONTIG_ALLOC; without it the dedicated pool is skipped.

  * (base, size) is exposed via DRIVER_ATTR_RO under
    /sys/bus/vmbus/drivers/hv_pci/swiotlb_{base,size} so userspace can
    forward the real GPA to the host-side device backend.

swiotlb has no destroy_pool() counterpart, so the pages are leaked on
driver unload; hv_pci is rarely hot-replaced and the pool is bounded.

Signed-off-by: Ben Hillis <benhillis@microsoft.com>
@benhillis benhillis force-pushed the user/benhillis/hv-pci-swiotlb-fix branch from 9077664 to 9bc4922 Compare May 26, 2026 23:19
@benhillis benhillis requested a review from Copilot May 27, 2026 00:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reworks how the hv_pci driver provisions its dedicated swiotlb pool. Instead of accepting a host-supplied GPA (which could be reclaimed by Hyper-V page-reporting and triple-fault the guest), the driver now reserves a contiguous DMA32 range from the buddy allocator at core_initcall time and exposes the resulting base/size via /sys/bus/vmbus/drivers/hv_pci/swiotlb_{base,size} for a userspace agent to forward to the host.

Changes:

  • Replaces <base>,<size> cmdline parsing with size-only hv_pci_swiotlb=<size>, 2 MiB aligned.
  • Adds a core_initcall (hv_pci_swiotlb_alloc_pool) that uses alloc_contig_pages(GFP_KERNEL|__GFP_DMA32|__GFP_ZERO, …) to back the pool, gated by CONFIG_CONTIG_ALLOC with a no-op fallback.
  • Adds DRIVER_ATTR_RO(swiotlb_base/size) sysfs files published after vmbus_driver_register() and removed on exit; backing pages are intentionally leaked on driver unload.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 479 to 489
static int __init early_hv_pci_swiotlb(char *p)
{
hv_pci_swiotlb_base = memparse(p, &p);
if (*p == ',')
hv_pci_swiotlb_size = memparse(p + 1, NULL);
if (hv_pci_swiotlb_base && hv_pci_swiotlb_size)
memblock_reserve(hv_pci_swiotlb_base, hv_pci_swiotlb_size);
return 0;
if (!p || !*p)
return 0;

hv_pci_swiotlb_size = memparse(p, NULL);
if (hv_pci_swiotlb_size)
hv_pci_swiotlb_size = ALIGN(hv_pci_swiotlb_size, SZ_2M);

return 0;
}
Comment on lines 479 to 489
static int __init early_hv_pci_swiotlb(char *p)
{
hv_pci_swiotlb_base = memparse(p, &p);
if (*p == ',')
hv_pci_swiotlb_size = memparse(p + 1, NULL);
if (hv_pci_swiotlb_base && hv_pci_swiotlb_size)
memblock_reserve(hv_pci_swiotlb_base, hv_pci_swiotlb_size);
return 0;
if (!p || !*p)
return 0;

hv_pci_swiotlb_size = memparse(p, NULL);
if (hv_pci_swiotlb_size)
hv_pci_swiotlb_size = ALIGN(hv_pci_swiotlb_size, SZ_2M);

return 0;
}
Comment on lines +4352 to +4353
hv_pci_swiotlb_pages = pages;
hv_pci_swiotlb_nr_pages = nr_pages;
Comment on lines +4263 to +4280
static void hv_pci_swiotlb_unpublish(void)
{
driver_remove_file(&hv_pci_drv.driver, &driver_attr_swiotlb_size);
driver_remove_file(&hv_pci_drv.driver, &driver_attr_swiotlb_base);
}

static void hv_pci_swiotlb_publish(void)
{
if (driver_create_file(&hv_pci_drv.driver, &driver_attr_swiotlb_base) ||
driver_create_file(&hv_pci_drv.driver, &driver_attr_swiotlb_size)) {
pr_warn("hv_pci: failed to publish swiotlb range to sysfs\n");
hv_pci_swiotlb_unpublish();
}
}

static void __exit exit_hv_pci_drv(void)
{
if (hv_pci_swiotlb_pool)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants