Skip to content

PCI: prepare for VFIO#5758

Open
ShadowCurse wants to merge 8 commits intofirecracker-microvm:mainfrom
ShadowCurse:pci_prepare_for_vfio
Open

PCI: prepare for VFIO#5758
ShadowCurse wants to merge 8 commits intofirecracker-microvm:mainfrom
ShadowCurse:pci_prepare_for_vfio

Conversation

@ShadowCurse
Copy link
Copy Markdown
Contributor

Changes

Main change here is the detachment of BAR and MsixConfig handling from the PciConfiguration type. BARs and MsixConfig will be used as separate components by the VFIO, so to prevent duplication, move them away from PciConfiguration.

Reason

Preparation for VFIO

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@ShadowCurse ShadowCurse self-assigned this Mar 13, 2026
@ShadowCurse ShadowCurse added the Type: Enhancement Indicates new feature requests label Mar 13, 2026
@ShadowCurse ShadowCurse force-pushed the pci_prepare_for_vfio branch 4 times, most recently from 07cd55a to 8e74111 Compare March 13, 2026 15:13
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 75.60976% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.91%. Comparing base (8b55c86) to head (f229ae5).

Files with missing lines Patch % Lines
src/vmm/src/pci/configuration.rs 80.00% 11 Missing ⚠️
src/vmm/src/pci/msix.rs 0.00% 10 Missing ⚠️
src/vmm/src/devices/virtio/transport/pci/device.rs 85.71% 8 Missing ⚠️
src/vmm/src/device_manager/pci_mngr.rs 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5758      +/-   ##
==========================================
- Coverage   83.00%   82.91%   -0.09%     
==========================================
  Files         275      275              
  Lines       29383    29364      -19     
==========================================
- Hits        24388    24348      -40     
- Misses       4995     5016      +21     
Flag Coverage Δ
5.10-m5n.metal 83.22% <75.60%> (-0.10%) ⬇️
5.10-m6a.metal 82.54% <75.60%> (-0.10%) ⬇️
5.10-m6g.metal 79.79% <75.60%> (-0.10%) ⬇️
5.10-m6i.metal 83.22% <75.60%> (-0.10%) ⬇️
5.10-m7a.metal-48xl 82.53% <75.60%> (-0.11%) ⬇️
5.10-m7g.metal 79.79% <75.60%> (-0.10%) ⬇️
5.10-m7i.metal-24xl 83.19% <75.60%> (-0.10%) ⬇️
5.10-m7i.metal-48xl 83.19% <75.60%> (-0.10%) ⬇️
5.10-m8g.metal-24xl 79.79% <75.60%> (-0.10%) ⬇️
5.10-m8g.metal-48xl 79.79% <75.60%> (-0.10%) ⬇️
5.10-m8i.metal-48xl 83.20% <75.60%> (-0.10%) ⬇️
5.10-m8i.metal-96xl 83.19% <75.60%> (-0.10%) ⬇️
6.1-m5n.metal 83.25% <75.60%> (-0.10%) ⬇️
6.1-m6a.metal 82.57% <75.60%> (?)
6.1-m6g.metal 79.79% <75.60%> (-0.10%) ⬇️
6.1-m6i.metal 83.25% <75.60%> (-0.09%) ⬇️
6.1-m7a.metal-48xl 82.56% <75.60%> (-0.10%) ⬇️
6.1-m7g.metal 79.79% <75.60%> (-0.11%) ⬇️
6.1-m7i.metal-24xl 83.26% <75.60%> (-0.09%) ⬇️
6.1-m7i.metal-48xl 83.26% <75.60%> (-0.10%) ⬇️
6.1-m8g.metal-24xl 79.78% <75.60%> (-0.11%) ⬇️
6.1-m8g.metal-48xl 79.79% <75.60%> (-0.10%) ⬇️
6.1-m8i.metal-48xl 83.26% <75.60%> (-0.10%) ⬇️
6.1-m8i.metal-96xl 83.26% <75.60%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ShadowCurse ShadowCurse marked this pull request as ready for review March 13, 2026 15:41
@ShadowCurse ShadowCurse added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Mar 13, 2026
} else {
self.configuration
.write_config_register(reg_idx, offset, data);
if BAR0_REG as usize <= reg_idx && reg_idx < (BAR0_REG + NUM_BAR_REGS) as usize {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would a match or a if/else if/else be more clear in this case?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.cap_pci_cfg_info.offset would be changed to be a register and not a byte offset, then this can be a match. But I did not want to touch unrelated code here. In general, I think we should agree on if we use registers or byte_offsets in the PCI code. I can do a follow up PR with this (or push it into #5752).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define 3 variables (or use functions/macros) and do if in_bar_range {} else if in_cap_pci_cfg_range {} else {}?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in the last commit

if data.len() == 4 {
if bar.about_to_be_read {
data.copy_from_slice(bar.size.as_bytes());
bar.about_to_be_read = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is exactly equivalent as the current code, where we actually have 2 fields, the current programmed addr and the register value. When the guest writes all 1s, only those beyond the size mask are persisted and the reprogramming detection is not kicked. When the guest writes something else, then we store the actual value and detect if the BAR was moved. How does reprogram detection works here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is to always ignore writes (unless it is a size reading write). This makes the BAR relocation be ignored. If the driver will try to relocate it, it will write to the BAR (and this write will be ignored) and then it will read the addr back to validate the write. Since we ignore it, it will read the old addr and assume that the relocation failed. At least Linux kernel does this: https://elixir.bootlin.com/linux/v6.19.8/source/drivers/pci/setup-res.c#L107.

Basically I tried to minimize the logic here to only handle the sizing write since this is all we can currently support. The BAR relocation (or even resizing) can be implemented later (and it will be a bit different for virtio and vfio devices anyway)

@ShadowCurse ShadowCurse force-pushed the pci_prepare_for_vfio branch 5 times, most recently from 8b1b015 to 8050f34 Compare March 26, 2026 09:41
@ShadowCurse ShadowCurse force-pushed the pci_prepare_for_vfio branch 2 times, most recently from dda34d8 to a615239 Compare March 31, 2026 11:08
self.configuration.detect_bar_reprogramming(reg_idx, data)
}

fn move_bar(&mut self, old_base: u64, new_base: u64) -> Result<(), DeviceRelocationError> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did not get a dead code warning?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can this be merged with "pci: virtio: refactor: remove BAR handling from PciConfiguration"?

Copy link
Copy Markdown
Contributor Author

@ShadowCurse ShadowCurse Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this commit in general as move_bar is actually a trait function

/// Bits 31:4: Base Address (read/write, writable bits depend on size)
#[derive(Debug, Default, Clone, Copy, Serialize, Deserialize)]
pub struct Bar {
/// Encoded address value of the register.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This comment could clarify what 'encoded' means in this case (lower bit carrying information)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

/// Encoded size value of the register.
pub encoded_size: u32,
/// Indicator if the register was prepared to be read as the `size` instead of `addr`
pub about_to_be_read: bool,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would size_read be more descriptive?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me size_read sounds like: "we did read of the size and here it is".

if let Ok(value) = u32::read_from_bytes(data)
&& value == 0xffff_ffff
{
assert!(offset == 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert here()? Can the guest trigger this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these read/write functions supposed to only be called by internal code. And that internal code will need to ensure that guest provided values make sense.

assert!(offset == 0);
self.bars[bar_idx as usize].about_to_be_read = true;
} else {
// No BAR relocation/resizing support. Ignore write.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment saying that the PCIe spec doesn't offer any way to reject this, but in practice Linux checks if it succeeded?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

data.copy_from_slice(bar.encoded_addr.as_bytes());
}
} else {
assert!(offset < 4);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, do these asserts here mean that a "wrong" guest read will trigger a crash?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previous comment

registers: [u32; NUM_CONFIGURATION_REGISTERS],
writable_bits: [u32; NUM_CONFIGURATION_REGISTERS], // writable bits for each register.
bars: [PciBar; NUM_BAR_REGS],
bars: [PciBar; NUM_BAR_REGS as usize],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid changing this type to minimize this diff? Otherwise do it in a separate patch, that patch is already quite long.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split into multiple commits

VIRTIO_BAR_INDEX,
virtio_pci_bar_addr,
CAPABILITY_BAR_SIZE,
false,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Perhaps an enum type Prefetchable / Non-prefetchable would be more descriptive and avoid the comment above?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use enum if there were more than 2 options, but here it does not seem like it would change much.

/// BARs
pub bars: [Bar; 6],
}
impl Bars {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like this patch to get smaller so that it's easier to review and argue about what it's doing. Can we have a patch that adds the new implementation along with the tests (and tell clippy to ignore dead code temporarily if needed)? And then in future commits we swap to using it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did the split

} else {
self.configuration
.write_config_register(reg_idx, offset, data);
if BAR0_REG as usize <= reg_idx && reg_idx < (BAR0_REG + NUM_BAR_REGS) as usize {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this overlaps with the code in PciConfigMmio::config_space_write()? The commit title only talks about PciConfiguration. I would prefer it to be more explicit about this overlap.

} else {
self.configuration
.write_config_register(reg_idx, offset, data);
if BAR0_REG as usize <= reg_idx && reg_idx < (BAR0_REG + NUM_BAR_REGS) as usize {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define 3 variables (or use functions/macros) and do if in_bar_range {} else if in_cap_pci_cfg_range {} else {}?

@ShadowCurse ShadowCurse force-pushed the pci_prepare_for_vfio branch 2 times, most recently from 4e30a8d to 473e3ff Compare April 2, 2026 13:42
@@ -550,10 +541,10 @@ mod tests {

fn detect_bar_reprogramming(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we remove this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not really about VFIO preparation. I can remove this in another PR if you want.

self.configuration.detect_bar_reprogramming(reg_idx, data)
}

fn move_bar(&mut self, old_base: u64, new_base: u64) -> Result<(), DeviceRelocationError> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And can this be merged with "pci: virtio: refactor: remove BAR handling from PciConfiguration"?

cap_pci_cfg_info: VirtioPciCfgCapInfo::default(),
bars: Bars::default(),
msix_config,
msix_config_offset: 0,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from this name it's ambiguous whether this is a capability offset or a BAR offset for the MSI-x table.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to msix_config_cap_offset

let base = reg_idx as usize * 4;
if base + offset as usize >= self.cap_pci_cfg_info.offset as usize
&& base + offset as usize + data.len()
// Handle access to the header of the MsixCapability. The rest of the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline I find this a bit weird so I'll wait to hear for more opinions on this.

@ShadowCurse ShadowCurse force-pushed the pci_prepare_for_vfio branch 4 times, most recently from 512fc75 to 1754d4a Compare April 2, 2026 16:45
For some reason we had 2 constants that we used interchangeably
to specify the BAR for virtio device. Use the more reasonably
named one and delete the old one.
Also delete VIRTIO_SHM_BAR_INDEX since it is never used.

No functional changes.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Add a new Bars type that handles basic interactions with PCI BAR
registers. This type is separate from PciConfiguration and can be
reused for future VFIO work.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Change BAR0_REG and NUM_BAR_REGS u8. Update PciConfiguration code and
tests to use the new types.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Make VirtioPciDevice use the new Bars type instead of relying on
PciConfiguration for BAR handling. This separates BAR management
from PCI configuration space, making PciConfiguration unaware of
BARs. The VirtioPciDevice now directly handles BAR reads/writes
and uses Bars for address tracking instead of a bare bar_address
field.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Previous commit created a separate `Bars` type to handle
BARs region and made VirtioPciDevice type to use it instead of
PciConfiguration to handle BARs region. This made BARs handling
in PciConfiguration redundant, so this commit removes it.
This also removes `detect_bar_reprogramming` support from
PciConfiguration. But Firecracker does not support BAR relocation
anyway, this does not affect functionality.
The removal of `detect_bar_reprogramming` also required to remove
the unit test for it in the pci/bus.rs.

No functional change.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Following previous commits notion, move MsixConfig handling out of
PciConfiguration into VirtioPciDevice.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
This constant is only used as u16, so change it's type from u8 to u16 to
avoid unnecessary conversions. Also rename with `_IDX` to have
similarity with `reg_idx` we use in PCI code.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse force-pushed the pci_prepare_for_vfio branch from 1754d4a to 7c53d11 Compare April 2, 2026 16:45
Move the comparisons into boolean vars to make dispatch code a bit
more readable.

Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Enhancement Indicates new feature requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants