netvsp: handle eqe 135 and reconfigure vf on release/1.7.2511#2610
Conversation
For netvsp to recover from SoC crash and NMC servicing, it must detect EQE 135 from MANA and reconfigure the Virtual Function. * `GDMA_EQE_HWC_RECONFIG_VF` added to events handled by the GDMA driver * `vf_reconfiguration_pending` bool added the GDMA driver. Set when the EQE is received, read by the MANA driver when processing all EQ events. * `vf_reconfig_sender` added to MANA driver to signal Netvsp VF Manager when it sees 'pending' is true * `vf_reconfig_receiver` added to Netvsp HclNetworkVFManagerWorker to send `VFReconfig` message when signaled * `VfReconfig` message added to Netvsp VF Manager, which removes the old VF and then creates a new VF Smaller changes: * `GDMA_GENERATE_TEST_EQE` and `GDMA_GENERATE_RECONFIG_VF_EVENT` added in order to create a unit test, `test_gdma_reconfig_vf()` * The logic of `NextWorkItem::ManaDeviceArrived` refactored into `startup_vtl2_device()` so the logic can be shared with `VfReconfig` Testing: * Unit tests pass * Tested on a lab machine with SoC MANA privates which allowed EQE 135 to be generated by command. Netvsp is able to see the EQE and VfReconfig is called. Before and after sending the EQE, ping and ntttcp traffic succeed.
There was a problem hiding this comment.
Pull request overview
This PR adds support for handling EQE 135 (VF reconfiguration event) to enable netvsp recovery from SoC crashes and NMC servicing. The implementation spans three layers: GDMA driver (event detection), MANA driver (event propagation), and Netvsp (VF reconfiguration orchestration).
Key changes:
- GDMA driver now handles
GDMA_EQE_HWC_RECONFIG_VFevents and maintains avf_reconfiguration_pendingflag - MANA driver propagates VF reconfig events through a mesh channel subscription mechanism
- Netvsp implements a state machine with exponential backoff retry logic to gracefully restart the VTL2 device after reconfiguration
- Full save/restore support for the pending flag to handle servicing scenarios
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| vm/devices/net/gdma_defs/src/lib.rs | Adds EQE 135 constant and test request type for VF reconfiguration |
| vm/devices/net/gdma/src/hwc.rs | Implements hardware control handler to generate VF reconfig events for testing |
| vm/devices/net/mana_driver/src/gdma_driver.rs | Adds vf_reconfiguration_pending flag, event handling, getter method, and test helper |
| vm/devices/net/mana_driver/src/mana.rs | Implements subscription mechanism for VF reconfig events with sender/receiver channel |
| vm/devices/net/mana_driver/src/save_restore.rs | Extends saved state to include vf_reconfiguration_pending flag |
| vm/devices/net/mana_driver/src/tests.rs | Adds unit test verifying VF reconfiguration event detection and flag behavior |
| openhcl/underhill_core/src/emuplat/netvsp.rs | Implements VF reconfiguration state machine with shutdown, restart, and exponential backoff retry logic; refactors device startup into shared method |
| NextWorkItem::Continue | ||
| let exists = Path::new(&device_path).exists(); | ||
| match (vtl2_device_state, exists) { | ||
| (Vtl2DeviceState::Missing, true) => NextWorkItem::ManaDeviceArrived, |
There was a problem hiding this comment.
The device_arrival event is only processed when vtl2_device_state is Missing (line 644), but this excludes the Reconfiguring state. If a uevent for device arrival occurs while in the Reconfiguring state (e.g., the device actually becomes available), it will be ignored until the next uevent. This could cause unnecessary retry delays. Consider also handling device arrival during Reconfiguring state to allow faster recovery.
| (Vtl2DeviceState::Missing, true) => NextWorkItem::ManaDeviceArrived, | |
| (Vtl2DeviceState::Missing | Vtl2DeviceState::Reconfiguring, true) => { | |
| NextWorkItem::ManaDeviceArrived | |
| } |
| // Don't 'keep alive'. VTL2 is reconfigured when in a bad state. | ||
| let keep_vf_alive = false; | ||
| self.shutdown_vtl2_device(keep_vf_alive).await; |
There was a problem hiding this comment.
After shutdown_vtl2_device is called (line 892), the vf_reconfig_receiver retains the old receiver whose sender was dropped during shutdown. If startup_vtl2_device fails to create a new device during retry attempts, subsequent VF reconfiguration events will be lost until the device successfully restarts. Consider either clearing vf_reconfig_receiver in shutdown_vtl2_device and checking for None in the event loop, or documenting this behavior to make the subtle coupling explicit.
| device.start_notification_task(&self.driver_source).await; | ||
| self.vf_reconfig_receiver = Some(device.subscribe_vf_reconfig().await); |
There was a problem hiding this comment.
There is a potential race condition between subscribing to VF reconfig events and processing pending events. The subscription happens in startup_vtl2_device (line 555) after start_notification_task (line 554). However, if a VF reconfiguration event was already pending in the GDMA driver before the subscription, it could be processed and sent to a non-existent receiver, causing the event to be lost. Consider subscribing to VF reconfig events before starting the notification task, or ensure the pending flag is checked after subscription.
| device.start_notification_task(&self.driver_source).await; | |
| self.vf_reconfig_receiver = Some(device.subscribe_vf_reconfig().await); | |
| self.vf_reconfig_receiver = Some(device.subscribe_vf_reconfig().await); | |
| device.start_notification_task(&self.driver_source).await; |
There was a problem hiding this comment.
I concur with this, and also in the other location below. If we are racing with a device reconfig, we could miss this and the device would be broken from the start. This can be a follow up PR in main.
| } | ||
|
|
||
| pub async fn save(&mut self) -> anyhow::Result<GdmaDriverSavedState> { | ||
| if self.hwc_failure { |
There was a problem hiding this comment.
I would make a follow up PR to remove vf_reconfiguration_pending from save state and just fail here instead. Otherwise we will save/restore the device only to have our first action be to tear down and recreate.
@justus-camp-microsoft I notice when save fails we still leak the device instead of cleaning it up. Is this intentional? This would make my proposal not work as well since the device would be left running in the bad state.
| device.start_notification_task(&self.driver_source).await; | ||
| self.vf_reconfig_receiver = Some(device.subscribe_vf_reconfig().await); |
There was a problem hiding this comment.
I concur with this, and also in the other location below. If we are racing with a device reconfig, we could miss this and the device would be broken from the start. This can be a follow up PR in main.
#2679) This cherry-picks the following changes, and includes a few minor merge conflict fix-ups. * vmm_tests: add hyper-v openhcl pcat tests (#2602) * petri: backend agnostic additional disk configuration (#2551) * vmm_tests/underhill_core: allow command line to specify settings / config timeout, and make it 30s for many devices test (#2619) * mesh/petri/vmm_tests/vpci: allow env vars when launching mesh process + verbose vpci logs (#2567) * petri: backend agnostic vtl2 settings configuration (#2550) * petri: allow more time for the vm to be off during reboot (#2533) * petri: check if the VM is off when waiting for hyper-v events (#2525) --------- Co-authored-by: Trevor Jones <trevor@thjmedia.net>
mattkur
left a comment
There was a problem hiding this comment.
Approving for 1.7 based on Brian's approval.
* release/1.7.2511: petri/vmm_tests: cherry-pick microsoft#2525 microsoft#2533 microsoft#2550 microsoft#2567 microsoft#2610 microsoft#2551 microsoft#2602 (microsoft#2679)
* release/1.7.2511: netvsp: handle eqe 135 and reconfigure vf on release/1.7.2511 (microsoft#2610)
Clean cherry-pick of #2576
For netvsp to recover from SoC crash and NMC servicing, it must detect EQE 135 from MANA and reconfigure the Virtual Function.
GDMA_EQE_HWC_RECONFIG_VFadded to events handled by the GDMA drivervf_reconfiguration_pendingbool added the GDMA driver. Set when the EQE is received, read by the MANA driver when processing all EQ events.vf_reconfig_senderadded to MANA driver to signal Netvsp VF Manager when it sees 'pending' is truevf_reconfig_receiveradded to Netvsp HclNetworkVFManagerWorker to sendVFReconfigmessage when signaledVfReconfigmessage added to Netvsp VF Manager, which removes the old VF and then creates a new VFSmaller changes:
GDMA_GENERATE_TEST_EQEandGDMA_GENERATE_RECONFIG_VF_EVENTadded in order to create a unit test,test_gdma_reconfig_vf()NextWorkItem::ManaDeviceArrivedrefactored intostartup_vtl2_device()so the logic can be shared withVfReconfigTesting: