Skip to content

Conversation

@nwochtma
Copy link

No description provided.

Copy link
Owner

@walking-machine walking-machine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of nits in this review, but one thing that needs to be fixed is I think there would be memory leaks for larger packets, because you always alloc the same number of header buffers as data buffers, but if we have 2 data buffers per packet, we only need 8 new header buffers per 16 new data buffers, the remaining 8 would be leaked. There several solutions to such a problem:

  • If total packet size (READ_ONCE(adapter->netdev->mtu) + LIBETH_RX_LL_LEN) is 3K or greater, do not use pseudo header split, I prefer this option
  • do separate next_to_use and next_to_clean accounting for header buffer

@alobakin
Copy link

A lot of nits in this review, but one thing that needs to be fixed is I think there would be memory leaks for larger packets, because you always alloc the same number of header buffers as data buffers, but if we have 2 data buffers per packet, we only need 8 new header buffers per 16 new data buffers, the remaining 8 would be leaked. There several solutions to such a problem:

The solution is to recycle the buffer to the page pool just like in idpf and ice (libeth_xdp_process_buff() does that by default when the size is zero).
I think it should be like that: W/A only under xdp->data, but process_buff() always when there's hdr_pp.

* If total packet size (`READ_ONCE(adapter->netdev->mtu) + LIBETH_RX_LL_LEN`) is 3K or greater, do not use pseudo header split, I prefer this option

You mean during the configuration or on a per-packet basis? The latter is not possible: buffers from the data FQ have a different layout when created with the hsplit flag, they lack headroom and tailroom.

The initial idea of this W/A is that: if we can't fine-tune the HW-writable buffer len using a 128-byte stride, then we always create header pools, so that data pools always have pow-2 buffers (almost the same as construct_skb() / "legacy-rx", but faster and more generic).

* do separate `next_to_use` and `next_to_clean` accounting for header buffer

Please don't :D

@nwochtma nwochtma force-pushed the larysa-check-2025-09-12 branch 5 times, most recently from 7e71a22 to 35f9f52 Compare September 29, 2025 14:23
#define IXGBEVF_RX_PAGE_LEN(hr) (ALIGN_DOWN(LIBETH_RX_PAGE_LEN(hr), \
IXGBE_SRRCTL_BSIZEPKT_STEP))

#define IXGBEVF_FLAG_PSEUDO_HSPLIT BIT(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just name it HSPLIT maybe. Both pseudo and normal hsplit won't differ too much.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we do not have header split, so it would be confusing

@nwochtma nwochtma force-pushed the larysa-check-2025-09-12 branch from 35f9f52 to 68621c1 Compare October 2, 2025 12:38
@walking-machine
Copy link
Owner

walking-machine commented Oct 2, 2025

A lot of nits in this review, but one thing that needs to be fixed is I think there would be memory leaks for larger packets, because you always alloc the same number of header buffers as data buffers, but if we have 2 data buffers per packet, we only need 8 new header buffers per 16 new data buffers, the remaining 8 would be leaked. There several solutions to such a problem:

The solution is to recycle the buffer to the page pool just like in idpf and ice (libeth_xdp_process_buff() does that by default when the size is zero). I think it should be like that: W/A only under xdp->data, but process_buff() always when there's hdr_pp.

* If total packet size (`READ_ONCE(adapter->netdev->mtu) + LIBETH_RX_LL_LEN`) is 3K or greater, do not use pseudo header split, I prefer this option

You mean during the configuration or on a per-packet basis? The latter is not possible: buffers from the data FQ have a different layout when created with the hsplit flag, they lack headroom and tailroom.

When MTU + LL is greater than 3K, it behaves just like any other ixgbevf HW, so it would not be per-packet (see, I have used netdev->mtu), as every packet buffer would have headroom + 3K + tailroom. Like the only advantage of the newer HW here is the packet size filter, which obviously does not help in case of bigger MTU.

The initial idea of this W/A is that: if we can't fine-tune the HW-writable buffer len using a 128-byte stride, then we always create header pools, so that data pools always have pow-2 buffers (almost the same as construct_skb() / "legacy-rx", but faster and more generic).

* do separate `next_to_use` and `next_to_clean` accounting for header buffer

Please don't :D

@nwochtma nwochtma force-pushed the larysa-check-2025-09-12 branch from 68621c1 to fa3678e Compare October 3, 2025 12:35
@nwochtma nwochtma force-pushed the larysa-check-2025-09-12 branch from fa3678e to 395ae75 Compare October 9, 2025 12:33
Copy link
Owner

@walking-machine walking-machine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks fine to me

struct page_pool *hdr_pp;

struct xdp_rxq_info xdp_rxq;

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newlines not related to the patch

union ixgbe_adv_rx_desc *rx_desc;
struct libeth_fqe *rx_buffer;
struct libeth_fqe *hdr_buff;
unsigned int hdr_size = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move it under if() now

hdr_size = ixgbevf_rx_hsplit_wa(hdr_buff,
rx_buffer,
size);
size -= hdr_size ? : size;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we cannot copy from the buffer, we zero its size, so it is dropped?

This patch introduces pseudo header split support in the ixgbevf
driver, specifically targeting ixgbe_mac_82599_vf.

On older hardware (e.g. ixgbe_mac_82599_vf), RX DMA write size can only be
limited in 1K increments. This causes issues when attempting to fit
multiple packets per page, as a DMA write may overwrite the
headroom of the next packet.

To address this, introduce pseudo header split support, where the hardware
copies the full L2 header into a dedicated header buffer. This avoids the
need for HR/TR alignment and allows safe skb construction from the header
buffer without risking overwrites.

Signed-off-by: Natalia Wochtman <natalia.wochtman@intel.com>
@nwochtma nwochtma force-pushed the larysa-check-2025-09-12 branch from 395ae75 to 5341a01 Compare October 13, 2025 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants