-
Notifications
You must be signed in to change notification settings - Fork 544
virtio_netmap: Use half the vring's size as netmap slots #726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
For every slot in a netmap ring, two descriptors in the virtqueue's
vring are used. Therefore, we can only use half the vring's size as
slots for netmap rings. Otherwise, the vring and the netmap ring are
out of sync. This currently corrupts proper operation and produces
bad data (see below).
This patch also sets an explicit nm_config to ensure we keep the
values for num_rx_desc/num_tx_desc from attach time and marks the
VQ_FULL case in virtio_netmap_init_buffers() as an issue that should
never be triggered.
Background: Experiment / Observation
Running a qemu-guest with a virtio_net NIC (rx_queue_size=256) attached
to a bridge on the host. Sending known/predictable packets in a loop
from the host via scapy. On the guest side, running tcpdump with native
netmap support and inspecting the packet data:
tcpdump -i netmap:enp8s0 -Xn#
The first 128 packets look as expected, but thereafter come 128 packets
of "null data":
128 15:59:46.592378 IP 10.0.0.128.128 > 127.0.0.1.256: UDP, length 141
0x0000: 4500 00a9 0001 0000 4011 f0c2 0a00 0080 E.......@.......
0x0010: 7f00 0001 0080 0100 0095 7f07 3030 3132 ............0012
0x0020: 3820 3031 3330 2046 4646 4646 4646 4646 8.0130.FFFFFFFFF
0x0030: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0040: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0050: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0060: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0070: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0080: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0090: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x00a0: 4646 4646 4646 4646 46 FFFFFFFFF
129 15:59:46.712936 00:00:00:00:00:00 > 00:00:00:00:00:00 Null Information, send seq 0, rcv seq 0, Flags [Command], length 121
0x0000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0050: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0060: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0070: 0000 0000 00 .....
At packet 256 tcpdump receives non-null data, but the payloads represents
what should've been received for packet 129 and later. Because the length
is taken from the vring, some of the slots also expose stale/corrupt data:
281 16:03:51.846490 IP 10.0.0.153.153 > 127.0.0.1.306: UDP, length 74
0x0000: 4500 0066 0001 0000 4011 f0ec 0a00 0099 E..f....@.......
0x0010: 7f00 0001 0099 0132 0052 d633 3030 3135 .......2.R.30015
0x0020: 3320 3030 3633 2046 4646 4646 4646 4646 3.0063.FFFFFFFFF
0x0030: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0040: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0050: 4646 4646 4646 4646 4646 4646 4646 4646 FFFFFFFFFFFFFFFF
0x0060: 4646 4646 4646 5757 5757 5757 5757 5757 FFFFFFWWWWWWWWWW
0x0070: 5757 5757 5757 5757 5757 5757 5757 5757 WWWWWWWWWWWWWWWW
0x0080: 5757 5757 5757 5757 5757 5757 5757 5757 WWWWWWWWWWWWWWWW
0x0090: 5757 5757 5757 5757 5757 5757 0000 0000 WWWWWWWWWWWW....
0x00a0: 0000 0000 0000 0000 ........
|
As an additional comment, FreeBSD uses only half the slots, too, unless indirect descriptors are in use. Linux virtio_net doesn't seem to use indirect descriptors (checked 4.4, 4,19), so it seems reasonable to do the same. https://github.com/freebsd/freebsd/blob/master/sys/dev/netmap/if_vtnet_netmap.h#L386 |
| * num_rx_desc descriptors and so the queue is not | ||
| * allowed to become full. | ||
| */ | ||
| if (vq->num_free == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the VQ_FULL macro for portability
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining this function is unnecessary, because nothing is going to change the values listed in struct nm_config_info...
Here you are not updating them indeed.
For every slot in a netmap ring, two descriptors in the virtqueue's
vring are used. Therefore, we can only use half the vring's size as
slots for netmap rings. Otherwise, the vring and the netmap ring are
out of sync. This currently corrupts proper operation and produces
bad data (see below).
This patch also sets an explicit nm_config to ensure we keep the
values for num_rx_desc/num_tx_desc from attach time and marks the
VQ_FULL case in virtio_netmap_init_buffers() as an issue that should
never be triggered.
Background: Experiment / Observation
Running a qemu-guest with a virtio_net NIC (rx_queue_size=256) attached
to a bridge on the host. Sending known/predictable packets in a loop
from the host via scapy. On the guest side, running tcpdump with native
netmap support and inspecting the packet data:
The first 128 packets look as expected, but thereafter come 128 packets
of "null data":
At packet 256 tcpdump receives non-null data, but the payloads represents
what should've been received for packet 129 and later. Because the length
is taken from the vring, some of the slots also expose stale/corrupt data:
This seems like a biggie and I wonder why this hasn't come up before, maybe there's something off in the analysis?
Scapy code to produce packets