Skip to content

Drop packets with invalid NAT requirements in flow-filter#1341

Open
qmonnet wants to merge 2 commits intopr/qmonnet/cleanup-flow-filter-libfrom
pr/qmonnet/flow-filter-skip-nat
Open

Drop packets with invalid NAT requirements in flow-filter#1341
qmonnet wants to merge 2 commits intopr/qmonnet/cleanup-flow-filter-libfrom
pr/qmonnet/flow-filter-skip-nat

Conversation

@qmonnet
Copy link
Member

@qmonnet qmonnet commented Mar 13, 2026

NOT BLOCKING FOR 26.01

Some NAT requirements are not currently supported, including:

  • Masquerading for destination IP address, when the packet has no flow information attached
  • Port forwarding for the source IP address/port, when the packet has no flow information attached

The flow-filter stage has visibility on these NAT requirements, and on the availability of flow session information for the packet. And yet, on non-ambiguous lookup results, it will let packets go through even if the NAT requirements are not valid. One consequence is that additional processing is required, because it falls down to the relevant NAT stages to check their context and dump the packet in that case. Another consequence is that, once a NAT stage eventually dumps the packet, it may do so for reasons that may not obvious when looking at the log. For example, we've observed logs such as:

ERROR dp-worker-8 dataplane_nat::stateful::apalloc: 256: No address pool found for source address 10.50.2.2. Did we hit a bug when building the stateful NAT allocator?
ERROR dp-worker-8 dataplane_nat::stateful: 513: stateful-NAT: Error processing packet: allocation failed: new NAT session creation denied

These logs are not incorrect, in the sense that in the context of the stateful NAT stage, reaching that point might be a bug if we assumed that the packet did require to be NAT-ed.

So in this PR, we add a check to the flow-filter stage to check the two cases described above, and to drop the packet with more helpful log information when we get invalid NAT requirements.

@Fredi-raspall Let me know what you think of it, it sounds like the right place to drop packets when we have invalid NAT requirements, but that's pushing a bit more logic to the flow-filter stage.

@qmonnet qmonnet requested a review from a team as a code owner March 13, 2026 17:34
@qmonnet qmonnet added the area/nat Related to Network Address Translation (NAT) label Mar 13, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds early validation in the flow-filter stage to drop packets with unsupported NAT requirements (destination masquerade or source port-forwarding) when no flow session info is attached. Previously these packets would pass through to downstream NAT stages, which would fail with confusing error messages.

Changes:

  • Added check_nat_requirements() validation before setting NAT requirements for single-match lookups in flow-filter
  • Updated tests to expect packets to be filtered/dropped in invalid NAT scenarios, and added new test cases with flow info attached
  • Extracted fake_flow_session test helper to reduce duplication

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
flow-filter/src/lib.rs Added check_nat_requirements() function and call it before set_nat_requirements for single-match lookups
flow-filter/src/tests.rs Updated assertions for now-dropped packets, added test cases with flow info, extracted fake_flow_session helper

You can also share your feedback on Copilot code review. Take the survey.

@qmonnet qmonnet force-pushed the pr/qmonnet/flow-filter-skip-nat branch from a173135 to 9609b0e Compare March 13, 2026 17:41
@qmonnet qmonnet self-assigned this Mar 13, 2026
@Fredi-raspall
Copy link
Contributor

What issue does this relate to?

@Fredi-raspall
Copy link
Contributor

I think the changes are okay. I have the concern whether this could block legitimate traffic. Also, I believe the error occurs in the multiple matches case, which is not yet covered. I'd opt not to add risky logic there and let instead the next NFs drop as they deem, because the checks are already there, but may make a decision when the multiple case match is addressed.

@Fredi-raspall
Copy link
Contributor

@qmonnet I believe that the approach in #1342 (which we wante to implement anyway) solves the issue.

@qmonnet qmonnet marked this pull request as draft March 16, 2026 12:37
@qmonnet
Copy link
Member Author

qmonnet commented Mar 16, 2026

Moved to draft until we clear up the discussion on the relationship with #1342.

What issue does this relate to?

This PR was initially driven by a report about the confusing logs mentioned above, on Slack, I don't think we've got a corresponding issue on GitHub.

I think the changes are okay. I have the concern whether this could block legitimate traffic.

Conversely, if we assume in the NAT stages that we only receive traffic that we should indeed NAT, we risk letting packets go through even though they're not legit 🤷. I still prefer the risk of dropping legit traffic.

Also, I believe the error occurs in the multiple matches case, which is not yet covered.

As I replied to Copilot, I've been looking into it.

I'd opt not to add risky logic there and let instead the next NFs drop as they deem, because the checks are already there, but may make a decision when the multiple case match is addressed.

👍 I'll work on completing it

@qmonnet I believe that the approach in #1342 (which we wante to implement anyway) solves the issue.

I don't believe it does, I think it addresses something different. #1342 bypasses flow-filter in the case when we have a valid, established flow for the packet; if there's no flow in place, it doesn't change the logic (and I expect we'd still get the same logs from stateful NAT as described above, for example). On the contrary, in the current PR, check_nat_requirements() actually checks (and may drop) when there is no flow for the packet.

@Fredi-raspall
Copy link
Contributor

Moved to draft until we clear up the discussion on the relationship with #1342.

What issue does this relate to?

This PR was initially driven by a report about the confusing logs mentioned above, on Slack, I don't think we've got a corresponding issue on GitHub.

Ok. I remember seeing something, somewhere, but could not recall the source.

I think the changes are okay. I have the concern whether this could block legitimate traffic.

Conversely, if we assume in the NAT stages that we only receive traffic that we should indeed NAT, we risk letting packets go through even though they're not legit 🤷. I still prefer the risk of dropping legit traffic.

Also, I believe the error occurs in the multiple matches case, which is not yet covered.

As I replied to Copilot, I've been looking into it.

I'd opt not to add risky logic there and let instead the next NFs drop as they deem, because the checks are already there, but may make a decision when the multiple case match is addressed.

👍 I'll work on completing it

@qmonnet I believe that the approach in #1342 (which we wante to implement anyway) solves the issue.

I don't believe it does, I think it addresses something different. #1342 bypasses flow-filter in the case when we have a valid, established flow for the packet; if there's no flow in place, it doesn't change the logic (and I expect we'd still get the same logs from stateful NAT as described above, for example). On the contrary, in the current PR, check_nat_requirements() actually checks (and may drop) when there is no flow for the packet.

Ok. I see what you mean

@qmonnet qmonnet force-pushed the pr/qmonnet/flow-filter-skip-nat branch from 9609b0e to 6fba446 Compare March 18, 2026 15:47
@qmonnet qmonnet marked this pull request as ready for review March 18, 2026 15:48
@qmonnet
Copy link
Member Author

qmonnet commented Mar 18, 2026

@Fredi-raspall So I've updated my PR, here are some considerations:

  • I added some clean-up commits for the surrounding code in flow-filter/src/lib.rs, but they should be simple to review.
  • I added (second commit) a fix for the flow lookup logic: do not drop packet if the flow returned has expired.
  • In the last commit, the main one for this PR, I added the NAT requirements check for the case raised by Copilot, where we have multiple possible remotes returned from the flow-filter table lookup. The change is light and I believe this is the only path where we need it - see the comments near the end of the commit description.

I acknowledge the logic in there is sensitive. I believe this code should go in, although if you prefer, we could hold back until after the 26.01 release (and merge only the first 4 commits before the release), so we have more time to test with it? Let me know what you think.

@qmonnet qmonnet force-pushed the pr/qmonnet/flow-filter-skip-nat branch from 6fba446 to d1a9362 Compare March 18, 2026 17:23
@Fredi-raspall
Copy link
Contributor

@qmonnet, some comments on this PR:

  • Although I like code reuse (and some of the changes are fine), IMO the first changes make the code less clear.
  • The first commit moves functions out of the impl FlowFilter. I see no real advantage to that and adding confusion more than anything. Those functions are private. If anything, I'd make them associated functions. I'm fine if you don't want to make them methods, although future changes may require them to be.
  • active_flow_for_packet() is fine. That one could actually be a method of packet since it is pretty generic.
  • dst_vpcd_from_flow_info() isn't fine IMO. True, it hides ugly code, but adding it you end up needing to lock again the flow-info unnecessarily.
  • set_nat_requirements_from_flow_info() calls now active_flow_for_packet(). I think this is not necessary and we end up repeating some logic several times. For instance, suppose a packet has a flow. We call:
1) bypass_with_flow_info(). This calls  active_flow_for_packet() (atomic check)
    then locks for dst_vpcd, 
    then set_nat_requirements (which calls again active_flow_for_packet()).

2) suppose set_nat_requirements_from_flow_info() fails. (side note: I believe we should drop here).
Then we don't bypass. Suppose the table lookup returns MultipleMatches.
Then we'd call check_packet_flow_info() again. This is:
   - active_flow_for_packet (atomic check for flow status)
   - dst_vpcd_from_flow_info (locking again).
If that returns a vpc, then we call set_nat_requirements_from_flow_info(), which:
   - calls active_flow_for_packet()
   - locks again the flow info
  • line 343 "port masquerading" I believe you meant port forwarding?
  • I'm not sure to understand some of the logic in deal_with_multiple_matches().
  • I think that the L4Protocol type adds confusion and we should reconsider it.

Copy link
Contributor

@Fredi-raspall Fredi-raspall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments in PR

@qmonnet
Copy link
Member Author

qmonnet commented Mar 19, 2026

@Fredi-raspall Thanks for the review,

  • Although I like code reuse (and some of the changes are fine), IMO the first changes make the code less clear.

OK I'll work toward improving that

  • The first commit moves functions out of the impl FlowFilter. I see no real advantage to that and adding confusion more than anything. Those functions are private. If anything, I'd make them associated functions. I'm fine if you don't want to make them methods, although future changes may require them to be.

OK, my objective here was to group the functions called by the main FlowFilter.process_packet() together. I moved them out because they didn't really rely on self, but I don't mind much either way, I'm happy to keep them part of FlowFilter. I'll move them in the file but leave them in FlowFilter, in that case.

  • active_flow_for_packet() is fine. That one could actually be a method of packet since it is pretty generic.

I wanted to do that and thought I couldn't because of dependencies, but I forgot you moved FlowInfo to net. OK I'll do it.

  • dst_vpcd_from_flow_info() isn't fine IMO. True, it hides ugly code, but adding it you end up needing to lock again the flow-info unnecessarily.

I don't follow. I don't understand where this change adds new locks, can you please develop?

  • set_nat_requirements_from_flow_info() calls now active_flow_for_packet(). I think this is not necessary and we end up repeating some logic several times. For instance, suppose a packet has a flow. We call: [...]

Right, I noticed and I can add a commit to address it, but this is not related to this PR, we've had that at least since the introduction of the bypass.

  • line 343 "port masquerading" I believe you meant port forwarding?

Not from this PR, but good catch, I'll fix it.

  • I'm not sure to understand some of the logic in deal_with_multiple_matches().

Happy to walk you through if it helps.

  • I think that the L4Protocol type adds confusion and we should reconsider it.

I'm open to reconsidering, but this is beyond the scope of this PR.

I didn't expect the clean-ups to be so controversial - to the point the discussion no longer focusses on the initial problem I was trying to address. I'm moving the clean-up commits to a separate PR so we can deal with the other change here: #1358

qmonnet added 2 commits March 19, 2026 12:43
Add a helper function to fake flow session information for a packet,
setting the destination VPC, but using some mock-up data for stateful
NAT or port forwarding context, because we don't need to read it in the
tests (and setting the real data might involve sorting out circular
dependencies).

This is in prevision for the addition of more checks setting flow
session information.

Signed-off-by: Quentin Monnet <qmo@qmon.net>
Some NAT requirements are not currently supported, including:

- Masquerading for destination IP address, when the packet has no flow
  information attached
- Port forwarding for the source IP address/port, when the packet has no
  flow information attached

The flow-filter stage has visibility on these NAT requirements, and on
the availability of flow session information for the packet. And yet, on
non-ambiguous lookup results, it will let packets go through even if the
NAT requirements are not valid. One consequence is that additional
processing is required, because it falls down to the relevant NAT stages
to check their context and dump the packet in that case. Another
consequence is that, once a NAT stage eventually dumps the packet, it
may do so for reasons that may not obvious when looking at the log. For
example, we've observed logs such as:

    ERROR dp-worker-8 dataplane_nat::stateful::apalloc: 256: No address pool found for source address 10.50.2.2.
        Did we hit a bug when building the stateful NAT allocator?
    ERROR dp-worker-8 dataplane_nat::stateful: 513: stateful-NAT: Error processing packet: allocation failed:
        new NAT session creation denied

These logs are not incorrect, in the sense that in the context of the
stateful NAT stage, reaching that point might be a bug if we assumed
that the packet did require to be NAT-ed.

So in this commit, we add a check to the flow-filter stage to check the
two cases described above, and to drop the packet with more helpful log
information when we get invalid NAT requirements.

There are two cases when we need to check these requirements:

  1) When we find a single remote data information object from the
     flow-filter table lookup, to validate that the NAT requirements in
     that object are supported.

  2) When we find multiple remote data information objects from the
     flow-filter table lookup, and filtering them on the L4 protocol
     leaves only a single one to use in deal_with_multiple_matches().
     This situation is similar to the first case.

  3) However, when we have two matching objects after filtering on the
     L4 protocol in deal_with_multiple_matches(), then the logic of the
     function already discards invalid combinations (we'll find no valid
     case and will return None at the end of the function), so no
     additional check is required on that branch.

We also adjust and compensate the unit tests affected by the change.

Signed-off-by: Quentin Monnet <qmo@qmon.net>
@qmonnet qmonnet force-pushed the pr/qmonnet/flow-filter-skip-nat branch from d1a9362 to e1e8d7e Compare March 19, 2026 12:52
@qmonnet qmonnet changed the base branch from main to pr/qmonnet/cleanup-flow-filter-lib March 19, 2026 12:52
@qmonnet qmonnet requested a review from Fredi-raspall March 19, 2026 12:52
@Fredi-raspall
Copy link
Contributor

I don't follow. I don't understand where this change adds new locks, can you please develop?

It does not add a new lock per se. Prior code would lock and do multiple things with the guard. Moving it out to a function requires locking multiple times while before a single lock would do it.

@Fredi-raspall
Copy link
Contributor

I didn't expect the clean-ups to be so controversial - to the point the discussion no longer focusses on the initial problem I was trying to address. I'm moving the clean-up commits to a separate PR so we can deal with the other change here: #1358

Nor did I 😅 , but since you added it to this PR, I was reviewing the whole thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/nat Related to Network Address Translation (NAT)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants