Drop packets with invalid NAT requirements in flow-filter#1341
Drop packets with invalid NAT requirements in flow-filter#1341qmonnet wants to merge 2 commits intopr/qmonnet/cleanup-flow-filter-libfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds early validation in the flow-filter stage to drop packets with unsupported NAT requirements (destination masquerade or source port-forwarding) when no flow session info is attached. Previously these packets would pass through to downstream NAT stages, which would fail with confusing error messages.
Changes:
- Added
check_nat_requirements()validation before setting NAT requirements for single-match lookups inflow-filter - Updated tests to expect packets to be filtered/dropped in invalid NAT scenarios, and added new test cases with flow info attached
- Extracted
fake_flow_sessiontest helper to reduce duplication
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| flow-filter/src/lib.rs | Added check_nat_requirements() function and call it before set_nat_requirements for single-match lookups |
| flow-filter/src/tests.rs | Updated assertions for now-dropped packets, added test cases with flow info, extracted fake_flow_session helper |
You can also share your feedback on Copilot code review. Take the survey.
a173135 to
9609b0e
Compare
|
What issue does this relate to? |
|
I think the changes are okay. I have the concern whether this could block legitimate traffic. Also, I believe the error occurs in the multiple matches case, which is not yet covered. I'd opt not to add risky logic there and let instead the next NFs drop as they deem, because the checks are already there, but may make a decision when the multiple case match is addressed. |
|
Moved to draft until we clear up the discussion on the relationship with #1342.
This PR was initially driven by a report about the confusing logs mentioned above, on Slack, I don't think we've got a corresponding issue on GitHub.
Conversely, if we assume in the NAT stages that we only receive traffic that we should indeed NAT, we risk letting packets go through even though they're not legit 🤷. I still prefer the risk of dropping legit traffic.
As I replied to Copilot, I've been looking into it.
👍 I'll work on completing it
I don't believe it does, I think it addresses something different. #1342 bypasses flow-filter in the case when we have a valid, established flow for the packet; if there's no flow in place, it doesn't change the logic (and I expect we'd still get the same logs from stateful NAT as described above, for example). On the contrary, in the current PR, |
Ok. I remember seeing something, somewhere, but could not recall the source.
Ok. I see what you mean |
9609b0e to
6fba446
Compare
|
@Fredi-raspall So I've updated my PR, here are some considerations:
I acknowledge the logic in there is sensitive. I believe this code should go in, although if you prefer, we could hold back until after the 26.01 release (and merge only the first 4 commits before the release), so we have more time to test with it? Let me know what you think. |
6fba446 to
d1a9362
Compare
|
@qmonnet, some comments on this PR:
|
Fredi-raspall
left a comment
There was a problem hiding this comment.
See comments in PR
|
@Fredi-raspall Thanks for the review,
OK I'll work toward improving that
OK, my objective here was to group the functions called by the main
I wanted to do that and thought I couldn't because of dependencies, but I forgot you moved
I don't follow. I don't understand where this change adds new locks, can you please develop?
Right, I noticed and I can add a commit to address it, but this is not related to this PR, we've had that at least since the introduction of the bypass.
Not from this PR, but good catch, I'll fix it.
Happy to walk you through if it helps.
I'm open to reconsidering, but this is beyond the scope of this PR. I didn't expect the clean-ups to be so controversial - to the point the discussion no longer focusses on the initial problem I was trying to address. I'm moving the clean-up commits to a separate PR so we can deal with the other change here: #1358 |
Add a helper function to fake flow session information for a packet, setting the destination VPC, but using some mock-up data for stateful NAT or port forwarding context, because we don't need to read it in the tests (and setting the real data might involve sorting out circular dependencies). This is in prevision for the addition of more checks setting flow session information. Signed-off-by: Quentin Monnet <qmo@qmon.net>
Some NAT requirements are not currently supported, including:
- Masquerading for destination IP address, when the packet has no flow
information attached
- Port forwarding for the source IP address/port, when the packet has no
flow information attached
The flow-filter stage has visibility on these NAT requirements, and on
the availability of flow session information for the packet. And yet, on
non-ambiguous lookup results, it will let packets go through even if the
NAT requirements are not valid. One consequence is that additional
processing is required, because it falls down to the relevant NAT stages
to check their context and dump the packet in that case. Another
consequence is that, once a NAT stage eventually dumps the packet, it
may do so for reasons that may not obvious when looking at the log. For
example, we've observed logs such as:
ERROR dp-worker-8 dataplane_nat::stateful::apalloc: 256: No address pool found for source address 10.50.2.2.
Did we hit a bug when building the stateful NAT allocator?
ERROR dp-worker-8 dataplane_nat::stateful: 513: stateful-NAT: Error processing packet: allocation failed:
new NAT session creation denied
These logs are not incorrect, in the sense that in the context of the
stateful NAT stage, reaching that point might be a bug if we assumed
that the packet did require to be NAT-ed.
So in this commit, we add a check to the flow-filter stage to check the
two cases described above, and to drop the packet with more helpful log
information when we get invalid NAT requirements.
There are two cases when we need to check these requirements:
1) When we find a single remote data information object from the
flow-filter table lookup, to validate that the NAT requirements in
that object are supported.
2) When we find multiple remote data information objects from the
flow-filter table lookup, and filtering them on the L4 protocol
leaves only a single one to use in deal_with_multiple_matches().
This situation is similar to the first case.
3) However, when we have two matching objects after filtering on the
L4 protocol in deal_with_multiple_matches(), then the logic of the
function already discards invalid combinations (we'll find no valid
case and will return None at the end of the function), so no
additional check is required on that branch.
We also adjust and compensate the unit tests affected by the change.
Signed-off-by: Quentin Monnet <qmo@qmon.net>
d1a9362 to
e1e8d7e
Compare
It does not add a new lock per se. Prior code would lock and do multiple things with the guard. Moving it out to a function requires locking multiple times while before a single lock would do it. |
Nor did I 😅 , but since you added it to this PR, I was reviewing the whole thing. |
NOT BLOCKING FOR 26.01
Some NAT requirements are not currently supported, including:
The flow-filter stage has visibility on these NAT requirements, and on the availability of flow session information for the packet. And yet, on non-ambiguous lookup results, it will let packets go through even if the NAT requirements are not valid. One consequence is that additional processing is required, because it falls down to the relevant NAT stages to check their context and dump the packet in that case. Another consequence is that, once a NAT stage eventually dumps the packet, it may do so for reasons that may not obvious when looking at the log. For example, we've observed logs such as:
These logs are not incorrect, in the sense that in the context of the stateful NAT stage, reaching that point might be a bug if we assumed that the packet did require to be NAT-ed.
So in this PR, we add a check to the flow-filter stage to check the two cases described above, and to drop the packet with more helpful log information when we get invalid NAT requirements.
@Fredi-raspall Let me know what you think of it, it sounds like the right place to drop packets when we have invalid NAT requirements, but that's pushing a bit more logic to the flow-filter stage.