Added retry for TD policy propagation for security_test by pawbhard · Pull Request #237 · grpc/psm-interop

pawbhard · 2026-03-24T13:03:38Z

Security test sometime fails usually in first or second step due to not receiving policy from TD.

This PR add a retry mechanism to account for this.
Internal bug b/280071258

pawbhard · 2026-03-24T13:09:08Z

Run : https://source.cloud.google.com/results/invocations/f4b9258a-5c46-4d9d-822d-5866fba8c086

arjan-bal · 2026-03-25T10:09:15Z

Is it possible to change the order in which GCP resources are created to ensure the connection uses TLS from the start? Initializing a server in plaintext and then upgrading to TLS poses a security risk for production workloads.

pawbhard · 2026-03-25T10:31:26Z

Is it possible to change the order in which GCP resources are created to ensure the connection uses TLS from the start? Initializing a server in plaintext and then upgrading to TLS poses a security risk for production workloads.

Yes its the correct way and need infra changes. Its already added in description and analysis in b/331206277

Yes it posses risk for production workload, but not to us as we are just using this to test. We will take it as a tech debt item (to be created). These is no point on keeping a failing/flaky test open because of this. As this is not a test failure

arjan-bal · 2026-03-25T11:15:34Z

Yes, it poses a risk for production workloads, but not to us, as we are just using this to test.

Ideally, E2E tests should strictly mirror the customer's user journey. When they deviate, we risk missing real-world regressions. For example, if this test created resources with the desired state instead of patching them—and we still observed these failures—it would highlight a legitimate security issue. By introducing this change, the test loses its ability to catch such vulnerabilities, creating a false sense of security.

We will track this as a tech debt item (ticket to be created). There is no point in keeping a failing/flaky test open because of this.

Do we have an estimate for the actual fix? In the meantime, is buggrep properly catching these failures, or is the on-call engineer spending time manually triaging them? I'd be much more comfortable approving a temporary workaround if it is accompanied by a concrete timeline for the real solution. Without that commitment, I believe we should prioritize the proper fix over masking the symptom.

pawbhard · 2026-03-25T11:31:50Z

Yes, it poses a risk for production workloads, but not to us, as we are just using this to test.

Ideally, E2E tests should strictly mirror the customer's user journey. When they deviate, we risk missing real-world regressions. For example, if this test created resources with the desired state instead of patching them—and we still observed these failures—it would highlight a legitimate security issue. By introducing this change, the test loses its ability to catch such vulnerabilities, creating a false sense of security.

We will track this as a tech debt item (ticket to be created). There is no point in keeping a failing/flaky test open because of this.

Do we have an estimate for the actual fix? In the meantime, is buggrep properly catching these failures, or is the on-call engineer spending time manually triaging them? I'd be much more comfortable approving a temporary workaround if it is accompanied by a concrete timeline for the real solution. Without that commitment, I believe we should prioritize the proper fix over masking the symptom.

Agree, We want to mirror customer Journey. Disagree on the point that putting this fix we are losing ability to catch failure as we are currently patching security config. (With or without fix we will not be able to catch what is being described)

On-Caller will pick it up based on priority. Cannot commit on timeline. keeping the issue in test, will not help us in prioritising as I can see issue is open more than 3 years back. I suggest fix what we are currently have, and plan for improvement based on discussion (mirror user journey and not do security patching)

Due to higher number of issue happening, we see conflict in matchers increasing toil o on-caller. (This bug matcher is part of conflicts).

Let us discuss offline if more discussion is needed on this, and converge.

sergiitk

LGTM as long as the tests pass

pawbhard · 2026-03-31T05:56:10Z

Run : https://source.cloud.google.com/results/invocations/bca3d2e0-21df-4452-94bd-268718b2d389

Added retry

dd6e35d

pawbhard requested a review from sergiitk March 24, 2026 13:03

pawbhard requested a review from a team as a code owner March 24, 2026 13:03

pawbhard requested a review from arjan-bal March 24, 2026 13:03

lint

6a80307

pawbhard changed the title ~~Added retry for initial file for security_test~~ Added retry for TD policy propagation for security_test Mar 24, 2026

arjan-bal assigned pawbhard Mar 25, 2026

sergiitk approved these changes Mar 31, 2026

View reviewed changes

Comment thread framework/xds_k8s_testcase.py Outdated

review comment

337ab69

pawbhard merged commit 5ff4026 into grpc:main Mar 31, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added retry for TD policy propagation for security_test#237

Added retry for TD policy propagation for security_test#237
pawbhard merged 3 commits intogrpc:mainfrom
pawbhard:security_test_fix

pawbhard commented Mar 24, 2026

Uh oh!

pawbhard commented Mar 24, 2026

Uh oh!

arjan-bal commented Mar 25, 2026

Uh oh!

pawbhard commented Mar 25, 2026

Uh oh!

arjan-bal commented Mar 25, 2026

Uh oh!

pawbhard commented Mar 25, 2026

Uh oh!

sergiitk left a comment

Uh oh!

Uh oh!

pawbhard commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pawbhard commented Mar 24, 2026

Uh oh!

pawbhard commented Mar 24, 2026

Uh oh!

arjan-bal commented Mar 25, 2026

Uh oh!

pawbhard commented Mar 25, 2026

Uh oh!

arjan-bal commented Mar 25, 2026

Uh oh!

pawbhard commented Mar 25, 2026

Uh oh!

sergiitk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pawbhard commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants