Skip to content

Conversation

@sdodson
Copy link
Member

@sdodson sdodson commented Aug 11, 2025

The operator now retries the well-known endpoint check every second for up to 15 seconds before setting itself unavailable. This prevents premature "unavailable" status during temporary network issues or API server startup scenarios.

Changes:

  • Add retry loop with 15-second timeout and 1-second intervals
  • Preserve existing ControllerProgressingError handling logic
  • Set WellKnownAvailable=False with reason "NotReady" after retries exhausted
  • Maintain proper progressing status during retry attempts

This change improves operator reliability by giving well-known endpoints time to become ready while maintaining the same final error handling behavior.

Assisted-by: Cursor, Claude Sonnet 4

Related to https://issues.redhat.com/browse/OCPBUGS-20056 but since this only addresses one of the paths I'm not linking this until I can prove whether or not this seems to help.

The operator now retries the well-known endpoint check every second for up to 15 seconds before setting itself unavailable. This prevents premature "unavailable" status during temporary network issues or API server startup scenarios.

Changes:
- Add retry loop with 15-second timeout and 1-second intervals
- Preserve existing ControllerProgressingError handling logic
- Set WellKnownAvailable=False with reason "NotReady" after retries exhausted
- Maintain proper progressing status during retry attempts

This change improves operator reliability by giving well-known endpoints time to become ready while maintaining the same final error handling behavior.

Assisted-by: Cursor, Claude Sonnet 4
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 11, 2025
@openshift-ci openshift-ci bot requested review from ibihim and liouk August 11, 2025 19:02
@sdodson
Copy link
Member Author

sdodson commented Aug 11, 2025

/test ?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sdodson
Once this PR has been reviewed and has the lgtm label, please assign liouk for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

@sdodson: The following commands are available to trigger required jobs:

/test e2e-agnostic
/test e2e-agnostic-upgrade
/test e2e-console-login
/test e2e-gcp-operator-encryption-perf
/test e2e-gcp-operator-encryption-rotation
/test e2e-oidc-techpreview
/test e2e-operator
/test e2e-operator-encryption
/test images
/test okd-scos-images
/test unit
/test verify
/test verify-bindata
/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-agnostic-ipv6
/test e2e-aws-external-oidc
/test e2e-aws-single-node
/test e2e-azure-external-oidc
/test e2e-gcp-external-oidc
/test okd-scos-e2e-aws-ovn
/test test-operator-integration

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic
pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic-ipv6
pull-ci-openshift-cluster-authentication-operator-master-e2e-agnostic-upgrade
pull-ci-openshift-cluster-authentication-operator-master-e2e-aws-single-node
pull-ci-openshift-cluster-authentication-operator-master-e2e-console-login
pull-ci-openshift-cluster-authentication-operator-master-e2e-operator
pull-ci-openshift-cluster-authentication-operator-master-images
pull-ci-openshift-cluster-authentication-operator-master-okd-scos-e2e-aws-ovn
pull-ci-openshift-cluster-authentication-operator-master-okd-scos-images
pull-ci-openshift-cluster-authentication-operator-master-test-operator-integration
pull-ci-openshift-cluster-authentication-operator-master-unit
pull-ci-openshift-cluster-authentication-operator-master-verify
pull-ci-openshift-cluster-authentication-operator-master-verify-bindata
pull-ci-openshift-cluster-authentication-operator-master-verify-deps
Details

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@sdodson
Copy link
Member Author

sdodson commented Aug 11, 2025

/payload-test periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

@sdodson: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info.

@sdodson
Copy link
Member Author

sdodson commented Aug 11, 2025

/payload-job periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2025

@sdodson: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-ci-4.20-e2e-aws-upgrade-ovn-single-node

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6e4f93f0-76e6-11f0-86f8-a1b16a4176a6-0

@sdodson sdodson marked this pull request as draft August 11, 2025 19:11
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 15, 2025

@sdodson: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 14, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 14, 2026

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Review skipped — only excluded labels are configured. (1)
  • do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants