Skip to content

Honor retry after#10421

Open
taliandre49 wants to merge 8 commits intoopenshift:mainfrom
taliandre49:honor-retry-after
Open

Honor retry after#10421
taliandre49 wants to merge 8 commits intoopenshift:mainfrom
taliandre49:honor-retry-after

Conversation

@taliandre49
Copy link

Honor Retry-After Header During PowerVS Cluster Destruction

This PR adds support for honoring the Retry-After response header when IBM Cloud API calls are rate limited (HTTP 429) during PowerVS cluster destruction.

Changes include:

  • Added waitForRetryAfter() helper function in powervs.go that reads the Retry-After header from a rate-limited response and sleeps for the specified duration before retrying, while also respecting context cancellation.
  • Updated loadSDKServices() to wrap ListZonesWithContext and ListDnszones calls in retry loops that invoke waitForRetryAfter() on 429 responses instead of immediately returning an error.
  • Added unit tests in powervs_test.go covering waitForRetryAfter() behavior (nil response, missing header, valid wait, context cancellation) and an integration-style test for the ListZones retry loop using a mock HTTP server.

Corresponding Jira ticket: https://redhat.atlassian.net/browse/MULTIARCH-5854

…ubnet destruction

Signed-off-by: Natalia Jordan <natalia.jordan@ibm.com>
…on error that failed before destroyPowerSubnets() logic ran

Signed-off-by: Natalia Jordan <natalia.jordan@ibm.com>
Signed-off-by: Natalia Jordan <natalia.jordan@ibm.com>
…guards for when to trigger logic

Signed-off-by: Natalia Jordan <natalia.jordan@ibm.com>
…roy powervs

Signed-off-by: Natalia Jordan <natalia.jordan@ibm.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 23, 2026

Walkthrough

Adds PowerVS destroy-time recovery for resources blocked by attached network interfaces (DHCP servers and subnets), and implements Retry-After-driven retry handling for SDK List APIs returning HTTP 429; includes unit tests for the Retry-After behavior.

Changes

Cohort / File(s) Summary
DHCP network deletion & attachment handling
pkg/destroy/powervs/power-dhcp.go
Introduce helpers: isDHCPNetworkAttachedError, extractNetworkIDFromError, findNetworkIDByName. Update destroyDHCPNetwork to fetch DHCP server object, proactively delete NICs for the DHCP server's network before deletion, attempt NIC cleanup when deletion fails due to attachments, and remove the prior subnet-existence backoff.
Subnet NIC cleanup & error classification
pkg/destroy/powervs/power-subnet.go
Add deleteNetworkInterfaces(subnetID string) error to enumerate/delete NICs and isNetworkInterfaceError(err error) bool to detect NIC-blocking errors. Modify deletePowerSubnet to delete subnet-attached NICs on deletion failures caused by attached interfaces and return a wrapped error to trigger existing retry/backoff.
Rate-limit retry / Retry-After handling
pkg/destroy/powervs/powervs.go
Add ClusterUninstaller.waitForRetryAfter(ctx, response) to parse and wait on Retry-After (seconds). Wrap SDK list calls (zonesSvc.ListZonesWithContext, dnsZonesSvc.ListDnszones) with retry logic that, on HTTP 429, uses waitForRetryAfter to decide whether to wait and retry. Introduce local *core.DetailedResponse variables and import strconv.
Unit tests for Retry-After and retry loop
pkg/destroy/powervs/powervs_test.go
Add TestWaitForRetryAfter and TestListZonesRetryLogic to validate waitForRetryAfter behavior (nil response, missing header, positive wait, context cancellation) and to exercise a retry loop against an httptest server returning two 429+Retry-After responses followed by a 200, asserting timing and retry count.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from Prashanth684 and mkumatag March 23, 2026 18:55
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign prashanth684 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/destroy/powervs/power-dhcp.go`:
- Around line 292-296: The current error return in the DHCP server deletion
block causes wait.ExponentialBackoffWithContext to abort because it receives
(false, error) — change the closure to signal a retry by returning (false, nil)
after attempting NIC deletion; keep the o.Logger.Warnf call and remove/replace
the fmt.Errorf(...) return with a retry signal so the backoff loop continues
(see the closure around wait.ExponentialBackoffWithContext and the call to
o.deleteNetworkInterfaces and the DHCP server deletion logic).

In `@pkg/destroy/powervs/power-subnet.go`:
- Around line 100-104: The current callback used with
ExponentialBackoffWithContext returns a non-nil error after calling
deleteNetworkInterfaces (via the fmt.Errorf(...) return), which causes the
backoff to stop; change the callback so that when NIC cleanup is attempted but
subnet is still blocked you return the retriable signal instead (i.e., return
(false, nil) from the callback) rather than a non-nil error — or alternatively
return a sentinel error value that your backoff wrapper maps to (false, nil).
Update the code path around deleteNetworkInterfaces and the fmt.Errorf(...)
return so the backoff receives the retriable indication and will retry the
subnet deletion.

In `@pkg/destroy/powervs/powervs.go`:
- Around line 414-419: The select that waits on ctx.Done() vs time.After(d)
treats a cancelled context as a successful wait by returning true; change the
ctx.Done() branch to return false (or otherwise propagate the cancellation) so
callers (e.g., the DNS zones loop that calls ListDnszones) stop retrying when
the context is cancelled. Locate the boolean wait helper (the select using
ctx.Done() and time.After(d)) and update it to return false on ctx.Done()
(optionally include ctx.Err() if you want to surface why), and update callers to
stop retrying when the helper returns false.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 30bb2d50-314b-4f3c-b28f-7655fbbc6b16

📥 Commits

Reviewing files that changed from the base of the PR and between ac9a370 and c1c68b0.

📒 Files selected for processing (4)
  • pkg/destroy/powervs/power-dhcp.go
  • pkg/destroy/powervs/power-subnet.go
  • pkg/destroy/powervs/powervs.go
  • pkg/destroy/powervs/powervs_test.go

Comment on lines +292 to +296
if nicErr := o.deleteNetworkInterfaces(networkID); nicErr != nil {
o.Logger.Warnf("Failed to delete network interfaces for DHCP subnet %q: %v", item.name, nicErr)
}
// Return error to trigger retry after NIC deletion
return fmt.Errorf("DHCP server deletion blocked by attached network interfaces: %w", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the DHCP code around lines 292-296
cd /tmp && find . -name "power-dhcp.go" -type f 2>/dev/null | head -5

Repository: openshift/installer

Length of output: 45


🏁 Script executed:

# List relevant files
git ls-files | grep -E "(power-dhcp|power-subnet)" | head -10

Repository: openshift/installer

Length of output: 132


🏁 Script executed:

# Read the DHCP code around the flagged lines
cat -n pkg/destroy/powervs/power-dhcp.go | sed -n '285,305p'

Repository: openshift/installer

Length of output: 1179


🏁 Script executed:

# Get more context around destroyDHCPNetwork function
cat -n pkg/destroy/powervs/power-dhcp.go | sed -n '250,300p'

Repository: openshift/installer

Length of output: 2773


🏁 Script executed:

# Get the full destroyDHCPNetwork function
cat -n pkg/destroy/powervs/power-dhcp.go | sed -n '260,390p'

Repository: openshift/installer

Length of output: 5369


🏁 Script executed:

# Check the subnet code for comparison
cat -n pkg/destroy/powervs/power-subnet.go | grep -A 50 "deleteNetworkInterfaces\|ExponentialBackoff" | head -80

Repository: openshift/installer

Length of output: 3405


🏁 Script executed:

# Check imports in power-dhcp.go to confirm which wait package is used
head -50 pkg/destroy/powervs/power-dhcp.go | grep -E "^import|wait"

Repository: openshift/installer

Length of output: 108


🏁 Script executed:

# Get all imports from power-dhcp.go
cat -n pkg/destroy/powervs/power-dhcp.go | sed -n '1,30p'

Repository: openshift/installer

Length of output: 1064


🌐 Web query:

k8s.io/apimachinery/pkg/util/wait ExponentialBackoffWithContext return value semantics bool error

💡 Result:

The function ExponentialBackoffWithContext in k8s.io/apimachinery/pkg/util/wait has the signature: func ExponentialBackoffWithContext(ctx context.Context, backoff Backoff, condition ConditionWithContextFunc) error It returns a single error value, not (bool, error). The condition function (ConditionWithContextFunc) is called repeatedly with exponential backoff until: - the condition returns (true, nil): function returns nil - the condition returns (false, err): function returns that err - context is cancelled: returns ctx.Err - backoff.Steps exhausted: returns ErrWaitTimeout There is no bool return value from the function itself; success is indicated by nil error, timeout by ErrWaitTimeout, condition error by the condition's error, or context error by ctx.Err.

Citations:


🏁 Script executed:

# Verify the actual k8s source code to double-check the semantics
web_search "kubernetes wait.ExponentialBackoffWithContext condition function false error exit immediately"

Repository: openshift/installer

Length of output: 113


🌐 Web query:

k8s.io/apimachinery/pkg/util/wait ConditionWithContextFunc (false, error) behavior retry

💡 Result:

In the k8s.io/apimachinery/pkg/util/wait package, ConditionWithContextFunc is defined as func(context.Context) (done bool, err error). It returns true if the condition is satisfied, or an error if the loop should be aborted. This type was added in v0.22.0. When used in polling functions like ExponentialBackoffWithContext(ctx context.Context, backoff Backoff, condition ConditionWithContextFunc) error (or similar functions like PollImmediateUntilWithContext), the behavior for returning (false, error) is: the error is immediately propagated and returned by the polling function, terminating the retry loop without further attempts. Returning (false, nil) continues retrying according to the backoff policy until success (true, nil), timeout, context cancellation, or max steps. This mirrors the non-context ConditionFunc behavior: errors abort immediately, false continues polling. Context variants respect ctx.Done for cancellation. Example usage pattern from client-go/util/retry: err := wait.ExponentialBackoff(backoff, func (bool, error) { err := fn if err == nil { return true, nil } if retriable(err) { return false, nil } // retry return false, err // abort }) Returning (false, error) aborts retry immediately with that error.

Citations:


Return value exits retry loop immediately instead of triggering retry

Lines 295-296 return an error expecting the retry loop to continue, but wait.ExponentialBackoffWithContext aborts on (false, error) instead of retrying. The error propagates immediately, bypassing the intended retry after NIC deletion. Use (false, nil) to signal retry continuation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/destroy/powervs/power-dhcp.go` around lines 292 - 296, The current error
return in the DHCP server deletion block causes
wait.ExponentialBackoffWithContext to abort because it receives (false, error) —
change the closure to signal a retry by returning (false, nil) after attempting
NIC deletion; keep the o.Logger.Warnf call and remove/replace the
fmt.Errorf(...) return with a retry signal so the backoff loop continues (see
the closure around wait.ExponentialBackoffWithContext and the call to
o.deleteNetworkInterfaces and the DHCP server deletion logic).

Comment on lines +100 to +104
if nicErr := o.deleteNetworkInterfaces(item.id); nicErr != nil {
o.Logger.Warnf("Failed to delete network interfaces for subnet %q: %v", item.name, nicErr)
}
// Return error to trigger retry after NIC deletion
return fmt.Errorf("subnet deletion blocked by network interfaces: %w", err)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

version=$(sed -nE 's/^[[:space:]]*k8s\.io\/apimachinery[[:space:]]+(v[^[:space:]]+).*/\1/p' go.mod | head -n1)

if [[ -z "${version}" ]]; then
  echo "Could not determine k8s.io/apimachinery version from go.mod" >&2
  exit 1
fi

for path in pkg/util/wait/backoff.go pkg/util/wait/wait.go; do
  url="https://raw.githubusercontent.com/kubernetes/apimachinery/${version}/${path}"
  if curl -fsSL "$url" | rg -n -C2 'func ExponentialBackoffWithContext|err != nil \|\| ok'; then
    exit 0
  fi
done

echo "Unable to locate ExponentialBackoffWithContext source for ${version}" >&2
exit 1

Repository: openshift/installer

Length of output: 740


Fix retry semantics: returning a non-nil error terminates the backoff instead of retrying

ExponentialBackoffWithContext stops immediately when the condition returns a non-nil error. Line 104 returns fmt.Errorf(...) after attempting NIC cleanup, which will exit the backoff loop and proceed directly to the fatal error handler on line 155-157, skipping any retry. To make this state retriable, signal retry intent by returning (false, nil) instead—use a sentinel value or error type that the callback maps to this return pair.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/destroy/powervs/power-subnet.go` around lines 100 - 104, The current
callback used with ExponentialBackoffWithContext returns a non-nil error after
calling deleteNetworkInterfaces (via the fmt.Errorf(...) return), which causes
the backoff to stop; change the callback so that when NIC cleanup is attempted
but subnet is still blocked you return the retriable signal instead (i.e.,
return (false, nil) from the callback) rather than a non-nil error — or
alternatively return a sentinel error value that your backoff wrapper maps to
(false, nil). Update the code path around deleteNetworkInterfaces and the
fmt.Errorf(...) return so the backoff receives the retriable indication and will
retry the subnet deletion.

Signed-off-by: Natalia Jordan <natalia.jordan@ibm.com>
@taliandre49
Copy link
Author

update:

  • resolved golint test failures
  • resolving code rabbit review comments
  • todo squash commits

Signed-off-by: Natalia Jordan <natalia.jordan@ibm.com>
@taliandre49
Copy link
Author

taliandre49 commented Mar 24, 2026

Note

Current code limitation: if the API keeps returning 429 indefinitely, this loops forever. There is no maximum number of retries. It would be a good design to give the retry loop a cap, will dig into what a reasonable cap would be.

Other things TODO consider: Code duplication, Sleep + signal tangled

…tley does not retry after

Signed-off-by: Natalia Jordan <natalia.jordan@ibm.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
pkg/destroy/powervs/powervs.go (2)

617-621: Use the context-aware DNS zones call if the SDK provides one.

Right now ctx only governs the backoff sleep. If ListDnszones itself blocks, cancellation/deadlines still won't interrupt the HTTP call. If the generated client has ListDnszonesWithContext, prefer it here so the request and the wait share the same cancellation behavior.

Possible fix if the SDK exposes a context-aware variant
-			listZonesResponse, detailedResponse, err = o.dnsZonesSvc.ListDnszones(options)
+			listZonesResponse, detailedResponse, err = o.dnsZonesSvc.ListDnszonesWithContext(ctx, options)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/destroy/powervs/powervs.go` around lines 617 - 621, The call to
o.dnsZonesSvc.ListDnszones should use a context-aware variant so the HTTP
request honors cancellation/deadlines; replace the invoke with
o.dnsZonesSvc.ListDnszonesWithContext(ctx, options) (or call that overload when
it exists) and keep assigning to listZonesResponse, detailedResponse, err;
retain the same rate-limit handling with o.waitForRetryAfter(ctx,
detailedResponse) and the Debugf log. If the SDK doesn't expose
ListDnszonesWithContext, add a small wrapper that uses the context-aware method
when available, otherwise fall back to the existing ListDnszones to preserve
behavior.

558-568: Deduplicate the 429 retry policy.

These branches already need the same wait/log/exit behavior, and the next policy change (cap, jitter, or clearer cancellation errors) will have to be kept in sync twice. A small shared helper would keep this flow consistent.

As per coding guidelines, Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Also applies to: 616-625

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/destroy/powervs/powervs.go` around lines 558 - 568, The 429
retry/log/wait logic in loadSDKServices around o.zonesSvc.ListZonesWithContext
is duplicated; extract that policy into a small shared helper (e.g., a method
like o.handleRateLimitRetry(ctx, detailedResponse, err, "ListZones") or
o.retryOnTooManyRequests(ctx, func() (res any, resp *http.Response, err error) {
... })) and replace both blocks (the ListZonesWithContext block and the similar
block at lines ~616-625) to call the helper; the helper should inspect
detailedResponse.StatusCode for StatusTooManyRequests, call
o.waitForRetryAfter(ctx, detailedResponse), log via o.Logger.Debugf with a clear
operation name, and return a boolean or wrapped error so the caller can either
continue the loop or return the formatted error consistently from
loadSDKServices.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/destroy/powervs/powervs.go`:
- Around line 401-413: The Retry-After handling in waitForRetryAfter incorrectly
rejects valid HTTP-date values and treats 0 as invalid; update the logic that
reads headers.Get("Retry-After") so it first tries strconv.Atoi and accepts 0
(remove the secs <= 0 rejection), and if Atoi fails then attempt to parse the
header as an HTTP-date (use time.Parse with RFC1123/RFC1123Z as appropriate) and
compute secs as the max of 0 and int(parsedTime.Sub(time.Now()).Seconds()); then
use secs to build the duration (d := time.Duration(secs) * time.Second) and log
the header and computed wait. Reference symbols: retryAfterHeader, headers.Get,
strconv.Atoi, time.Parse, secs, d, and the waitForRetryAfter logging calls.

---

Nitpick comments:
In `@pkg/destroy/powervs/powervs.go`:
- Around line 617-621: The call to o.dnsZonesSvc.ListDnszones should use a
context-aware variant so the HTTP request honors cancellation/deadlines; replace
the invoke with o.dnsZonesSvc.ListDnszonesWithContext(ctx, options) (or call
that overload when it exists) and keep assigning to listZonesResponse,
detailedResponse, err; retain the same rate-limit handling with
o.waitForRetryAfter(ctx, detailedResponse) and the Debugf log. If the SDK
doesn't expose ListDnszonesWithContext, add a small wrapper that uses the
context-aware method when available, otherwise fall back to the existing
ListDnszones to preserve behavior.
- Around line 558-568: The 429 retry/log/wait logic in loadSDKServices around
o.zonesSvc.ListZonesWithContext is duplicated; extract that policy into a small
shared helper (e.g., a method like o.handleRateLimitRetry(ctx, detailedResponse,
err, "ListZones") or o.retryOnTooManyRequests(ctx, func() (res any, resp
*http.Response, err error) { ... })) and replace both blocks (the
ListZonesWithContext block and the similar block at lines ~616-625) to call the
helper; the helper should inspect detailedResponse.StatusCode for
StatusTooManyRequests, call o.waitForRetryAfter(ctx, detailedResponse), log via
o.Logger.Debugf with a clear operation name, and return a boolean or wrapped
error so the caller can either continue the loop or return the formatted error
consistently from loadSDKServices.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: aad51eef-6a47-4fb7-a285-4583ee2ad366

📥 Commits

Reviewing files that changed from the base of the PR and between b2251b5 and d487dc2.

📒 Files selected for processing (1)
  • pkg/destroy/powervs/powervs.go

Comment on lines +401 to +413
retryAfterHeader := headers.Get("Retry-After")

if retryAfterHeader == "" {
o.Logger.Debugf("waitForRetryAfter: Retry-After header is not present, or could not be retrieved")
return false
}
secs, err := strconv.Atoi(retryAfterHeader)
if err != nil || secs <= 0 {
o.Logger.Debugf("waitForRetryAfter: Invalid Retry-After header value: %s", retryAfterHeader)
return false
}
d := time.Duration(secs) * time.Second
o.Logger.Debugf("waitForRetryAfter: Rate limited, honoring Retry-After header, waiting for %v", secs)
Copy link

@coderabbitai coderabbitai bot Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n pkg/destroy/powervs/powervs.go | sed -n '390,420p'

Repository: openshift/installer

Length of output: 1218


🌐 Web query:

RFC 7231 Retry-After header format specification delta-seconds HTTP-date

💡 Result:

The RFC 7231 Retry-After header format specification defines it as either an HTTP-date or delta-seconds (number of seconds to delay). ABNF: Retry-After = HTTP-date / delay-seconds delay-seconds = 1*DIGIT HTTP-date is defined in RFC 7231 Section 7.1.1.1 as IMF-fixdate / obs-date (preferred fixed-length format or obsolete formats for compatibility). delta-seconds is a non-negative decimal integer (1 or more digits) representing seconds after the response is received. It matches the general delta-seconds rule used across HTTP/1.1 specs (defined in RFC 7230 Section 3.2.6 or equivalent, but locally in Retry-After context). Examples from RFC: Retry-After: Fri, 31 Dec 1999 23:59:59 GMT Retry-After: 120 (2 minutes) Note: RFC 7231 is obsoleted by RFC 9110 (HTTP Semantics, 2022), but the format remains unchanged.

Citations:


🏁 Script executed:

# Check if net/http provides ParseTime function
python3 << 'EOF'
import subprocess
result = subprocess.run(['go', 'doc', 'net/http', 'ParseTime'], 
                       capture_output=True, text=True, cwd='.')
print(result.stdout)
print(result.stderr)
EOF

Repository: openshift/installer

Length of output: 319


🏁 Script executed:

head -30 pkg/destroy/powervs/powervs.go

Repository: openshift/installer

Length of output: 964


🏁 Script executed:

rg "gohttp|import" pkg/destroy/powervs/powervs.go | head -40

Repository: openshift/installer

Length of output: 549


Support the full Retry-After header format per RFC 7231.

The Retry-After header can be sent as either an HTTP-date or delta-seconds format, and 0 is a valid delay value. The current implementation using strconv.Atoi() followed by secs <= 0 check only handles delta-seconds. This causes valid HTTP-date responses (e.g., Retry-After: Fri, 31 Dec 1999 23:59:59 GMT) to be silently dropped as errors, breaking HTTP specification compliance.

Suggested fix
 	retryAfterHeader := headers.Get("Retry-After")
 
 	if retryAfterHeader == "" {
 		o.Logger.Debugf("waitForRetryAfter: Retry-After header is not present, or could not be retrieved")
 		return false
 	}
-	secs, err := strconv.Atoi(retryAfterHeader)
-	if err != nil || secs <= 0 {
-		o.Logger.Debugf("waitForRetryAfter: Invalid Retry-After header value: %s", retryAfterHeader)
-		return false
-	}
-	d := time.Duration(secs) * time.Second
-	o.Logger.Debugf("waitForRetryAfter: Rate limited, honoring Retry-After header, waiting for %v", secs)
+	retryAfterHeader = strings.TrimSpace(retryAfterHeader)
+	var d time.Duration
+	if secs, err := strconv.Atoi(retryAfterHeader); err == nil {
+		if secs < 0 {
+			o.Logger.Debugf("waitForRetryAfter: Invalid Retry-After header value: %s", retryAfterHeader)
+			return false
+		}
+		d = time.Duration(secs) * time.Second
+	} else if t, err := gohttp.ParseTime(retryAfterHeader); err == nil {
+		d = time.Until(t)
+		if d < 0 {
+			d = 0
+		}
+	} else {
+		o.Logger.Debugf("waitForRetryAfter: Invalid Retry-After header value: %s", retryAfterHeader)
+		return false
+	}
+	o.Logger.Debugf("waitForRetryAfter: Rate limited, honoring Retry-After header, waiting for %v", d)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/destroy/powervs/powervs.go` around lines 401 - 413, The Retry-After
handling in waitForRetryAfter incorrectly rejects valid HTTP-date values and
treats 0 as invalid; update the logic that reads headers.Get("Retry-After") so
it first tries strconv.Atoi and accepts 0 (remove the secs <= 0 rejection), and
if Atoi fails then attempt to parse the header as an HTTP-date (use time.Parse
with RFC1123/RFC1123Z as appropriate) and compute secs as the max of 0 and
int(parsedTime.Sub(time.Now()).Seconds()); then use secs to build the duration
(d := time.Duration(secs) * time.Second) and log the header and computed wait.
Reference symbols: retryAfterHeader, headers.Get, strconv.Atoi, time.Parse,
secs, d, and the waitForRetryAfter logging calls.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree need to incorporate something along these lineds

 retryTime, err := time.Parse(time.RFC1123, Retryheader)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taliandre49, great — that's the right direction. One suggestion: since the file already imports gohttp "net/http", you can use gohttp.ParseTime() instead of time.Parse(time.RFC1123, ...) directly. net/http.ParseTime handles all three HTTP date formats (RFC1123, RFC850, and ANSI C asctime) per the HTTP spec, making it more robust:

retryAfterHeader = strings.TrimSpace(retryAfterHeader)
var d time.Duration
if secs, err := strconv.Atoi(retryAfterHeader); err == nil {
    if secs < 0 {
        o.Logger.Debugf("waitForRetryAfter: Invalid Retry-After header value: %s", retryAfterHeader)
        return false
    }
    d = time.Duration(secs) * time.Second
} else if t, err := gohttp.ParseTime(retryAfterHeader); err == nil {
    d = time.Until(t)
    if d < 0 {
        d = 0
    }
} else {
    o.Logger.Debugf("waitForRetryAfter: Invalid Retry-After header value: %s", retryAfterHeader)
    return false
}
o.Logger.Debugf("waitForRetryAfter: Rate limited, honoring Retry-After header, waiting for %v", d)
select {
case <-ctx.Done():
    return false
case <-time.After(d):
    return true
}

Key points vs. the original:

  • secs < 0 replaces secs <= 0 — a Retry-After: 0 is valid and means "retry immediately" (the time.After(0) fires right away and returns true).
  • gohttp.ParseTime already imports cleanly since gohttp is aliased in this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant