Background
The discussion in #667, #771, and #772 has established partial handling for 409 Conflict errors. The current state (as of the merge of #772) is:
- 409 Conflict is generally non-retryable (terminal)
- Exception: Neutron quota-exceeded 409s are retryable (
isNeutronQuotaError), because quota can free up without spec changes and Neutron is the only OpenStack service that uses 409 for quota (others use 413 or 403)
Open question 1: transitional-state 409s
During the review of #771, a closely related scenario was raised but explicitly deferred to a follow-up: OpenStack services return 409 when an operation is attempted on a resource in a transitional state (e.g. deleting a LoadBalancer that is still in PROVISIONING status). These errors are retryable simply by waiting — the conflict resolves itself without any spec change. Unlike quota errors, transitional-state 409s are not specific to one service (Octavia, Neutron, Nova, etc. all have transitional states).
Open question 2: whitelist vs blacklist for retryable 409s
The Neutron quota carve-out uses a whitelist approach: 409 is terminal by default, with specific known-retryable cases carved out. But the same logic used to justify retrying quota errors applies much more broadly:
- Quota exceeded → resolved by freeing quota externally (no spec change)
- Duplicate name → resolved by deleting the conflicting resource externally (no spec change)
- Resource in transitional state → resolved by waiting (no spec change)
If "can be resolved without a spec change" is the criterion for retryability, then most 409s arguably qualify. This raises the question of whether a blacklist approach is more appropriate: treat 409 as retryable by default, and mark only specific known-terminal cases as non-retryable.
It is worth asking: are there 409 scenarios that are only solvable by a spec change? If not, the whitelist approach may be the wrong default.
Options on the table
-
Whitelist (current direction): 409 is terminal by default; carve out specific retryable patterns by inspecting the error body. Con: ongoing maintenance burden; easy to miss cases; risks being inconsistent across services.
-
Blacklist: 409 is retryable by default; mark only specific known-terminal patterns as non-retryable. Pro: consistent with how most 409s behave in practice. Con: requires identifying which 409s truly are only fixable via spec change.
-
All 409s retryable: Treat every Conflict as transient with exponential backoff. Pro: simple and handles all cases. Con: if any 409 can only be fixed by a spec change, it would spin indefinitely.
References
Background
The discussion in #667, #771, and #772 has established partial handling for 409 Conflict errors. The current state (as of the merge of #772) is:
isNeutronQuotaError), because quota can free up without spec changes and Neutron is the only OpenStack service that uses 409 for quota (others use 413 or 403)Open question 1: transitional-state 409s
During the review of #771, a closely related scenario was raised but explicitly deferred to a follow-up: OpenStack services return 409 when an operation is attempted on a resource in a transitional state (e.g. deleting a LoadBalancer that is still in
PROVISIONINGstatus). These errors are retryable simply by waiting — the conflict resolves itself without any spec change. Unlike quota errors, transitional-state 409s are not specific to one service (Octavia, Neutron, Nova, etc. all have transitional states).Open question 2: whitelist vs blacklist for retryable 409s
The Neutron quota carve-out uses a whitelist approach: 409 is terminal by default, with specific known-retryable cases carved out. But the same logic used to justify retrying quota errors applies much more broadly:
If "can be resolved without a spec change" is the criterion for retryability, then most 409s arguably qualify. This raises the question of whether a blacklist approach is more appropriate: treat 409 as retryable by default, and mark only specific known-terminal cases as non-retryable.
It is worth asking: are there 409 scenarios that are only solvable by a spec change? If not, the whitelist approach may be the wrong default.
Options on the table
Whitelist (current direction): 409 is terminal by default; carve out specific retryable patterns by inspecting the error body. Con: ongoing maintenance burden; easy to miss cases; risks being inconsistent across services.
Blacklist: 409 is retryable by default; mark only specific known-terminal patterns as non-retryable. Pro: consistent with how most 409s behave in practice. Con: requires identifying which 409s truly are only fixable via spec change.
All 409s retryable: Treat every Conflict as transient with exponential backoff. Pro: simple and handles all cases. Con: if any 409 can only be fixed by a spec change, it would spin indefinitely.
References
IsRetryable; deferred transitional-state 409s