Skip to content
Discussion options

You must be logged in to vote

Retries stop being helpful when they hide the root cause instead of buying time to fix it.

A few practical indicators:

Retries significantly increase tail latency or amplify load during partial outages.

Failures become correlated rather than isolated, causing retry storms.

Success depends on retry count rather than system health.

When retries shift from handling rare transient issues to being required for normal operation, that’s usually a signal that the system’s failure modes aren’t well understood.

At that point, redesigning for better isolation, backpressure, or clearer failure boundaries tends to yield more long-term value than further tuning retry parameters.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by iaversao7-sketch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants