Skip to content

Retry api calls on 'service unavailable'#266

Open
fwiesel wants to merge 1 commit intostable/rocky-m3from
status_code_retry
Open

Retry api calls on 'service unavailable'#266
fwiesel wants to merge 1 commit intostable/rocky-m3from
status_code_retry

Conversation

@fwiesel
Copy link
Copy Markdown
Member

@fwiesel fwiesel commented Oct 29, 2021

The keystoneauth1 adapter used as the basis for cinder, glance,
and neutron api calls already support to retry on a 503 status call,
if the corresponding parameter is passed.
Currently, only connection failures are retried, but if the service
is behind a load-balancer, that is rather unlikely and instead a
Service Unavailable error would be raised.

Change-Id: I82cf1d6eecad1262841c49e10d30c1ec5ba26f80

The keystoneauth1 adapter used as the basis for cinder, glance,
and neutron api calls already support to retry on a 503 status call,
if the corresponding parameter is passed.
Currently, only connection failures are retried, but if the service
is behind a load-balancer, that is rather unlikely and instead a
Service Unavailable error would be raised.

Change-Id: I82cf1d6eecad1262841c49e10d30c1ec5ba26f80
@fwiesel fwiesel requested a review from joker-at-work October 29, 2021 10:11
Copy link
Copy Markdown

@joker-at-work joker-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our admin/internal APIs don't go via loadbalancer, so it's more likely that connection errors occur.

Why did you opt for taking the same setting for different retries? We wouldn't be able to disable one, if it becomes problematic.

By default, this only retries 503, which is probably fine for requests changing anything in Cinder, but for GET-requests, we might also want to retry on 500 - e.g. the DB restarting and thus requests failing with "internal server error". What do you think?

@fwiesel
Copy link
Copy Markdown
Member Author

fwiesel commented Oct 29, 2021

Our admin/internal APIs don't go via loadbalancer, so it's more likely that connection errors occur.

The change is general and not specific to our situation. But agreed, then it doesn't help us much.

Why did you opt for taking the same setting for different retries?

The option is called http_retries and the says Number of times xyzclient should retry on any failed http call..
As it stands, the code doesn't do that, it only retries on failed connections.
Separate values make more sense to me if we need the granularity to actually tune them do different values.

We wouldn't be able to disable one, if it becomes problematic.

We still can disable retries, just not separately. I would say that is good enough for fixing something which requires a fix.

By default, this only retries 503, which is probably fine for requests changing anything in Cinder, but for GET-requests, we might also want to retry on 500 - e.g. the DB restarting and thus requests failing with "internal server error". What do you think?

The retries are enabled for all requests including POST,PUT,etc. I rather would not like to retry that on the lowest level except for 503, as the APIs are not guaranteed to be idempotent.
I think, a 500 error is a "real" server error, which should be fixed on the application/db side.

E.g. fix the retry logic in the application-db api (oslo.db) to handle the restart better (more likely in a short-time frame) or fixing it on the db side that the API is zero-downtime (quite a bit of effort).

Copy link
Copy Markdown

@joker-at-work joker-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thank you for the explanations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants