Retry api calls on 'service unavailable'#266
Conversation
The keystoneauth1 adapter used as the basis for cinder, glance, and neutron api calls already support to retry on a 503 status call, if the corresponding parameter is passed. Currently, only connection failures are retried, but if the service is behind a load-balancer, that is rather unlikely and instead a Service Unavailable error would be raised. Change-Id: I82cf1d6eecad1262841c49e10d30c1ec5ba26f80
There was a problem hiding this comment.
Our admin/internal APIs don't go via loadbalancer, so it's more likely that connection errors occur.
Why did you opt for taking the same setting for different retries? We wouldn't be able to disable one, if it becomes problematic.
By default, this only retries 503, which is probably fine for requests changing anything in Cinder, but for GET-requests, we might also want to retry on 500 - e.g. the DB restarting and thus requests failing with "internal server error". What do you think?
The change is general and not specific to our situation. But agreed, then it doesn't help us much.
The option is called
We still can disable retries, just not separately. I would say that is good enough for fixing something which requires a fix.
The retries are enabled for all requests including POST,PUT,etc. I rather would not like to retry that on the lowest level except for 503, as the APIs are not guaranteed to be idempotent. E.g. fix the retry logic in the application-db api (oslo.db) to handle the restart better (more likely in a short-time frame) or fixing it on the db side that the API is zero-downtime (quite a bit of effort). |
joker-at-work
left a comment
There was a problem hiding this comment.
Sounds good. Thank you for the explanations.
The keystoneauth1 adapter used as the basis for cinder, glance,
and neutron api calls already support to retry on a 503 status call,
if the corresponding parameter is passed.
Currently, only connection failures are retried, but if the service
is behind a load-balancer, that is rather unlikely and instead a
Service Unavailable error would be raised.
Change-Id: I82cf1d6eecad1262841c49e10d30c1ec5ba26f80