Add retry decorator to tests that are vulnerable to transient service issues#421
Add retry decorator to tests that are vulnerable to transient service issues#421timmarkhuff merged 7 commits intomainfrom
Conversation
brandon-wada
left a comment
There was a problem hiding this comment.
I'm fine with trying this out.
I think the central worry on something like this is that it might hide real flaky issues. However, we're currently fairly comfortable noting that the flakiness we observe is due to irregular usage patterns when 12 copies of the same tests are running in sync with each other via GHA. Given that, we're unlikely to be hiding issues that are relevant to any real usage, and we should be able to observe real issues in our BE alerting
I think ideally, none of the SDK tests should be intended to uncover flakiness. Flakiness is better discovered through canary tests than unit tests in python-sdk. |
Some tests in python-sdk are vulnerable to bad responses from the cloud service. For example, an image query might get a result of STILL_PROCESSING, which means the cloud didn't have an answer in time. This is a transient error and will almost always be resolved with a retry.
This PR adds a retry decorator to protect such tests. Any test that submits an image query and asserts anything about the result are protected with this decorator.
I also found a few instances of tests that were not using our standard
detector_namefunction for naming detectors. I fixed those too.