Prevent infinite retries of autoscaling#9574
Conversation
|
@weizhouapache would like some advice on this issue. Do you think that if the number of all VMs including those in error and stopped states >= max size then we should stop scaling any further Or do you think if there are VMs in error state we need to retry for a few iterations? |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## 4.19 #9574 +/- ##
============================================
- Coverage 15.08% 4.30% -10.79%
============================================
Files 5406 366 -5040
Lines 472889 29514 -443375
Branches 57738 5162 -52576
============================================
- Hits 71352 1270 -70082
+ Misses 393593 28100 -365493
+ Partials 7944 144 -7800
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| sc.setParameters("vmGroupId", vmGroupId); | ||
| sc.setJoinParameters("vmSearch", "states", | ||
| State.Starting, State.Running, State.Stopping, State.Migrating); | ||
| State.Starting, State.Running, State.Stopping, State.Migrating, State.Error, State.Stopped); |
There was a problem hiding this comment.
This may make autoscaler to not retry deployment if any deployment goes into error state intermittently. Should we include Error state after some n reties?
There was a problem hiding this comment.
Yeah, that was my worry too, wanted some inputs on it. I'll do that. Thanks @shwstppr
|
@Pearl1594 my other advice would be
|
|
I added Stopped State imagining a scenario where in say for whatever reason a host goes down, all VMs belonging to the autoscale group, on that host would enter stopped state. This would cause additional VMs to be redeployed. I thought of it as a problematic scenario. But maybe that's how it should behave, I am not sure of the scope of autoscaling. Maybe you could shed some light on this @weizhouapache |
Yea i think that works. As a user, even if there is anything wrong with the VM, id still like it to be considered under the ASG Group. If not, i would not know i had a faulty VM as i would only find out by going through my list of VMs outside the ASG. |
|
closing this as #11244 addresses it |
Description
This PR fixes: #9318
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?