-
Notifications
You must be signed in to change notification settings - Fork 261
A61 update: Avoid re-resolution without any connection failures #539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -130,9 +130,13 @@ address list. Specifically: | |||||||||||||||||||||||||||||||||||||||||||||||||
| reports READY, which happens after all handshakes are complete), | ||||||||||||||||||||||||||||||||||||||||||||||||||
| we choose that connection. If there is a timer running, we cancel | ||||||||||||||||||||||||||||||||||||||||||||||||||
| the timer. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| - Each time a subchannel reports TRANSIENT_FAILURE, we will increase a | ||||||||||||||||||||||||||||||||||||||||||||||||||
| counter for the number of connection failures. The counter is reset | ||||||||||||||||||||||||||||||||||||||||||||||||||
| any time re-resolution is requested. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| - We will wait for at least one connection attempt on every address to | ||||||||||||||||||||||||||||||||||||||||||||||||||
| fail before we consider the first pass to be complete. At that point, | ||||||||||||||||||||||||||||||||||||||||||||||||||
| we will request re-resolution. As per [gRFC A62][A62], we will report | ||||||||||||||||||||||||||||||||||||||||||||||||||
| we will request re-resolution if the number of connection failures is at | ||||||||||||||||||||||||||||||||||||||||||||||||||
| least the number of subchannels. As per [gRFC A62][A62], we will report | ||||||||||||||||||||||||||||||||||||||||||||||||||
| TRANSIENT_FAILURE state and will continue trying to connect. We will | ||||||||||||||||||||||||||||||||||||||||||||||||||
| stay in TRANSIENT_FAILURE until either (a) we become connected or (b) | ||||||||||||||||||||||||||||||||||||||||||||||||||
| the LB policy is destroyed by the channel shutting down or going IDLE. | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -143,18 +147,18 @@ all times, with no regard for the order of the addresses. Each | |||||||||||||||||||||||||||||||||||||||||||||||||
| individual subchannel will provide [backoff behavior][backoff-spec], | ||||||||||||||||||||||||||||||||||||||||||||||||||
| reporting TRANSIENT_FAILURE while in backoff and then IDLE when backoff | ||||||||||||||||||||||||||||||||||||||||||||||||||
| has finished. The pick_first policy will therefore automatically | ||||||||||||||||||||||||||||||||||||||||||||||||||
| request a connection whenever a subchannel reports IDLE. We will count | ||||||||||||||||||||||||||||||||||||||||||||||||||
| the number of connection failures, and when that number reaches the | ||||||||||||||||||||||||||||||||||||||||||||||||||
| number of subchannels, we will request re-resolution; note that because | ||||||||||||||||||||||||||||||||||||||||||||||||||
| request a connection whenever a subchannel reports IDLE. When the | ||||||||||||||||||||||||||||||||||||||||||||||||||
| number of connection failures reaches the number of subchannels (and not | ||||||||||||||||||||||||||||||||||||||||||||||||||
| in the first pass), we will request re-resolution; note that because | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+151
to
+152
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not in the first pass? If we're going to trigger this based solely on the number of connection attempt failures, then shouldn't we do that regardless of whether or not we're in the first pass? It seems like if we're already in TF and we get a new address list, we shouldn't effectively reset the counter.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't think I changed any semantics here. I just made it more explicit. This paragraph only applies when we swap "modes" away from first pass. I don't think we need to re-resolve during the first pass as we haven't actually tried all the addresses yet. We trigger re-resolution after that completes. If we are doing a first pass, we're not guaranteed to be in TF... I could understand basing some of this off of "in TF" vs "not in TF," but the existing design uses "in first pass" vs "not in first pass." So changes there are more invasive. |
||||||||||||||||||||||||||||||||||||||||||||||||||
| the backoff state will differ across the subchannels, this may mean that | ||||||||||||||||||||||||||||||||||||||||||||||||||
| we have seen multiple failures of a single subchannel and no failures | ||||||||||||||||||||||||||||||||||||||||||||||||||
| from another subchannel, but this is a close enough approximation and | ||||||||||||||||||||||||||||||||||||||||||||||||||
| very simple to implement. | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
133
to
156
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it might be clearer to decouple the discussion of re-resolution behavior from the rest of the description here by putting it in its own paragraph:
Suggested change
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Your use of the github "suggestion" feature is broken. You might want to avoid using it next time or properly select the multiple lines necessary to make it useful. (Or... I don't even know. I see you selected multiple lines now. I have no clue what github is doing. Although the multiple lines selected still looks suspect.) That seems strictly worse to me, as it needs to alternate between "first pass" and "not first pass". It's also not as clear whether this paragraph is a continuation of the "not first pass" paragraph above; in fact, reading what you wrote I'd assume it is a continuation and only applies when we swap "modes", because the "(and not in the first pass)" is a parenthetical. I don't understand why "note that because the backoff" is combined with a semicolon with the previous sentence. They seem unrelated. It looks like your new reorganization broke the language, because now we're not requesting re-resolution when we leave the first pass. Was that your intention? I can't actually tell your intention with the language/organization present. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| Note that every time the LB policy receives a new address list, it will | ||||||||||||||||||||||||||||||||||||||||||||||||||
| start an initial Happy Eyeballs pass over the new list, even if some of | ||||||||||||||||||||||||||||||||||||||||||||||||||
| start a new first pass over the new list using Happy Eyeballs, even if some of | ||||||||||||||||||||||||||||||||||||||||||||||||||
| the subchannels are not actually new due to their addresses having been | ||||||||||||||||||||||||||||||||||||||||||||||||||
| present on both the old and new lists. This means that on the initial | ||||||||||||||||||||||||||||||||||||||||||||||||||
| present on both the old and new lists. This means that on the first | ||||||||||||||||||||||||||||||||||||||||||||||||||
| pass through the address list for a subsequent address list update, when | ||||||||||||||||||||||||||||||||||||||||||||||||||
| pick_first decides to start a connection attempt on a given subchannel | ||||||||||||||||||||||||||||||||||||||||||||||||||
| (whether because it is the first subchannel in the list or because the | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
Uh oh!
There was an error while loading. Please reload this page.