sync_switch_configuration: Don't allow dueling Nexuses based on a stale blueprint#10336
sync_switch_configuration: Don't allow dueling Nexuses based on a stale blueprint#10336jgallagher wants to merge 8 commits intomainfrom
sync_switch_configuration: Don't allow dueling Nexuses based on a stale blueprint#10336Conversation
9a4f1a3 to
11d2e05
Compare
|
Starting to review this:
As discussed in the update watercooler in the morning, this isn't quite right I think, since it is a partial fix. You may want to remove this from the PR description so 10320 doesn't get auto-closed. |
Ehh, #10320 notes that it's specifically about the blueprint issue. I should open a separate issue for the network config side. |
sunshowers
left a comment
There was a problem hiding this comment.
Looks great, just a few questions.
| warn!( | ||
| log, | ||
| "skipping bootstore update due to stale blueprint"; | ||
| "our-blueprint-gen" => desired_gen, | ||
| "bootstore-blueprint-gen" => current_gen, | ||
| ); |
There was a problem hiding this comment.
Worth logging the values of rnc_differs and nat_differs here?
| // * "Definitely not" if our generation is older than the generation | ||
| // currently in the bootstore; this indicates we've produced | ||
| // `desired_blueprint_networking_config` based on a stale blueprint. | ||
| // * "Definitely yes" if our generation is not older than the current | ||
| // bootstore generation and there have been changes to the config. | ||
| // * "No information" if our NAT entries haven't changed, we fall through to | ||
| // checking the rack network config below. |
There was a problem hiding this comment.
could you rewrite this as a decision table? If I understand correctly "No information" is "desired generation >= current gen and NAT entries equal" -- a decision table would make this clearer and would help visually inspect that all cases are handled.
| // 2. Backwards compatibility: prior versions of this type did not store | ||
| // this information at all, and we must be able to cleanly handle that at | ||
| // runtime. |
There was a problem hiding this comment.
"At runtime" here includes within the bootstore, right?
There was a problem hiding this comment.
Yeah this is kind of vague isn't it? Do you have preference between cutting this off:
// 2. Backwards compatibility: prior versions of this type did not store
// this information at all.
or changing it to specify that we treat older versions as None:
// 2. Backwards compatibility: prior versions of this type did not store
// this information at all. If the bootstore contains an earlier
// `SystemNetworkingConfig` that we need to convert to the latest
// version, `blueprint_external_networking_config` will be `None`.
There was a problem hiding this comment.
I like the second version.
sure, that works :) |
This commit has two primary changes:
SystemNetworkingConfigtype to have an extra generation number inside it attached to the service NAT entries. This is a copy of the blueprint'sexternal_networking_generationnumber from the blueprint where the NAT entries were extracted. (This requires bumping the sled-agent API and all the extra machinery for changing the bootstore type; this is a little noisy but very mechanical.)sync_switch_configurationbg task to inspect this generation number and skip updating the bootstore if its currently-loaded blueprint has an olderexternal_networking_generation(i.e., is stale).Fixes #10320. Staged on top of #10331.