Skip to content

WIP: Hypervisor per ESXi#233

Closed
fwiesel wants to merge 7522 commits intostable/rocky-m3from
hypervisor_per_esxi
Closed

WIP: Hypervisor per ESXi#233
fwiesel wants to merge 7522 commits intostable/rocky-m3from
hypervisor_per_esxi

Conversation

@fwiesel
Copy link
Copy Markdown
Member

@fwiesel fwiesel commented Aug 26, 2021

Currently, the code only collects stats for the different nodes... Now I have to look how to represent it to placement.
After that, the question is, how to migrate the VMs between the hypervisor nodes.

@fwiesel fwiesel force-pushed the hypervisor_per_esxi branch 2 times, most recently from 27166a8 to d5c92e7 Compare August 27, 2021 11:24
@fwiesel fwiesel force-pushed the hypervisor_per_esxi branch 5 times, most recently from 59ccaf7 to 57ca92b Compare September 21, 2021 07:49
@fwiesel fwiesel force-pushed the hypervisor_per_esxi branch from 57ca92b to 8999084 Compare October 21, 2021 14:20
@fwiesel fwiesel force-pushed the hypervisor_per_esxi branch from 8999084 to e99307a Compare November 3, 2021 09:30
@fwiesel fwiesel force-pushed the hypervisor_per_esxi branch 2 times, most recently from 7fd3978 to be8ed79 Compare November 10, 2021 10:02
@fwiesel fwiesel force-pushed the hypervisor_per_esxi branch from be8ed79 to 8864d38 Compare December 1, 2021 08:45
@fwiesel fwiesel force-pushed the hypervisor_per_esxi branch from 8864d38 to 244fb1f Compare December 21, 2021 10:17
@fwiesel fwiesel force-pushed the hypervisor_per_esxi branch from 244fb1f to 67d41f4 Compare December 30, 2021 10:38
@fwiesel fwiesel force-pushed the hypervisor_per_esxi branch from 67d41f4 to e77d660 Compare January 12, 2022 15:48
@fwiesel fwiesel force-pushed the hypervisor_per_esxi branch from e77d660 to 3d4a22d Compare March 8, 2022 09:33
imitevm and others added 14 commits August 8, 2022 15:46
Whiile nova already cleaned up its original image-cache, our patches
also added images as templates into the mix that were not cleaned up
prior to this patch and thus filling up the datastores.

Change-Id: I2fc631d6ce0a9339be9237d63ae7e86d94779dcc
(cherry picked from commit 97780bb)
(cherry picked from commit ce5f026)
If we sort the by creation-time, we can stop looping over tasks once we
find an event that's too old. All other VMs, we didn't find an event
for, yet, must be expired anyways.

Change-Id: I196bb9ab48867f314d9a5f3b7566384fa72778df
(cherry picked from commit 642bcc7)
(cherry picked from commit 2190382)
Since we want to keep images in the owner's folder again instead of
uploading them to all projects using them, we need to also make sure we
clean up everywhere - even in projects not having instances.

Change-Id: I9344a9bb8c0436cb5c87f3328fa3b9d31dedbdbc
(cherry picked from commit 943babe)
(cherry picked from commit b1af943)
We need to support Python3 in the future and this is a low-hanging
fruit.

Change-Id: I86804af29f4c6aff14fa5495e5344377488ba8fe
(cherry picked from commit 95c177f)
(cherry picked from commit bce80c6)
We want to make sure we only clean up image-cache templates and no other
VMs that might happen to lie in the "Images" folder.

Change-Id: I55b7fa7ebbe14b13f579ebc39dcae2549ddedc9a
(cherry picked from commit 67fda8f)
(cherry picked from commit d18fb11)
Since every nova-compute node will be running the image-cache cleanup,
we have to make sure this works in parallel. Therefore, we limit the
cleanup of image-cache templates to the datastores the nova-compute node
is responsible for - the ones configured in the datastore_regex config
option.

Change-Id: I5fae822a08bcc06565c64f959553cf7082bb2423
(cherry picked from commit 0d8d760)
(cherry picked from commit 5f21c7d)
Even with image-as-template disabled we have to run the image-template
cleanup to ensure removal after the setting is switched off and because
we switched all our image-upload to using those image-templates.

Change-Id: I924fbc42f014ecf4c0342246d48f27b6ba1d5c77
(cherry picked from commit 44b8620)
(cherry picked from commit 7331446)
Otherwise, we might copy the same image to the datastore multiple times,
just because VMs from multiple projects are deployed from that (shared)
image.

Change-Id: I02721da655ec505d7c3c5d3c9faca1be77dce813
(cherry picked from commit 95b6176)
(cherry picked from commit aea1268)
Every image we upload automatically ends up as template-VM. We can
leverage this in normal deployments and copy the VMDK from the
image-template, if it still exists. If it doesn't exist anymore, we can
try to copy another image-template from another datastore and use its
VMDK instead - that should be still faster than copying stuff from
glance again.

If all fails, we turn back to the original approach of loading the image
from glance.

Change-Id: I659d22e494a86fe4e567306062784313432d11ee
(cherry picked from commit cfa3eb8)
(cherry picked from commit a0d9db9)
Until now, the logs during a clean-up showed "Destroyed VM" messages,
but one could not find out, which VM was destroyed. Therefore, we add an
additional log message to point out that and which image-template VM
we're going to destroy.

Change-Id: I7429fca0175ec4593689466be6dcc0cb2482cb9f
(cherry picked from commit 557e34e)
(cherry picked from commit 2d6f122)
There are cases, where we don't want to blindly try and delete a VM, but
instead know the outcome for certain types of errors at least. For
supporting this case, vm_util.destroy_vm() has to re-raise certain types
of exceptions. In the case we're looking for, that's the
"InvalidArgument" fault. For this to work like before, code calling
destroy_vm() needs to catch these exceptions, too.

Change-Id: I28d2f1b94b8439cfea88146746ae6e59d61f087c
(cherry picked from commit e112c5e)
(cherry picked from commit 73a9a27)
While we don't know why, yet, we've seen image-cache templates missing
their directory on the datastore.

For the image-template cleanup this means, that Nova cannot destroy
those templates, as this raises an InvalidArgument fault on VMware's
side, tellung us that ConfigSpec.files.vmPathName is invalid. Since we
need to clean those templates up anyways to create a new, usable one, we
catch those errors and call UnregisterVM as a fallback.

For getting templates from another datastores or re-using the already
existing template to copy its disk, we have to catch
FileNotFoundException. If this situation occurs, we have to clean that
broken template up and also let nova continue the search for a source of
the disk for the new VM.

Change-Id: Id6d1cc1cd7a50958c77a1417e3f2aed7b9672e15
(cherry picked from commit 1b342cd)
(cherry picked from commit 13940c5)
If the DRS policy is not set to "fullyAutomated", i.e. if DRS will not
take action on existing, running VMs, we have not change to free up a
host and thus don't have to try to find one. Instead, we error out to
tell nova-bigvm to search for another cluster.

Change-Id: Idbdfe82b4057844401e710fb9d87141478bb3353
(cherry picked from commit e98d6ae)
(cherry picked from commit 0e8f17e)
Support for the E1000 driver is phasing out in guest operating systems,
so we switch to E1000e, which is also faster and has more hardware
offloading capabilities.

Change-Id: I08ac32f914a57d3eb7328351a07a20a2ef212cb8
(cherry picked from commit 5e7556d)

fix unit tests for vmware: Switch default VIF model

(cherry picked from commit 425070f)
(cherry picked from commit a3b173a)
grandchild and others added 25 commits January 26, 2023 12:01
Original code can be found at: https://opendev.org/x/nova-mksproxy

Integrate the code into nova as a builtin command.

Subclass `base.SecurityProxy` into `MksSecurityProxy`. Because MKS'
`connect_info.internal_access_path` contains more than just a `path`
(namely a JSON object with "ticket", "cfgFile" & "thumbprint") add a
new `parse_internal_access_path()` method in
`NovaProxyRequestHandler`. This method tries to parse
`internal_acess_path` as JSON and if that fails, puts the contents in
the "path" key of a new `internal_access_path_data` dict.

Co-authored-by: Johannes Kulik <johannes.kulik@sap.com>
Has to be "nova-{NAME}proxy".
The old script had a --verbose flag that we are using in the k8s
deployment definition for all regions. This will include both Xena
and older Rocky deployments. The new name of this flag is
--mks-verbose.

Instead of adding an image version check in the helm-chart, add an
alias for the flag instead, for the duration of the update, and revert
this after the Xena upgrade is through everywhere.
Physical CPUs are faster than hyperthreaded CPUs. So by default VMware
spreads a VM's CPU cores onto physical cores, spanning multiple NUMA
nodes if needed although the vCPUs would fit onto a smaller number of
NUMA nodes.

HANA VMs prefer low-latency (and thus NUMA-locality) over raw
performance, so they set the VMware config `preferHT` to not get
spread out. Before this setting was only applied if the VM's flavor
qualified as a "Big VM" flavor (i.e. >1GiB) in size. This excluded
smaller HANA flavors and a single-NUMA-node VM got spread over two
nodes.

Make `preferHT` depend on whether the flavor requires one of our
`CUSTOM_NUMASIZE_*` traits instead.
Another best-practice recommendation for HANA-on-VMware.

The code pretty much duplicates the one for setting the DRS partially-
automated behavior. I decided against deduplication and abstraction in
favor of readability and at the cost of LOC.
Testing showed that single- and half-numa-node VMs get spawned across
multiple NUMA nodes due to `numa.autosize.vcpu.maxPerVirtualNode`
being too low. Set the value explictly to the same value that's used
for the `CoresPerSocket` cluster-vm setting.
Introduced in 'vmware: Restart BigVMs with "high" priority', the
function "update_cluster_das_vm_override()" did not work, because it
created a "ClusterDasVmConfigInfo" instead of a "ClusterDasVmConfigSpec"
object. This lead to us not being able to spawn BigVMs with the
following error:

    Exception in ReconfigureComputeResource_Task.
    Cause: Type not found: 'operation'

Change-Id: If9acf9ee07e373b7b24c14c642d0d99fe2a41db1
Copy the relevant code from the libvirt driver.

This reintroduces the code that was changed (presumably by accident) in
  fd4f43b VmWare: Refactored resource and inventory collection
Fixup that commit with this one when forward porting!
`remove_unused_original_minimum_age_seconds` along with several other
config options related to image cache management were moved to their
own config group `image_cache` in Ussuri (nova 21), but the cherry-
pick in
    9de18bd "vmware: Update expired image-template-VM handling"
failed to account for this.
Was removed in
    f3cc311 "vmware: Remove vestigial nova-network support"
but not taken into account by
    aba0dbb "VMware: Image as VM template".
In
    daeeafd "baremetal quota handling"
the `_build_per_flavor_limits()` method is incorrectly
rewritten to match surrounding code, esp. `_build_absolute_limits()`,
and uses not only the limit, but limit and in_use as value for the
resulting limit dict.

The original commit,
    16857ed "reintroduce baremetal quota handling"
used `absolute_limits` directly as method parameter.
For historic reasons, there is a mapping between
the volume-uuid and the device[].backing.uuid in vmware
through extraConfig.
In most cases they are the same, but sometimes they
are not.
And apparently, the extraProperties can even hold
wrong information.

Since the risk of a collision of uuids is quite low,
and lookup is the iteration over a small list of volumes
we first try it that way, avoiding possibly incorrect
information in the extraConfig.
Failing that, we still do the additional api call
to get the mapping, and try that.

Work around type errors mypy raises in the changed file

Change-Id: Ifcdf96cfc6d00473299c1f2a7cb9d23d03294027
The ironic-python-agent uses qemu-img to convert
various image formats to the raw disk format.
QCow2 is already correctly marked accordingly,
but vmdk was missing

Change-Id: Ifd868e6951452b291184bd848d4244d37649f1ed
We need to seperate the arguments for tox
from the ones to the linter

Change-Id: I6d7ee1d95f0ca17fa6c04a3546a7f907cc6393f1
Nested `ClusterDasVmSettings` inside new `ClusterDasVmConfigInfo`
objects don't get created properly. This seems to have worked on Rocky
but fails on Xena.
Otherwise will try and decode as UTF-8-encoded text during read().
We have added custom filters and weighers for our common
use-case VMware, and excluded there already baremetal.

For libvirt as additional hypervisor, we also want
to skip those filters/weighers, so we test now
if the request is scheduled for vmware either
due to a flavor extra_spec or an image property
If nothing is set, we default to vmware for backwards
compatibility

Change-Id: I3223aee433ba9009d2cd6387eeda8e70dbbb3cde
The code assumes that the libvirt version reported
by the hypervisor matches the version of the python library installed.
Since we run the agent in a container, that is not correct,
and the attribute will not be in the library as it has been
build for an older version.

Change-Id: I156047b1f9829a49b429d51ca7f7777606b10a56
Change-Id: I3edf0d3c6deb8385adc1095c7e3c09526e2d4d65
This command detects changes in the flavors and syncs the
instances flavor-info with the new changes.

`nova-manage sap sync_instances_flavor`
The vcenter doesn't like a reconfiguration call with an empty spec, and
raises an error. So we skip that and save ourselves from doing some work
on top.

Change-Id: I1da3309500a2cd384c5d7cd431e71297ef5d5a3a
You can configure now
[vmware]
hypervisor_mode=<cluster,esxi_to_cluster,cluster_to_esxi>

to chose to expose the individual esxi hosts instead of the cluster.
It will only show the esxi-host, as hypervisor node, but will not
honour the node parameter for placeing the instance for instance-creation,
migration, etc...

Change-Id: I264fd242c0de01ae8442c03bc726a0abfbe176ef
The code assumes that there is a single compute-node per host,
which is incorrect for ironic. By summarising over all hosts,
we get a correct response also for compute-hosts with more than
one compute-node.

Change-Id: Iaf6e2a72f6649e234660de47bec8b1da1ea1571e
@fwiesel fwiesel force-pushed the hypervisor_per_esxi branch from 3d4a22d to 30fc702 Compare March 31, 2023 14:02
@grandchild
Copy link
Copy Markdown

that's quite few commits. is this meant for xena?

@fwiesel fwiesel closed this May 2, 2023
@fwiesel
Copy link
Copy Markdown
Member Author

fwiesel commented May 2, 2023

The PR was meant for rocky, but I've updated the branch for xena.
I've closed this in favour of #423

@grandchild
Copy link
Copy Markdown

fyi, it should be possible to change the target branch within the existing PR -- but all good, let's continue in #423.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants