Skip to content

[VMware DRS] Adding new host to DRS cluster does not participate in load balancing.#1257

Merged
nvazquez merged 2 commits intoapache:mainfrom
sureshanaparti:CLOUDSTACK-9175
Aug 27, 2021
Merged

[VMware DRS] Adding new host to DRS cluster does not participate in load balancing.#1257
nvazquez merged 2 commits intoapache:mainfrom
sureshanaparti:CLOUDSTACK-9175

Conversation

@sureshanaparti
Copy link
Contributor

@sureshanaparti sureshanaparti commented Dec 17, 2015

Description

CLOUDSTACK-9175: [VMware DRS] Adding new host to DRS cluster does not participate in load balancing.

Summary: When a new host is added to a cluster, Cloudstack doesn't create all the port groups (created by cloudstack earlier in other hosts) present in the cluster. Since the new host doesn't have all the necessary networking port groups of cloudstack, it is not eligible to participate in DRS load balancing or HA.

Solution: When adding a host to the cluster in Cloudstack, use VMware API to find the list of unique port groups on a previously added host (older host in the cluster) if exists and then create them on the new host.

Fixes: #3156

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • [] Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

ACS_AddPortGroupsToNewHostAdded_VMware

How Has This Been Tested?

Manually tested adding a new host in the VMware cluster, and verified the port groups added in the new host.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sureshanaparti could you consider making a Jira? As a suggestion, could you remove those underscore "_" before variables? That convention is not recommended in Java.
Furthermore, could make a JavaDoc? That could save some time for those that are going to read your code in the future and turns things easier and organized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rodrigo93 This fix was part of the jira CLOUDSTACK-9175. Maintained the same conventions as in this class. The underscore "_" is prepended to all the member variables of the class.

@koushik-das
Copy link
Contributor

@sureshanaparti There are some open items, please address them

@yadvr
Copy link
Member

yadvr commented May 2, 2016

@sureshanaparti please rebase against latest master, thanks

tag:vmware-pickup

@resmo
Copy link
Member

resmo commented May 24, 2016

would like to see this fixed, @sureshanaparti can I help you out?

@bvbharatk
Copy link
Contributor

ACS CI BVT Run

Sumarry:
Build Number 70
Hypervisor xenserver
NetworkType Advanced
Passed=73
Failed=0
Skipped=3

Link to logs Folder (search by build_no): https://www.dropbox.com/sh/yj3wnzbceo9uef2/AAB6u-Iap-xztdm6jHX9SjPja?dl=0

Failed tests:

Skipped tests:
test_vm_nic_adapter_vmxnet3
test_static_role_account_acls
test_deploy_vgpu_enabled_vm

Passed test suits:
test_deploy_vm_with_userdata.py
test_affinity_groups_projects.py
test_portable_publicip.py
test_vpc_vpn.py
test_over_provisioning.py
test_global_settings.py
test_scale_vm.py
test_service_offerings.py
test_routers_iptables_default_policy.py
test_routers.py
test_reset_vm_on_reboot.py
test_snapshots.py
test_deploy_vms_with_varied_deploymentplanners.py
test_login.py
test_list_ids_parameter.py
test_public_ip_range.py
test_multipleips_per_nic.py
test_regions.py
test_affinity_groups.py
test_network_acl.py
test_pvlan.py
test_volumes.py
test_nic.py
test_deploy_vm_root_resize.py
test_resource_detail.py
test_secondary_storage.py
test_vm_life_cycle.py
test_disk_offerings.py

@sureshanaparti sureshanaparti force-pushed the CLOUDSTACK-9175 branch 2 times, most recently from c533665 to 01c95f9 Compare December 1, 2016 20:50
@sureshanaparti
Copy link
Contributor Author

Addressed all the changes suggested and rebased against latest master.

  • Used CollectionUtils.isEmpty() as suggested.

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test centos6 vmware-55u3

@yadvr
Copy link
Member

yadvr commented Dec 2, 2016

@sureshanaparti sorry, this is a restricted command to avoid resource abuse issues.
@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-306

@yadvr
Copy link
Member

yadvr commented Dec 2, 2016

@sureshanaparti do you think this would be useful for 4.9? If so, can you change PR's base branch to 4.9 and rebase your branch against 4.9?

Copy link
Member

@sateesh-chodapuneedi sateesh-chodapuneedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Idea to get port groups from oldest host per DB sounds good.

Copy link
Member

@sateesh-chodapuneedi sateesh-chodapuneedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @sureshanaparti this could be good candidate for 4.9 as well.

@sureshanaparti sureshanaparti changed the base branch from master to 4.9 December 2, 2016 12:29
@sureshanaparti
Copy link
Contributor Author

@rhtyd Changed base branch to 4.9 and rebased against 4.9. This would be useful for 4.9.

@yadvr
Copy link
Member

yadvr commented Dec 8, 2016

@sureshanaparti thanks, I'll kick some tests.
@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@yadvr
Copy link
Member

yadvr commented Dec 8, 2016

@blueorangutan package

@blueorangutan
Copy link

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✖debian. JID-370

@blueorangutan
Copy link

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-371

@yadvr
Copy link
Member

yadvr commented Dec 9, 2016

@blueorangutan test centos7 vmware-55u3

@blueorangutan
Copy link

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-55u3) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-637)
Environment: vmware-55u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 35525 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr1257-t637-vmware-55u3.zip
Test completed. 42 look ok, 6 have error(s)

Test Result Time (s) Test File
test_04_rvpc_privategw_static_routes Failure 186.66 test_privategw_acl.py
test_04_rvpc_internallb_haproxy_stats_on_all_interfaces Failure 131.04 test_internal_lb.py
test_01_vpc_site2site_vpn Error 480.96 test_vpc_vpn.py
test_01_redundant_vpc_site2site_vpn Error 777.23 test_vpc_vpn.py
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Error 236.67 test_vpc_redundant.py
test_reboot_router Error 573.34 test_network.py
ContextSuite context=TestListIdsParams>:setup Error 0.00 test_list_ids_parameter.py
test_01_vpc_remote_access_vpn Success 156.38 test_vpc_vpn.py
test_02_VPC_default_routes Success 380.10 test_vpc_router_nics.py
test_01_VPC_nics_after_destroy Success 719.85 test_vpc_router_nics.py
test_05_rvpc_multi_tiers Success 696.58 test_vpc_redundant.py
test_04_rvpc_network_garbage_collector_nics Success 1613.09 test_vpc_redundant.py
test_02_redundant_VPC_default_routes Success 673.94 test_vpc_redundant.py
test_01_create_redundant_VPC_2tiers_4VMs_4IPs_4PF_ACL Success 1456.76 test_vpc_redundant.py
test_09_delete_detached_volume Success 30.73 test_volumes.py
test_06_download_detached_volume Success 60.60 test_volumes.py
test_05_detach_volume Success 100.22 test_volumes.py
test_04_delete_attached_volume Success 15.16 test_volumes.py
test_03_download_attached_volume Success 25.81 test_volumes.py
test_02_attach_volume Success 58.74 test_volumes.py
test_01_create_volume Success 516.54 test_volumes.py
test_03_delete_vm_snapshots Success 280.21 test_vm_snapshots.py
test_02_revert_vm_snapshots Success 232.37 test_vm_snapshots.py
test_01_test_vm_volume_snapshot Success 271.82 test_vm_snapshots.py
test_01_create_vm_snapshots Success 161.64 test_vm_snapshots.py
test_deploy_vm_multiple Success 252.05 test_vm_life_cycle.py
test_deploy_vm Success 0.02 test_vm_life_cycle.py
test_advZoneVirtualRouter Success 0.02 test_vm_life_cycle.py
test_10_attachAndDetach_iso Success 26.66 test_vm_life_cycle.py
test_09_expunge_vm Success 125.13 test_vm_life_cycle.py
test_08_migrate_vm Success 126.15 test_vm_life_cycle.py
test_07_restore_vm Success 0.10 test_vm_life_cycle.py
test_06_destroy_vm Success 10.12 test_vm_life_cycle.py
test_03_reboot_vm Success 5.10 test_vm_life_cycle.py
test_02_start_vm Success 20.18 test_vm_life_cycle.py
test_01_stop_vm Success 5.09 test_vm_life_cycle.py
test_CreateTemplateWithDuplicateName Success 316.90 test_templates.py
test_08_list_system_templates Success 0.04 test_templates.py
test_07_list_public_templates Success 0.04 test_templates.py
test_05_template_permissions Success 0.06 test_templates.py
test_04_extract_template Success 15.23 test_templates.py
test_03_delete_template Success 5.25 test_templates.py
test_02_edit_template Success 90.14 test_templates.py
test_01_create_template Success 176.07 test_templates.py
test_10_destroy_cpvm Success 236.56 test_ssvm.py
test_09_destroy_ssvm Success 268.45 test_ssvm.py
test_08_reboot_cpvm Success 186.36 test_ssvm.py
test_07_reboot_ssvm Success 158.86 test_ssvm.py
test_06_stop_cpvm Success 176.63 test_ssvm.py
test_05_stop_ssvm Success 203.78 test_ssvm.py
test_04_cpvm_internals Success 1.05 test_ssvm.py
test_03_ssvm_internals Success 3.58 test_ssvm.py
test_02_list_cpvm_vm Success 0.09 test_ssvm.py
test_01_list_sec_storage_vm Success 0.09 test_ssvm.py
test_01_snapshot_root_disk Success 81.33 test_snapshots.py
test_04_change_offering_small Success 96.93 test_service_offerings.py
test_03_delete_service_offering Success 0.03 test_service_offerings.py
test_02_edit_service_offering Success 0.06 test_service_offerings.py
test_01_create_service_offering Success 0.08 test_service_offerings.py
test_02_sys_template_ready Success 0.10 test_secondary_storage.py
test_01_sys_vm_start Success 0.17 test_secondary_storage.py
test_09_reboot_router Success 166.16 test_routers.py
test_08_start_router Success 156.13 test_routers.py
test_07_stop_router Success 25.18 test_routers.py
test_06_router_advanced Success 0.04 test_routers.py
test_05_router_basic Success 0.03 test_routers.py
test_04_restart_network_wo_cleanup Success 5.53 test_routers.py
test_03_restart_network_cleanup Success 140.79 test_routers.py
test_02_router_internal_adv Success 0.87 test_routers.py
test_01_router_internal_basic Success 0.46 test_routers.py
test_router_dns_guestipquery Success 77.15 test_router_dns.py
test_router_dns_externalipquery Success 0.07 test_router_dns.py
test_router_dhcphosts Success 125.57 test_router_dhcphosts.py
test_router_dhcp_opts Success 21.32 test_router_dhcphosts.py
test_01_updatevolumedetail Success 0.06 test_resource_detail.py
test_01_reset_vm_on_reboot Success 30.27 test_reset_vm_on_reboot.py
test_createRegion Success 0.04 test_regions.py
test_create_pvlan_network Success 5.31 test_pvlan.py
test_dedicatePublicIpRange Success 0.48 test_public_ip_range.py
test_03_vpc_privategw_restart_vpc_cleanup Success 1237.77 test_privategw_acl.py
test_02_vpc_privategw_static_routes Success 816.91 test_privategw_acl.py
test_01_vpc_privategw_acl Success 207.81 test_privategw_acl.py
test_01_primary_storage_nfs Success 38.74 test_primary_storage.py
test_createPortablePublicIPRange Success 15.14 test_portable_publicip.py
test_createPortablePublicIPAcquire Success 15.35 test_portable_publicip.py
test_isolate_network_password_server Success 91.96 test_password_server.py
test_UpdateStorageOverProvisioningFactor Success 0.13 test_over_provisioning.py
test_oobm_zchange_password Success 30.51 test_outofbandmanagement.py
test_oobm_multiple_mgmt_server_ownership Success 16.26 test_outofbandmanagement.py
test_oobm_issue_power_status Success 5.21 test_outofbandmanagement.py
test_oobm_issue_power_soft Success 15.33 test_outofbandmanagement.py
test_oobm_issue_power_reset Success 15.24 test_outofbandmanagement.py
test_oobm_issue_power_on Success 15.24 test_outofbandmanagement.py
test_oobm_issue_power_off Success 15.47 test_outofbandmanagement.py
test_oobm_issue_power_cycle Success 15.27 test_outofbandmanagement.py
test_oobm_enabledisable_across_clusterzones Success 92.20 test_outofbandmanagement.py
test_oobm_enable_feature_valid Success 5.12 test_outofbandmanagement.py
test_oobm_enable_feature_invalid Success 0.08 test_outofbandmanagement.py
test_oobm_disable_feature_valid Success 0.11 test_outofbandmanagement.py
test_oobm_disable_feature_invalid Success 0.07 test_outofbandmanagement.py
test_oobm_configure_invalid_driver Success 0.07 test_outofbandmanagement.py
test_oobm_configure_default_driver Success 0.06 test_outofbandmanagement.py
test_oobm_background_powerstate_sync Success 29.36 test_outofbandmanagement.py
test_extendPhysicalNetworkVlan Success 15.30 test_non_contigiousvlan.py
test_01_nic Success 645.33 test_nic.py
test_releaseIP Success 453.08 test_network.py
test_public_ip_user_account Success 10.21 test_network.py
test_public_ip_admin_account Success 40.20 test_network.py
test_network_rules_acquired_public_ip_3_Load_Balancer_Rule Success 76.53 test_network.py
test_network_rules_acquired_public_ip_2_nat_rule Success 61.40 test_network.py
test_network_rules_acquired_public_ip_1_static_nat_rule Success 124.91 test_network.py
test_delete_account Success 307.23 test_network.py
test_02_port_fwd_on_non_src_nat Success 55.53 test_network.py
test_01_port_fwd_on_src_nat Success 111.63 test_network.py
test_nic_secondaryip_add_remove Success 232.11 test_multipleips_per_nic.py
login_test_saml_user Success 17.91 test_login.py
test_assign_and_removal_lb Success 148.77 test_loadbalance.py
test_02_create_lb_rule_non_nat Success 207.18 test_loadbalance.py
test_01_create_lb_rule_src_nat Success 207.61 test_loadbalance.py
test_07_list_default_iso Success 0.06 test_iso.py
test_05_iso_permissions Success 0.05 test_iso.py
test_04_extract_Iso Success 5.12 test_iso.py
test_03_delete_iso Success 95.18 test_iso.py
test_02_edit_iso Success 0.07 test_iso.py
test_01_create_iso Success 20.73 test_iso.py
test_03_vpc_internallb_haproxy_stats_on_all_interfaces Success 474.55 test_internal_lb.py
test_02_internallb_roundrobin_1RVPC_3VM_HTTP_port80 Success 1131.90 test_internal_lb.py
test_01_internallb_roundrobin_1VPC_3VM_HTTP_port80 Success 917.44 test_internal_lb.py
test_dedicateGuestVlanRange Success 10.20 test_guest_vlan_range.py
test_UpdateConfigParamWithScope Success 0.11 test_global_settings.py
test_rolepermission_lifecycle_update Success 5.79 test_dynamicroles.py
test_rolepermission_lifecycle_list Success 5.82 test_dynamicroles.py
test_rolepermission_lifecycle_delete Success 5.58 test_dynamicroles.py
test_rolepermission_lifecycle_create Success 5.59 test_dynamicroles.py
test_rolepermission_lifecycle_concurrent_updates Success 5.69 test_dynamicroles.py
test_role_lifecycle_update_role_inuse Success 5.61 test_dynamicroles.py
test_role_lifecycle_update Success 5.65 test_dynamicroles.py
test_role_lifecycle_list Success 5.61 test_dynamicroles.py
test_role_lifecycle_delete Success 5.64 test_dynamicroles.py
test_role_lifecycle_create Success 5.61 test_dynamicroles.py
test_role_inuse_deletion Success 5.58 test_dynamicroles.py
test_role_account_acls_multiple_mgmt_servers Success 7.14 test_dynamicroles.py
test_role_account_acls Success 7.69 test_dynamicroles.py
test_default_role_deletion Success 5.82 test_dynamicroles.py
test_04_create_fat_type_disk_offering Success 0.05 test_disk_offerings.py
test_03_delete_disk_offering Success 0.03 test_disk_offerings.py
test_02_edit_disk_offering Success 0.04 test_disk_offerings.py
test_02_create_sparse_type_disk_offering Success 0.05 test_disk_offerings.py
test_01_create_disk_offering Success 0.08 test_disk_offerings.py
test_deployvm_userdispersing Success 60.60 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_userconcentrated Success 86.11 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_firstfit Success 171.03 test_deploy_vms_with_varied_deploymentplanners.py
test_deployvm_userdata_post Success 60.50 test_deploy_vm_with_userdata.py
test_deployvm_userdata Success 145.93 test_deploy_vm_with_userdata.py
test_02_deploy_vm_root_resize Success 5.56 test_deploy_vm_root_resize.py
test_01_deploy_vm_root_resize Success 5.56 test_deploy_vm_root_resize.py
test_00_deploy_vm_root_resize Success 5.65 test_deploy_vm_root_resize.py
test_deploy_vm_from_iso Success 166.57 test_deploy_vm_iso.py
test_DeployVmAntiAffinityGroup Success 186.28 test_affinity_groups.py
test_08_resize_volume Skipped 10.11 test_volumes.py
test_07_resize_fail Skipped 10.21 test_volumes.py
test_06_copy_template Skipped 0.00 test_templates.py
test_static_role_account_acls Skipped 0.02 test_staticroles.py
test_01_scale_vm Skipped 64.34 test_scale_vm.py
test_01_primary_storage_iscsi Skipped 0.03 test_primary_storage.py
test_06_copy_iso Skipped 0.00 test_iso.py
test_deploy_vgpu_enabled_vm Skipped 0.00 test_deploy_vgpu_enabled_vm.py

@sureshanaparti sureshanaparti marked this pull request as ready for review August 18, 2021 13:18
@sureshanaparti sureshanaparti changed the title CLOUDSTACK-9175: [VMware DRS] Adding new host to DRS cluster does not participate in load balancing. [VMware DRS] Adding new host to DRS cluster does not participate in load balancing. Aug 18, 2021
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@sureshanaparti
Copy link
Contributor Author

ping @sureshanaparti ?

rebased, and tested @rhtyd

@blueorangutan
Copy link

@sureshanaparti a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian. SL-JID 911

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link

@sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1699)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 42527 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr1257-t1699-vmware-67u3.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Smoke tests completed. 89 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

… participate in load balancing.

Summary: When a new host is added to a cluster, Cloudstack doesn't create all the port groups (created by cloudstack earlier in other hosts) present in the cluster. Since the new host doesn't have all the necessary networking port groups of cloudstack, it is not eligible to participate in DRS load balancing or HA.

Solution: When adding a host to the cluster in Cloudstack, use VMware API to find the list of unique port groups on a previously added host (older host in the cluster) if exists and then create them on the new host.
@sureshanaparti
Copy link
Contributor Author

@blueorangutan package

@blueorangutan
Copy link

@sureshanaparti a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan
Copy link

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 998

@sureshanaparti
Copy link
Contributor Author

@blueorangutan test centos7 vmware-67u3

@blueorangutan
Copy link

@sureshanaparti a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan
Copy link

Trillian test result (tid-1764)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 36895 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr1257-t1764-vmware-67u3.zip
Smoke tests completed. 89 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File

@nvazquez nvazquez merged commit 7f4f3f7 into apache:main Aug 27, 2021
@yadvr
Copy link
Member

yadvr commented Aug 27, 2021

👏 6 year old PR gets merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VMware standard vSwitch port group configuration when ESXi host is added to CloudStack