Fix false positives in Usage sanity checks of templates and network offerings #8136
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #8136 +/- ##
============================================
- Coverage 30.37% 29.90% -0.48%
+ Complexity 32633 32023 -610
============================================
Files 5352 5352
Lines 374419 374423 +4
Branches 54609 54609
============================================
- Hits 113719 111957 -1762
- Misses 245523 247415 +1892
+ Partials 15177 15051 -126
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
DaanHoogland
left a comment
There was a problem hiding this comment.
good fixes @winterhazel
clgtm
|
@winterhazel , you can add yourself to the contributers list in .asf.yaml |
yadvr
left a comment
There was a problem hiding this comment.
LGTM but needs additional review/testing
|
@blueorangutan package |
|
@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
@blueorangutan test alma8 kvm-alma8 |
|
@DaanHoogland a [SL] Trillian-Jenkins test job (alma8 mgmt + kvm-alma8) has been kicked to run smoke tests |
|
@DaanHoogland @harikrishna-patnala @shwstppr could you guys please validate the changes in this PR for merging? Thank you. |
shwstppr
left a comment
There was a problem hiding this comment.
Code LGTM. Needs testing
|
@blueorangutan package |
|
@shwstppr a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el7 ✖️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 8329 |
|
@blueorangutan package |
|
@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8338 |
|
@blueorangutan test |
|
@shwstppr a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
Co-authored-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
|
@blueorangutan package |
|
@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✖️ el7 ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 8594 |
|
@blueorangutan package |
|
@winterhazel a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
@blueorangutan test |
|
@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-9149)
|
|
@blueorangutan test alma9 kvm-alma9 keepEnv |
|
@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + kvm-alma9) has been kicked to run smoke tests |
|
[SF] Trillian test result (tid-9182)
|
|
@blueorangutan package |
|
@GutoVeronezi a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8655 |
|
I guess it kind of worked because after running unfortunately I had to look up the issue in the code: Out of scope for this PR but I think we can improve there, @winterhazel , cc @vishesh92 . |
JoaoJandre
left a comment
There was a problem hiding this comment.
CLGTM, didn't test it
…fferings (apache#8136) Co-authored-by: João Jandre <48719461+JoaoJandre@users.noreply.github.com>
Description
This PR addresses two false positives in the Usage sanity checker:
(1)
A sanity check verifies if there are usage records of a template/ISO created after the template/ISO was removed.
In an environment with two zones, if a template gets removed from a zone and added back to it,
cloud.template_zone_refwill have two entries related to the template; one marked asremoved. In this situation, the operator will receive a false positive stating that there are usage records for the template/ISO after it was removed, even though the template/ISO is available in that zone.This check was changed to verify if the template/ISO is currently available in that zone.
(2)
A sanity check verifies if there are usage records with a raw usage greater than the aggregation range.
If a VM has more than one NIC using the same network offering, CloudStack will group the usage of both NICs into a single network offering usage record; for example, two NICs used for 1 hour will result in an entry with a raw usage of 2 hours. In this situation, the operator will receive a false positive stating that there are usage records with a raw usage greater than the aggregation range.
This check was changed to consider the average usage of the NICs, instead of the raw usage.
Types of changes
Feature/Enhancement Scale or Bug Severity
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
I changed the initial delay in
UsageManagerImpl.java#L315to 0 in order to run the sanity checks immediately when Usage starts. Then, whenever I wanted to run the sanity checks, I would set the last checked id in/usr/local/libexec/sanity-check-last-idto 1 and restart Usage.To simulate the scenario in (1), I created the zone
zone2, copied a template that was being used inzone1tozone2and deleted the template fromzone1. Then, I copied it fromzone2tozone1.Before applying the changes:
After applying the changes
zone1, ran the sanity checks again and verified that CloudStack reported template/ISO usage records created after it was removed.To simulate the scenario in (2), I set
usage.stats.job.aggregation.rangeto 60 (hourly), created a VM with two NICs and waited until there were network offering usage records corresponding to the NICs. I verified thatraw_usagewas indeed 2.Before applying the changes:
raw_usagegreater than 1.After applying the changes:
raw_usagegreater than 1;raw_usageof a usage record from 2 to 3, ran the sanity checks again and verified that CloudStack reported network offering usage records withraw_usagegreater than 1.