[v2] Add more granular metrics to benchmarking framework: plugin-load time, static imports time, client creation time.#10292
[v2] Add more granular metrics to benchmarking framework: plugin-load time, static imports time, client creation time.#10292aemous wants to merge 9 commits into
Conversation
hssyoo
left a comment
There was a problem hiding this comment.
Looks fine but the static import measurement seems a bit dubious. I'd like to understand how we plan on using this measurement, what we expect it to catch, how we would react.
| before_imports = start_time | ||
| # We import from awscli lazily to ensure import time is measured in | ||
| # total runtime. | ||
| from awscli.botocore.hooks import HierarchicalEmitter | ||
| from awscli.clidriver import AWSCLIEntryPoint, create_clidriver | ||
| after_imports = time.time() |
There was a problem hiding this comment.
What are we measuring here exactly?
There was a problem hiding this comment.
This is the time to import all modules used by the AWS CLI at runtime. It's similar to the granular benchmarking that you previously implemented where you measured "Python Library Imports."
It let's us measure the total amount of time spent during the AWS CLI runtime (i.e. after interpreter started) that is spent importing modules, before we start running any initialization code.
Looks fine but the static import measurement seems a bit dubious. I'd like to understand how we plan on using this measurement, what we expect it to catch, how we would react.
It is largely prompted by our previous observation that prompt_toolkit took up a significant fraction of runtime. It embeds the philosophy "if module imports previously took up too much runtime, we should monitor the metric over time so we don't lose track of it, to prevent future regressions."
During my dev builds, we're observing 119 ms of time taken during static imports. This starts off as a baseline, that we track changes in over time.
It also automatically implements the measure that we previously did ad-hoc. When we benchmarked it ad-hoc we found useful results (particularly hefty imports). So in my view, if it was useful once, then it should be useful to track in the long term.
"What we expect it to catch, how we would react"
For the most part, it should catch large regressions. If we notice this number change significantly, it would prompt us to investigate the regression, and consider refactoring code (e.g. lazy initialization). Overall, this should help us reduce the time between (A) unintentionally increasing command execution time due to a new hefty import and (B) allocating time to investigate and mitigate the regression.
Description of changes:
./scripts/performance/run-benchmarks) now emits plugin-load time, static imports time, and client creation time.before-create-ciientandafter-create-clientbefore and after client creation, respectively.create_clidriver, add a new optionalevent_hooksargument to allow callers to provide their own event emitter. This change was needed so the benchmark framework can register against theload_pluginsevent, which is emitted duringcreate_clidriver.--debug-dirparameter inrun-benchmarksscript so that it supports relative paths (previously it only supported absolute paths).Description of tests:
--debug-dirparameter with relative and absolute paths, and verified expected behavior in both cases.Example plugin-load time emission:
{ "name": "cloudwatch.getmetricdata.plugins.import.time", "description": "Total time spent loading all built-in plugins.", "unit": "Seconds", "dimensions": [], "measurements": [ 0.08212709426879883 ] }By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.