[v2] Add more granular metrics to benchmarking framework: plugin-load time, static imports time, client creation time. by aemous · Pull Request #10292 · aws/aws-cli

aemous · 2026-05-08T13:47:33Z

Description of changes:

The benchmarking framework (callable from ./scripts/performance/run-benchmarks) now emits plugin-load time, static imports time, and client creation time.
Fixed bug with performance framework on IMDS-compatible instances where the CLI would attempt to reach the IMDS endpoints to retrieve credentials but the stubbed responses did not support this. Now, we supply mock credentials in the stubbed config files in each of our stubbed JSON benchmarks.
In vendored botocore, emit before-create-ciient and after-create-client before and after client creation, respectively.
In create_clidriver, add a new optional event_hooks argument to allow callers to provide their own event emitter. This change was needed so the benchmark framework can register against the load_plugins event, which is emitted during create_clidriver.
Modified --debug-dir parameter in run-benchmarks script so that it supports relative paths (previously it only supported absolute paths).

Description of tests:

Manually ran the benchmarking framework locally and on an IMDS-compatible EC2 instance, and verified the numbers are successfully being emitted, and are in an expected range aligned with previous benchmarking efforts.
Manually tested the --debug-dir parameter with relative and absolute paths, and verified expected behavior in both cases.
Successfully ran an internal pre-prod build workflow.

Example plugin-load time emission:

{
      "name": "cloudwatch.getmetricdata.plugins.import.time",
      "description": "Total time spent loading all built-in plugins.",
      "unit": "Seconds",
      "dimensions": [],
      "measurements": [
        0.08212709426879883
      ]
}

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

hssyoo

Looks fine but the static import measurement seems a bit dubious. I'd like to understand how we plan on using this measurement, what we expect it to catch, how we would react.

hssyoo · 2026-05-20T19:09:07Z

+        before_imports = start_time
+        # We import from awscli lazily to ensure import time is measured in
+        # total runtime.
+        from awscli.botocore.hooks import HierarchicalEmitter
+        from awscli.clidriver import AWSCLIEntryPoint, create_clidriver
+        after_imports = time.time()


What are we measuring here exactly?

This is the time to import all modules used by the AWS CLI at runtime. It's similar to the granular benchmarking that you previously implemented where you measured "Python Library Imports."

It let's us measure the total amount of time spent during the AWS CLI runtime (i.e. after interpreter started) that is spent importing modules, before we start running any initialization code.

Looks fine but the static import measurement seems a bit dubious. I'd like to understand how we plan on using this measurement, what we expect it to catch, how we would react.

It is largely prompted by our previous observation that prompt_toolkit took up a significant fraction of runtime. It embeds the philosophy "if module imports previously took up too much runtime, we should monitor the metric over time so we don't lose track of it, to prevent future regressions."

During my dev builds, we're observing 119 ms of time taken during static imports. This starts off as a baseline, that we track changes in over time.

It also automatically implements the measure that we previously did ad-hoc. When we benchmarked it ad-hoc we found useful results (particularly hefty imports). So in my view, if it was useful once, then it should be useful to track in the long term.

"What we expect it to catch, how we would react"

For the most part, it should catch large regressions. If we notice this number change significantly, it would prompt us to investigate the regression, and consider refactoring code (e.g. lazy initialization). Overall, this should help us reduce the time between (A) unintentionally increasing command execution time due to a new hefty import and (B) allocating time to investigate and mitigate the regression.

aemous added 9 commits May 5, 2026 15:56

Update scripts for benchmarking to comput plugin-imports

d30c8bf

Emit the load-plugins event.

6a7b2e2

some fixes

df1153d

Add mock credentials to all benchmark stubs.

e9d63c4

Fix stubs template.

513f611

Update debug dir parameter so it supports relative paths.

f02244c

Finalize changes.

d385117

Add client creation time.

c6684d5

Remove debug print statement.

647b19c

aemous requested a review from a team May 8, 2026 13:47

aemous added v2 benchmarking labels May 8, 2026

hssyoo assigned hssyoo and unassigned hssyoo May 20, 2026

hssyoo self-requested a review May 20, 2026 18:27

hssyoo reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v2] Add more granular metrics to benchmarking framework: plugin-load time, static imports time, client creation time.#10292

[v2] Add more granular metrics to benchmarking framework: plugin-load time, static imports time, client creation time.#10292
aemous wants to merge 9 commits into
aws:v2from
aemous:plugin-load-time-perf

aemous commented May 8, 2026

Uh oh!

hssyoo left a comment

Uh oh!

hssyoo May 20, 2026

Uh oh!

aemous May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

aemous commented May 8, 2026

Uh oh!

hssyoo left a comment

Choose a reason for hiding this comment

Uh oh!

hssyoo May 20, 2026

Choose a reason for hiding this comment

Uh oh!

aemous May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants