Skip to content

machine: add SSHSession to reuse SSH connections#28736

Open
inknos wants to merge 1 commit into
containers:mainfrom
inknos:run-4470
Open

machine: add SSHSession to reuse SSH connections#28736
inknos wants to merge 1 commit into
containers:mainfrom
inknos:run-4470

Conversation

@inknos
Copy link
Copy Markdown
Collaborator

@inknos inknos commented May 19, 2026

Each ssh.Dial to localhost costs ~77ms. I counted 6 SSH calls during a machine start, ~464ms is spent on connection init. Some more time can be saved by incorporating bash commands in one.

  • Merge the two separate SSH calls (write tmpfiles config + run systemd-tmpfiles) into a single bash -c command joined with &&.

  • MountVolumesToVM now opens a single SSHSession for all volumes

  • Combine mkdir and mount steps into a single bash -c command per volume, further halving the number of SSH round-trips.

Time measurements are done by running
bin/podman machine start --log-level=debug
and checking the time in the logs (this and the table were done by Claude):

Init (unchanged, not optimized):
+---+--------+--------+---------+------------------------------+
| # | Dial | Run | Total | Purpose |
+---+--------+--------+---------+------------------------------+
| 1 | 95ms | 2156ms | 2251ms | SSH readiness probe |
| 2 | 85ms | 738ms | 823ms | inject proxy env |
+---+--------+--------+---------+------------------------------+

Volume mounts — before (4 connections, 4 dials):
+---+--------+--------+---------+------------------------------+
| # | Dial | Run | Total | Purpose |
+---+--------+--------+---------+------------------------------+
| 1 | 66ms | 341ms | 408ms | vol 1: mkdir |
| 2 | 70ms | 329ms | 400ms | vol 1: mount |
| 3 | 76ms | 423ms | 499ms | vol 2: chattr + mkdir |
| 4 | 72ms | 270ms | 342ms | vol 2: mount |
+---+--------+--------+---------+------------------------------+
| 284ms | 1363ms | 1649ms |
+--------+--------+---------+

Volume mounts — after (1 dial, shared session):
+---+--------+--------+---------+------------------------------+
| # | Dial | Run | Total | Purpose |
+---+--------+--------+---------+------------------------------+
| 1 | 64ms | -- | -- | open shared session |
| 2 | 0ms | 586ms | 586ms | vol 1: mkdir + mount |
| 3 | 0ms | 527ms | 527ms | vol 2: chattr + mkdir + mount|
+---+--------+--------+---------+------------------------------+
| 64ms | 1113ms | 1113ms |
+--------+--------+---------+
Saved: ~220ms dial, ~250ms total (3 fewer connections)

Fixes: https://redhat.atlassian.net/browse/RUN-4470

Checklist

Ensure you have completed the following checklist for your pull request to be reviewed:

  • Certify you wrote the patch or otherwise have the right to pass it on as an open-source patch by signing all
    commits. (git commit -s). (If needed, use git commit -s --amend). The author email must match
    the sign-off email address. See CONTRIBUTING.md
    for more information.
  • Referenced issues using Fixes: #00000 in commit message (if applicable)
  • Tests have been added/updated (or no tests are needed)
  • Documentation has been updated (or no documentation changes are needed)
  • All commits pass make validatepr (format/lint checks)
  • Release note entered in the section below (or None if no user-facing changes)

Does this PR introduce a user-facing change?

None

Each ssh.Dial to localhost costs ~77ms. I counted 6 SSH calls during a
machine start, ~464ms is spent on connection init. Some more time can be
saved by incorporating bash commands in one.

- Merge the two separate SSH calls (write tmpfiles config + run
systemd-tmpfiles) into a single bash -c command joined with &&.

- MountVolumesToVM now opens a single SSHSession for all volumes

- Combine mkdir and mount steps into a single bash -c command per volume,
further halving the number of SSH round-trips.

Time measurements are done by running
`bin/podman machine start --log-level=debug`
and checking the time in the logs (this and the table were done by Claude):

  Init (unchanged, not optimized):
  +---+--------+--------+---------+------------------------------+
  | # |  Dial  |  Run   |  Total  | Purpose                      |
  +---+--------+--------+---------+------------------------------+
  | 1 |  95ms  | 2156ms | 2251ms  | SSH readiness probe          |
  | 2 |  85ms  |  738ms |  823ms  | inject proxy env             |
  +---+--------+--------+---------+------------------------------+

  Volume mounts — before (4 connections, 4 dials):
  +---+--------+--------+---------+------------------------------+
  | # |  Dial  |  Run   |  Total  | Purpose                      |
  +---+--------+--------+---------+------------------------------+
  | 1 |  66ms  |  341ms |  408ms  | vol 1: mkdir                 |
  | 2 |  70ms  |  329ms |  400ms  | vol 1: mount                 |
  | 3 |  76ms  |  423ms |  499ms  | vol 2: chattr + mkdir        |
  | 4 |  72ms  |  270ms |  342ms  | vol 2: mount                 |
  +---+--------+--------+---------+------------------------------+
      |  284ms | 1363ms | 1649ms  |
      +--------+--------+---------+

  Volume mounts — after (1 dial, shared session):
  +---+--------+--------+---------+------------------------------+
  | # |  Dial  |  Run   |  Total  | Purpose                      |
  +---+--------+--------+---------+------------------------------+
  | 1 |  64ms  |    --  |    --   | open shared session          |
  | 2 |   0ms  |  586ms |  586ms  | vol 1: mkdir + mount         |
  | 3 |   0ms  |  527ms |  527ms  | vol 2: chattr + mkdir + mount|
  +---+--------+--------+---------+------------------------------+
      |  64ms  | 1113ms | 1113ms  |
      +--------+--------+---------+
  Saved: ~220ms dial, ~250ms total (3 fewer connections)

Fixes: https://redhat.atlassian.net/browse/RUN-4470

Signed-off-by: Nicola Sella <nsella@redhat.com>
@packit-as-a-service
Copy link
Copy Markdown

[NON-BLOCKING] Packit jobs failed. @containers/packit-build please check. Everyone else, feel free to ignore.

@inknos
Copy link
Copy Markdown
Collaborator Author

inknos commented May 19, 2026

@Honny1 ptal, I am not sure which tests should I write for this change?

@Honny1
Copy link
Copy Markdown
Member

Honny1 commented May 19, 2026

I don't think we need new tests for these changes. I am adding the No New Tests label.

@Honny1 Honny1 added the No New Tests Allow PR to proceed without adding regression tests label May 19, 2026
@Luap99
Copy link
Copy Markdown
Member

Luap99 commented May 19, 2026

Ideally we switch the qemu mounts over to systemd units just like we have on applehv.
For hyperV 9p mounts I think we will always need the manual ssh mounts though.

machine tests are so flaky I agree we cannot really have performance tests so no tests seem fine for this, existing tests should cover these code paths.

@baude
Copy link
Copy Markdown
Member

baude commented May 19, 2026

looks like the qemu failures are legit?

@Luap99 what would you like to do here? Do you want to merge this and go back or just require @inknos to do that change now?

@Luap99
Copy link
Copy Markdown
Member

Luap99 commented May 19, 2026

if we touch the code now we might as well do it right with systemd, I mean the code for that should already be in applehv so just need to share it here in qemu as well

That feels nicer than joining one huge command.

@inknos
Copy link
Copy Markdown
Collaborator Author

inknos commented May 20, 2026

got it, thanks for the feedback!

@baude
Copy link
Copy Markdown
Member

baude commented May 20, 2026

got it, thanks for the feedback!

thanks @inknos !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

machine No New Tests Allow PR to proceed without adding regression tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants