Skip to content

[SAP] modify select_datastore_by_name to use a cache#322

Merged
hemna merged 1 commit intostable/2023.1-m3from
cache-ds-lookup
Apr 1, 2026
Merged

[SAP] modify select_datastore_by_name to use a cache#322
hemna merged 1 commit intostable/2023.1-m3from
cache-ds-lookup

Conversation

@hemna
Copy link
Copy Markdown

@hemna hemna commented Mar 31, 2026

Optimize the select_datastore_by_name method to significantly reduce vCenter API load during high-volume operations like boot-from-volume.

Problem:
The original implementation called _get_datastores() which fetches ALL datastores with their 'host' and 'summary' properties. With ~40 datastores and ~50 hosts each, this transferred ~500KB-2MB of data per call. Under high load (e.g., 96 volume creates in 5 minutes), this caused vCenter connection pool exhaustion and cascading timeouts averaging 530 seconds per _select_ds_for_volume call.

Solution:

  1. Add _get_datastore_by_name() method that fetches properties for only the specific datastore needed, reducing data transfer to ~10-20KB.

  2. Add a 5-minute TTL cache for datastore name -> moref mappings in get_ds_ref_by_name(). Since volume creates are bursty and typically target the same datastores, this eliminates repeated vCenter queries for the lightweight name lookup on cache hits.

  3. The host availability check still queries vCenter on every call to ensure we always have fresh data about which hosts are connected and not in maintenance mode.

Performance impact:

  • First call: 1 lightweight query (names only) + 1 targeted query
  • Subsequent calls (within 5 min): 1 targeted query only (cache hit)
  • Original: 1 heavy query fetching all datastores with all host mounts

Change-Id: If9d46fe833418b67393535384af075a95f2ca4cb

Optimize the select_datastore_by_name method to significantly reduce
vCenter API load during high-volume operations like boot-from-volume.

Problem:
The original implementation called _get_datastores() which fetches ALL
datastores with their 'host' and 'summary' properties. With ~40 datastores
and ~50 hosts each, this transferred ~500KB-2MB of data per call. Under
high load (e.g., 96 volume creates in 5 minutes), this caused vCenter
connection pool exhaustion and cascading timeouts averaging 530 seconds
per _select_ds_for_volume call.

Solution:
1. Add _get_datastore_by_name() method that fetches properties for only
   the specific datastore needed, reducing data transfer to ~10-20KB.

2. Add a 5-minute TTL cache for datastore name -> moref mappings in
   get_ds_ref_by_name(). Since volume creates are bursty and typically
   target the same datastores, this eliminates repeated vCenter queries
   for the lightweight name lookup on cache hits.

3. The host availability check still queries vCenter on every call to
   ensure we always have fresh data about which hosts are connected
   and not in maintenance mode.

Performance impact:
- First call: 1 lightweight query (names only) + 1 targeted query
- Subsequent calls (within 5 min): 1 targeted query only (cache hit)
- Original: 1 heavy query fetching all datastores with all host mounts

Change-Id: If9d46fe833418b67393535384af075a95f2ca4cb
@hemna hemna requested a review from jagoleni April 1, 2026 14:01
@hemna hemna merged commit 1ef5bef into stable/2023.1-m3 Apr 1, 2026
2 checks passed
@hemna hemna deleted the cache-ds-lookup branch April 1, 2026 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants