[scheduler-extender] v2 refactor: reservation API, cache, RBAC, space accounting, agent fixes by AleksZimin · Pull Request #193 · deckhouse/sds-node-configurator

AleksZimin · 2026-02-09T20:23:42Z

Description

This PR refines the SDS common scheduler extender and related agent/module pieces on top of add-common-scheduler-extender-v2. Main changes:

Scheduler extender (images/sds-common-scheduler-extender/)

Evolved HTTP API: from /api/v1/volumes/* to /v1/lvg/filter-and-score and /v1/lvg/narrow-reservation with matching request/response types (FilterAndScore*, NarrowReservation*, LVMVolumeGroupInput, etc.).
Cache refactor: replaced nested LVG-in-cache structure with a pure reservation store (reservation ID → pools + TTL); LVG data comes from the controller-runtime client/informer. Unified StoragePoolKey, TTL cleanup, lazy TTL in reads (removed redundant pools map).
Scheduling logic: filter creates reservations across candidate pools; prioritize narrows reservations; PVC/LLV watchers call NarrowReservation / RemoveReservation on bind and lifecycle events; non-Ready LVGs excluded via isLVGSchedulable.
Capacity / races: available space uses an LLV-based formula with per-pool unaccounted-space calibration and LVG watcher recalibration; fixes thin-pool totalCapacity handling; reduces wrong free-space assumptions vs VGFree-only.
Ops / safety: request-scoped context with timeout in filter/prioritize handlers (avoids hanging on slow/blocked API); RBAC for ReplicatedStorageClass / ReplicatedStoragePool, moduleconfigs.deckhouse.io; newControlPlane module setting (replaces useLinstor) to choose extender vs LINSTOR for replicated PVCs; /stat extended with filter-and-score counters.
Tests: unit tests for cache, filter-and-score, narrow-reservation, helpers; linter cleanups (unparam, etc.).

Module / Kubernetes

Helm manifests for extender: Deployment, Service, webhook config, RBAC, Secret, ConfigMap; hook common-scheduler-extender-certs for TLS material.
OpenAPI values (openapi/values*.yaml) for new settings.
API: ReplicatedStorageClass / ReplicatedStoragePool types aligned with CRDs; registration updates.

Agent

Several thin pools per LVG support where applicable.
After lvcreate / lvextend / snapshot operations, use GetLV instead of waiting on scanner/cache to avoid extra requeues and busy-wait.
BlockDeviceFilter: sanitize label selectors so In/NotIn with nil/empty values no longer crash reconciliation.

Misc

.gitignore updates; optional debug helper under hack/ for watching resources vs pod logs.

Impact on the cluster: deploys/updates the scheduler extender workload and kube-scheduler webhook configuration; may cause scheduler extender pod restarts and webhook reloads during rollout. Does not by itself restart control-plane components beyond what a normal module upgrade does.

Why do we need it, and what problem does it solve?

The v2 extender stack needs a maintainable reservation model, correct free-space and thin-pool accounting under concurrent scheduling, and safe handler behavior (timeouts, RBAC, non-Ready LVGs). Without these, users can see hung filter/prioritize calls, wrong capacity decisions, stale reservations, or agent reconciliation crashes on malformed BlockDeviceFilter selectors.

This PR delivers those fixes and refactors on the existing v2 branch so replicated/local PVC scheduling stays correct and observable.

What is the expected result?

After enabling/upgrading the module: sds-common-scheduler-extender pods become Ready; MutatingWebhookConfiguration for the scheduler points to the extender service; no endless hangs in extender logs on List/Watch (RBAC fixed).
Scheduling local/replicated PVCs: extender returns filter/prioritize results within the request timeout; reservations align with chosen nodes; NotReady LVGs are not scored as usable pools.
Agent: fewer unnecessary requeues after LVM resize/create; BlockDeviceFilter with empty In/NotIn values no longer breaks the BD reconciliation loop.
ModuleConfig newControlPlane: when set as intended, the extender participates in replicated PVC scheduling; otherwise behavior falls back to LINSTOR-managed path per design.

Checklist

The code is covered by unit tests.
e2e tests passed.
Documentation updated according to the changes.
Changes were tested in the Kubernetes cluster manually.

Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

- Add POST /api/v1/volumes/bind endpoint - Add BindVolumeRequest/BindVolumeResponse API types - Add LVGRef struct and RemoveVolumeReservationsExcept method to cache - Implement bindVolume handler: decode request, validate, clear unselected LVG/thinpool reservations for the volume - Add cache tests (thick, thin, multiple keep, empty keep, idempotency) - Add handler tests (valid bind, validation errors, method not allowed) Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

…Name - Remove Type field from VolumeInput (filter-prioritize and bind requests) - Infer type: any LVG with thinPoolName → thin; all empty → thick - Add consistency validation: reject mixed thinPoolName in LVGs - RemoveVolumeReservationsExcept: drop volumeType param; infer from keep; when keep empty, remove from both thick and thin - Add TestCache_RemoveVolumeReservationsExcept_EmptyKeep_RemovesFromBoth - Remove invalid volume type test from bind_volume_test Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

Simplify the scheduler-extender cache from a deeply nested structure (lvgEntry -> thickByPVC/thinByPool -> pvcEntry/volumeEntry) to a clean two-map reservation store (pools + reservations). LVG resources are no longer stored in the custom cache; they are read from the controller-runtime informer cache via client.Client. Cache changes: - Unified StoragePoolKey (LVGName + optional ThinPoolName) replaces separate thick/thin handling everywhere - pools map: pre-calculated reservedSize per pool for O(1) lookups - reservations map: reservationID -> size + set of pools + TTL - Methods: AddReservation, RemoveReservation, NarrowReservation, GetReservedSpace, HasReservation, GetAllPools, GetAllReservations - Background goroutine for TTL-based cleanup of expired reservations API changes: - New routes: /v1/lvg/filter-and-score, /v1/lvg/narrow-reservation (replace /api/v1/volumes/filter-prioritize and /api/v1/volumes/bind) - New request/response types: FilterAndScoreRequest/Response, NarrowReservationRequest/Response, LVMVolumeGroupInput, ScoredLVMVolumeGroup Scheduler logic changes: - filter: collects StoragePoolKeys across all filtered nodes per PVC, creates one reservation with N pool keys via AddReservation - prioritize: after scoring, calls NarrowReservation to release reservations on nodes filtered out by kube or other extenders - Helper functions (getAvailableSpace, checkPoolHasSpace, calculatePoolScore) combine client.Client for LVG capacity with cache.GetReservedSpace for reservations Controller changes: - PVC watcher: on selectedNode -> NarrowReservation to node's pools; on bound/delete -> RemoveReservation - LVG watcher: deleted entirely; informer is started by field indexer registration in main.go, stale data expires via TTL - LLV watcher (new): watches LVMLogicalVolume; on Phase=Created or Delete -> RemoveReservation to prevent double-counting Infrastructure: - Field indexer on LVMVolumeGroup.Status.Nodes.Name for efficient node-to-LVG lookups via client.MatchingFields - Simplified cache constructor: NewCache(logger, cleanupInterval) - Removed PVCExpiredDurationSec config, CacheSize config Files deleted: filter_prioritize.go, bind_volume.go, bind_volume_test.go, lvg_watcher_cache.go, lvg_watcher_cache_test.go Files created: filter_and_score.go, narrow_reservation.go, llv_watcher_cache.go Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

…hedulable check Add isLVGSchedulable(*LVMVolumeGroup) that checks Status.Phase == Ready. LVGs in NotReady, Terminating, etc. are excluded from: - /v1/lvg/filter-and-score (via getAvailableSpace) - /scheduler/filter (local and replicated PVCs) - /scheduler/prioritize (local and replicated PVCs) Integration points: - getAvailableSpace(): return error if LVG not schedulable - findMatchedSCLVG(): only consider schedulable LVGs when matching - findLVGForNodeInRSP(): skip non-schedulable LVGs Designed for future extension (e.g. Unschedulable field) in one place. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

- Remove PoolEntry type and pools map from cache - GetReservedSpace and GetAllPools now compute reserved size by iterating reservations and skipping expired entries (lazy TTL check) - Expired reservations are effectively ignored immediately after TTL without waiting for the 30s cleanup ticker - AddReservation, removeReservation, NarrowReservation simplified - HasReservation and GetReservation return false for expired entries - Add Expired field to ReservationInfo; GetAllReservations returns all entries including expired for debug visibility - Debug endpoints: mark expired reservations with [EXPIRED] in /cache, show active/expired counts in /stat - Add tests: TestGetReservedSpace_SkipsExpired, TestGetAllPools_SkipsExpired, TestGetAllReservations_MarksExpired Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

…am, whitespace) Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

- Add handler_test_helpers_test.go: shared test helpers (newTestScheduler, readyLVG, readyLVGWithThinPool, notReadyLVG, newTestCache, newFakeClient) - Add filter_and_score_test.go: 12 tests for validation, filtering, scoring, cache, NotReady LVGs, thin pools, and idempotent reservation replace - Add narrow_reservation_test.go: 8 tests for validation, narrowing, non-existent reservation, and cache state verification Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

After lvcreate and lvextend, the in-memory cache contains stale data until the scanner runs. This caused unnecessary 5s requeues and blocking busy-wait loops. - LLV create: replace getLVActualSize with commands.GetLV after lvcreate - LLV resize: replace getLVActualSize with commands.GetLV after lvextend - LLV extender: replace FindLV busy-wait loop with GetLV after lvextend - LLVS snapshot: use GetLV after CreateThinLogicalVolumeSnapshot instead of requeueing for cache discovery Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

- Replace getUseLinstor with getNewControlPlane (inverted semantics): newControlPlane=true means the extender handles replicated PVC scheduling, false/absent means LINSTOR manages it. - Add RBAC permissions for moduleconfigs.deckhouse.io to fix "cannot list resource" error in reflector logs. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

BlockDeviceFilter resources with In/NotIn matchExpressions and nil/empty values caused metav1.LabelSelectorAsSelector to fail, breaking the entire block device reconciliation loop. Add sanitizeLabelSelector() that drops such vacuous expressions before parsing. Add tests for nil values, empty values, and mixed cases. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

…oragePool and add debug tool Add missing get/list/watch permissions for replicatedstorageclasses and replicatedstoragepools to the scheduler-extender ClusterRole. Without them the controller-runtime cached client blocks forever on informer sync, hanging filter/prioritize handlers. Add hack/debug.go — a standalone diagnostic tool that watches Kubernetes resources via kubectl and prints colored diffs interleaved with pod logs. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

…r and prioritize handlers Replace s.ctx (global) with a 4s timeout context derived from r.Context() so that blocked API calls (e.g. informer waiting for RBAC-denied List/Watch) return an error instead of hanging the handler goroutine indefinitely. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

…ffset approach Replace stale VGFree-only computation with LLV-based available space formula: min(totalCapacity - sumAllLLV - unaccountedSpace, reportedFree) - reserved. Key changes: - Add per-pool unaccounted space offset to reservation cache - Add sumLLVSpace helper to sum spec.size for all LLVs on a storage pool - Add CalibratePoolUnaccountedSpace to compute non-LLV volume offset - Add LVG watcher controller to recalibrate on LVG status updates - Register LLV field indexer (spec.lvmVolumeGroupName) for efficient queries - Update getAvailableSpace to use min(llvBased, reportedFree) as safety net - Add 16 unit tests covering thick/thin pools, calibration, and edge cases Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

…thin pool tests totalCapacity for thin pools was incorrectly set to AllocatedSize (space already handed out to thin LVs), which is 0 for empty pools. This caused all filter requests to reject every node with "not enough space". Fix: totalCapacity = AllocatedSize + AvailableSpace (full overprovisioned capacity of the thin pool, analogous to VGSize for thick pools). Add 7 thin pool unit tests covering empty pool, in-flight LLVs, reservations, unaccounted space, and calibration scenarios. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

…lution RSC spec.storagePool is now deprecated; the RSP name is stored in status.storagePoolName by the sds-replicated-volume RSC controller. Update RSC type to include spec.storage, status.storagePoolName, and add GetStoragePoolName() helper. Update RSP type to include status.eligibleNodes. Fix all extender code to resolve RSP name via GetStoragePoolName() instead of the empty spec.storagePool field. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

Remove always-constant parameters from readyLVGWithThinPool (vgSize) and thinLLV (lvgName), hardcoding the values inside the helpers. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

AleksZimin added 4 commits February 9, 2026 23:19

add several thinpool for one lvg

b672c72

Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

AleksZimin force-pushed the add-common-scheduler-extender-v2-refactor branch from 931f375 to 9cccf0e Compare February 10, 2026 21:48

AleksZimin added 2 commits February 11, 2026 01:09

AleksZimin force-pushed the add-common-scheduler-extender-v2-refactor branch from b7372b3 to 231dd6a Compare February 10, 2026 22:58

fix: resolve linter issues (errcheck, revive, gci, staticcheck, unpar…

4013d50

…am, whitespace) Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

AleksZimin force-pushed the add-common-scheduler-extender-v2-refactor branch from 231dd6a to 4013d50 Compare February 10, 2026 23:02

feat(scheduler): add filter-and-score request count to /stat

265ed41

Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

AleksZimin force-pushed the add-common-scheduler-extender-v2-refactor branch from f14e2e0 to 265ed41 Compare February 10, 2026 23:31

AleksZimin added 2 commits February 11, 2026 02:46

dmgtn force-pushed the add-common-scheduler-extender-v2-refactor branch from a75b782 to a2511e8 Compare March 8, 2026 22:56

AleksZimin self-assigned this Mar 12, 2026

AleksZimin added 4 commits March 12, 2026 09:53

AleksZimin force-pushed the add-common-scheduler-extender-v2-refactor branch from 46567d2 to d825918 Compare March 13, 2026 13:45

AleksZimin added 4 commits March 15, 2026 04:26

[scheduler-extender] Fix unparam linter warnings in test helpers

031f3b8

Remove always-constant parameters from readyLVGWithThinPool (vgSize) and thinLLV (lvgName), hardcoding the values inside the helpers. Signed-off-by: Aleksandr Zimin <alexandr.zimin@flant.com>

AleksZimin force-pushed the add-common-scheduler-extender-v2-refactor branch from 05adc48 to 031f3b8 Compare March 16, 2026 10:15

AleksZimin merged commit b05ab4d into add-common-scheduler-extender-v2 Mar 16, 2026
10 of 11 checks passed

AleksZimin deleted the add-common-scheduler-extender-v2-refactor branch March 16, 2026 11:57

duckhawk changed the title ~~add-common-scheduler-extender-v2 refactor~~ [scheduler-extender] v2 refactor: reservation API, cache, RBAC, space accounting, agent fixes Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[scheduler-extender] v2 refactor: reservation API, cache, RBAC, space accounting, agent fixes#193

[scheduler-extender] v2 refactor: reservation API, cache, RBAC, space accounting, agent fixes#193
AleksZimin merged 18 commits intoadd-common-scheduler-extender-v2from
add-common-scheduler-extender-v2-refactor

AleksZimin commented Feb 9, 2026 •

edited by duckhawk

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AleksZimin commented Feb 9, 2026 • edited by duckhawk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why do we need it, and what problem does it solve?

What is the expected result?

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AleksZimin commented Feb 9, 2026 •

edited by duckhawk

Loading