[SPIKE] CSPL-4530: Per-Pod Rolling Restart with Finalizers and Intent Annotations#1710
[SPIKE] CSPL-4530: Per-Pod Rolling Restart with Finalizers and Intent Annotations#1710vivekr-splunk wants to merge 100 commits intodevelopfrom
Conversation
* test * test * test * test * Use endpoint to update conf file * CSPL-3895 Matching changes from Ingestor to Indexer --------- Co-authored-by: Kasia Koziol <kkoziol@splunk.com> Co-authored-by: igor.grzankowski <@splunk.com>
…on for empty values
Add kubebuilder RBAC markers for rolling restart functionality: - pods (get;list;watch;delete): Check pod status, delete for restart - pods/eviction (create): Use Eviction API (respects PDB) - poddisruptionbudgets: Create/manage PDB for safe restarts - secrets (get;list;watch): Get Splunk credentials for REST API - statefulsets (get;list;watch;update;patch): Manage StatefulSet These permissions are required for: - Checking restart_required endpoint on pods - Calling Splunk reload API - Gracefully evicting pods during rolling restart - Ensuring PDB constraints during restart 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add RestartStatus struct to track rolling restart state: - Phase: Current restart phase (Pending, InProgress, Completed, Failed) - Message: Human-readable progress message - TotalPods: Total pod count in cluster - PodsNeedingRestart: Count of pods requiring restart - PodsRestarted: Count of pods successfully restarted - LastCheckTime: When we last checked restart_required endpoint - LastRestartTime: When current operation started (for timeout detection) Design uses aggregate counts (not arrays) to scale to 1000+ pods. All progress details are in the Message field for user visibility. Regenerated CRD with new status fields. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add two new methods to SplunkClient: 1. CheckRestartRequired(): - Checks /services/messages/restart_required endpoint - Returns (bool, string, error) for restart status and reason - Used to detect when Splunk config changes require restart 2. ReloadSplunk(): - Calls /services/server/control/restart?mode=reload - Reloads configuration without restarting splunkd - Non-disruptive operation for config changes that don't need full restart - Accepts HTTP 200 or 202 status codes These methods enable the rolling restart mechanism to: - Check all pods for restart requirements - Try reload first before full restart (optimization) - Only restart pods where reload is insufficient 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add helper functions: isPodReady, shouldCheckRestartRequired, checkPodsRestartRequired - Add tryReloadAllPods for sequential reload (one pod at a time) - Implement handleRollingRestart state machine with 5 phases: * None/Completed: Check if restart needed (rate limited to 5 min) * Pending: Try reload first before restart * InProgress: Rolling restart one pod per reconcile * Completed: All pods restarted successfully * Failed: Retry after 5 minutes - Integrate state machine into reconciliation loop - Replace immediate restart loop with rolling restart trigger on secret changes - Add timeout protection (2 hours) for stuck restart operations Benefits: - Respects PodDisruptionBudget via Delete (not Eviction API yet) - One pod at a time to maintain availability - Tries reload first (non-disruptive) before restart - Proper state tracking in RestartStatus - Scales to 1000+ pods with aggregate counts
- Modified secret change detection to trigger InProgress phase directly - Skip reload attempt for secret changes (secrets require pod restart) - Implement sequential pod restart based on PodsRestarted counter - Remove dependency on CheckRestartRequired() for secret-triggered restarts - Accept 404 responses from restart_required endpoint on standalone instances This fixes the rolling restart mechanism for IngestorCluster which uses standalone Splunk instances that don't generate restart_required messages for external secret changes. Tested: Successfully performed rolling restart of 3 pods when secret updated. Pods restarted sequentially with proper status tracking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit introduces a comprehensive pod lifecycle management system using Kubernetes finalizers and intent annotations to ensure safe pod deletion during restarts and scale-down operations. Key Features: - Pod Controller with finalizer management (splunk.com/pod-cleanup) - Pod deletion handler for role-specific cleanup - Intent annotation system (serve, scale-down, restart) - Per-pod eviction for IndexerCluster and SearchHeadCluster - PVC lifecycle management (preserve on restart, delete on scale-down) - Secret change detection and rolling restart triggers Components Added: - internal/controller/pod_controller.go (NEW) - pkg/splunk/enterprise/pod_deletion_handler.go (NEW) Components Modified: - pkg/splunk/enterprise/indexercluster.go (per-pod eviction) - pkg/splunk/enterprise/searchheadcluster.go (per-pod eviction) - pkg/splunk/enterprise/standalone.go (secret change restart) - pkg/splunk/splkcontroller/statefulset.go (scale-down intent marking) - pkg/splunk/client/enterprise.go (restart/reload REST API) - api/v4/*_types.go (RestartStatus fields) RBAC Changes: - Added pod watch/get/list/update/patch permissions - Added pod/eviction create permission - Added secret watch/get/list permission - Added PVC delete permission Testing: - Unit tests for pod controller and deletion handler - Integration tests for restart and scale-down scenarios - Test plan documented in FINALIZER_TEST_PLAN.md 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
This user-facing documentation explains the rolling restart feature from an operator's perspective, focusing on practical usage and benefits. Key sections: - Overview and benefits for users - How it works from user perspective - Common scenarios with step-by-step examples - Monitoring and troubleshooting guidance - Best practices and FAQ Audience: Splunk Operator users managing Kubernetes clusters Format: Practical guide with copy/paste commands 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
…luster Removed automatic restart_required detection and pod eviction from IndexerCluster and SearchHeadCluster controllers. These components have in-product orchestration: - IndexerCluster: Cluster Manager (CM) handles restart coordination - SearchHeadCluster: Deployer + Captain handle restart coordination The operator should not interfere with their built-in orchestration logic. Removed functions: - checkAndEvictIndexersIfNeeded() and helpers - checkAndEvictSearchHeadsIfNeeded() and helpers - policyv1 imports (no longer needed) Retained for ALL controllers: - Pod finalizers for scale-down/restart cleanup - PreStop lifecycle hooks - Intent annotations (serve vs scale-down) - PVC lifecycle management - StatefulSet rolling update support restart_required detection remains for: - IngestorCluster (no in-product orchestrator) - Standalone (no in-product orchestrator) Changes: 2 files changed, 12 insertions(+), 222 deletions(-) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Updated test dependencies (ginkgo, gomega, pprof) - Applied go fmt formatting to pod_controller and pod_deletion_handler - Updated golang.org/x dependencies to latest versions These changes were generated by running 'make build'. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
CLA Assistant Lite bot: I have read the CLA Document and I hereby sign the CLA 1 out of 2 committers have signed the CLA. |
…rcentage-based updates This commit completes three major enhancements to the per-pod rolling restart mechanism: 1. PreStop Hook Implementation - Created tools/k8_probes/preStop.sh script (10KB) - Role-based shutdown: indexer decommission, SH detention, graceful stop for others - Intent-aware: reads splunk.com/pod-intent annotation to determine behavior - Scale-down: enforce_counts=1 (rebalance buckets/members) - Restart: enforce_counts=0 (no rebalancing) - Status monitoring with configurable timeouts - Added POD_NAME, POD_NAMESPACE, SPLUNK_PASSWORD env vars to pods 2. Refactored Decommission/Detention to PreStop Hooks - Moved decommission execution from operator to preStop hook (indexercluster.go) - Moved detention execution from operator to preStop hook (searchheadclusterpodmanager.go) - Operator now only waits/verifies completion instead of executing - Implemented waitForSearchHeadDetention in pod_deletion_handler.go - Better separation of concerns: execution in hook, verification in operator 3. Percentage-Based Rolling Update Support - Added RollingUpdateConfig type to api/v4/common_types.go - Support for maxPodsUnavailable as percentage (e.g., "25%") or absolute number - Support for partition-based canary deployments - Implemented buildUpdateStrategy function in configuration.go - Backward compatible: defaults to existing behavior (1 pod at a time) Benefits: - Faster pod termination (decommission during SIGTERM, not before) - Flexible rollout control (percentage-based or absolute) - Better error visibility (preStop failures in pod events) - Consistent pod lifecycle operations - Support for canary deployments Documentation: - IMPLEMENTATION_SUMMARY.md: Complete implementation details - CURRENT_IMPLEMENTATION_ANALYSIS.md: Code analysis and requirements - per-pod-rolling-restart-architecture.png: C4 architecture diagram Testing Required: - PreStop hook execution for all roles - Decommission/detention with intent annotations - Percentage-based rolling updates - Canary deployments with partition - PDB interaction with custom maxPodsUnavailable Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ster and SearchHeadCluster Removed unused restart_required detection functions that were left behind as dead code: - shouldCheckIndexerRestartRequired() - checkIndexerPodsRestartRequired() - shouldCheckSearchHeadRestartRequired() - checkSearchHeadPodsRestartRequired() These functions were never called after the earlier refactoring that removed restart_required detection for IndexerCluster and SearchHeadCluster. Rationale: - IndexerCluster: Cluster Manager (CM) handles restart coordination - SearchHeadCluster: Deployer + Captain handle restart coordination - Operator should not interfere with in-product orchestration Kept functions: - triggerIndexerRollingRestart() - Used for secret changes (legitimate) - triggerSearchHeadRollingRestart() - Used for secret changes (legitimate) Changes: - Removed 240 lines of dead code - Added clarifying comments about restart orchestration responsibility Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit fixes 7 critical and high-priority issues identified in the Kubernetes native patterns review: 1. CRITICAL: Add duplicate finalizer prevention - Check if finalizer exists before appending - Prevents multiple cleanup runs and delayed pod deletion - Location: configuration.go - added containsString() helper 2. CRITICAL: Add mutual exclusion for eviction vs RollingUpdate - Check if StatefulSet rolling update in progress before evicting - Prevents PDB violations from simultaneous pod terminations - Applied to: IngestorCluster and Standalone - Location: ingestorcluster.go, standalone.go 3. HIGH: Add timeout to preStop API calls - Added --max-time 10 to curl commands - Prevents preStop hooks from hanging indefinitely - Location: preStop.sh:get_pod_intent() 4. HIGH: Add environment variable validation - Validate POD_NAME, POD_NAMESPACE, SPLUNK_ROLE at startup - Warn if SPLUNK_CLUSTER_MANAGER_URL missing for indexers - Location: preStop.sh:main() 5. HIGH: Fix decommission/detention timeout behavior - Return error (exit 1) instead of success when timeout exceeded - Allows operator/finalizer to detect incomplete operations - Location: preStop.sh - decommission_indexer(), detain_search_head() 6. MEDIUM: Fix PDB for single-replica deployments - Allow eviction for single replica (minAvailable = 0) - Previously blocked all evictions (minAvailable = 1) - Location: util.go:2618-2620 7. MEDIUM: Add update staleness detection - Check if RollingUpdate stalled (no pods ready) - Warn if less than half pods ready - Return PhaseError if update appears stuck - Location: statefulset.go:205-232 8. Use mounted secret file instead of env var - Read password from /mnt/splunk-secrets/password - Remove SPLUNK_PASSWORD environment variable - More secure and aligns with existing pattern - Location: preStop.sh, configuration.go Testing: - All code compiles successfully - make build passes Documentation: - KUBERNETES_NATIVE_REVIEW_FINDINGS.md - Complete review analysis Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Test Coverage (18 passing tests): - PodDisruptionBudget creation and updates for all cluster types - Intent annotation handling (scale-down vs restart) - Finalizer management and duplicate prevention - Percentage-based rolling update configuration - Mutual exclusion between eviction and StatefulSet rolling updates - Pod eviction logic with PDB protection - Cluster-specific behavior (CM/Captain orchestration vs operator eviction) Test Files: - pkg/splunk/enterprise/pod_lifecycle_test.go: Pod lifecycle management tests - pkg/splunk/enterprise/pod_eviction_test.go: Pod eviction and intent tests - TEST_COVERAGE.md: Comprehensive test documentation All tests use fake Kubernetes client for fast, isolated execution. Integration tests requiring preStop.sh file are marked for separate execution. Related: CSPL-4530 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Critical Fix: - Operator now detects and respects user-created PDBs - Check if PDB has ownerReference to CR before updating - If PDB has no owner reference → user-created → DO NOT MODIFY - If PDB has owner reference → operator-managed → update as needed Problem Solved: - Previously, operator would overwrite user-created PDBs on every reconcile - Customers could not customize availability requirements - User's custom minAvailable/maxUnavailable settings would be lost Solution: - Added owner reference check in ApplyPodDisruptionBudget() - Operator logs "user-created PDB detected" and skips update - User PDBs take precedence over operator defaults Test Coverage: - TestUserCreatedPDB: Verifies user PDB is preserved (minAvailable=1 stays 1) - TestOperatorManagedPDB: Verifies operator can update its own PDBs Documentation: - USER_CREATED_PDB.md: Complete guide for user-created PDB support * Use cases (high availability, maintenance windows, faster updates) * Lifecycle management (creation, updates, deletion) * Best practices and troubleshooting * Examples for production and dev environments - TEST_COVERAGE.md: Updated to reflect 20 passing tests Benefits: ✅ Customers can define custom availability requirements ✅ Operator respects user configuration ✅ Backward compatible (existing behavior unchanged) ✅ Fully tested with unit tests ✅ Clear documentation for users Related: CSPL-4530 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on handling, and pod intent This commit addresses all 7 issues identified in the code review plus the open question about pod intent RBAC. ## Issue #1: Eviction RBAC in Wrong API Group [HIGH - FIXED] **Problem:** RBAC granted pods/eviction under core group, but eviction is policy/v1 **Impact:** Runtime RBAC forbidden errors when calling Eviction API **Fix:** Changed //+kubebuilder:rbac annotation from groups=core to groups=policy **Files:** standalone_controller.go, ingestorcluster_controller.go, role.yaml (regenerated) ## Issue #2: Scale-Down Intent Never Applied [HIGH - FALSE POSITIVE] **Problem:** Reviewer thought scale-down intent wasn't applied **Analysis:** Actually IS applied via markPodForScaleDown() in statefulset.go:156 **Status:** No changes needed - already implemented correctly ## Issue #3: preStop Cluster Manager URL Malformed [HIGH - FIXED] **Problem:** SPLUNK_CLUSTER_MANAGER_URL set to service name without https:// or :8089 **Impact:** Peer status checks always fail, decommission verification doesn't work **Fix:** - Construct full URL: https://<service-name>:8089 - Add SPLUNK_CLUSTER_MANAGER_SERVICE env var for service name - Fix peer name construction in preStop.sh (use POD_NAME directly) **Files:** configuration.go, preStop.sh ## Issue #4: preStop Timeout Exceeds Grace Period [MEDIUM - FIXED] **Problem:** preStop could wait 300s but non-indexers only get 120s grace period **Impact:** Kubelet SIGKILL before hook finishes, incomplete cleanup **Fix:** Align timeouts with grace periods: - Indexers: 270s max wait (300s grace period - 30s buffer) - Others: 90s max wait (120s grace period - 30s buffer) **Files:** preStop.sh ## Issue #5: PDB Selector Mismatch with ClusterManagerRef [MEDIUM - FIXED] **Problem:** PDB selector uses empty partOfIdentifier but pods use ClusterManagerRef.Name **Impact:** PDB doesn't select any pods, no disruption protection **Fix:** Apply same partOfIdentifier logic to PDB labels as pod labels - Type assertion to get ClusterManagerRef from IndexerCluster CR - Use ClusterManagerRef.Name or ClusterMasterRef.Name as partOfIdentifier **Files:** util.go ## Issue #6: Partition Blocks Eviction Forever [MEDIUM - FIXED] **Problem:** When RollingUpdateConfig.Partition set, UpdatedReplicas < Replicas is always true **Impact:** restart_required evictions never happen with canary deployments **Fix:** Check if partition-based update is complete: - Calculate expectedUpdatedReplicas = replicas - partition - If UpdatedReplicas >= expectedUpdatedReplicas, allow eviction - Only block eviction if partitioned pods still updating **Files:** standalone.go, ingestorcluster.go ## Issue #7: PDB Violation Detection is Brittle [LOW - FIXED] **Problem:** String matching "Cannot evict pod" is fragile and locale-dependent **Impact:** May miss PDB violations if error message changes **Fix:** Use k8serrors.IsTooManyRequests(err) to check for HTTP 429 status - More reliable than string matching - Matches Kubernetes Eviction API behavior **Files:** standalone.go, ingestorcluster.go ## Open Question: preStop Pod Intent RBAC Dependency [FIXED] **Problem:** preStop needs GET pods RBAC to read intent annotation **Impact:** If RBAC not granted, intent always defaults to "serve" **Fix:** Use Kubernetes Downward API instead of API call: - Add SPLUNK_POD_INTENT env var via downward API - Read from metadata.annotations['splunk.com/pod-intent'] - No RBAC required, no network calls, no timeouts - More reliable and works in restricted environments **Files:** configuration.go, preStop.sh ## Summary of Changes **3 High Priority Fixes:** - ✅ Eviction RBAC now in correct API group (policy) - ✅ Cluster Manager URL properly constructed - ✅ Pod intent via Downward API (no RBAC needed) **3 Medium Priority Fixes:** - ✅ preStop timeout aligned with grace period - ✅ PDB selector matches pod labels (ClusterManagerRef support) - ✅ Partition-based updates don't block eviction forever **1 Low Priority Fix:** - ✅ PDB violation detection uses IsTooManyRequests() **Documentation:** - REVIEW_FINDINGS_RESPONSE.md: Complete analysis of all findings ## Testing - Code compiles successfully - All changes follow Kubernetes best practices - Backward compatible (no breaking changes) Related: CSPL-4530 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This spike implements a comprehensive pod lifecycle management system for Splunk Operator enabling graceful pod termination, role-specific cleanup, and flexible rolling update strategies. The implementation introduces three new Custom Resource Definitions (IngestorCluster, Queue, ObjectStorage) to support index and ingestion separation architecture, along with Kubernetes-native patterns using finalizers, preStop hooks, and PodDisruptionBudgets.
Changes:
- Adds 3 new CRDs (IngestorCluster, Queue, ObjectStorage) for index/ingestion separation
- Implements pod finalizer-based cleanup and preStop lifecycle hooks for graceful shutdown
- Adds PodDisruptionBudget support and rolling update enhancements with percentage-based configurations
- Updates StatefulSet management to use RollingUpdate strategy instead of OnDelete
- Adds extensive test infrastructure for the new functionality
Reviewed changes
Copilot reviewed 148 out of 151 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| api/v4/*_types.go | New CRD types and status fields for IngestorCluster, Queue, ObjectStorage, RestartStatus |
| pkg/splunk/splkcontroller/statefulset.go | Critical refactoring of rolling update logic with intent marking and PVC cleanup changes |
| pkg/splunk/enterprise/configuration.go | PreStop hooks, finalizers, environment variables, and rolling update strategy |
| pkg/splunk/enterprise/standalone.go | Per-pod eviction logic with restart_required detection |
| pkg/splunk/enterprise/searchheadcluster.go | Updated to monitor detention instead of triggering it |
| internal/controller/pod_controller.go | New pod finalizer controller for cleanup orchestration |
| test/testenv/*.go | Test utilities for IngestorCluster and related resources |
| test/index_and_ingestion_separation/* | New test suite for index/ingestion separation |
| config/crd/bases/* | Generated CRD manifests for new and updated resources |
| helm-chart/* | Helm chart templates and RBAC for new CRDs |
| config/rbac/* | RBAC permissions for pod eviction and PDB management |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| response=$(curl -s -k -u "${SPLUNK_USER}:${SPLUNK_PASSWORD}" \ | ||
| "${cluster_manager_url}/services/cluster/manager/peers?output_mode=json" 2>/dev/null) |
There was a problem hiding this comment.
The curl call to the Cluster Manager in get_indexer_peer_status uses -k (disables TLS certificate verification) while sending Splunk admin credentials to SPLUNK_CLUSTER_MANAGER_URL. If an attacker can influence DNS or the network path to that URL inside the cluster, they can man-in-the-middle this connection and steal admin credentials or spoof Cluster Manager responses. Replace -k with proper certificate verification (for example by mounting the CM CA cert and using --cacert or --capath), and ensure SPLUNK_CLUSTER_MANAGER_URL always points to a trusted internal endpoint.
Overview
This spike implements a comprehensive pod lifecycle management system for Splunk Operator that enables graceful pod termination, role-specific cleanup, and flexible rolling update strategies. The implementation follows Kubernetes-native patterns using finalizers, preStop hooks, PodDisruptionBudgets, and the Eviction API.
Key Features
1. Graceful Pod Lifecycle Management
PreStop Lifecycle Hooks
Intent-Aware Operations
splunk.com/pod-intentannotation to distinguish operations:serve- Normal operationscale-down- Permanent removal with full cleanuprestart- Temporary termination with data preservation2. Safe Cluster Operations
Finalizer-Based Cleanup
PodDisruptionBudget Integration
minAvailable = replicas - 13. Flexible Rolling Update Strategies
Percentage-Based Updates
Canary Deployments
Intelligent Update Management
4. Automatic Restart Detection (IngestorCluster & Standalone)
Per-Pod Restart Monitoring
restart_requiredmessages for configuration changesConfiguration-Driven Restarts
5. Kubernetes-Native Design
Follows Best Practices
Production-Ready Error Handling
Architecture
See the complete C4 architecture diagram:
per-pod-rolling-restart-architecture.pngKey Components:
Implementation Details
Cluster Type Behaviors
IngestorCluster & Standalone
restart_requiredmonitoringIndexerCluster
SearchHeadCluster
Environment Variables
All pods receive via Kubernetes downward API:
POD_NAME- Pod name for API queriesPOD_NAMESPACE- Namespace for API queriesSPLUNK_ROLE- Role type for preStop hook logicSPLUNK_CLUSTER_MANAGER_URL- Cluster Manager URL (for indexers)Passwords read from mounted secrets at
/mnt/splunk-secrets/passwordTermination Grace Periods
Benefits
Operational
Technical
Developer
Configuration Examples
Basic Rolling Update with Percentage
Canary Deployment
Conservative Update (Default)
Testing
Completed
Recommended Testing
Files Changed
New Files
tools/k8_probes/preStop.sh- PreStop lifecycle hook implementationper-pod-rolling-restart-architecture.png- C4 architecture diagramKUBERNETES_NATIVE_REVIEW_FINDINGS.md- K8s patterns review and validationModified Core Logic
pkg/splunk/enterprise/configuration.go- StatefulSet creation, PreStop hooks, finalizerspkg/splunk/enterprise/pod_deletion_handler.go- Finalizer handler with intent-based cleanuppkg/splunk/enterprise/ingestorcluster.go- Restart detection and pod evictionpkg/splunk/enterprise/standalone.go- Restart detection and pod evictionpkg/splunk/enterprise/util.go- PodDisruptionBudget creation and managementpkg/splunk/splkcontroller/statefulset.go- Rolling update coordinationAPI Changes
api/v4/common_types.go- RollingUpdateConfig type for percentage-based updatesUser Guide
Complete user guide available in:
per-pod-rolling-restart-user-guide.mdCovers:
Documentation
per-pod-rolling-restart-architecture.png- Complete C4 diagram showing all components and interactionsper-pod-rolling-restart-user-guide.md- Comprehensive guide for operatorsKUBERNETES_NATIVE_REVIEW_FINDINGS.md- K8s native patterns validationBackward Compatibility
rollingUpdateConfigfield is optional (defaults to existing behavior)Future Enhancements
🔬 This is a SPIKE - For evaluation and architectural review
🤖 Generated with Claude Code