reconciliation logic for connection poolers#1713
reconciliation logic for connection poolers#1713limak9182 wants to merge 5 commits intofeature/database-controllersfrom
Conversation
|
CLA Assistant Lite bot: I have read the CLA Document and I hereby sign the CLA 1 out of 2 committers have signed the CLA. |
|
CLA Assistant Lite bot: All contributors have NOT signed the COC Document I have read the Code of Conduct and I hereby accept the Terms You can retrigger this bot by commenting recheck in this Pull Request |
| if getClusterClassErr := r.Get(ctx, client.ObjectKey{Name: postgresCluster.Spec.Class}, postgresClusterClass); getClusterClassErr != nil { | ||
| logger.Error(getClusterClassErr, "Unable to fetch referenced PostgresClusterClass", "className", postgresCluster.Spec.Class) | ||
| r.setCondition(postgresCluster, metav1.ConditionFalse, "ClusterClassNotFound", getClusterClassErr.Error()) | ||
| r.setCondition(postgresCluster, "Ready", metav1.ConditionFalse, "ClusterClassNotFound", getClusterClassErr.Error()) |
There was a problem hiding this comment.
we should probably use r.syncStatus everywhere
| // It ensures that we always attempt to sync the status of the PostgresCluster based on the final state of the CNPG Cluster and any errors that may have occurred. | ||
| defer func() { | ||
| if syncErr := r.syncStatus(ctx, postgresCluster, cnpgCluster, err); syncErr != nil { | ||
| if syncErr := r.syncStatus(ctx, postgresCluster, cnpgCluster, poolerEnabled, err); syncErr != nil { |
There was a problem hiding this comment.
this defer functions is probably not needed really if we map our state properly
| if buildCNPGClusterErr := r.Create(ctx, cnpgCluster); buildCNPGClusterErr != nil { | ||
| logger.Error(buildCNPGClusterErr, "Failed to create CNPG Cluster") | ||
| r.setCondition(postgresCluster, metav1.ConditionFalse, "ClusterBuildFailed", buildCNPGClusterErr.Error()) | ||
| r.setCondition(postgresCluster, "Ready", metav1.ConditionFalse, "ClusterBuildFailed", buildCNPGClusterErr.Error()) |
There was a problem hiding this comment.
we are setting condition everywhere but we are not syncing this via update
| requeuePooler, poolerErr := r.reconcileConnectionPooler(ctx, postgresCluster, postgresClusterClass, cnpgCluster) | ||
| if poolerErr != nil { | ||
| logger.Error(poolerErr, "Failed to reconcile connection pooler") | ||
| r.setCondition(postgresCluster, "Ready", metav1.ConditionFalse, "PoolerReconciliationFailed", poolerErr.Error()) |
There was a problem hiding this comment.
we should update the state after we set conditon
|
|
||
| // isConnectionPoolerEnabled determines if connection pooler should be active. | ||
| func (r *PostgresClusterReconciler) isConnectionPoolerEnabled(class *enterprisev4.PostgresClusterClass, cluster *enterprisev4.PostgresCluster) bool { | ||
| if cluster.Spec.ConnectionPoolerEnabled != nil { |
There was a problem hiding this comment.
do we need this really? we have a function that merges class and cluster spec. we can rely on it to get the latest value of pooler as well
| } | ||
|
|
||
| // reconcileConnectionPooler creates or deletes CNPG Pooler resources based on the effective enabled state. | ||
| // Returns (requeue, error) — requeue is true when poolers were just created and may not be ready yet. |
There was a problem hiding this comment.
this should be decision of reconciller to reque if new connection pooler is created and then before running reconcileConnectionPooler also check the diff and the status. if status is that it still reconcille we should requeue one more time
| return false, nil | ||
| } | ||
|
|
||
| if cnpgCluster.Status.Phase != cnpgv1.PhaseHealthy { |
There was a problem hiding this comment.
I think checking the status of pooler should be separate function from creation even from the perspective of testing
| return false, fmt.Errorf("failed to reconcile RO pooler: %w", err) | ||
| } | ||
|
|
||
| // Check if poolers are ready — requeue if they're still provisioning. |
There was a problem hiding this comment.
Im not sure if this works - k8s is caching resources so if you call get operation immedietally you will get cached information without updated status + it takes more than milisecond to provision poolers. We should have workflow similar to this:
Reconciliation loop iteration 1:
├── Check if Pooler exists func → No
├── Create Pooler func
└── return RequeueAfter: 15s in reconcillation loop
Reconciliation loop iteration 2 (15s later):
├── Check if Pooler exists func → Yes
├── r.Get(Pooler) → read status from cache (now populated by CNPG)
├── Pooler ready? → No → return RequeueAfter: 15s
└── Pooler ready? → Yes → continue
| logger := logs.FromContext(ctx) | ||
|
|
||
| if !r.isConnectionPoolerEnabled(class, postgresCluster) { | ||
| // Skip deletion if the cluster is not healthy — owner references handle cleanup via GC. |
There was a problem hiding this comment.
pooler is separated from cluster correct? So why we link removal to the cluster state? Shouldnt we instead check if pooler exists first and remove quickly if not?
There was a problem hiding this comment.
But do we need to let the pooler stay after the cluster is removed? We may get orphaned resources in k8s, no?
| UID: cnpgCluster.UID, | ||
| } | ||
|
|
||
| if poolerEnabled { |
There was a problem hiding this comment.
status sync function shouldnt have domain specific switches to do smarter logic. Maybe pooler here can be simply another set of phase and conditions that controller needs to trigger?
There was a problem hiding this comment.
We can add it to our Confluence PostgresCluster Controller Design doc, I assume?
72cfc39 to
d661e84
Compare
Description
What does this PR have in it?
Key Changes
Highlight the updates in specific files
Testing and Verification
How did you test these changes? What automated tests are added?
Related Issues
Jira tickets, GitHub issues, Support tickets...
PR Checklist