tiproxy: add spec field `gracefulShutdownDeleteDelaySeconds` to gracefully mark unhealthy before deleting the pods (#6829) by ti-chi-bot · Pull Request #6894 · pingcap/tidb-operator

ti-chi-bot · 2026-05-14T09:19:06Z

This is an automated cherry-pick of #6829

Background

I'm designing the graceful restarting of TiProxy in cloud environment.

The intended behavior is:

Use a big maxSurge.
Patch the existing TiProxyGroup to enable a long graceful shutdown delete delay.
Restart.
Old TiProxy instances are first marked unhealthy, then kept alive for a while before the old pods are actually deleted.

This is mainly for cloud load balancers: existing long-lived connections can continue to work on the old TiProxy (with big enough target_health_state.unhealthy.draining_interval_seconds for AWS and disable ConnectionDrainEnabled for aliyun), while new connections should be sent to the new TiProxy instances.

We cannot rely on changing terminationGracePeriodSeconds for existing pods, because that would itself require restarting them. So this PR adds a controller-side graceful delete flow for TiProxy.

Design

Add a new spec field spec.template.spec.gracefulShutdownDeleteDelaySeconds to TiProxyGroup / TiProxy.
This field is treated as reloadable, so patching it does not trigger a rolling restart by itself.
When a TiProxy object is being deleted and this field is set to a positive value:
1. operator first tries to call POST /api/debug/health/unhealthy
2. if the API is not supported (404), operator falls back to sending SIGTERM to the TiProxy process by pods/exec
3. only after TiProxy is confirmed unhealthy, operator writes core.pingcap.com/tiproxy-graceful-shutdown-begin-time on the pod and starts the delete-delay timer
4. after the timer expires, operator deletes the pod
If TiProxy cannot be marked unhealthy, operator will keep retrying and will not start the delete-delay timer.
When the whole Cluster is deleting, this graceful delay is skipped and the pod is deleted directly.

This design keeps the user-facing control in spec, avoids changing terminationGracePeriodSeconds, and supports both new TiProxy versions (with unhealthy API) and older ones (with SIGTERM fallback).

Usage

Patch an existing TiProxyGroup, so old TiProxy pods will be kept for a while after they are marked unhealthy:

kubectl --context "$CONTEXT" -n "$NS" patch tiproxygroup pg --type merge -p '{
  "spec": {
    "template": {
      "spec": {
        "gracefulShutdownDeleteDelaySeconds": 20
      }
    }
  }
}'

Then trigger a rolling restart, for example by changing config or image. With a large maxSurge, new TiProxy pods can come up first, and old TiProxy pods will only be deleted after entering graceful shutdown and waiting for
the configured delay.

Signed-off-by: Yang Keao <yangkeao@chunibyo.icu>

codecov-commenter · 2026-05-14T09:25:49Z

Codecov Report

❌ Patch coverage is 67.12329% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.95%. Comparing base (e5ddd8b) to head (a4aef15).

Additional details and impacted files

@@               Coverage Diff               @@
##           release-2.1    #6894      +/-   ##
===============================================
+ Coverage        37.61%   37.95%   +0.33%     
===============================================
  Files              392      393       +1     
  Lines            22483    22603     +120     
===============================================
+ Hits              8458     8579     +121     
+ Misses           14025    14024       -1

Flag	Coverage Δ
unittest	`37.95% <67.12%> (+0.33%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

liubog2008 · 2026-05-14T14:56:49Z

/lgtm

ti-chi-bot · 2026-05-14T14:56:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liubog2008

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [liubog2008]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-05-14T14:56:56Z

[LGTM Timeline notifier]

Timeline:

2026-05-14 14:56:56.390273398 +0000 UTC m=+363984.923052717: ☑️ agreed by liubog2008.

YangKeao added 7 commits May 14, 2026 09:19

support delayed TiProxy graceful shutdown via unhealthy API

b25f799

fix: add missing license header for updater test

2967a1b

fix: address tiproxy graceful shutdown lint issues

51421cc

simplify tiproxy graceful shutdown fallback

df4dd3b

Signed-off-by: Yang Keao <yangkeao@chunibyo.icu>

tiproxy: support graceful shutdown delete delay

d74e309

test: support old TiProxy graceful restart e2e

e78b0ad

fix: resolve TiProxy graceful shutdown lint issues

a4aef15

ti-chi-bot added do-not-merge/hold size/XXL type/cherry-pick-for-release-2.1 v2 for operator v2 labels May 14, 2026

ti-chi-bot mentioned this pull request May 14, 2026

tiproxy: add annotation tiproxy-graceful-shutdown-delete-delay-seconds to gracefully mark unhealthy before deleting the pods #6829

Merged

ti-chi-bot assigned liubog2008 May 14, 2026

ti-chi-bot Bot requested a review from howardlau1999 May 14, 2026 09:19

ti-chi-bot Bot added the lgtm label May 14, 2026

ti-chi-bot Bot added the approved label May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tiproxy: add spec field `gracefulShutdownDeleteDelaySeconds` to gracefully mark unhealthy before deleting the pods (#6829)#6894

tiproxy: add spec field `gracefulShutdownDeleteDelaySeconds` to gracefully mark unhealthy before deleting the pods (#6829)#6894
ti-chi-bot wants to merge 7 commits into
pingcap:release-2.1from
ti-chi-bot:cherry-pick-6829-to-release-2.1

ti-chi-bot commented May 14, 2026

Uh oh!

codecov-commenter commented May 14, 2026

Uh oh!

liubog2008 commented May 14, 2026

Uh oh!

ti-chi-bot Bot commented May 14, 2026

Uh oh!

ti-chi-bot Bot commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ti-chi-bot commented May 14, 2026

Background

Design

Usage

Uh oh!

codecov-commenter commented May 14, 2026

Codecov Report

Uh oh!

liubog2008 commented May 14, 2026

Uh oh!

ti-chi-bot Bot commented May 14, 2026

Uh oh!

ti-chi-bot Bot commented May 14, 2026

[LGTM Timeline notifier]

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants