Skip to content

fix: handle TiKV metrics fetch errors#6897

Merged
ti-chi-bot[bot] merged 1 commit into
pingcap:mainfrom
immanuwell:fix-tikv-leader-count-fetch-error
May 15, 2026
Merged

fix: handle TiKV metrics fetch errors#6897
ti-chi-bot[bot] merged 1 commit into
pingcap:mainfrom
immanuwell:fix-tikv-leader-count-fetch-error

Conversation

@immanuwell
Copy link
Copy Markdown
Contributor

@immanuwell immanuwell commented May 14, 2026

What

GetLeaderCount could say the leader metric was missing when the /metrics fetch itself failed. The fetch runs in a goroutine, but the error was stored in a shared var, so the caller could miss it. Tiny fix, but yeah, kinda sus.

This waits for the fetch result through a buffered error channel before returning the fallback "metric not found" error.

Related: #4281 is the same code path, but a different old goroutine-leak issue.

Repro

The added test reproduces it on the old code:

go test ./pkg/tikvapi/v1 -run TestTiKVClient_GetLeaderCountFetchError -count=1

It serves /metrics with HTTP 500. Old code returns "metric not found". New code returns the real fetch error. This can happen IRL when the TiKV metrics endpoint is unavailable, times out, hits TLS/transport errors, or returns non-200; no weird cloud quota edge case needed.

Tests

go test -race ./pkg/tikvapi/v1 -count=1
go test ./pkg/tikvapi/v1 -run TestTiKVClient_GetLeaderCountFetchError -count=100
go test ./pkg/pdapi/... ./pkg/tikvapi/... ./pkg/resourcemanagerapi/... ./pkg/ticdcapi/...
git diff --check
make unit

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 14, 2026

Hi @immanuwell. Thanks for your PR.

I'm waiting for a pingcap member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 14, 2026

Welcome @immanuwell! It looks like this is your first PR to pingcap/tidb-operator 🎉

@pingcap-cla-assistant
Copy link
Copy Markdown

pingcap-cla-assistant Bot commented May 14, 2026

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot Bot added the size/S label May 14, 2026
@github-actions github-actions Bot added the v2 for operator v2 label May 14, 2026
@liubog2008
Copy link
Copy Markdown
Member

/lgtm

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liubog2008

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the lgtm label May 15, 2026
@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 15, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-15 03:22:53.108575216 +0000 UTC m=+408741.641354534: ☑️ agreed by liubog2008.

@ti-chi-bot ti-chi-bot Bot added the approved label May 15, 2026
@liubog2008
Copy link
Copy Markdown
Member

/cherry-pick release-2.1
/cherry-pick release-2.0

@ti-chi-bot
Copy link
Copy Markdown
Member

@liubog2008: once the present PR merges, I will cherry-pick it on top of release-2.0/release-2.1 in the new PR and assign it to you.

Details

In response to this:

/cherry-pick release-2.1
/cherry-pick release-2.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 37.97%. Comparing base (cc26af7) to head (c77b3a3).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6897      +/-   ##
==========================================
+ Coverage   37.95%   37.97%   +0.01%     
==========================================
  Files         393      393              
  Lines       22603    22605       +2     
==========================================
+ Hits         8579     8584       +5     
+ Misses      14024    14021       -3     
Flag Coverage Δ
unittest 37.97% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ti-chi-bot ti-chi-bot Bot merged commit fe02609 into pingcap:main May 15, 2026
10 checks passed
@ti-chi-bot
Copy link
Copy Markdown
Member

@liubog2008: new pull request created to branch release-2.1: #6898.

Details

In response to this:

/cherry-pick release-2.1
/cherry-pick release-2.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Copy Markdown
Member

@liubog2008: new pull request created to branch release-2.0: #6899.

Details

In response to this:

/cherry-pick release-2.1
/cherry-pick release-2.0

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants