Skip to content

fix: handle TiKV metrics fetch errors (#6897)#6899

Open
ti-chi-bot wants to merge 1 commit into
pingcap:release-2.0from
ti-chi-bot:cherry-pick-6897-to-release-2.0
Open

fix: handle TiKV metrics fetch errors (#6897)#6899
ti-chi-bot wants to merge 1 commit into
pingcap:release-2.0from
ti-chi-bot:cherry-pick-6897-to-release-2.0

Conversation

@ti-chi-bot
Copy link
Copy Markdown
Member

This is an automated cherry-pick of #6897

What

GetLeaderCount could say the leader metric was missing when the /metrics fetch itself failed. The fetch runs in a goroutine, but the error was stored in a shared var, so the caller could miss it. Tiny fix, but yeah, kinda sus.

This waits for the fetch result through a buffered error channel before returning the fallback "metric not found" error.

Related: #4281 is the same code path, but a different old goroutine-leak issue.

Repro

The added test reproduces it on the old code:

go test ./pkg/tikvapi/v1 -run TestTiKVClient_GetLeaderCountFetchError -count=1

It serves /metrics with HTTP 500. Old code returns "metric not found". New code returns the real fetch error. This can happen IRL when the TiKV metrics endpoint is unavailable, times out, hits TLS/transport errors, or returns non-200; no weird cloud quota edge case needed.

Tests

go test -race ./pkg/tikvapi/v1 -count=1
go test ./pkg/tikvapi/v1 -run TestTiKVClient_GetLeaderCountFetchError -count=100
go test ./pkg/pdapi/... ./pkg/tikvapi/... ./pkg/resourcemanagerapi/... ./pkg/ticdcapi/...
git diff --check
make unit

@ti-chi-bot
Copy link
Copy Markdown
Contributor

ti-chi-bot Bot commented May 15, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fengou1 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot requested a review from shonge May 15, 2026 05:36
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.01%. Comparing base (3051251) to head (28799f2).

Additional details and impacted files
@@               Coverage Diff               @@
##           release-2.0    #6899      +/-   ##
===============================================
+ Coverage        42.98%   43.01%   +0.02%     
===============================================
  Files              334      334              
  Lines            18730    18732       +2     
===============================================
+ Hits              8052     8057       +5     
+ Misses           10678    10675       -3     
Flag Coverage Δ
unittest 43.01% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants