feat(tidb): add TiFlash replication lag, PD metrics, and dashboard-aligned metrics#2982
Open
premal wants to merge 3 commits into
Open
feat(tidb): add TiFlash replication lag, PD metrics, and dashboard-aligned metrics#2982premal wants to merge 3 commits into
premal wants to merge 3 commits into
Conversation
…trics - Add tiflash_syncing_data_freshness histogram (TiFlash replication lag from TiKV) - Add PD_METRICS list: pd_client_cmd_handle_cmds_duration_seconds, pd_client_request_handle_requests_duration_seconds - Add TiDB session phase duration metrics (parse/compile/execute/transaction) - Add TiDB connection metrics (get_token, conn_idle) and server metrics (query_total, disconnection, plan_cache) - Add tidb_tikvclient_request_seconds (TiKV client latency seen from TiDB) - Add TiKV raftstore metrics (append/apply/commit log, store/apply duration) - Add TiKV gRPC, engine flow, and async storage request metrics - Update check.py to include PD_METRICS in default metric list - Add fixture data and unit tests for all new metrics (EXPECTED_PD, extended EXPECTED_TIDB/TIKV) - Update metadata.csv with all new metric definitions Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Add 4 new dashboard groups covering metrics not previously visualized: - TiDB query internals: total QPS, plan cache, parse/compile/execute duration, TiKV client request latency, connection idle duration - TiFlash replication: replication lag histogram (avg/p50/p95/p99) - TiKV raftstore & gRPC: raftstore log/store/apply duration, gRPC message duration, engine flow bytes, async storage request duration - PD client: PD command and request handling duration (avg/p99) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
….count expectation OpenMetrics base check emits histogram count/sum rows with upper_bound:none tag. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
joepeeples
approved these changes
Apr 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tiflash_syncing_data_freshnesshistogram to track TiFlash replication lag from TiKV (avg/p50/p95/p99)PD_METRICSlist:pd_client_cmd_handle_cmds_duration_seconds,pd_client_request_handle_requests_duration_secondsget_token,conn_idle,query_total,disconnection_total,plan_cache_total,plan_cache_miss_totaltidb_tikvclient_request_seconds(TiKV client latency from TiDB's perspective)check.pyto includePD_METRICSin default metric listEXPECTED_PD, extendedEXPECTED_TIDB/EXPECTED_TIKV)metadata.csvwith all new metric definitionsoverview.json: TiDB query internals, TiFlash replication, TiKV raftstore & gRPC, PD clientMotivation
The existing integration collected only a small subset of the metrics visible in TiDB Dashboard's Monitoring page. This PR aligns the Datadog integration with the full set of metrics that TiDB operators actually use for day-to-day monitoring, and adds TiFlash replication lag which was previously missing entirely.
Test plan
tests/fixtures/test_pd_mock_metricsunit test addedEXPECTED_TIDBandEXPECTED_TIKVextended with representative tags for each new metricmetadata.csvupdated with type, unit, and description for all new entriesmetadata.csv)🤖 Generated with Claude Code