backendcluster: add cluster manager and cluster-scoped topology runtime#1104
backendcluster: add cluster manager and cluster-scoped topology runtime#1104YangKeao wants to merge 2 commits intopingcap:mainfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
98ea284 to
3993ee3
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1104 +/- ##
=======================================
Coverage ? 67.13%
=======================================
Files ? 143
Lines ? 15067
Branches ? 0
=======================================
Hits ? 10115
Misses ? 4258
Partials ? 694
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3993ee3 to
a7c5384
Compare
|
/test all |
|
/retest |
25a54ca to
cd317e4
Compare
|
/test all |
9eea95f to
f97603c
Compare
f97603c to
1add4f8
Compare
Introduce the backend cluster manager, cluster-scoped InfoSync runtime, topology aggregation, and single-cluster compatibility hooks.
1add4f8 to
af488c9
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 82ae03a9dc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if cluster := srv.clusterManager.PrimaryCluster(); cluster != nil { | ||
| srv.etcdCli = cluster.EtcdClient() | ||
| promFetcher = cluster | ||
| } |
There was a problem hiding this comment.
Handle multi-cluster startup without nil ETCD client
When more than one backend cluster is configured, PrimaryCluster() returns nil and this block leaves both srv.etcdCli and promFetcher unset, but the metrics reader is still started afterward. In that configuration, Prometheus fetch is unavailable and backend metrics fallback eventually calls BackendReader.queryAllOwners/etcd.GetKVs with a nil etcd client, which panics and stops the metrics loop; this makes multi-cluster deployments lose metrics-driven balancing right after startup.
Useful? React with 👍 / 👎.
| { | ||
| healthCheckCfg := config.NewDefaultHealthCheckConfig() | ||
| srv.metricsReader = metricsreader.NewDefaultMetricsReader(lg.Named("mr"), srv.infoSyncer, srv.infoSyncer, srv.httpCli, srv.etcdCli, healthCheckCfg, srv.configManager) | ||
| srv.metricsReader = metricsreader.NewDefaultMetricsReader(lg.Named("mr"), promFetcher, srv.clusterManager, srv.httpCli, srv.etcdCli, healthCheckCfg, srv.configManager) |
There was a problem hiding this comment.
Avoid pinning readers to the initial primary cluster
This constructs metricsReader with the one-time promFetcher/srv.etcdCli selected at startup, but backend-cluster runtime is now hot-reloaded and old clusters are explicitly closed during syncClusters. After a backend-cluster PD update in a running node, the reader/VIP paths keep using stale handles from the old cluster instead of the newly active runtime, so topology/prom/election operations can fail permanently after config reload.
Useful? React with 👍 / 👎.
| // Namespace always receives a topology fetcher from the cluster manager. PDFetcher preserves | ||
| // legacy static backend.instances compatibility by falling back internally before any backend | ||
| // cluster is configured. | ||
| fetcher := observer.NewPDFetcher(mgr.tpFetcher, cfg.Backend.Instances, logger.Named("be_fetcher"), healthCheckCfg) |
There was a problem hiding this comment.
I don't understand why you wrap the StaticFetcher in the PDFetcher. It makes PDFetcher even more complicated. StaticFetcher is only used for testing, especially mysql-connector-test.
| } | ||
|
|
||
| func (dhc *DefaultHealthCheck) Check(ctx context.Context, addr string, info *BackendInfo, lastBh *BackendHealth) *BackendHealth { | ||
| func (dhc *DefaultHealthCheck) Check(ctx context.Context, _ string, info *BackendInfo, lastBh *BackendHealth) *BackendHealth { |
There was a problem hiding this comment.
If you do not need addr, remove it from the param list.
What problem does this PR solve?
Issue Number: close #1098
What is changed and how it works:
Introduce a backend-cluster manager that owns cluster-scoped runtime instances.
This PR adds:
Check List
Tests
Notable changes
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.