Skip to content

[fix](fe) add --drop_backends param to start_fe.sh#63306

Open
mymeiyi wants to merge 2 commits into
apache:masterfrom
mymeiyi:drop-backends
Open

[fix](fe) add --drop_backends param to start_fe.sh#63306
mymeiyi wants to merge 2 commits into
apache:masterfrom
mymeiyi:drop-backends

Conversation

@mymeiyi
Copy link
Copy Markdown
Contributor

@mymeiyi mymeiyi commented May 15, 2026

This PR adds a --drop_backends startup flag for Doris FE, wiring it through bin/start_fe.sh into FE argument parsing, and executing backend removal when the FE becomes master

./bin/start_fe.sh --drop_backends

Copilot AI review requested due to automatic review settings May 15, 2026 12:55
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a --drop_backends startup flag for Doris FE, wiring it through bin/start_fe.sh into FE argument parsing, and executing backend removal when the FE becomes master.

Changes:

  • Add --drop_backends option to bin/start_fe.sh and pass it through to org.apache.doris.DorisFE.
  • Register/parse a new FE CLI option drop_backends and set a corresponding system property.
  • When transferring to master, drop all registered backends if the system property is enabled.

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File Description
fe/fe-core/src/main/java/org/apache/doris/DorisFE.java Adds the --drop_backends CLI option and sets a system property when present.
fe/fe-core/src/main/java/org/apache/doris/common/FeConstants.java Introduces the DROP_BACKENDS_KEY constant.
fe/fe-core/src/main/java/org/apache/doris/catalog/Env.java Drops all backends during transferToMaster() when the property is enabled.
bin/start_fe.sh Adds --drop_backends getopt parsing and forwards the flag to FE.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread fe/fe-core/src/main/java/org/apache/doris/catalog/Env.java Outdated
Comment thread fe/fe-core/src/main/java/org/apache/doris/DorisFE.java Outdated
Comment thread fe/fe-core/src/main/java/org/apache/doris/catalog/Env.java Outdated
@mymeiyi
Copy link
Copy Markdown
Contributor Author

mymeiyi commented May 15, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking correctness issue. The new startup flag is wired through the shell script and FE argument parsing, and bash -n bin/start_fe.sh passes, but the backend removal path currently treats individual drop failures as non-fatal. That can leave the cluster in a partially dropped state while the FE continues to become ready, which violates the semantics of a destructive recovery/startup option.

Critical checkpoint conclusions:

  • Goal/test: The PR aims to add --drop_backends and remove all BEs when the FE becomes MASTER. The wiring is present, but the implementation does not guarantee all requested BEs are removed; I did not find tests proving success/failure behavior.
  • Scope: The change is small and focused.
  • Concurrency/locking: No new explicit locks; the drop path uses existing SystemInfoService immutable-map replacement. The main concern is failure atomicity rather than lock ordering.
  • Lifecycle: The flag is a process system property observed during transferToMaster; no static initialization ordering issue found. Repeated leadership transitions are intentionally tied to the process flag per the current help text.
  • Config/compatibility: No persistent config or protocol/storage-format compatibility changes.
  • Parallel paths: The PR routes through existing dropBackend; cloud mode overrides can fail via meta-service RPC, making the non-fatal catch especially problematic.
  • Tests: No regression/unit test coverage was added for successful forwarding or partial-failure behavior.
  • Observability: Existing logs are useful, but observability cannot compensate for continuing after a failed destructive operation.
  • Transactions/persistence/data writes: Backend metadata changes are journaled through existing drop paths; however, partial failure can persist only a subset of requested drops.
  • Performance: No hot-path performance concern found for this startup/master-transfer option.

Focus points: The provided focus file says there is no additional review focus; no focus-specific issue was found.

Comment thread fe/fe-core/src/main/java/org/apache/doris/catalog/Env.java Outdated
@mymeiyi
Copy link
Copy Markdown
Contributor Author

mymeiyi commented May 15, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 30924 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a62b4c4f8bb2a6be32fe4398818ef818d1b797dc, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17759	3823	3810	3810
q2	q3	10743	1367	791	791
q4	4684	467	350	350
q5	7669	2255	2078	2078
q6	370	172	138	138
q7	925	793	634	634
q8	9384	1616	1583	1583
q9	6977	4927	4883	4883
q10	6466	2111	1835	1835
q11	429	279	245	245
q12	694	416	292	292
q13	18291	3382	2788	2788
q14	262	254	229	229
q15	q16	819	776	698	698
q17	927	864	873	864
q18	7118	5614	5564	5564
q19	1231	1331	1101	1101
q20	519	386	257	257
q21	5752	2517	2489	2489
q22	437	356	295	295
Total cold run time: 101456 ms
Total hot run time: 30924 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4176	4075	4085	4075
q2	q3	4510	4887	4375	4375
q4	2079	2175	1386	1386
q5	4410	4226	4248	4226
q6	228	175	129	129
q7	2163	1878	1662	1662
q8	2399	2087	2031	2031
q9	7813	7817	7669	7669
q10	4548	4483	4042	4042
q11	573	399	379	379
q12	828	783	510	510
q13	3248	3658	3044	3044
q14	299	290	264	264
q15	q16	740	739	655	655
q17	1328	1296	1247	1247
q18	7981	7312	6998	6998
q19	1131	1068	1091	1068
q20	2223	2207	1937	1937
q21	5257	4549	4447	4447
q22	524	473	422	422
Total cold run time: 56458 ms
Total hot run time: 50566 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169317 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a62b4c4f8bb2a6be32fe4398818ef818d1b797dc, data reload: false

query5	4315	652	516	516
query6	331	225	213	213
query7	4281	580	301	301
query8	335	235	221	221
query9	8816	3954	3945	3945
query10	453	329	293	293
query11	5753	2355	2182	2182
query12	176	125	120	120
query13	1305	591	416	416
query14	5928	5308	5029	5029
query14_1	4304	4313	4304	4304
query15	206	202	186	186
query16	987	436	429	429
query17	954	701	573	573
query18	2447	501	363	363
query19	229	211	181	181
query20	150	133	132	132
query21	220	137	123	123
query22	13569	13615	13401	13401
query23	17168	16412	16025	16025
query23_1	16181	16239	16147	16147
query24	7493	1745	1313	1313
query24_1	1297	1312	1328	1312
query25	596	502	444	444
query26	1312	331	169	169
query27	2700	549	335	335
query28	4496	1959	1935	1935
query29	984	670	517	517
query30	304	236	200	200
query31	1110	1060	937	937
query32	90	75	76	75
query33	562	358	308	308
query34	1170	1163	661	661
query35	760	798	671	671
query36	1328	1317	1185	1185
query37	169	118	101	101
query38	3240	3093	3053	3053
query39	926	929	886	886
query39_1	872	889	896	889
query40	239	158	142	142
query41	72	74	67	67
query42	112	115	111	111
query43	319	335	283	283
query44	
query45	212	206	196	196
query46	1075	1182	730	730
query47	2338	2336	2279	2279
query48	410	434	313	313
query49	647	514	394	394
query50	988	349	245	245
query51	4402	4346	4246	4246
query52	109	111	96	96
query53	270	296	212	212
query54	323	290	269	269
query55	100	94	99	94
query56	318	318	325	318
query57	1461	1422	1347	1347
query58	321	283	283	283
query59	1574	1613	1434	1434
query60	336	316	291	291
query61	159	151	152	151
query62	673	623	562	562
query63	238	205	200	200
query64	2384	788	627	627
query65	
query66	1730	472	359	359
query67	29983	29939	29146	29146
query68	
query69	442	338	304	304
query70	1057	934	962	934
query71	299	273	276	273
query72	2971	2695	2382	2382
query73	831	718	428	428
query74	5065	4906	4715	4715
query75	2689	2599	2268	2268
query76	2277	1127	788	788
query77	392	408	323	323
query78	12017	12056	11658	11658
query79	1452	1037	758	758
query80	1342	538	452	452
query81	506	280	244	244
query82	1334	155	121	121
query83	360	282	240	240
query84	309	141	111	111
query85	929	538	442	442
query86	445	333	332	332
query87	3416	3383	3180	3180
query88	3498	2608	2627	2608
query89	444	392	335	335
query90	1916	178	178	178
query91	177	165	134	134
query92	78	75	70	70
query93	1479	1524	896	896
query94	740	352	296	296
query95	674	379	434	379
query96	1061	776	353	353
query97	2691	2712	2541	2541
query98	237	231	227	227
query99	1113	1116	986	986
Total cold run time: 253578 ms
Total hot run time: 169317 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 21.43% (3/14) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 21.43% (3/14) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants