Skip to content

[opt](memory) limit max block bytes per batch#61821

Open
sollhui wants to merge 2 commits intoapache:masterfrom
sollhui:limit-load-reader-max-block-bytes-200mb
Open

[opt](memory) limit max block bytes per batch#61821
sollhui wants to merge 2 commits intoapache:masterfrom
sollhui:limit-load-reader-max-block-bytes-200mb

Conversation

@sollhui
Copy link
Copy Markdown
Contributor

@sollhui sollhui commented Mar 27, 2026

Problem

During import/load, file format readers accumulate rows into a block until batch_size
(row count) is reached. When individual rows are large (e.g., wide schemas, complex nested
types), a single batch can exceed hundreds of MBs, causing excessive memory pressure.

Solution

Add a new BE config load_reader_max_block_bytes (default 200MB, set to 0 to disable)
and enforce it across all load file format readers:

Reader Mechanism
CsvReader Track cumulative line bytes in the read loop; break early when limit reached
NewJsonReader Check block->bytes() in the loop guard before reading each JSON object
ParquetReader After each next_batch(), adaptively reduce _batch_size based on observed bytes/row
OrcReader Same adaptive reduction + recreate _batch with the new size for subsequent reads

Config

// be.conf
load_reader_max_block_bytes=209715200 # 200MB (default)
load_reader_max_block_bytes=0 # disable limit

@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Mar 28, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 26918 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e0942707d32f9e9102aefb6725dd375bed3f5d42, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17592	4618	4321	4321
q2	q3	10643	825	534	534
q4	4735	358	249	249
q5	7758	1257	1029	1029
q6	180	173	149	149
q7	810	849	682	682
q8	9824	1501	1397	1397
q9	6017	4685	4700	4685
q10	6319	1930	1685	1685
q11	497	252	262	252
q12	750	595	464	464
q13	18056	2658	1974	1974
q14	232	240	214	214
q15	q16	727	726	676	676
q17	750	875	412	412
q18	6043	5406	5224	5224
q19	1116	975	639	639
q20	547	484	372	372
q21	4486	1904	1669	1669
q22	425	337	291	291
Total cold run time: 97507 ms
Total hot run time: 26918 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4702	4731	4576	4576
q2	q3	3902	4307	4006	4006
q4	877	1269	794	794
q5	4075	4414	4358	4358
q6	188	179	144	144
q7	1760	1661	1555	1555
q8	2486	2710	2594	2594
q9	7720	7291	7443	7291
q10	3860	4032	3636	3636
q11	520	446	415	415
q12	505	578	447	447
q13	2618	2947	2107	2107
q14	295	302	280	280
q15	q16	741	770	695	695
q17	1198	1433	1361	1361
q18	7165	6786	6891	6786
q19	903	908	926	908
q20	2064	2123	1986	1986
q21	4076	3549	3393	3393
q22	461	430	390	390
Total cold run time: 50116 ms
Total hot run time: 47722 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 169542 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e0942707d32f9e9102aefb6725dd375bed3f5d42, data reload: false

query5	4330	640	527	527
query6	341	229	201	201
query7	4211	466	275	275
query8	340	234	223	223
query9	8725	2730	2720	2720
query10	502	390	331	331
query11	7012	5084	4862	4862
query12	184	126	123	123
query13	1290	461	354	354
query14	5737	3700	3506	3506
query14_1	2803	2797	2815	2797
query15	203	194	174	174
query16	971	454	445	445
query17	924	716	590	590
query18	2432	438	343	343
query19	218	213	183	183
query20	126	122	126	122
query21	213	134	106	106
query22	13279	14796	14973	14796
query23	16990	16387	16101	16101
query23_1	16159	15820	15575	15575
query24	7200	1621	1215	1215
query24_1	1216	1217	1250	1217
query25	559	459	407	407
query26	1233	263	147	147
query27	2780	480	293	293
query28	4446	1829	1842	1829
query29	859	588	488	488
query30	296	228	191	191
query31	1011	947	877	877
query32	78	70	71	70
query33	501	329	280	280
query34	883	867	539	539
query35	642	716	571	571
query36	1103	1144	992	992
query37	147	95	81	81
query38	2961	2949	2899	2899
query39	848	816	816	816
query39_1	795	794	785	785
query40	226	148	132	132
query41	65	59	60	59
query42	255	254	252	252
query43	245	249	221	221
query44	
query45	203	187	183	183
query46	882	980	600	600
query47	2100	2504	2036	2036
query48	307	309	222	222
query49	643	454	387	387
query50	698	269	211	211
query51	4060	4120	4017	4017
query52	265	271	251	251
query53	291	334	283	283
query54	326	271	257	257
query55	91	89	83	83
query56	305	325	308	308
query57	1910	1699	1637	1637
query58	286	272	282	272
query59	2780	2893	2777	2777
query60	348	337	349	337
query61	148	148	152	148
query62	618	588	532	532
query63	312	279	271	271
query64	5166	1279	1072	1072
query65	
query66	1485	477	371	371
query67	24242	24296	24210	24210
query68	
query69	418	319	304	304
query70	982	998	981	981
query71	342	321	310	310
query72	2957	2759	2447	2447
query73	548	546	320	320
query74	9611	9558	9429	9429
query75	2864	2730	2448	2448
query76	2287	1034	679	679
query77	390	390	309	309
query78	10903	11144	10418	10418
query79	1121	770	569	569
query80	727	630	535	535
query81	500	263	231	231
query82	1332	154	123	123
query83	336	265	246	246
query84	297	122	94	94
query85	852	497	486	486
query86	402	313	292	292
query87	3214	3099	2988	2988
query88	3539	2672	2672	2672
query89	439	368	341	341
query90	1957	186	177	177
query91	175	179	145	145
query92	79	80	72	72
query93	909	854	498	498
query94	471	326	301	301
query95	596	345	321	321
query96	645	531	229	229
query97	2473	2443	2382	2382
query98	241	224	237	224
query99	1056	997	899	899
Total cold run time: 249220 ms
Total hot run time: 169542 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants