Skip to content

[fix](be) Reject super wildcard path in json keys#63300

Open
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:codex/reject-json-keys-super-wildcard
Open

[fix](be) Reject super wildcard path in json keys#63300
mrhhsg wants to merge 1 commit into
apache:masterfrom
mrhhsg:codex/reject-json-keys-super-wildcard

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 15, 2026

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: json_keys/jsonb_keys rejected ordinary wildcard paths but allowed super wildcard paths such as $**.a to fall through and return NULL. The function only supports reading keys from a single object, so super wildcard paths should fail with the same INVALID_JSON_PATH error as other wildcard paths.

Release note

Reject unsupported super wildcard JSON paths in json_keys/jsonb_keys instead of returning NULL.

Check List (For Author)

  • Test: Unit Test / Regression test / Static check
    • Unit Test: ./run-be-ut.sh --run --filter=FunctionJsonbTEST.JsonbKeysRejectSuperWildcardPath
    • Regression test: Added regression-test/suites/jsonb_p0/test_jsonb_keys_invalid_path.groovy (not run locally; no FE/BE cluster was listening on configured regression ports)
    • Static check: build-support/check-format.sh
    • Static check: build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (failed due to pre-existing function_jsonb.cpp complexity diagnostics and toolchain header/NOLINTEND errors)
  • Behavior changed: Yes (json_keys/jsonb_keys now return INVALID_JSON_PATH for $** paths instead of NULL)
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Fixes Jira DORIS-25570. json_keys/jsonb_keys rejected ordinary wildcard paths but allowed super wildcard paths such as $**.a to fall through and return NULL. The function only supports reading keys from a single object, so super wildcard paths should fail with the same INVALID_JSON_PATH error as other wildcard paths.

### Release note

Reject unsupported super wildcard JSON paths in json_keys/jsonb_keys instead of returning NULL.

### Check List (For Author)

- Test: Unit Test / Regression test / Static check
    - Unit Test: ./run-be-ut.sh --run --filter=FunctionJsonbTEST.JsonbKeysRejectSuperWildcardPath
    - Regression test: Added regression-test/suites/jsonb_p0/test_jsonb_keys_invalid_path.groovy (not run locally; no FE/BE cluster was listening on configured regression ports)
    - Static check: build-support/check-format.sh
    - Static check: build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (failed due to pre-existing function_jsonb.cpp complexity diagnostics and toolchain header/NOLINTEND errors)
- Behavior changed: Yes (json_keys/jsonb_keys now return INVALID_JSON_PATH for $** paths instead of NULL)
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 15, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review summary for PR 63300 at 7aabb3f.

No blocking issues found.

Critical checkpoint conclusions:

  • Goal and proof: The PR aims to reject unsupported ** JSON path tokens in json_keys/jsonb_keys. The implementation adds is_supper_wildcard() checks in both const-path and non-const-path branches, and adds BE unit/regression coverage for both paths.
  • Scope: The code change is small and focused on the existing wildcard rejection logic.
  • Concurrency: No new concurrency, shared state, locks, or lifecycle-sensitive objects are introduced.
  • Lifecycle/static initialization: No new static/global initialization dependency is introduced.
  • Configuration: No configuration item is added.
  • Compatibility/behavior: This is an intentional user-visible behavior change from returning NULL to returning INVALID_JSON_PATH for unsupported super-wildcard paths.
  • Parallel paths: jsonb_keys is an alias of json_keys, so the BE change covers both names through the shared implementation. Existing wildcard rejection in the related JSON modification path already checks is_supper_wildcard().
  • Conditional checks: The added condition directly extends the existing unsupported wildcard validation to the parser's super-wildcard flag.
  • Tests: A BE unit test covers const and non-const path execution, and a regression test covers SQL const and query-derived path expressions. I attempted ./run-be-ut.sh --run --filter=FunctionJsonbTEST.JsonbKeysRejectSuperWildcardPath, but it timed out during first-time contrib/apache-orc submodule cloning before executing the test.
  • Observability: No additional observability appears necessary for this validation-only change.
  • Transaction/persistence/data writes: Not applicable.
  • FE/BE variable passing: Not applicable.
  • Performance: The added checks are constant-time after path parsing and do not add meaningful overhead.
  • User focus points: No additional user-provided review focus was present.

Residual risk: The regression test was not executed in this review environment, and the targeted BE unit test could not be completed because dependency initialization exceeded the command timeout.

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 15, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31190 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7aabb3f3c4e3430165a0fa806ebf8a8521d96d4c, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17629	3895	3953	3895
q2	q3	10781	1394	829	829
q4	4686	471	354	354
q5	7604	2236	2143	2143
q6	235	177	138	138
q7	953	784	619	619
q8	9436	1778	1589	1589
q9	5134	4914	4847	4847
q10	6411	2083	1761	1761
q11	421	280	247	247
q12	633	428	295	295
q13	18136	3375	2738	2738
q14	264	250	234	234
q15	q16	821	797	704	704
q17	940	940	1025	940
q18	6946	5780	5699	5699
q19	1320	1251	1067	1067
q20	500	410	260	260
q21	6031	2543	2529	2529
q22	424	367	302	302
Total cold run time: 99305 ms
Total hot run time: 31190 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4171	4148	4176	4148
q2	q3	4510	4893	4277	4277
q4	2090	2216	1388	1388
q5	4356	4254	4254	4254
q6	225	177	132	132
q7	1764	2134	1702	1702
q8	2532	2139	2141	2139
q9	8011	7890	7737	7737
q10	4524	4427	4066	4066
q11	560	439	397	397
q12	744	761	518	518
q13	3405	3651	2979	2979
q14	281	305	278	278
q15	q16	707	722	637	637
q17	1327	1292	1282	1282
q18	7885	7381	7326	7326
q19	1191	1129	1122	1122
q20	2188	2208	1934	1934
q21	5256	4577	4393	4393
q22	519	466	407	407
Total cold run time: 56246 ms
Total hot run time: 51116 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169150 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7aabb3f3c4e3430165a0fa806ebf8a8521d96d4c, data reload: false

query5	4301	657	508	508
query6	340	216	196	196
query7	4258	607	322	322
query8	323	235	221	221
query9	8830	4013	4010	4010
query10	456	357	311	311
query11	5744	2408	2195	2195
query12	181	130	125	125
query13	1295	606	436	436
query14	5857	5327	4999	4999
query14_1	4315	4312	4366	4312
query15	204	201	177	177
query16	984	440	412	412
query17	948	711	593	593
query18	2448	491	370	370
query19	217	215	184	184
query20	133	136	134	134
query21	216	139	117	117
query22	13647	13474	13358	13358
query23	17196	16304	15996	15996
query23_1	16125	16140	16111	16111
query24	7493	1716	1309	1309
query24_1	1306	1315	1307	1307
query25	566	496	435	435
query26	1334	311	179	179
query27	2745	587	342	342
query28	4449	1977	1949	1949
query29	984	654	522	522
query30	312	235	200	200
query31	1125	1066	937	937
query32	96	76	75	75
query33	553	367	306	306
query34	1169	1146	637	637
query35	772	793	670	670
query36	1342	1381	1160	1160
query37	156	109	104	104
query38	3207	3128	3031	3031
query39	933	924	898	898
query39_1	878	903	871	871
query40	231	153	132	132
query41	71	69	70	69
query42	115	113	117	113
query43	328	329	286	286
query44	
query45	218	207	201	201
query46	1070	1182	738	738
query47	2319	2356	2171	2171
query48	418	449	307	307
query49	632	489	387	387
query50	995	350	259	259
query51	4275	4282	4242	4242
query52	106	107	95	95
query53	260	278	203	203
query54	320	284	245	245
query55	97	92	88	88
query56	291	295	298	295
query57	1398	1425	1302	1302
query58	309	270	267	267
query59	1564	1604	1405	1405
query60	317	312	296	296
query61	161	158	156	156
query62	656	623	556	556
query63	247	200	207	200
query64	2393	797	635	635
query65	
query66	1727	464	351	351
query67	29918	29314	29923	29314
query68	
query69	445	337	303	303
query70	952	966	962	962
query71	317	292	266	266
query72	3034	2732	2370	2370
query73	805	768	424	424
query74	5060	4890	4717	4717
query75	2661	2661	2285	2285
query76	2261	1127	768	768
query77	391	402	328	328
query78	12101	12049	11697	11697
query79	1325	1040	735	735
query80	644	536	438	438
query81	452	282	243	243
query82	1414	157	125	125
query83	360	273	243	243
query84	309	146	109	109
query85	876	538	482	482
query86	394	326	310	310
query87	3377	3350	3204	3204
query88	3513	2668	2679	2668
query89	434	379	334	334
query90	1935	183	179	179
query91	178	168	165	165
query92	79	77	73	73
query93	1475	1442	854	854
query94	531	348	317	317
query95	672	384	346	346
query96	1072	799	322	322
query97	2682	2693	2559	2559
query98	232	229	236	229
query99	1135	1105	992	992
Total cold run time: 251578 ms
Total hot run time: 169150 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.51% (27776/37787)
Line Coverage 57.48% (301012/523694)
Region Coverage 54.73% (251662/459823)
Branch Coverage 56.22% (108709/193350)

@mrhhsg mrhhsg marked this pull request as ready for review May 16, 2026 01:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants