[feature](be) Support flexible variant partial update#63182
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
I reviewed the actual GitHub PR diff and did not find a blocking issue.\n\nCritical checkpoint conclusions:\n- Goal/test coverage: the PR enables flexible partial update for VARIANT columns and adds BE unit tests plus regression coverage for object patch merge, invalid patch values, doc-mode rejection, same-batch sequence behavior, and publish conflicts.\n- Scope/focus: the actual PR diff is focused on VARIANT flexible partial update support and related FE validation/tests.\n- Concurrency/lifecycle: publish-conflict handling remains in the existing MoW publish/read-plan flow; I did not find a new lock-order or lifecycle issue in the changed paths.\n- Compatibility/persistence: no new persisted metadata or FE-BE protocol fields were introduced in the actual PR diff.\n- Data correctness/MoW: the changed read-plan and merge paths account for old row delete signs, sequence rows, same-batch duplicate keys, and path-level VARIANT conflict markers; no concrete version/delete-bitmap correctness issue found.\n- Error handling/memory: new VARIANT merge errors propagate through Status in the reviewed write/publish paths; no unchecked Status or obvious untracked large allocation issue found.\n- Tests: coverage is relevant; I did not run the test suite in this review environment.\n- User focus: no additional user-provided review focus was supplied.
ae3ea35 to
c773c69
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found one blocking correctness issue in the flexible VARIANT partial update path. The main feature direction is covered by unit/regression tests for same-batch merges, publish conflicts, row-store/column-store, sequence columns, doc-mode rejection, and invalid patch values, but the new aggregation path is not safe for sliced RowsInBlock input.
Critical checkpoint conclusions:
- Goal/test: The PR aims to support JSON object patch semantics for VARIANT flexible partial update, and tests cover many core cases, but they do not cover
RowsInBlock.row_pos > 0. - Scope: The feature is mostly focused, but the no-sequence aggregation change broadens write-path behavior and needs the row-slice invariant fixed.
- Concurrency/lifecycle: No new shared-state locking or lifecycle issue found in the reviewed paths.
- Compatibility/config: No new config or wire/storage format compatibility issue found beyond reusing the hidden skip bitmap; marker handling is filtered from read plans.
- Parallel paths: Row-store/column-store and publish-conflict paths were reviewed; the sliced-block issue applies before both downstream paths.
- Tests: Missing coverage for flexible variant partial update when a segment writer receives a non-zero
row_posslice with duplicate keys. - Observability/performance: No separate blocking observability or obvious hot-path performance issue found.
- User focus: No additional user-provided review focus was present.
FE UT Coverage ReportIncrement line coverage |
c773c69 to
0553706
Compare
|
run buildall |
|
/review |
TPC-H: Total hot run time: 29465 ms |
TPC-H: Total hot run time: 29810 ms |
TPC-DS: Total hot run time: 171133 ms |
TPC-DS: Total hot run time: 172478 ms |
There was a problem hiding this comment.
Summary: I am requesting changes based on the existing inline thread at be/src/storage/segment/vertical_segment_writer.cpp:772 (#63182 (comment)). I did not add a duplicate inline comment because that thread already covers the concrete blocking issue: the new flexible VARIANT aggregation path processes rows from offset 0 while the segment writer API can pass a RowsInBlock slice with non-zero row_pos, which can aggregate/merge the wrong rows and leave later bitmap/delete-sign/key-column accesses inconsistent.
Critical checkpoint conclusions:
- Goal/test: The PR aims to support flexible partial updates for VARIANT object patches with same-batch merging and publish-conflict replay. The included unit and regression coverage exercises normal object patching, conflict replay, doc-mode rejection, null rejection, sequence cases, and row-store/non-row-store paths, but the row-slice case from the existing thread is not covered.
- Scope: The change is focused on VARIANT flexible partial update, but it touches multiple hot ingestion/write paths.
- Concurrency/lifecycle: No new thread/lifecycle ownership issue found in the reviewed paths; the main correctness risk is within segment write row slicing and downstream mutation of the block.
- Config/compatibility: No new config or persisted format version was added. The feature stores extra marker values in the existing skip bitmap; this needs the row-slice fix to be safe before relying on it.
- Parallel paths: JSON reader, memtable aggregation, vertical segment writer, read-plan fill, and publish conflict replay were checked. I did not find another distinct issue beyond the existing row_pos bug.
- Tests: Existing tests are broad for common and conflict paths, but missing non-zero row_pos coverage for flexible VARIANT partial update with duplicate keys, as already requested in the inline thread.
- Observability/performance: No separate blocking observability or performance issue found.
User focus: No additional user-provided review focus was specified.
TPC-H: Total hot run time: 29736 ms |
TPC-DS: Total hot run time: 171930 ms |
FE UT Coverage ReportIncrement line coverage |
0553706 to
ac082f9
Compare
|
run buildall |
|
/review |
There was a problem hiding this comment.
I found two correctness/validation gaps that should be addressed before this is merged. The main data-correctness concern is that nested empty-object VARIANT patches are accepted but lose their intended path, so stale subpaths can survive same-batch and publish-conflict merges. I also found that the new doc-mode restriction is not enforced when enabling UPDATE_FLEXIBLE_COLUMNS via ALTER TABLE, so unsupported tables can still be enabled and fail later at load time.
Critical checkpoint conclusions: goal is clear and partially achieved, with tests for ordinary object-path merges but missing empty-object patch coverage; the change is mostly focused but adds parallel validation paths that are not fully synchronized; no new unsafe locking/concurrency pattern was found in the reviewed code, while publish-conflict behavior is affected by the marker issue; no new config item or persistence format was added; FE/BE behavior is mostly aligned except the ALTER-enable validation gap; transaction/delete-bitmap paths still rely on existing MoW mechanisms, but path-marker completeness is required for correctness; observability is not a blocker; user focus file had no additional focus points.
TPC-H: Total hot run time: 30855 ms |
d73b6ab to
aaef4cc
Compare
|
run buildall |
|
/review |
TPC-DS: Total hot run time: 169372 ms |
TPC-DS: Total hot run time: 170246 ms |
TPC-DS: Total hot run time: 170497 ms |
TPC-DS: Total hot run time: 169833 ms |
There was a problem hiding this comment.
I found one blocking issue. The PR goal is to support flexible partial updates, including new VARIANT patch handling and routine-load validation/persistence. The implementation is mostly focused, but one table-level invariant is still missing at the feature-enable/validation boundary.
Critical checkpoint conclusions:
- Goal/test: The PR adds broad regression and unit coverage for flexible partial update and VARIANT paths, but cluster-key tables are not covered and can still be put into an unusable enabled state.
- Scope: The changes are generally focused on flexible partial update, routine-load metadata, BE marker handling, and validation.
- Concurrency: No new distinct concurrency issue found beyond existing review threads.
- Lifecycle/persistence: Routine-load persistence risks are already covered by existing threads; no additional distinct issue found.
- Config/compatibility: Existing review threads already cover BE marker compatibility/capability gating.
- Parallel paths: The table-level validator omits the same cluster-key restriction enforced by stream/routine load planning, causing a distinct enable-time gap.
- Tests: Missing negative coverage for enabling UPDATE_FLEXIBLE_COLUMNS on MoW unique tables with cluster keys.
- Observability/transactions/data writes/performance: No additional distinct blocker found beyond existing threads.
User focus: no additional user-provided review focus was specified.
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
### What problem does this PR solve? Issue Number: close apache#48122 Related PR: N/A Problem Summary: Allow MOW unique flexible partial update loads to patch VARIANT object paths while preserving undeclared paths, including duplicate-key batching, sequence semantics, routine-load replay, publish conflict replay, and rejected-row filtering for invalid VARIANT patch values. V1 accepts JSON object patches only and rejects unsupported VARIANT modes. Also reject explicit merge_type request shapes, WHERE, delete, and sequence semantics for flexible partial update across stream load and routine load so unsupported semantics cannot be accepted accidentally. Gate VARIANT flexible partial update on an explicit BE heartbeat capability so FE cannot enable marker-producing loads against old or actually dead backends. ### Release note MOW unique tables can use JSON flexible partial update loads to patch object fields inside supported VARIANT columns. Flexible partial update rejects explicit merge_type, WHERE, delete, and sequence options, and VARIANT flexible partial update requires each backend in the current cluster to be alive and advertise the required heartbeat capability. ### Check List (For Author) - Test: Unit Test and Regression test - ./build.sh --be --fe - ./build.sh --fe - ./run-fe-ut.sh --run org.apache.doris.catalog.OlapTableTest - ./run-fe-ut.sh --run org.apache.doris.nereids.load.NereidsStreamLoadPlannerTest - ./run-fe-ut.sh --run org.apache.doris.catalog.OlapTableTest,org.apache.doris.system.SystemInfoServiceTest - ./run-fe-ut.sh --run org.apache.doris.system.SystemInfoServiceTest - ./run-fe-ut.sh --run org.apache.doris.load.routineload.RoutineLoadJobTest,org.apache.doris.persist.AlterRoutineLoadOperationLogTest - ./run-fe-ut.sh --run org.apache.doris.load.routineload.RoutineLoadJobTest,org.apache.doris.load.routineload.KinesisRoutineLoadJobTest,org.apache.doris.persist.AlterRoutineLoadOperationLogTest,org.apache.doris.catalog.OlapTableTest - ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.AlterTableCommandTest,org.apache.doris.catalog.OlapTableTest,org.apache.doris.load.routineload.RoutineLoadJobTest,org.apache.doris.load.routineload.KinesisRoutineLoadJobTest,org.apache.doris.persist.AlterRoutineLoadOperationLogTest - ./run-be-ut.sh --run --filter='*VariantUtilTest*' - ./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d unique_with_mow_p0/flexible -s test_flexible_partial_update_variant - ./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d unique_with_mow_p0/flexible -s test_flexible_partial_update_restricts - ./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d unique_with_mow_p0/flexible -s test_flexible_partial_update_property - ./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d load_p0/routine_load -s test_routine_load_flexible_partial_update - ./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d load_p0/routine_load -s test_routine_load_flexible_partial_update_validate - git diff --check - git diff --cached --check - clang-format lint action script over modified C++ files - Behavior changed: Yes. JSON flexible partial update on MOW unique tables now merges declared VARIANT object paths and preserves undeclared paths. Flexible partial update now rejects explicit merge_type, WHERE, delete, and sequence clauses for stream/routine load request shapes. VARIANT flexible partial update now requires each backend in the current cluster to be alive and advertise the required heartbeat capability. - Does this need documentation: No
TPC-H: Total hot run time: 31388 ms |
aaef4cc to
c59e395
Compare
|
run buildall |
|
/review |
TPC-DS: Total hot run time: 169226 ms |
TPC-H: Total hot run time: 31121 ms |
FE UT Coverage ReportIncrement line coverage |
There was a problem hiding this comment.
I completed a full pass over the PR diff and the existing inline review context. I did not add new inline comments because the blocking issues I would rely on are already raised in existing threads and the instructions prohibit duplicating them.
Critical checkpoint conclusions:
- Goal/test proof: the PR aims to support flexible partial update for VARIANT and tighten FE load validation. There is substantial BE/FE/regression coverage, but the existing review threads still identify uncovered negative cases and compatibility gaps.
- Scope/clarity: the implementation is broad across FE validation, BE write/merge paths, heartbeat capability plumbing, and persistence. Several existing threads show places where the feature gate is still not consistently applied.
- Concurrency/lifecycle: existing routine-load ALTER locking/persistence concerns and schema-change revalidation paths were reviewed; no additional distinct concurrency issue was found in this pass.
- Configuration/dynamic behavior: no new config item issue found.
- Compatibility: existing threads still cover blocking rolling-upgrade/marker capability and legacy persistence risks; these remain the main blockers.
- Parallel paths: existing threads cover stream load vs routine load and RowsetBuilder vs DeltaWriterV2 path gaps; I did not find a new distinct omitted path beyond those.
- Tests/results: many tests were added, but existing review threads request additional negative/compatibility coverage for the blocking paths.
- Observability: no additional observability blocker found.
- Transaction/persistence/data correctness: the already-open threads include data correctness and persistence failures significant enough to block approval.
- User focus: no additional user-provided focus points were present.
Overall opinion: request changes based on the existing non-duplicate blocking inline threads; no additional inline comments are submitted in this review to avoid repeating already-known issues.
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Automated review result: request changes. I did not add duplicate inline comments because the existing review context already contains the blocking findings. The unresolved blockers are still material: flexible partial update can still be enabled or altered into table/job shapes that later cannot execute, routine-load descriptor persistence/replay paths still need complete validation, and VARIANT skip-bitmap marker compatibility must be gated by an explicit BE capability across all participating backends.
Critical checkpoint conclusions:
- Goal/test proof: the PR aims to support flexible partial update for VARIANT and tighten routine-load/stream-load validation, with substantial unit/regression coverage, but existing review threads show uncovered negative and compatibility cases remain.
- Scope/clarity: the change is large and touches FE validation, BE storage/write paths, heartbeat capability, and persistence; several paths remain parallel and easy to miss.
- Concurrency/lifecycle: routine-load ALTER serialization was improved, but existing threads still identify validation/replay paths that can leave persisted jobs in an invalid state; no new distinct concurrency issue found in this pass.
- Configuration/compatibility: the marker format changes persisted skip-bitmap semantics; the current gating is still insufficient per the existing compatibility threads and must be resolved before merge.
- Parallel code paths: stream load, routine load, load-v2, cloud/non-cloud rowset builders, and schema-change enable paths all need consistent validation; existing threads identify remaining mismatches.
- Test coverage/results: many tests were added, but existing review comments request additional negative/replay/compatibility coverage. I did not run tests in this review environment.
- Observability/performance: no additional observability issue found beyond the existing marker-size amplification thread, which appears addressed with a total-byte cap in current code.
- Transaction/persistence/data correctness: existing routine-load image/edit-log persistence and BE marker compatibility findings are blocking because they can cause accepted metadata to replay incorrectly or old BEs to misinterpret persisted rowset data.
User focus: no additional user-provided review focus was specified.
TPC-DS: Total hot run time: 169231 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 30779 ms |
TPC-DS: Total hot run time: 169020 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: close #48122
Related PR: N/A
Problem Summary: Allow MOW unique flexible partial update loads to patch VARIANT object paths while preserving undeclared paths, including duplicate-key batching and publish conflict replay. V1 accepts JSON object patches only and rejects root scalar/array/null values and doc-mode VARIANT columns.
Release note
MOW unique tables can use JSON flexible partial update loads to patch object fields inside VARIANT columns.
Check List (For Author)
build-support/clang-format.shgit diff --check 058d97897f611b0bf68965d073a5888aafba9805...HEAD./build.sh --be --fe./run-be-ut.sh --run --filter='VariantUtilTest.*VariantPatch*'./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d unique_with_mow_p0/flexible -s test_flexible_partial_update_variant./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d unique_with_mow_p0/flexible/publish -s test_flexible_partial_update_variant_publish_conflict./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d unique_with_mow_p0/flexible -s test_flexible_partial_update_restricts./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d load_p0/routine_load -s test_routine_load_flexible_partial_updatebuild-support/run-clang-tidy.sh --base 058d97897f611b0bf68965d073a5888aafba9805 --build-dir be/build_Release(blocked locally by existing/toolchain include issues:stddef.hnot found,types.hunmatchedNOLINTEND, and BE test include resolution under the Release compile database)