From b7402399a565dc0a3e621ecb4d61cfd48a4eb2fc Mon Sep 17 00:00:00 2001
From: lihangyu <lihangyu@selectdb.com>
Date: Wed, 13 May 2026 03:23:15 +0800
Subject: [PATCH] [feature](iceberg) Support reading Iceberg variant from
 Parquet

### What problem does this PR solve?

Issue Number: N/A

Related PR: #63192

Problem Summary: Doris could not read Iceberg v3 VARIANT columns from Parquet files. This change maps Iceberg VARIANT to Doris VARIANT, validates the Parquet VariantShredding wrapper shape, decodes metadata/value residual data, reads shredded typed_value columns, and prunes shredded Parquet leaves for accessed variant paths. The VARIANT reader and planner changes stay scoped to the Iceberg/Parquet VARIANT path instead of coupling generic nested-column code to Iceberg-only behavior. Typed-only shredded projections stay on native Parquet typed columns when residual value columns are not selected, with counter coverage to catch row-wise performance regressions. Selected residual or complex layouts still fall back to row-wise reconstruction. This also preserves VARIANT subpaths through casts, validates the actual Iceberg data-file format for VARIANT reads, rejects duplicate VariantShredding structural children, preserves null temporal typed leaves without reading their physical value, and keeps delete-only Iceberg MERGE projections from reading unused visible target data columns.

### Release note

Support reading Iceberg v3 VARIANT Parquet columns, including shredded typed_value column pruning and binary/UUID/primitive residual VARIANT values. Writing Iceberg VARIANT columns is rejected with an explicit unsupported error.

### Check List (For Author)

- Test: Regression test / Unit Test / Manual test

    - Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.DirectTypedOnlyReaderCountersUseNativePath:ParquetVariantReaderTest.VariantReaderCountersUseRowWiseWhenResidualValueSelected:ParquetVariantReaderTest.RowWisePreservesExplicitVariantNullShreddedArrayElement:ParquetVariantReaderTest.RowWiseRejectsMissingShreddedArrayElement' (4 tests passed)

    - Unit Test: ./run-be-ut.sh --run -f 'ParquetVariantReaderTest.RowWisePreservesNullComplexTypedArrayElement:ParquetVariantReaderTest.RowWiseRejectsMissingShreddedArrayElement' (2 tests passed)

    - Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.*' (85 tests passed on rerun; the first attempt failed before tests in OpenBLAS CMake getarch bootstrap)

    - Unit Test: ./run-be-ut.sh --run --filter='ParquetVariantReaderTest.*:NestedColumnAccessHelperTest.*' (127 tests passed)

    - Unit Test: ./run-be-ut.sh --run --filter='IcebergReaderCreateColumnIdsTest.*' (9 tests passed)

    - Unit Test: ./run-be-ut.sh --run --filter=ParquetVariantReaderTest.RejectVariantSchemaWithDuplicateStructuralChild:ParquetVariantReaderTest.DirectTypedOnlyPreservesTemporalLeafNull (2 tests passed; rerun after clang-format also passed)

    - Unit Test: ./run-be-ut.sh --run --filter=ParquetVariantReaderTest.DirectTypedOnlyReaderCountersUseNativePath (1 test passed after latest changes)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testVariantComparisonPredicateCollectsWholeVariantOperand (1 test passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest#testVariantCastProjectionKeepsSubPathWithSiblingPredicate (1 test passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.PruneNestedColumnTest (70 tests passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest#testExplodeSubqueryJoinAggAccessPaths (1 test passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.datasource.iceberg.source.IcebergScanNodeTest#testValidateVariantDataFileFormatRejectsOrcSplit (1 test passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.datasource.iceberg.source.IcebergScanNodeTest (6 tests passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.trees.plans.commands.IcebergMergeCommandTest#testDeleteProjectionDoesNotReadVisibleTargetColumns (1 test passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest (11 tests passed; Maven reactor succeeded)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.datasource.iceberg.IcebergUtilsTest (passed)

    - Unit Test: ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.SlotTypeReplacerTest (5 tests passed)

    - Regression test: performance regression coverage is included in regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy, including profile assertions that typed-only projections increment VariantDirectTypedValueReadRows and keep VariantRowWiseReadRows at 0. Not run locally in this worktree because no local Doris cluster/output BE+FE runtime is available.

    - Regression test: Added regression-test/suites/external_table_p0/iceberg/test_iceberg_variant_table_path.groovy to exercise the Iceberg REST catalog table path with nested VARIANT access and profile read-column assertions. Not run locally because Docker access to spark-iceberg is unavailable in this worktree.

    - Manual test: PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/clang-format.sh

    - Manual test: PATH=/mnt/disk6/common/ldb_toolchain_toucan/bin:$PATH build-support/check-format.sh

    - Manual test: git diff --check

    - Manual test: cd fe && mvn -pl fe-core checkstyle:check -DskipTests

    - Static analysis: CLANG_TIDY_BINARY=/tmp/clang-tidy-resource-filter build-support/run-clang-tidy.sh --build-dir be/ut_build_ASAN (passed for changed lines after adding the clang-tidy resource-dir and filtering a pre-existing be/src/core/types.h clang-tidy-nolint diagnostic; the unwrapped script was blocked by that existing header diagnostic)

- Behavior changed: Yes. Doris can read Iceberg v3 VARIANT Parquet columns, supports typed-only shredded projection pruning on native typed columns, reconstructs selected residual or complex layouts row-wise, rejects malformed VariantShredding schemas and missing present shredded array payloads, preserves null complex/temporal typed values and explicit Variant null array elements, forces root access for whole-VARIANT scalar/comparison consumers while preserving literal subpath pruning for typed reads, recursively rejects Iceberg VARIANT reads from non-Parquet data files during scan planning, avoids reading unused target data columns for delete-only Iceberg MERGE, and rejects Iceberg VARIANT data-file writes explicitly.

- Does this need documentation: No
---
 .../format/parquet/delta_bit_pack_decoder.h   |    4 +-
 .../format/parquet/parquet_column_convert.cpp |   76 +
 .../parquet/parquet_nested_column_utils.cpp   |  533 +++
 .../parquet/parquet_nested_column_utils.h     |   40 +
 .../format/parquet/parquet_variant_reader.cpp | 1161 +++++++
 .../format/parquet/parquet_variant_reader.h   |   38 +
 be/src/format/parquet/schema_desc.cpp         |  127 +-
 be/src/format/parquet/schema_desc.h           |   25 +
 .../parquet/vparquet_column_chunk_reader.cpp  |    8 +-
 .../parquet/vparquet_column_chunk_reader.h    |    3 +-
 .../format/parquet/vparquet_column_reader.cpp | 2078 +++++++++++-
 .../format/parquet/vparquet_column_reader.h   |  121 +-
 be/src/format/parquet/vparquet_reader.cpp     |  244 +-
 be/src/format/parquet/vparquet_reader.h       |   15 +-
 .../hive/hive_parquet_nested_column_utils.cpp |  144 +-
 .../hive/hive_parquet_nested_column_utils.h   |    5 +-
 be/src/format/table/hive_reader.cpp           |   54 +-
 be/src/format/table/hive_reader.h             |    4 +-
 .../table/iceberg/arrow_schema_util.cpp       |    3 +
 .../iceberg_parquet_nested_column_utils.cpp   |  146 +-
 .../iceberg_parquet_nested_column_utils.h     |    6 +-
 be/src/format/table/iceberg/types.cpp         |    2 +
 be/src/format/table/iceberg/types.h           |   10 +
 be/src/format/table/iceberg_reader.cpp        |    8 +-
 .../parquet/delta_byte_array_decoder_test.cpp |   90 +-
 be/test/format/parquet/parquet_expr_test.cpp  |   62 +
 .../parquet/parquet_variant_reader_test.cpp   | 2994 +++++++++++++++++
 .../hive_reader_create_column_ids_test.cpp    |   21 +-
 .../iceberg_reader_create_column_ids_test.cpp |  155 +-
 .../nested_column_access_helper_test.cpp      | 1113 ++++++
 .../connector/iceberg/IcebergTypeMapping.java |    4 +
 .../datasource/iceberg/IcebergUtils.java      |   50 +
 .../iceberg/source/IcebergScanNode.java       |   59 +
 .../translator/PhysicalPlanTranslator.java    |    3 +-
 ...rgMergeSinkToPhysicalIcebergMergeSink.java |    1 +
 .../AccessPathExpressionCollector.java        |  213 +-
 .../rewrite/AccessPathPlanCollector.java      |  108 +-
 .../rules/rewrite/NestedColumnPruning.java    |   32 +
 .../rules/rewrite/SlotTypeReplacer.java       |   19 +-
 .../plans/commands/IcebergMergeCommand.java   |   26 +-
 .../insert/IcebergInsertExecutor.java         |   14 +-
 .../logical/LogicalIcebergMergeSink.java      |   35 +-
 .../physical/PhysicalIcebergMergeSink.java    |   35 +-
 .../doris/planner/IcebergMergeSink.java       |   10 +
 .../doris/planner/IcebergTableSink.java       |    1 +
 .../datasource/iceberg/IcebergUtilsTest.java  |    6 +
 .../iceberg/source/IcebergScanNodeTest.java   |   84 +
 .../rules/rewrite/PruneNestedColumnTest.java  |  325 ++
 .../rules/rewrite/SlotTypeReplacerTest.java   |  210 ++
 .../rewrite/VariantPruningLogicTest.java      |   53 +-
 .../commands/IcebergMergeCommandTest.java     |   60 +
 .../doris/planner/IcebergMergeSinkTest.java   |   33 +
 .../doris/planner/IcebergTableSinkTest.java   |   89 +
 .../tvf/iceberg_variant_binary_typed.parquet  |  Bin 0 -> 743 bytes
 .../iceberg_variant_binary_unshredded.parquet |  Bin 0 -> 764 bytes
 .../tvf/iceberg_variant_shredded.parquet      |  Bin 0 -> 1865 bytes
 .../iceberg_variant_temporal_typed.parquet    |  Bin 0 -> 1348 bytes
 ...ceberg_variant_temporal_unshredded.parquet |  Bin 0 -> 917 bytes
 .../tvf/iceberg_variant_typed_only.parquet    |  Bin 0 -> 1724 bytes
 .../tvf/iceberg_variant_unshredded.parquet    |  Bin 0 -> 1561 bytes
 .../tvf/test_local_tvf_iceberg_variant.out    |   51 +
 .../test_iceberg_variant_table_path.groovy    |  137 +
 .../tvf/test_local_tvf_iceberg_variant.groovy |  448 +++
 63 files changed, 10874 insertions(+), 522 deletions(-)
 create mode 100644 be/src/format/parquet/parquet_nested_column_utils.cpp
 create mode 100644 be/src/format/parquet/parquet_nested_column_utils.h
 create mode 100644 be/src/format/parquet/parquet_variant_reader.cpp
 create mode 100644 be/src/format/parquet/parquet_variant_reader.h
 create mode 100644 be/test/format/parquet/parquet_variant_reader_test.cpp
 create mode 100644 be/test/format/table/nested_column_access_helper_test.cpp
 create mode 100644 fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/SlotTypeReplacerTest.java
 create mode 100644 fe/fe-core/src/test/java/org/apache/doris/planner/IcebergTableSinkTest.java
 create mode 100644 regression-test/data/external_table_p0/tvf/iceberg_variant_binary_typed.parquet
 create mode 100644 regression-test/data/external_table_p0/tvf/iceberg_variant_binary_unshredded.parquet
 create mode 100644 regression-test/data/external_table_p0/tvf/iceberg_variant_shredded.parquet
 create mode 100644 regression-test/data/external_table_p0/tvf/iceberg_variant_temporal_typed.parquet
 create mode 100644 regression-test/data/external_table_p0/tvf/iceberg_variant_temporal_unshredded.parquet
 create mode 100644 regression-test/data/external_table_p0/tvf/iceberg_variant_typed_only.parquet
 create mode 100644 regression-test/data/external_table_p0/tvf/iceberg_variant_unshredded.parquet
 create mode 100644 regression-test/data/external_table_p0/tvf/test_local_tvf_iceberg_variant.out
 create mode 100644 regression-test/suites/external_table_p0/iceberg/test_iceberg_variant_table_path.groovy
 create mode 100644 regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy

diff --git a/be/src/format/parquet/delta_bit_pack_decoder.h b/be/src/format/parquet/delta_bit_pack_decoder.h
index 52d45ea2297b33..d547909fafd7dc 100644
--- a/be/src/format/parquet/delta_bit_pack_decoder.h
+++ b/be/src/format/parquet/delta_bit_pack_decoder.h
@@ -31,6 +31,7 @@
 
 #include "common/status.h"
 #include "core/data_type/data_type.h"
+#include "core/data_type/data_type_nullable.h"
 #include "format/parquet/decoder.h"
 #include "format/parquet/fix_length_plain_decoder.h"
 #include "format/parquet/parquet_common.h"
@@ -329,7 +330,8 @@ class DeltaByteArrayDecoder : public DeltaDecoder {
         RETURN_IF_ERROR(_get_internal(_values.data(), cast_set<uint32_t>(num_values - null_count),
                                       &num_valid_values));
         DCHECK_EQ(num_values - null_count, num_valid_values);
-        if (doris_column->is_column_string()) {
+        if (doris_column->is_column_string() ||
+            remove_nullable(data_type)->get_primitive_type() == TYPE_VARBINARY) {
             return decode_byte_array<has_filter>(_values, doris_column, data_type, select_vector);
         } else {
             return decode_fixed_byte_array<has_filter>(_values, doris_column, data_type,
diff --git a/be/src/format/parquet/parquet_column_convert.cpp b/be/src/format/parquet/parquet_column_convert.cpp
index 940e95bd973306..981bd5b461acb0 100644
--- a/be/src/format/parquet/parquet_column_convert.cpp
+++ b/be/src/format/parquet/parquet_column_convert.cpp
@@ -29,6 +29,68 @@
 namespace doris::parquet {
 const cctz::time_zone ConvertParams::utc0 = cctz::utc_time_zone();
 
+namespace {
+
+struct TimeToMicroScale {
+    int64_t numerator;
+    int64_t denominator;
+};
+
+TimeToMicroScale time_unit_to_micro_scale(const tparquet::TimeUnit& time_unit) {
+    if (time_unit.__isset.MILLIS) {
+        return {1000, 1};
+    }
+    if (time_unit.__isset.MICROS) {
+        return {1, 1};
+    }
+    DCHECK(time_unit.__isset.NANOS);
+    return {1, 1000};
+}
+
+TimeToMicroScale parquet_time_to_micro_scale(const tparquet::SchemaElement& schema) {
+    if (schema.__isset.logicalType && schema.logicalType.__isset.TIME) {
+        return time_unit_to_micro_scale(schema.logicalType.TIME.unit);
+    }
+    DCHECK(schema.__isset.converted_type);
+    if (schema.converted_type == tparquet::ConvertedType::TIME_MILLIS) {
+        return {1000, 1};
+    }
+    DCHECK(schema.converted_type == tparquet::ConvertedType::TIME_MICROS);
+    return {1, 1};
+}
+
+template <PrimitiveType SrcPrimitiveType>
+class VariantIntToTimeV2 final : public PhysicalToLogicalConverter {
+public:
+    explicit VariantIntToTimeV2(TimeToMicroScale scale) : _scale(scale) {}
+
+    Status physical_convert(ColumnPtr& src_physical_col, ColumnPtr& src_logical_column) override {
+        using SrcColumnType = typename PrimitiveTypeTraits<SrcPrimitiveType>::ColumnType;
+        using TimeType = typename PrimitiveTypeTraits<TYPE_TIMEV2>::CppType;
+
+        ColumnPtr src_col = remove_nullable(src_physical_col);
+        MutableColumnPtr dst_col = remove_nullable(src_logical_column)->assume_mutable();
+
+        size_t rows = src_col->size();
+        size_t start_idx = dst_col->size();
+        dst_col->resize(start_idx + rows);
+
+        const auto& src_data = static_cast<const SrcColumnType*>(src_col.get())->get_data();
+        auto& data = static_cast<ColumnTimeV2*>(dst_col.get())->get_data();
+
+        for (int i = 0; i < rows; i++) {
+            data[start_idx + i] =
+                    static_cast<TimeType>(src_data[i] * _scale.numerator / _scale.denominator);
+        }
+        return Status::OK();
+    }
+
+private:
+    TimeToMicroScale _scale;
+};
+
+} // namespace
+
 #define FOR_LOGICAL_DECIMAL_TYPES(M) \
     M(TYPE_DECIMAL32)                \
     M(TYPE_DECIMAL64)                \
@@ -246,6 +308,20 @@ std::unique_ptr<PhysicalToLogicalConverter> PhysicalToLogicalConverter::get_conv
                               convert_params.get(), physical_converter);
     } else if (src_logical_primitive == TYPE_DATEV2) {
         physical_converter = std::make_unique<Int32ToDate>();
+    } else if (src_logical_primitive == TYPE_TIMEV2) {
+        if (!field_schema->is_in_variant) {
+            physical_converter =
+                    std::make_unique<UnsupportedConverter>(src_physical_type, src_logical_type);
+        } else if (src_physical_type == tparquet::Type::INT32) {
+            physical_converter = std::make_unique<VariantIntToTimeV2<TYPE_INT>>(
+                    parquet_time_to_micro_scale(parquet_schema));
+        } else if (src_physical_type == tparquet::Type::INT64) {
+            physical_converter = std::make_unique<VariantIntToTimeV2<TYPE_BIGINT>>(
+                    parquet_time_to_micro_scale(parquet_schema));
+        } else {
+            physical_converter =
+                    std::make_unique<UnsupportedConverter>(src_physical_type, src_logical_type);
+        }
     } else if (src_logical_primitive == TYPE_DATETIMEV2) {
         if (src_physical_type == tparquet::Type::INT96) {
             // int96 only stores nanoseconds in standard parquet file
diff --git a/be/src/format/parquet/parquet_nested_column_utils.cpp b/be/src/format/parquet/parquet_nested_column_utils.cpp
new file mode 100644
index 00000000000000..d43767da4bb1ef
--- /dev/null
+++ b/be/src/format/parquet/parquet_nested_column_utils.cpp
@@ -0,0 +1,533 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "format/parquet/parquet_nested_column_utils.h"
+
+#include <algorithm>
+#include <cctype>
+#include <string_view>
+#include <unordered_map>
+#include <utility>
+
+#include "core/data_type/data_type_nullable.h"
+#include "format/parquet/schema_desc.h"
+
+namespace doris {
+namespace {
+
+enum class NestedPathMode {
+    NAME,
+    FIELD_ID,
+};
+
+void add_column_id_range(const FieldSchema& field_schema, std::set<uint64_t>& column_ids) {
+    const uint64_t start_id = field_schema.get_column_id();
+    const uint64_t max_column_id = field_schema.get_max_column_id();
+    for (uint64_t id = start_id; id <= max_column_id; ++id) {
+        column_ids.insert(id);
+    }
+}
+
+const FieldSchema* find_child_by_structural_name(const FieldSchema& field_schema,
+                                                 std::string_view name) {
+    std::string lower_name(name);
+    std::transform(lower_name.begin(), lower_name.end(), lower_name.begin(),
+                   [](unsigned char c) { return static_cast<char>(std::tolower(c)); });
+    for (const auto& child : field_schema.children) {
+        if (child.name == name || child.lower_case_name == lower_name) {
+            return &child;
+        }
+    }
+    return nullptr;
+}
+
+const FieldSchema* find_child_by_exact_name(const FieldSchema& field_schema,
+                                            std::string_view name) {
+    for (const auto& child : field_schema.children) {
+        if (child.name == name) {
+            return &child;
+        }
+    }
+    return nullptr;
+}
+
+const FieldSchema* find_variant_typed_child_by_key(const FieldSchema& field_schema,
+                                                   std::string_view key) {
+    return find_child_by_exact_name(field_schema, key);
+}
+
+void add_variant_metadata(const FieldSchema& variant_field, std::set<uint64_t>& column_ids) {
+    if (const auto* metadata = find_child_by_structural_name(variant_field, "metadata")) {
+        add_column_id_range(*metadata, column_ids);
+    }
+}
+
+bool is_unannotated_variant_value_field(const FieldSchema& field) {
+    // VARIANT residual value is raw binary; annotated strings named value are user fields.
+    return field.lower_case_name == "value" && field.physical_type == tparquet::Type::BYTE_ARRAY &&
+           !field.parquet_schema.__isset.logicalType &&
+           !field.parquet_schema.__isset.converted_type;
+}
+
+const FieldSchema* find_variant_value_field(const FieldSchema& field_schema) {
+    for (const auto& child : field_schema.children) {
+        if (is_unannotated_variant_value_field(child)) {
+            return &child;
+        }
+    }
+    return nullptr;
+}
+
+void add_variant_value(const FieldSchema& variant_field, std::set<uint64_t>& column_ids) {
+    add_variant_metadata(variant_field, column_ids);
+    if (const auto* value = find_variant_value_field(variant_field)) {
+        add_column_id_range(*value, column_ids);
+    }
+}
+
+struct VariantColumnIdExtractionResult {
+    bool has_child_columns = false;
+    bool needs_metadata = false;
+};
+
+using VariantPathMap = std::unordered_map<std::string, std::vector<std::vector<std::string>>>;
+
+bool is_shredded_variant_field(const FieldSchema& field_schema) {
+    bool has_value = false;
+    const FieldSchema* typed_value = nullptr;
+    for (const auto& child : field_schema.children) {
+        if (child.lower_case_name == "value") {
+            if (!is_unannotated_variant_value_field(child)) {
+                return false;
+            }
+            has_value = true;
+            continue;
+        }
+        if (child.lower_case_name == "typed_value") {
+            typed_value = &child;
+            continue;
+        }
+        return false;
+    }
+    if (has_value) {
+        return true;
+    }
+    if (typed_value == nullptr) {
+        return false;
+    }
+    const auto type = remove_nullable(typed_value->data_type);
+    return type->get_primitive_type() == TYPE_STRUCT || type->get_primitive_type() == TYPE_ARRAY;
+}
+
+bool add_shredded_variant_field_value(const FieldSchema& shredded_field,
+                                      std::set<uint64_t>& column_ids) {
+    if (const auto* value = find_variant_value_field(shredded_field)) {
+        add_column_id_range(*value, column_ids);
+        return true;
+    }
+    return false;
+}
+
+bool is_variant_array_subscript(std::string_view path) {
+    return !path.empty() &&
+           std::all_of(path.begin(), path.end(), [](unsigned char c) { return std::isdigit(c); });
+}
+
+bool is_terminal_variant_meta_component(std::string_view path) {
+    return path == "NULL" || path == "OFFSET";
+}
+
+const std::vector<std::string>& effective_variant_path(const std::vector<std::string>& raw_path,
+                                                       std::vector<std::string>& stripped_path) {
+    if (!raw_path.empty() && is_terminal_variant_meta_component(raw_path.back())) {
+        stripped_path.assign(raw_path.begin(), raw_path.end() - 1);
+        return stripped_path;
+    }
+    return raw_path;
+}
+
+bool contains_inherited_metadata_value(const FieldSchema& field_schema) {
+    if (is_shredded_variant_field(field_schema) &&
+        find_variant_value_field(field_schema) != nullptr) {
+        return true;
+    }
+    return std::any_of(
+            field_schema.children.begin(), field_schema.children.end(),
+            [](const FieldSchema& child) { return contains_inherited_metadata_value(child); });
+}
+
+VariantColumnIdExtractionResult extract_variant_typed_nested_column_ids(
+        const FieldSchema& field_schema, const std::vector<std::vector<std::string>>& paths,
+        std::set<uint64_t>& column_ids, NestedPathMode mode);
+
+VariantColumnIdExtractionResult extract_typed_value_path(const FieldSchema& typed_value,
+                                                         const std::vector<std::string>& path,
+                                                         std::set<uint64_t>& column_ids,
+                                                         NestedPathMode mode) {
+    VariantColumnIdExtractionResult result;
+    const auto typed_value_type = remove_nullable(typed_value.data_type);
+    if (typed_value_type->get_primitive_type() != TYPE_STRUCT) {
+        result = extract_variant_typed_nested_column_ids(typed_value, {path}, column_ids, mode);
+    } else if (const auto* typed_child = find_variant_typed_child_by_key(typed_value, path[0])) {
+        if (path.size() == 1) {
+            add_column_id_range(*typed_child, column_ids);
+            result.has_child_columns = true;
+            result.needs_metadata = contains_inherited_metadata_value(*typed_child);
+        } else {
+            std::vector<std::vector<std::string>> child_paths {
+                    std::vector<std::string>(path.begin() + 1, path.end())};
+            result = extract_variant_typed_nested_column_ids(*typed_child, child_paths, column_ids,
+                                                             mode);
+        }
+    }
+
+    if (result.has_child_columns) {
+        column_ids.insert(typed_value.get_column_id());
+    }
+    return result;
+}
+
+void add_variant_typed_path(PrimitiveType field_type, const FieldSchema& field_schema,
+                            const std::vector<std::string>& path,
+                            VariantColumnIdExtractionResult* result, std::set<uint64_t>& column_ids,
+                            VariantPathMap* child_paths) {
+    if (path.empty()) {
+        add_column_id_range(field_schema, column_ids);
+        result->has_child_columns = true;
+        result->needs_metadata |= contains_inherited_metadata_value(field_schema);
+        return;
+    }
+
+    const bool is_list = field_type == PrimitiveType::TYPE_ARRAY;
+    const bool is_map = field_type == PrimitiveType::TYPE_MAP;
+    std::vector<std::string> remaining;
+    std::string child_key;
+    if (is_list) {
+        child_key = "*";
+        if (!is_variant_array_subscript(path[0])) {
+            remaining.assign(path.begin(), path.end());
+        } else if (path.size() > 1) {
+            remaining.assign(path.begin() + 1, path.end());
+        }
+    } else if (is_map) {
+        (*child_paths)["KEYS"].emplace_back();
+        child_key = "VALUES";
+        if (path.size() > 1) {
+            remaining.assign(path.begin() + 1, path.end());
+        }
+    } else {
+        child_key = path[0];
+        if (path.size() > 1) {
+            remaining.assign(path.begin() + 1, path.end());
+        }
+    }
+    (*child_paths)[child_key].push_back(std::move(remaining));
+}
+
+std::string variant_typed_child_key(PrimitiveType field_type, const FieldSchema& field_schema,
+                                    uint64_t child_index) {
+    if (field_type == PrimitiveType::TYPE_ARRAY) {
+        return "*";
+    }
+    if (field_type == PrimitiveType::TYPE_MAP) {
+        if (child_index == 0) {
+            return "KEYS";
+        }
+        return child_index == 1 ? "VALUES" : "";
+    }
+    return field_schema.children[child_index].name;
+}
+
+void append_variant_child_paths(const VariantPathMap& paths_by_name, const std::string& key,
+                                std::vector<std::vector<std::string>>& child_paths) {
+    auto child_paths_it = paths_by_name.find(key);
+    if (child_paths_it != paths_by_name.end()) {
+        child_paths.insert(child_paths.end(), child_paths_it->second.begin(),
+                           child_paths_it->second.end());
+    }
+}
+
+std::vector<std::vector<std::string>> collect_variant_typed_child_paths(
+        const VariantPathMap& paths_by_name, const std::string& child_key) {
+    std::vector<std::vector<std::string>> child_paths;
+    append_variant_child_paths(paths_by_name, child_key, child_paths);
+    return child_paths;
+}
+
+void extract_variant_typed_child_column_ids(
+        const FieldSchema& child, const std::vector<std::vector<std::string>>& child_paths,
+        std::set<uint64_t>& column_ids, NestedPathMode mode,
+        VariantColumnIdExtractionResult* result) {
+    const bool needs_full_child =
+            std::any_of(child_paths.begin(), child_paths.end(),
+                        [](const std::vector<std::string>& path) { return path.empty(); });
+    if (needs_full_child) {
+        add_column_id_range(child, column_ids);
+        result->has_child_columns = true;
+        result->needs_metadata |= contains_inherited_metadata_value(child);
+        return;
+    }
+
+    auto child_result =
+            extract_variant_typed_nested_column_ids(child, child_paths, column_ids, mode);
+    result->has_child_columns |= child_result.has_child_columns;
+    result->needs_metadata |= child_result.needs_metadata;
+}
+
+VariantColumnIdExtractionResult extract_shredded_variant_field_ids(
+        const FieldSchema& shredded_field, const std::vector<std::vector<std::string>>& paths,
+        std::set<uint64_t>& column_ids, NestedPathMode mode) {
+    const auto* typed_value = find_child_by_structural_name(shredded_field, "typed_value");
+    VariantColumnIdExtractionResult result;
+
+    for (const auto& raw_path : paths) {
+        std::vector<std::string> stripped_path;
+        const auto& path = effective_variant_path(raw_path, stripped_path);
+        if (path.empty()) {
+            add_column_id_range(shredded_field, column_ids);
+            result.has_child_columns = true;
+            result.needs_metadata |= contains_inherited_metadata_value(shredded_field);
+            continue;
+        }
+
+        VariantColumnIdExtractionResult typed_result;
+        if (typed_value != nullptr) {
+            typed_result = extract_typed_value_path(*typed_value, path, column_ids, mode);
+            result.needs_metadata |= typed_result.needs_metadata;
+        }
+        const bool has_residual_value =
+                add_shredded_variant_field_value(shredded_field, column_ids);
+        if (has_residual_value) {
+            result.needs_metadata = true;
+        }
+        if (!typed_result.has_child_columns) {
+            result.has_child_columns |= has_residual_value;
+            continue;
+        }
+        result.has_child_columns = true;
+    }
+
+    if (result.has_child_columns) {
+        column_ids.insert(shredded_field.get_column_id());
+    }
+    return result;
+}
+
+VariantColumnIdExtractionResult extract_variant_nested_column_ids(
+        const FieldSchema& variant_field, const std::vector<std::vector<std::string>>& paths,
+        std::set<uint64_t>& column_ids, NestedPathMode mode) {
+    const auto* typed_value = find_child_by_structural_name(variant_field, "typed_value");
+    VariantColumnIdExtractionResult result;
+
+    for (const auto& raw_path : paths) {
+        std::vector<std::string> stripped_path;
+        const auto& path = effective_variant_path(raw_path, stripped_path);
+        if (path.empty()) {
+            add_column_id_range(variant_field, column_ids);
+            result.has_child_columns = true;
+            continue;
+        }
+
+        VariantColumnIdExtractionResult typed_result;
+        if (typed_value != nullptr) {
+            typed_result = extract_typed_value_path(*typed_value, path, column_ids, mode);
+            if (typed_result.needs_metadata) {
+                add_variant_metadata(variant_field, column_ids);
+            }
+        }
+
+        if (!typed_result.has_child_columns) {
+            add_variant_value(variant_field, column_ids);
+        }
+        result.has_child_columns = true;
+    }
+
+    if (result.has_child_columns) {
+        column_ids.insert(variant_field.get_column_id());
+    }
+    return result;
+}
+
+VariantColumnIdExtractionResult extract_variant_typed_nested_column_ids(
+        const FieldSchema& field_schema, const std::vector<std::vector<std::string>>& paths,
+        std::set<uint64_t>& column_ids, NestedPathMode mode) {
+    if (remove_nullable(field_schema.data_type)->get_primitive_type() ==
+        PrimitiveType::TYPE_VARIANT) {
+        return extract_variant_nested_column_ids(field_schema, paths, column_ids, mode);
+    }
+    if (is_shredded_variant_field(field_schema)) {
+        return extract_shredded_variant_field_ids(field_schema, paths, column_ids, mode);
+    }
+
+    VariantColumnIdExtractionResult result;
+    VariantPathMap child_paths_by_name;
+    const auto field_type = remove_nullable(field_schema.data_type)->get_primitive_type();
+    for (const auto& path : paths) {
+        add_variant_typed_path(field_type, field_schema, path, &result, column_ids,
+                               &child_paths_by_name);
+    }
+
+    for (uint64_t i = 0; i < field_schema.children.size(); ++i) {
+        const auto& child = field_schema.children[i];
+        const std::string child_key = variant_typed_child_key(field_type, field_schema, i);
+        auto child_paths = collect_variant_typed_child_paths(child_paths_by_name, child_key);
+        if (child_paths.empty()) {
+            continue;
+        }
+        extract_variant_typed_child_column_ids(child, child_paths, column_ids, mode, &result);
+    }
+
+    if (result.has_child_columns) {
+        column_ids.insert(field_schema.get_column_id());
+    }
+    return result;
+}
+
+void normalize_map_wildcard(
+        std::unordered_map<std::string, std::vector<std::vector<std::string>>>& child_paths) {
+    auto wildcard_it = child_paths.find("*");
+    if (wildcard_it == child_paths.end()) {
+        return;
+    }
+
+    auto wildcard_paths = std::move(wildcard_it->second);
+    child_paths.erase(wildcard_it);
+    auto& values_paths = child_paths["VALUES"];
+    values_paths.insert(values_paths.end(), wildcard_paths.begin(), wildcard_paths.end());
+    child_paths["KEYS"].emplace_back();
+}
+
+std::string get_nested_child_key(const FieldSchema& field_schema, uint64_t child_index,
+                                 NestedPathMode mode) {
+    const auto field_type = remove_nullable(field_schema.data_type)->get_primitive_type();
+    if (field_type == PrimitiveType::TYPE_ARRAY) {
+        return "*";
+    }
+    if (field_type == PrimitiveType::TYPE_MAP) {
+        if (child_index == 0) {
+            return "KEYS";
+        }
+        return child_index == 1 ? "VALUES" : "";
+    }
+
+    const auto& child = field_schema.children[child_index];
+    if (mode == NestedPathMode::NAME) {
+        return child.lower_case_name;
+    }
+    return std::to_string(child.field_id);
+}
+
+bool should_skip_nested_child_key(std::string_view child_key, NestedPathMode mode) {
+    return child_key.empty() || (mode == NestedPathMode::FIELD_ID && child_key == "-1");
+}
+
+void extract_nested_column_ids_impl(const FieldSchema& field_schema,
+                                    const std::vector<std::vector<std::string>>& paths,
+                                    std::set<uint64_t>& column_ids, NestedPathMode mode) {
+    const auto field_type = remove_nullable(field_schema.data_type)->get_primitive_type();
+    if (field_type == PrimitiveType::TYPE_VARIANT) {
+        static_cast<void>(extract_variant_nested_column_ids(field_schema, paths, column_ids, mode));
+        return;
+    }
+
+    std::unordered_map<std::string, std::vector<std::vector<std::string>>> child_paths_by_key;
+    for (const auto& path : paths) {
+        if (path.empty()) {
+            continue;
+        }
+        std::vector<std::string> remaining;
+        if (path.size() > 1) {
+            remaining.assign(path.begin() + 1, path.end());
+        }
+        child_paths_by_key[path[0]].push_back(std::move(remaining));
+    }
+
+    if (field_type == PrimitiveType::TYPE_MAP) {
+        normalize_map_wildcard(child_paths_by_key);
+    }
+
+    bool has_child_columns = false;
+    if (field_type == PrimitiveType::TYPE_ARRAY &&
+        child_paths_by_key.find("OFFSET") != child_paths_by_key.end()) {
+        has_child_columns = true;
+    }
+    for (uint64_t i = 0; i < field_schema.children.size(); ++i) {
+        const auto& child = field_schema.children[i];
+        const std::string child_key = get_nested_child_key(field_schema, i, mode);
+        if (should_skip_nested_child_key(child_key, mode)) {
+            continue;
+        }
+
+        if (field_type == PrimitiveType::TYPE_MAP && i == 0) {
+            const bool has_keys_access =
+                    child_paths_by_key.find("KEYS") != child_paths_by_key.end();
+            const bool has_values_access =
+                    child_paths_by_key.find("VALUES") != child_paths_by_key.end();
+            const bool has_offset_access =
+                    child_paths_by_key.find("OFFSET") != child_paths_by_key.end();
+            const bool has_null_access =
+                    child_paths_by_key.find("NULL") != child_paths_by_key.end();
+            if (!has_keys_access && (has_values_access || has_offset_access || has_null_access)) {
+                add_column_id_range(child, column_ids);
+                has_child_columns = true;
+                continue;
+            }
+        }
+
+        auto child_paths_it = child_paths_by_key.find(child_key);
+        if (child_paths_it == child_paths_by_key.end()) {
+            continue;
+        }
+
+        const auto& child_paths = child_paths_it->second;
+        const bool needs_full_child =
+                std::any_of(child_paths.begin(), child_paths.end(),
+                            [](const std::vector<std::string>& path) { return path.empty(); });
+
+        if (needs_full_child) {
+            add_column_id_range(child, column_ids);
+            has_child_columns = true;
+            continue;
+        }
+
+        const size_t before_size = column_ids.size();
+        extract_nested_column_ids_impl(child, child_paths, column_ids, mode);
+        if (column_ids.size() > before_size) {
+            has_child_columns = true;
+        }
+    }
+
+    if (has_child_columns) {
+        column_ids.insert(field_schema.get_column_id());
+    }
+}
+
+} // namespace
+
+void ParquetNestedColumnUtils::extract_nested_column_ids_by_name(
+        const FieldSchema& field_schema, const std::vector<std::vector<std::string>>& paths,
+        std::set<uint64_t>& column_ids) {
+    extract_nested_column_ids_impl(field_schema, paths, column_ids, NestedPathMode::NAME);
+}
+
+void ParquetNestedColumnUtils::extract_nested_column_ids_by_field_id(
+        const FieldSchema& field_schema, const std::vector<std::vector<std::string>>& paths,
+        std::set<uint64_t>& column_ids) {
+    extract_nested_column_ids_impl(field_schema, paths, column_ids, NestedPathMode::FIELD_ID);
+}
+
+} // namespace doris
diff --git a/be/src/format/parquet/parquet_nested_column_utils.h b/be/src/format/parquet/parquet_nested_column_utils.h
new file mode 100644
index 00000000000000..181fba58faee7b
--- /dev/null
+++ b/be/src/format/parquet/parquet_nested_column_utils.h
@@ -0,0 +1,40 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <cstdint>
+#include <set>
+#include <string>
+#include <vector>
+
+namespace doris {
+
+struct FieldSchema;
+
+class ParquetNestedColumnUtils {
+public:
+    static void extract_nested_column_ids_by_name(
+            const FieldSchema& field_schema, const std::vector<std::vector<std::string>>& paths,
+            std::set<uint64_t>& column_ids);
+
+    static void extract_nested_column_ids_by_field_id(
+            const FieldSchema& field_schema, const std::vector<std::vector<std::string>>& paths,
+            std::set<uint64_t>& column_ids);
+};
+
+} // namespace doris
diff --git a/be/src/format/parquet/parquet_variant_reader.cpp b/be/src/format/parquet/parquet_variant_reader.cpp
new file mode 100644
index 00000000000000..8d63065a13a92e
--- /dev/null
+++ b/be/src/format/parquet/parquet_variant_reader.cpp
@@ -0,0 +1,1161 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "format/parquet/parquet_variant_reader.h"
+
+#include <algorithm>
+#include <cstring>
+#include <deque>
+#include <iomanip>
+#include <limits>
+#include <sstream>
+#include <string_view>
+#include <vector>
+
+#include "core/column/column_variant.h"
+#include "core/data_type/data_type_decimal.h"
+#include "core/value/jsonb_value.h"
+#include "exec/common/variant_util.h"
+
+namespace doris::parquet {
+
+std::string format_variant_uuid(const uint8_t* ptr) {
+    static constexpr char hex[] = "0123456789abcdef";
+    std::string uuid;
+    uuid.reserve(36);
+    for (int i = 0; i < 16; ++i) {
+        if (i == 4 || i == 6 || i == 8 || i == 10) {
+            uuid.push_back('-');
+        }
+        uuid.push_back(hex[ptr[i] >> 4]);
+        uuid.push_back(hex[ptr[i] & 0x0f]);
+    }
+    return uuid;
+}
+
+namespace {
+
+struct VariantMetadata {
+    std::vector<std::string> dictionary;
+};
+
+struct VariantObjectLayout {
+    std::vector<uint64_t> field_ids;
+    std::vector<uint64_t> field_offsets;
+    std::vector<uint64_t> field_ends;
+    const uint8_t* fields = nullptr;
+    uint64_t total_size = 0;
+};
+
+struct VariantArrayLayout {
+    std::vector<uint64_t> field_offsets;
+    const uint8_t* fields = nullptr;
+    uint64_t total_size = 0;
+};
+
+uint64_t read_unsigned_le(const uint8_t* ptr, int size) {
+    uint64_t value = 0;
+    for (int i = 0; i < size; ++i) {
+        value |= static_cast<uint64_t>(ptr[i]) << (i * 8);
+    }
+    return value;
+}
+
+int64_t read_signed_le(const uint8_t* ptr, int size) {
+    uint64_t value = read_unsigned_le(ptr, size);
+    if (size < 8) {
+        uint64_t sign_bit = uint64_t {1} << (size * 8 - 1);
+        if ((value & sign_bit) != 0) {
+            uint64_t mask = ~((uint64_t {1} << (size * 8)) - 1);
+            value |= mask;
+        }
+    }
+    return static_cast<int64_t>(value);
+}
+
+__int128 read_signed_int128_le(const uint8_t* ptr) {
+    unsigned __int128 unsigned_value = 0;
+    for (int i = 15; i >= 0; --i) {
+        unsigned_value <<= 8;
+        unsigned_value |= ptr[i];
+    }
+    static constexpr unsigned __int128 sign_bit = static_cast<unsigned __int128>(1) << 127;
+    if ((unsigned_value & sign_bit) == 0) {
+        return static_cast<__int128>(unsigned_value);
+    }
+    static constexpr __int128 signed_half_range = static_cast<__int128>(1) << 126;
+    return (static_cast<__int128>(unsigned_value & (sign_bit - 1)) - signed_half_range) -
+           signed_half_range;
+}
+
+Status require_available(const uint8_t* ptr, const uint8_t* end, size_t size,
+                         std::string_view context) {
+    if (ptr > end) {
+        return Status::Corruption("Invalid Parquet VARIANT {} encoding", context);
+    }
+    if (size > static_cast<size_t>(end - ptr)) {
+        return Status::Corruption("Invalid Parquet VARIANT {} encoding", context);
+    }
+    return Status::OK();
+}
+
+Status require_available_entries(const uint8_t* ptr, const uint8_t* end, uint64_t entries,
+                                 size_t entry_size, std::string_view context) {
+    if (entries > std::numeric_limits<size_t>::max() / entry_size) {
+        return Status::Corruption("Invalid Parquet VARIANT {} encoding", context);
+    }
+    return require_available(ptr, end, static_cast<size_t>(entries) * entry_size, context);
+}
+
+bool variant_string_less(std::string_view lhs, std::string_view rhs) {
+    return std::lexicographical_compare(
+            lhs.begin(), lhs.end(), rhs.begin(), rhs.end(), [](char left, char right) {
+                return static_cast<unsigned char>(left) < static_cast<unsigned char>(right);
+            });
+}
+
+bool is_valid_utf8(std::string_view value) {
+    const auto* data = reinterpret_cast<const uint8_t*>(value.data());
+    const auto* end = data + value.size();
+    while (data < end) {
+        const uint8_t first = *data++;
+        if (first <= 0x7f) {
+            continue;
+        }
+
+        uint32_t code_point = 0;
+        size_t continuation_bytes = 0;
+        if (first >= 0xc2 && first <= 0xdf) {
+            code_point = first & 0x1f;
+            continuation_bytes = 1;
+        } else if (first >= 0xe0 && first <= 0xef) {
+            code_point = first & 0x0f;
+            continuation_bytes = 2;
+        } else if (first >= 0xf0 && first <= 0xf4) {
+            code_point = first & 0x07;
+            continuation_bytes = 3;
+        } else {
+            return false;
+        }
+
+        if (static_cast<size_t>(end - data) < continuation_bytes) {
+            return false;
+        }
+        for (size_t i = 0; i < continuation_bytes; ++i) {
+            const uint8_t byte = *data++;
+            if ((byte & 0xc0) != 0x80) {
+                return false;
+            }
+            code_point = (code_point << 6) | (byte & 0x3f);
+        }
+
+        if ((continuation_bytes == 2 && code_point < 0x800) ||
+            (continuation_bytes == 3 && code_point < 0x10000) ||
+            (code_point >= 0xd800 && code_point <= 0xdfff) || code_point > 0x10ffff) {
+            return false;
+        }
+    }
+    return true;
+}
+
+Status require_valid_utf8(std::string_view value, std::string_view context) {
+    if (!is_valid_utf8(value)) {
+        return Status::Corruption("Invalid Parquet VARIANT {} UTF-8 string", context);
+    }
+    return Status::OK();
+}
+
+Status validate_array_field_offsets(const std::vector<uint64_t>& field_offsets, uint64_t total_size,
+                                    std::string_view context) {
+    if (field_offsets.empty() || field_offsets.front() != 0) {
+        return Status::Corruption("Invalid Parquet VARIANT {} field offsets", context);
+    }
+    for (size_t i = 0; i < field_offsets.size(); ++i) {
+        if (field_offsets[i] > total_size) {
+            return Status::Corruption("Invalid Parquet VARIANT {} field offset {}", context,
+                                      field_offsets[i]);
+        }
+        if (i > 0 && field_offsets[i] < field_offsets[i - 1]) {
+            return Status::Corruption("Invalid Parquet VARIANT {} field offsets", context);
+        }
+    }
+    return Status::OK();
+}
+
+Status compute_object_field_ends(const std::vector<uint64_t>& field_offsets, uint64_t total_size,
+                                 std::vector<uint64_t>* field_ends) {
+    if (field_offsets.empty()) {
+        return Status::Corruption("Invalid Parquet VARIANT object field offsets");
+    }
+    size_t num_elements = field_offsets.size() - 1;
+    if (num_elements == 0) {
+        if (total_size != 0) {
+            return Status::Corruption("Invalid Parquet VARIANT object field offsets");
+        }
+        return Status::OK();
+    }
+
+    std::vector<std::pair<uint64_t, size_t>> physical_offsets;
+    physical_offsets.reserve(num_elements);
+    for (size_t i = 0; i < num_elements; ++i) {
+        if (field_offsets[i] >= total_size) {
+            return Status::Corruption("Invalid Parquet VARIANT object field offset {}",
+                                      field_offsets[i]);
+        }
+        physical_offsets.emplace_back(field_offsets[i], i);
+    }
+    std::sort(physical_offsets.begin(), physical_offsets.end());
+    if (physical_offsets.front().first != 0) {
+        return Status::Corruption("Invalid Parquet VARIANT object field offsets");
+    }
+
+    field_ends->assign(num_elements, 0);
+    for (size_t i = 0; i < physical_offsets.size(); ++i) {
+        if (i > 0 && physical_offsets[i].first == physical_offsets[i - 1].first) {
+            return Status::Corruption("Invalid Parquet VARIANT object field offsets");
+        }
+        uint64_t child_end =
+                i + 1 < physical_offsets.size() ? physical_offsets[i + 1].first : total_size;
+        (*field_ends)[physical_offsets[i].second] = child_end;
+    }
+    return Status::OK();
+}
+
+void append_json_string(std::string_view value, std::string* json, bool escape_non_ascii = false) {
+    json->push_back('"');
+    static constexpr char hex[] = "0123456789abcdef";
+    for (unsigned char c : value) {
+        switch (c) {
+        case '"':
+            json->append("\\\"");
+            break;
+        case '\\':
+            json->append("\\\\");
+            break;
+        case '\b':
+            json->append("\\b");
+            break;
+        case '\f':
+            json->append("\\f");
+            break;
+        case '\n':
+            json->append("\\n");
+            break;
+        case '\r':
+            json->append("\\r");
+            break;
+        case '\t':
+            json->append("\\t");
+            break;
+        default:
+            if (c < 0x20 || (escape_non_ascii && c >= 0x80)) {
+                json->append("\\u00");
+                json->push_back(hex[c >> 4]);
+                json->push_back(hex[c & 0x0f]);
+            } else {
+                json->push_back(static_cast<char>(c));
+            }
+            break;
+        }
+    }
+    json->push_back('"');
+}
+
+template <typename T>
+Status append_floating_json(T value, std::string* json) {
+    std::ostringstream oss;
+    oss << std::setprecision(std::numeric_limits<T>::max_digits10) << value;
+    json->append(oss.str());
+    return Status::OK();
+}
+
+std::string int128_to_string(__int128 value) {
+    if (value == 0) {
+        return "0";
+    }
+    bool negative = value < 0;
+    unsigned __int128 unsigned_value = negative ? static_cast<unsigned __int128>(-(value + 1)) + 1
+                                                : static_cast<unsigned __int128>(value);
+    std::string digits;
+    while (unsigned_value > 0) {
+        digits.push_back(static_cast<char>('0' + unsigned_value % 10));
+        unsigned_value /= 10;
+    }
+    if (negative) {
+        digits.push_back('-');
+    }
+    std::reverse(digits.begin(), digits.end());
+    return digits;
+}
+
+void append_decimal_json(__int128 unscaled, int scale, std::string* json) {
+    std::string value = int128_to_string(unscaled);
+    bool negative = !value.empty() && value[0] == '-';
+    std::string digits = negative ? value.substr(1) : value;
+    if (scale == 0) {
+        json->append(value);
+        return;
+    }
+    if (scale > 0) {
+        if (digits.size() <= static_cast<size_t>(scale)) {
+            digits.insert(0, static_cast<size_t>(scale) + 1 - digits.size(), '0');
+        }
+        digits.insert(digits.end() - scale, '.');
+        if (negative) {
+            json->push_back('-');
+        }
+        json->append(digits);
+        return;
+    }
+    if (negative) {
+        json->push_back('-');
+    }
+    json->append(digits);
+    json->append(static_cast<size_t>(-scale), '0');
+}
+
+Status decode_primitive(uint8_t primitive_header, const uint8_t* ptr, const uint8_t* end,
+                        std::string* json, const uint8_t** next);
+Status decode_value(const uint8_t* ptr, const uint8_t* end, const VariantMetadata& metadata,
+                    std::string* json, const uint8_t** next);
+
+void append_uuid_json(const uint8_t* ptr, std::string* json) {
+    json->push_back('"');
+    json->append(format_variant_uuid(ptr));
+    json->push_back('"');
+}
+
+Status make_jsonb_field(std::string_view json, FieldWithDataType* value) {
+    JsonBinaryValue jsonb_value;
+    RETURN_IF_ERROR(jsonb_value.from_json_string(json.data(), json.size()));
+    value->field =
+            Field::create_field<TYPE_JSONB>(JsonbField(jsonb_value.value(), jsonb_value.size()));
+    value->base_scalar_type_id = TYPE_JSONB;
+    value->num_dimensions = 0;
+    value->precision = 0;
+    value->scale = 0;
+    return Status::OK();
+}
+
+std::string make_null_array_json(size_t elements) {
+    std::string json = "[";
+    for (size_t i = 0; i < elements; ++i) {
+        if (i != 0) {
+            json.push_back(',');
+        }
+        json.append("null");
+    }
+    json.push_back(']');
+    return json;
+}
+
+Status insert_empty_object_marker(const PathInData& path, VariantMap* values) {
+    FieldWithDataType value;
+    RETURN_IF_ERROR(make_jsonb_field("{}", &value));
+    (*values)[path] = std::move(value);
+    return Status::OK();
+}
+
+Status parse_json_to_variant_map(std::string_view json, const PathInData& prefix,
+                                 VariantMap* values) {
+    auto parsed_column = ColumnVariant::create(0, false);
+    ParseConfig parse_config;
+    StringRef json_ref(json.data(), json.size());
+    RETURN_IF_CATCH_EXCEPTION(
+            variant_util::parse_json_to_variant(*parsed_column, json_ref, nullptr, parse_config));
+    Field parsed = (*parsed_column)[0];
+    if (parsed.is_null()) {
+        (*values)[prefix] = FieldWithDataType {.field = Field()};
+        return Status::OK();
+    }
+
+    PathInDataBuilder path;
+    path.append(prefix.get_parts(), false);
+    for (auto& [parsed_path, value] : parsed.get<TYPE_VARIANT>()) {
+        path.append(parsed_path.get_parts(), false);
+        (*values)[path.build()] = std::move(value);
+        for (size_t i = 0; i < parsed_path.get_parts().size(); ++i) {
+            path.pop_back();
+        }
+    }
+    return Status::OK();
+}
+
+void fill_field_type_info(FieldWithDataType* value) {
+    FieldInfo info;
+    variant_util::get_field_info(value->field, &info);
+    value->base_scalar_type_id = info.scalar_type_id;
+    value->num_dimensions = static_cast<uint8_t>(info.num_dimensions);
+    value->precision = info.precision;
+    value->scale = info.scale;
+}
+
+template <PrimitiveType Primitive>
+void set_primitive_variant_field(const typename PrimitiveTypeTraits<Primitive>::CppType& data,
+                                 FieldWithDataType* value) {
+    value->field = Field::create_field<Primitive>(data);
+    fill_field_type_info(value);
+}
+
+Status read_decimal_primitive_field(uint8_t primitive_header, const uint8_t* ptr,
+                                    const uint8_t* end, FieldWithDataType* value,
+                                    const uint8_t** next) {
+    int value_size = 16;
+    if (primitive_header == 8) {
+        value_size = 4;
+    } else if (primitive_header == 9) {
+        value_size = 8;
+    }
+    RETURN_IF_ERROR(require_available(ptr, end, 1 + value_size, "decimal value"));
+    int scale = static_cast<int8_t>(*ptr++);
+    if (scale < 0 || scale > BeConsts::MAX_DECIMAL128_PRECISION) {
+        return Status::Corruption("Invalid Parquet VARIANT decimal scale {}", scale);
+    }
+
+    if (primitive_header == 8) {
+        set_primitive_variant_field<TYPE_DECIMAL32>(
+                Decimal32(static_cast<Int32>(read_signed_le(ptr, value_size))), value);
+        value->precision = BeConsts::MAX_DECIMAL32_PRECISION;
+    } else if (primitive_header == 9) {
+        set_primitive_variant_field<TYPE_DECIMAL64>(
+                Decimal64(static_cast<Int64>(read_signed_le(ptr, value_size))), value);
+        value->precision = BeConsts::MAX_DECIMAL64_PRECISION;
+    } else {
+        set_primitive_variant_field<TYPE_DECIMAL128I>(Decimal128V3(read_signed_int128_le(ptr)),
+                                                      value);
+        value->precision = BeConsts::MAX_DECIMAL128_PRECISION;
+    }
+    value->scale = scale;
+    *next = ptr + value_size;
+    return Status::OK();
+}
+
+Status read_integral_primitive_field(uint8_t primitive_header, const uint8_t* ptr,
+                                     const uint8_t* end, FieldWithDataType* value,
+                                     const uint8_t** next) {
+    int value_size = 8;
+    if (primitive_header == 3) {
+        value_size = 1;
+    } else if (primitive_header == 4) {
+        value_size = 2;
+    } else if (primitive_header == 5 || primitive_header == 11) {
+        value_size = 4;
+    }
+    RETURN_IF_ERROR(require_available(ptr, end, value_size, "integer value"));
+    const auto data = static_cast<Int64>(read_signed_le(ptr, value_size));
+
+    switch (primitive_header) {
+    case 3:
+        set_primitive_variant_field<TYPE_TINYINT>(static_cast<Int8>(data), value);
+        break;
+    case 4:
+        set_primitive_variant_field<TYPE_SMALLINT>(static_cast<Int16>(data), value);
+        break;
+    case 5:
+        set_primitive_variant_field<TYPE_INT>(static_cast<Int32>(data), value);
+        break;
+    case 6:
+    case 11:
+    case 12:
+    case 13:
+    case 17:
+        set_primitive_variant_field<TYPE_BIGINT>(data, value);
+        break;
+    case 18:
+    case 19:
+        set_primitive_variant_field<TYPE_BIGINT>(data / 1000, value);
+        break;
+    default:
+        return Status::Corruption("Unsupported Parquet VARIANT primitive header {}",
+                                  primitive_header);
+    }
+    *next = ptr + value_size;
+    return Status::OK();
+}
+
+Status read_floating_primitive_field(uint8_t primitive_header, const uint8_t* ptr,
+                                     const uint8_t* end, FieldWithDataType* value,
+                                     const uint8_t** next) {
+    if (primitive_header == 14) {
+        RETURN_IF_ERROR(require_available(ptr, end, 4, "float value"));
+        auto bits = static_cast<uint32_t>(read_unsigned_le(ptr, 4));
+        float data;
+        std::memcpy(&data, &bits, sizeof(data));
+        set_primitive_variant_field<TYPE_FLOAT>(data, value);
+        *next = ptr + 4;
+        return Status::OK();
+    }
+
+    DCHECK_EQ(primitive_header, 7);
+    RETURN_IF_ERROR(require_available(ptr, end, 8, "double value"));
+    uint64_t bits = read_unsigned_le(ptr, 8);
+    double data;
+    std::memcpy(&data, &bits, sizeof(data));
+    set_primitive_variant_field<TYPE_DOUBLE>(data, value);
+    *next = ptr + 8;
+    return Status::OK();
+}
+
+Status read_binary_primitive_field(const uint8_t* ptr, const uint8_t* end, FieldWithDataType* value,
+                                   std::deque<std::string>* string_values, const uint8_t** next) {
+    RETURN_IF_ERROR(require_available(ptr, end, 4, "binary length"));
+    uint64_t size = read_unsigned_le(ptr, 4);
+    ptr += 4;
+    RETURN_IF_ERROR(require_available(ptr, end, size, "binary value"));
+    string_values->emplace_back(reinterpret_cast<const char*>(ptr), static_cast<size_t>(size));
+    value->field = Field::create_field<TYPE_VARBINARY>(StringView(string_values->back()));
+    fill_field_type_info(value);
+    *next = ptr + size;
+    return Status::OK();
+}
+
+Status read_string_primitive_field(const uint8_t* ptr, const uint8_t* end, FieldWithDataType* value,
+                                   const uint8_t** next) {
+    RETURN_IF_ERROR(require_available(ptr, end, 4, "binary or string length"));
+    uint64_t size = read_unsigned_le(ptr, 4);
+    ptr += 4;
+    RETURN_IF_ERROR(require_available(ptr, end, size, "string value"));
+    std::string_view data(reinterpret_cast<const char*>(ptr), static_cast<size_t>(size));
+    RETURN_IF_ERROR(require_valid_utf8(data, "string value"));
+    value->field = Field::create_field<TYPE_STRING>(String(data));
+    fill_field_type_info(value);
+    *next = ptr + size;
+    return Status::OK();
+}
+
+Status read_uuid_primitive_field(const uint8_t* ptr, const uint8_t* end, FieldWithDataType* value,
+                                 const uint8_t** next) {
+    RETURN_IF_ERROR(require_available(ptr, end, 16, "uuid value"));
+    value->field = Field::create_field<TYPE_STRING>(format_variant_uuid(ptr));
+    fill_field_type_info(value);
+    *next = ptr + 16;
+    return Status::OK();
+}
+
+Status read_array_layout(uint8_t value_header, const uint8_t* ptr, const uint8_t* end,
+                         VariantArrayLayout* layout) {
+    int field_offset_size = (value_header & 0x03) + 1;
+    int num_elements_size = (value_header & 0x04) != 0 ? 4 : 1;
+
+    RETURN_IF_ERROR(require_available(ptr, end, num_elements_size, "array element count"));
+    uint64_t num_elements = read_unsigned_le(ptr, num_elements_size);
+    ptr += num_elements_size;
+
+    RETURN_IF_ERROR(require_available_entries(ptr, end, num_elements + 1, field_offset_size,
+                                              "array field offsets"));
+    layout->field_offsets.resize(num_elements + 1);
+    for (uint64_t i = 0; i <= num_elements; ++i) {
+        layout->field_offsets[i] = read_unsigned_le(ptr, field_offset_size);
+        ptr += field_offset_size;
+    }
+
+    layout->total_size = layout->field_offsets.back();
+    layout->fields = ptr;
+    RETURN_IF_ERROR(
+            require_available(layout->fields, end, layout->total_size, "array field values"));
+    RETURN_IF_ERROR(
+            validate_array_field_offsets(layout->field_offsets, layout->total_size, "array"));
+    return Status::OK();
+}
+
+Status read_object_layout(uint8_t value_header, const uint8_t* ptr, const uint8_t* end,
+                          const VariantMetadata& metadata, VariantObjectLayout* layout) {
+    int field_offset_size = (value_header & 0x03) + 1;
+    int field_id_size = ((value_header >> 2) & 0x03) + 1;
+    int num_elements_size = (value_header & 0x10) != 0 ? 4 : 1;
+
+    RETURN_IF_ERROR(require_available(ptr, end, num_elements_size, "object element count"));
+    uint64_t num_elements = read_unsigned_le(ptr, num_elements_size);
+    ptr += num_elements_size;
+
+    RETURN_IF_ERROR(
+            require_available_entries(ptr, end, num_elements, field_id_size, "object field ids"));
+    layout->field_ids.resize(num_elements);
+    for (uint64_t i = 0; i < num_elements; ++i) {
+        layout->field_ids[i] = read_unsigned_le(ptr, field_id_size);
+        ptr += field_id_size;
+        if (layout->field_ids[i] >= metadata.dictionary.size()) {
+            return Status::Corruption("Invalid Parquet VARIANT object field id {}",
+                                      layout->field_ids[i]);
+        }
+        if (i > 0 && !variant_string_less(metadata.dictionary[layout->field_ids[i - 1]],
+                                          metadata.dictionary[layout->field_ids[i]])) {
+            return Status::Corruption("Invalid Parquet VARIANT object field names");
+        }
+    }
+
+    RETURN_IF_ERROR(require_available_entries(ptr, end, num_elements + 1, field_offset_size,
+                                              "object field offsets"));
+    layout->field_offsets.resize(num_elements + 1);
+    for (uint64_t i = 0; i <= num_elements; ++i) {
+        layout->field_offsets[i] = read_unsigned_le(ptr, field_offset_size);
+        ptr += field_offset_size;
+    }
+
+    layout->total_size = layout->field_offsets.back();
+    layout->fields = ptr;
+    RETURN_IF_ERROR(
+            require_available(layout->fields, end, layout->total_size, "object field values"));
+    RETURN_IF_ERROR(compute_object_field_ends(layout->field_offsets, layout->total_size,
+                                              &layout->field_ends));
+    return Status::OK();
+}
+
+Status decode_value_to_variant_map(const uint8_t* ptr, const uint8_t* end,
+                                   const VariantMetadata& metadata, PathInDataBuilder* path,
+                                   VariantMap* values, std::deque<std::string>* string_values,
+                                   const uint8_t** next);
+
+Status decode_primitive_to_variant_map(uint8_t primitive_header, const uint8_t* ptr,
+                                       const uint8_t* end, const VariantMetadata&,
+                                       PathInDataBuilder* path, VariantMap* values,
+                                       std::deque<std::string>* string_values,
+                                       const uint8_t** next) {
+    FieldWithDataType value;
+    switch (primitive_header) {
+    case 0:
+        value.field = Field();
+        value.base_scalar_type_id = INVALID_TYPE;
+        *next = ptr;
+        break;
+    case 1:
+        set_primitive_variant_field<TYPE_BOOLEAN>(true, &value);
+        *next = ptr;
+        break;
+    case 2:
+        set_primitive_variant_field<TYPE_BOOLEAN>(false, &value);
+        *next = ptr;
+        break;
+    case 3:
+    case 4:
+    case 5:
+    case 6:
+    case 11:
+    case 12:
+    case 13:
+    case 17:
+    case 18:
+    case 19:
+        RETURN_IF_ERROR(read_integral_primitive_field(primitive_header, ptr, end, &value, next));
+        break;
+    case 7:
+    case 14:
+        RETURN_IF_ERROR(read_floating_primitive_field(primitive_header, ptr, end, &value, next));
+        break;
+    case 8:
+    case 9:
+    case 10:
+        RETURN_IF_ERROR(read_decimal_primitive_field(primitive_header, ptr, end, &value, next));
+        break;
+    case 15:
+        RETURN_IF_ERROR(read_binary_primitive_field(ptr, end, &value, string_values, next));
+        break;
+    case 16:
+        RETURN_IF_ERROR(read_string_primitive_field(ptr, end, &value, next));
+        break;
+    case 20:
+        RETURN_IF_ERROR(read_uuid_primitive_field(ptr, end, &value, next));
+        break;
+    default:
+        return Status::Corruption("Unsupported Parquet VARIANT primitive header {}",
+                                  primitive_header);
+    }
+    (*values)[path->build()] = std::move(value);
+    return Status::OK();
+}
+
+Status decode_object_to_variant_map(uint8_t value_header, const uint8_t* ptr, const uint8_t* end,
+                                    const VariantMetadata& metadata, PathInDataBuilder* path,
+                                    VariantMap* values, std::deque<std::string>* string_values,
+                                    const uint8_t** next) {
+    VariantObjectLayout layout;
+    RETURN_IF_ERROR(read_object_layout(value_header, ptr, end, metadata, &layout));
+
+    if (layout.field_ids.empty()) {
+        RETURN_IF_ERROR(insert_empty_object_marker(path->build(), values));
+    }
+
+    for (uint64_t i = 0; i < layout.field_ids.size(); ++i) {
+        const uint8_t* child_begin = layout.fields + layout.field_offsets[i];
+        const uint8_t* child_end = layout.fields + layout.field_ends[i];
+        const uint8_t* child_next = nullptr;
+        path->append(metadata.dictionary[layout.field_ids[i]], false);
+        RETURN_IF_ERROR(decode_value_to_variant_map(child_begin, child_end, metadata, path, values,
+                                                    string_values, &child_next));
+        path->pop_back();
+        if (child_next != child_end) {
+            return Status::Corruption("Invalid Parquet VARIANT object child value length");
+        }
+    }
+    *next = layout.fields + layout.total_size;
+    return Status::OK();
+}
+
+void move_variant_map_to_field(VariantMap&& element_values, FieldWithDataType* value) {
+    if (element_values.size() == 1 && element_values.begin()->first.empty()) {
+        *value = std::move(element_values.begin()->second);
+        return;
+    }
+    value->field = Field::create_field<TYPE_VARIANT>(std::move(element_values));
+    fill_field_type_info(value);
+}
+
+Status decode_array_element_to_field(const uint8_t* ptr, const uint8_t* end,
+                                     const VariantMetadata& metadata, FieldWithDataType* value,
+                                     std::deque<std::string>* string_values, const uint8_t** next) {
+    RETURN_IF_ERROR(require_available(ptr, end, 1, "array child value"));
+    const uint8_t value_metadata = *ptr++;
+    const uint8_t basic_type = value_metadata & 0x03;
+    const uint8_t value_header = value_metadata >> 2;
+
+    if (basic_type == 0) {
+        VariantMap element_values;
+        PathInDataBuilder element_path;
+        RETURN_IF_ERROR(decode_primitive_to_variant_map(value_header, ptr, end, metadata,
+                                                        &element_path, &element_values,
+                                                        string_values, next));
+        move_variant_map_to_field(std::move(element_values), value);
+        return Status::OK();
+    }
+
+    if (basic_type == 1) {
+        const size_t size = value_header;
+        RETURN_IF_ERROR(require_available(ptr, end, size, "short string value"));
+        std::string_view data(reinterpret_cast<const char*>(ptr), size);
+        RETURN_IF_ERROR(require_valid_utf8(data, "short string value"));
+        value->field = Field::create_field<TYPE_STRING>(String(data));
+        fill_field_type_info(value);
+        *next = ptr + size;
+        return Status::OK();
+    }
+
+    if (basic_type == 2 || basic_type == 3) {
+        VariantMap element_values;
+        PathInDataBuilder element_path;
+        RETURN_IF_ERROR(decode_value_to_variant_map(ptr - 1, end, metadata, &element_path,
+                                                    &element_values, string_values, next));
+        move_variant_map_to_field(std::move(element_values), value);
+        return Status::OK();
+    }
+
+    std::string json;
+    RETURN_IF_ERROR(decode_value(ptr - 1, end, metadata, &json, next));
+    VariantMap element_values;
+    RETURN_IF_ERROR(parse_json_to_variant_map(json, PathInData(), &element_values));
+    move_variant_map_to_field(std::move(element_values), value);
+    return Status::OK();
+}
+
+Status decode_array_to_variant_map(uint8_t value_header, const uint8_t* ptr, const uint8_t* end,
+                                   const VariantMetadata& metadata, PathInDataBuilder* path,
+                                   VariantMap* values, std::deque<std::string>* string_values,
+                                   const uint8_t** next) {
+    VariantArrayLayout layout;
+    RETURN_IF_ERROR(read_array_layout(value_header, ptr, end, &layout));
+
+    Array array;
+    array.reserve(layout.field_offsets.size() - 1);
+    for (uint64_t i = 0; i + 1 < layout.field_offsets.size(); ++i) {
+        const uint8_t* child_begin = layout.fields + layout.field_offsets[i];
+        const uint8_t* child_end = layout.fields + layout.field_offsets[i + 1];
+        const uint8_t* child_next = nullptr;
+        FieldWithDataType child;
+        RETURN_IF_ERROR(decode_array_element_to_field(child_begin, child_end, metadata, &child,
+                                                      string_values, &child_next));
+        if (child_next != child_end) {
+            return Status::Corruption("Invalid Parquet VARIANT array child value length");
+        }
+        array.push_back(std::move(child.field));
+    }
+
+    FieldWithDataType value;
+    const size_t elements = array.size();
+    value.field = Field::create_field<TYPE_ARRAY>(std::move(array));
+    fill_field_type_info(&value);
+    if (value.base_scalar_type_id == INVALID_TYPE) {
+        RETURN_IF_ERROR(make_jsonb_field(make_null_array_json(elements), &value));
+    }
+    (*values)[path->build()] = std::move(value);
+    *next = layout.fields + layout.total_size;
+    return Status::OK();
+}
+
+Status decode_value_to_variant_map(const uint8_t* ptr, const uint8_t* end,
+                                   const VariantMetadata& metadata, PathInDataBuilder* path,
+                                   VariantMap* values, std::deque<std::string>* string_values,
+                                   const uint8_t** next) {
+    RETURN_IF_ERROR(require_available(ptr, end, 1, "value"));
+    uint8_t value_metadata = *ptr++;
+    uint8_t basic_type = value_metadata & 0x03;
+    uint8_t value_header = value_metadata >> 2;
+
+    switch (basic_type) {
+    case 0:
+        return decode_primitive_to_variant_map(value_header, ptr, end, metadata, path, values,
+                                               string_values, next);
+    case 2:
+        return decode_object_to_variant_map(value_header, ptr, end, metadata, path, values,
+                                            string_values, next);
+    case 1:
+        [[fallthrough]];
+    case 3: {
+        if (basic_type == 3) {
+            Status array_st = decode_array_to_variant_map(value_header, ptr, end, metadata, path,
+                                                          values, string_values, next);
+            if (array_st.ok()) {
+                return array_st;
+            }
+            if (!array_st.is<ErrorCode::NOT_IMPLEMENTED_ERROR>()) {
+                return array_st;
+            }
+        }
+        std::string json;
+        RETURN_IF_ERROR(decode_value(ptr - 1, end, metadata, &json, next));
+        return parse_json_to_variant_map(json, path->build(), values);
+    }
+    default:
+        return Status::Corruption("Unsupported Parquet VARIANT basic type {}", basic_type);
+    }
+}
+
+Status decode_metadata(const StringRef& metadata, VariantMetadata* result) {
+    const auto* ptr = reinterpret_cast<const uint8_t*>(metadata.data);
+    const auto* end = ptr + metadata.size;
+    RETURN_IF_ERROR(require_available(ptr, end, 1, "metadata"));
+    uint8_t header = *ptr++;
+    uint8_t version = header & 0x0f;
+    if (version != 1) {
+        return Status::Corruption("Unsupported Parquet VARIANT metadata version {}", version);
+    }
+    if ((header & 0x20) != 0) {
+        return Status::Corruption("Invalid Parquet VARIANT metadata header {}", header);
+    }
+    const bool sorted_strings = (header & 0x10) != 0;
+    int offset_size = ((header >> 6) & 0x03) + 1;
+    RETURN_IF_ERROR(require_available(ptr, end, offset_size, "metadata dictionary size"));
+    uint64_t dictionary_size = read_unsigned_le(ptr, offset_size);
+    ptr += offset_size;
+
+    RETURN_IF_ERROR(require_available_entries(ptr, end, dictionary_size + 1, offset_size,
+                                              "metadata dictionary offsets"));
+    std::vector<uint64_t> offsets(dictionary_size + 1);
+    for (uint64_t i = 0; i <= dictionary_size; ++i) {
+        offsets[i] = read_unsigned_le(ptr, offset_size);
+        ptr += offset_size;
+        if (i > 0 && offsets[i] < offsets[i - 1]) {
+            return Status::Corruption("Invalid Parquet VARIANT metadata dictionary offsets");
+        }
+    }
+    if (offsets.front() != 0) {
+        return Status::Corruption("Invalid Parquet VARIANT metadata dictionary offsets");
+    }
+
+    RETURN_IF_ERROR(require_available(ptr, end, offsets.back(), "metadata dictionary bytes"));
+    if (ptr + offsets.back() != end) {
+        return Status::Corruption("Invalid Parquet VARIANT metadata dictionary bytes");
+    }
+    result->dictionary.clear();
+    result->dictionary.reserve(dictionary_size);
+    for (uint64_t i = 0; i < dictionary_size; ++i) {
+        std::string entry(reinterpret_cast<const char*>(ptr + offsets[i]),
+                          offsets[i + 1] - offsets[i]);
+        RETURN_IF_ERROR(require_valid_utf8(entry, "metadata dictionary"));
+        if (sorted_strings && !result->dictionary.empty() &&
+            !variant_string_less(result->dictionary.back(), entry)) {
+            return Status::Corruption("Invalid Parquet VARIANT sorted metadata dictionary key");
+        }
+        result->dictionary.emplace_back(std::move(entry));
+    }
+    return Status::OK();
+}
+
+// NOLINTNEXTLINE(readability-function-cognitive-complexity, readability-function-size): VARIANT primitive tags are a compact spec switch.
+Status decode_primitive(uint8_t primitive_header, const uint8_t* ptr, const uint8_t* end,
+                        std::string* json, const uint8_t** next) {
+    switch (primitive_header) {
+    case 0:
+        json->append("null");
+        *next = ptr;
+        return Status::OK();
+    case 1:
+        json->append("true");
+        *next = ptr;
+        return Status::OK();
+    case 2:
+        json->append("false");
+        *next = ptr;
+        return Status::OK();
+    case 3:
+        RETURN_IF_ERROR(require_available(ptr, end, 1, "int8 value"));
+        json->append(std::to_string(static_cast<int8_t>(*ptr)));
+        *next = ptr + 1;
+        return Status::OK();
+    case 4:
+        RETURN_IF_ERROR(require_available(ptr, end, 2, "int16 value"));
+        json->append(std::to_string(read_signed_le(ptr, 2)));
+        *next = ptr + 2;
+        return Status::OK();
+    case 5:
+        RETURN_IF_ERROR(require_available(ptr, end, 4, "int32 value"));
+        json->append(std::to_string(read_signed_le(ptr, 4)));
+        *next = ptr + 4;
+        return Status::OK();
+    case 6:
+        RETURN_IF_ERROR(require_available(ptr, end, 8, "int64 value"));
+        json->append(std::to_string(read_signed_le(ptr, 8)));
+        *next = ptr + 8;
+        return Status::OK();
+    case 7: {
+        RETURN_IF_ERROR(require_available(ptr, end, 8, "double value"));
+        uint64_t bits = read_unsigned_le(ptr, 8);
+        double value;
+        std::memcpy(&value, &bits, sizeof(value));
+        RETURN_IF_ERROR(append_floating_json(value, json));
+        *next = ptr + 8;
+        return Status::OK();
+    }
+    case 8:
+    case 9:
+    case 10: {
+        int value_size = 16;
+        if (primitive_header == 8) {
+            value_size = 4;
+        } else if (primitive_header == 9) {
+            value_size = 8;
+        }
+        RETURN_IF_ERROR(require_available(ptr, end, 1 + value_size, "decimal value"));
+        int scale = static_cast<int8_t>(*ptr++);
+        if (scale < 0 || scale > 38) {
+            return Status::Corruption("Invalid Parquet VARIANT decimal scale {}", scale);
+        }
+        __int128 unscaled = 0;
+        if (value_size == 16) {
+            unscaled = read_signed_int128_le(ptr);
+        } else {
+            unscaled = read_signed_le(ptr, value_size);
+        }
+        append_decimal_json(unscaled, scale, json);
+        *next = ptr + value_size;
+        return Status::OK();
+    }
+    case 11:
+        RETURN_IF_ERROR(require_available(ptr, end, 4, "date value"));
+        json->append(std::to_string(read_signed_le(ptr, 4)));
+        *next = ptr + 4;
+        return Status::OK();
+    case 12:
+    case 13:
+    case 17:
+        RETURN_IF_ERROR(require_available(ptr, end, 8, "time or timestamp value"));
+        json->append(std::to_string(read_signed_le(ptr, 8)));
+        *next = ptr + 8;
+        return Status::OK();
+    case 18:
+    case 19:
+        RETURN_IF_ERROR(require_available(ptr, end, 8, "nanosecond timestamp value"));
+        json->append(std::to_string(read_signed_le(ptr, 8) / 1000));
+        *next = ptr + 8;
+        return Status::OK();
+    case 14: {
+        RETURN_IF_ERROR(require_available(ptr, end, 4, "float value"));
+        auto bits = static_cast<uint32_t>(read_unsigned_le(ptr, 4));
+        float value;
+        std::memcpy(&value, &bits, sizeof(value));
+        RETURN_IF_ERROR(append_floating_json(value, json));
+        *next = ptr + 4;
+        return Status::OK();
+    }
+    case 15: {
+        RETURN_IF_ERROR(require_available(ptr, end, 4, "binary length"));
+        uint64_t size = read_unsigned_le(ptr, 4);
+        ptr += 4;
+        RETURN_IF_ERROR(require_available(ptr, end, size, "binary value"));
+        std::string_view value(reinterpret_cast<const char*>(ptr), static_cast<size_t>(size));
+        append_json_string(value, json, true);
+        *next = ptr + size;
+        return Status::OK();
+    }
+    case 16: {
+        RETURN_IF_ERROR(require_available(ptr, end, 4, "binary or string length"));
+        uint64_t size = read_unsigned_le(ptr, 4);
+        ptr += 4;
+        RETURN_IF_ERROR(require_available(ptr, end, size, "string value"));
+        std::string_view value(reinterpret_cast<const char*>(ptr), static_cast<size_t>(size));
+        RETURN_IF_ERROR(require_valid_utf8(value, "string value"));
+        append_json_string(value, json);
+        *next = ptr + size;
+        return Status::OK();
+    }
+    case 20:
+        RETURN_IF_ERROR(require_available(ptr, end, 16, "uuid value"));
+        append_uuid_json(ptr, json);
+        *next = ptr + 16;
+        return Status::OK();
+    default:
+        return Status::Corruption("Unsupported Parquet VARIANT primitive header {}",
+                                  primitive_header);
+    }
+}
+
+Status decode_object(uint8_t value_header, const uint8_t* ptr, const uint8_t* end,
+                     const VariantMetadata& metadata, std::string* json, const uint8_t** next) {
+    int field_offset_size = (value_header & 0x03) + 1;
+    int field_id_size = ((value_header >> 2) & 0x03) + 1;
+    int num_elements_size = (value_header & 0x10) != 0 ? 4 : 1;
+
+    RETURN_IF_ERROR(require_available(ptr, end, num_elements_size, "object element count"));
+    uint64_t num_elements = read_unsigned_le(ptr, num_elements_size);
+    ptr += num_elements_size;
+
+    RETURN_IF_ERROR(
+            require_available_entries(ptr, end, num_elements, field_id_size, "object field ids"));
+    std::vector<uint64_t> field_ids(num_elements);
+    for (uint64_t i = 0; i < num_elements; ++i) {
+        field_ids[i] = read_unsigned_le(ptr, field_id_size);
+        ptr += field_id_size;
+        if (field_ids[i] >= metadata.dictionary.size()) {
+            return Status::Corruption("Invalid Parquet VARIANT object field id {}", field_ids[i]);
+        }
+        if (i > 0 && !variant_string_less(metadata.dictionary[field_ids[i - 1]],
+                                          metadata.dictionary[field_ids[i]])) {
+            return Status::Corruption("Invalid Parquet VARIANT object field names");
+        }
+    }
+
+    RETURN_IF_ERROR(require_available_entries(ptr, end, num_elements + 1, field_offset_size,
+                                              "object field offsets"));
+    std::vector<uint64_t> field_offsets(num_elements + 1);
+    for (uint64_t i = 0; i <= num_elements; ++i) {
+        field_offsets[i] = read_unsigned_le(ptr, field_offset_size);
+        ptr += field_offset_size;
+    }
+
+    uint64_t total_size = field_offsets.back();
+    const uint8_t* fields = ptr;
+    RETURN_IF_ERROR(require_available(fields, end, total_size, "object field values"));
+    std::vector<uint64_t> field_ends;
+    RETURN_IF_ERROR(compute_object_field_ends(field_offsets, total_size, &field_ends));
+
+    json->push_back('{');
+    for (uint64_t i = 0; i < num_elements; ++i) {
+        if (i != 0) {
+            json->push_back(',');
+        }
+        append_json_string(metadata.dictionary[field_ids[i]], json);
+        json->push_back(':');
+        const uint8_t* child_begin = fields + field_offsets[i];
+        const uint8_t* child_end = fields + field_ends[i];
+        const uint8_t* child_next = nullptr;
+        RETURN_IF_ERROR(decode_value(child_begin, child_end, metadata, json, &child_next));
+        if (child_next != child_end) {
+            return Status::Corruption("Invalid Parquet VARIANT object child value length");
+        }
+    }
+    json->push_back('}');
+    *next = fields + total_size;
+    return Status::OK();
+}
+
+Status decode_array(uint8_t value_header, const uint8_t* ptr, const uint8_t* end,
+                    const VariantMetadata& metadata, std::string* json, const uint8_t** next) {
+    VariantArrayLayout layout;
+    RETURN_IF_ERROR(read_array_layout(value_header, ptr, end, &layout));
+
+    json->push_back('[');
+    for (uint64_t i = 0; i + 1 < layout.field_offsets.size(); ++i) {
+        if (i != 0) {
+            json->push_back(',');
+        }
+        const uint8_t* child_begin = layout.fields + layout.field_offsets[i];
+        const uint8_t* child_end = layout.fields + layout.field_offsets[i + 1];
+        const uint8_t* child_next = nullptr;
+        RETURN_IF_ERROR(decode_value(child_begin, child_end, metadata, json, &child_next));
+        if (child_next != child_end) {
+            return Status::Corruption("Invalid Parquet VARIANT array child value length");
+        }
+    }
+    json->push_back(']');
+    *next = layout.fields + layout.total_size;
+    return Status::OK();
+}
+
+Status decode_value(const uint8_t* ptr, const uint8_t* end, const VariantMetadata& metadata,
+                    std::string* json, const uint8_t** next) {
+    RETURN_IF_ERROR(require_available(ptr, end, 1, "value"));
+    uint8_t value_metadata = *ptr++;
+    uint8_t basic_type = value_metadata & 0x03;
+    uint8_t value_header = value_metadata >> 2;
+
+    switch (basic_type) {
+    case 0:
+        return decode_primitive(value_header, ptr, end, json, next);
+    case 1: {
+        size_t size = value_header;
+        RETURN_IF_ERROR(require_available(ptr, end, size, "short string value"));
+        std::string_view value(reinterpret_cast<const char*>(ptr), static_cast<size_t>(size));
+        RETURN_IF_ERROR(require_valid_utf8(value, "short string value"));
+        append_json_string(value, json);
+        *next = ptr + size;
+        return Status::OK();
+    }
+    case 2:
+        return decode_object(value_header, ptr, end, metadata, json, next);
+    case 3:
+        return decode_array(value_header, ptr, end, metadata, json, next);
+    default:
+        return Status::Corruption("Unsupported Parquet VARIANT basic type {}", basic_type);
+    }
+}
+
+} // namespace
+
+Status decode_variant_to_json(const StringRef& metadata, const StringRef& value,
+                              std::string* json) {
+    VariantMetadata decoded_metadata;
+    RETURN_IF_ERROR(decode_metadata(metadata, &decoded_metadata));
+    json->clear();
+    const auto* ptr = reinterpret_cast<const uint8_t*>(value.data);
+    const auto* end = ptr + value.size;
+    const uint8_t* next = nullptr;
+    RETURN_IF_ERROR(decode_value(ptr, end, decoded_metadata, json, &next));
+    if (next != end) {
+        return Status::Corruption("Invalid Parquet VARIANT value has {} trailing bytes",
+                                  end - next);
+    }
+    return Status::OK();
+}
+
+Status decode_variant_to_variant_map(const StringRef& metadata, const StringRef& value,
+                                     const PathInData& prefix, VariantMap* values,
+                                     std::deque<std::string>* string_values) {
+    VariantMetadata decoded_metadata;
+    RETURN_IF_ERROR(decode_metadata(metadata, &decoded_metadata));
+    const auto* ptr = reinterpret_cast<const uint8_t*>(value.data);
+    const auto* end = ptr + value.size;
+    const uint8_t* next = nullptr;
+    PathInDataBuilder path;
+    path.append(prefix.get_parts(), false);
+    RETURN_IF_ERROR(decode_value_to_variant_map(ptr, end, decoded_metadata, &path, values,
+                                                string_values, &next));
+    if (next != end) {
+        return Status::Corruption("Invalid Parquet VARIANT value has {} trailing bytes",
+                                  end - next);
+    }
+    return Status::OK();
+}
+
+} // namespace doris::parquet
diff --git a/be/src/format/parquet/parquet_variant_reader.h b/be/src/format/parquet/parquet_variant_reader.h
new file mode 100644
index 00000000000000..8289113f5fc963
--- /dev/null
+++ b/be/src/format/parquet/parquet_variant_reader.h
@@ -0,0 +1,38 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <cstdint>
+#include <deque>
+#include <string>
+
+#include "common/status.h"
+#include "core/field.h"
+#include "core/string_ref.h"
+
+namespace doris::parquet {
+
+std::string format_variant_uuid(const uint8_t* ptr);
+
+Status decode_variant_to_json(const StringRef& metadata, const StringRef& value, std::string* json);
+
+Status decode_variant_to_variant_map(const StringRef& metadata, const StringRef& value,
+                                     const PathInData& prefix, VariantMap* values,
+                                     std::deque<std::string>* string_values);
+
+} // namespace doris::parquet
diff --git a/be/src/format/parquet/schema_desc.cpp b/be/src/format/parquet/schema_desc.cpp
index 972ce6f969b74c..61684ceb13d12c 100644
--- a/be/src/format/parquet/schema_desc.cpp
+++ b/be/src/format/parquet/schema_desc.cpp
@@ -17,8 +17,6 @@
 
 #include "format/parquet/schema_desc.h"
 
-#include <ctype.h>
-
 #include <algorithm>
 #include <ostream>
 #include <utility>
@@ -29,6 +27,7 @@
 #include "core/data_type/data_type_factory.hpp"
 #include "core/data_type/data_type_map.h"
 #include "core/data_type/data_type_struct.h"
+#include "core/data_type/data_type_variant.h"
 #include "core/data_type/define_primitive_type.h"
 #include "format/generic_reader.h"
 #include "format/table/table_schema_change_helper.h"
@@ -66,6 +65,23 @@ static bool is_optional_node(const tparquet::SchemaElement& schema) {
            schema.repetition_type == tparquet::FieldRepetitionType::OPTIONAL;
 }
 
+static bool is_variant_node(const tparquet::SchemaElement& schema) {
+    return schema.__isset.logicalType && schema.logicalType.__isset.VARIANT;
+}
+
+static void mark_variant_subfields(FieldSchema* field) {
+    field->is_in_variant = true;
+    for (auto& child : field->children) {
+        mark_variant_subfields(&child);
+    }
+}
+
+static bool is_unannotated_binary_field(const FieldSchema& field) {
+    return field.physical_type == tparquet::Type::BYTE_ARRAY &&
+           !field.parquet_schema.__isset.logicalType &&
+           !field.parquet_schema.__isset.converted_type;
+}
+
 static int num_children_node(const tparquet::SchemaElement& schema) {
     return schema.__isset.num_children ? schema.num_children : 0;
 }
@@ -305,7 +321,8 @@ std::pair<DataTypePtr, bool> FieldDescriptor::convert_to_doris_type(
             }
         }
     } else if (logicalType.__isset.TIME) {
-        ans.first = DataTypeFactory::instance().create_data_type(TYPE_TIMEV2, nullable);
+        ans.first = DataTypeFactory::instance().create_data_type(
+                TYPE_TIMEV2, nullable, 0, logicalType.TIME.unit.__isset.MILLIS ? 3 : 6);
     } else if (logicalType.__isset.TIMESTAMP) {
         if (_enable_mapping_timestamp_tz) {
             if (logicalType.TIMESTAMP.isAdjustedToUTC) {
@@ -351,9 +368,10 @@ std::pair<DataTypePtr, bool> FieldDescriptor::convert_to_doris_type(
         ans.first = DataTypeFactory::instance().create_data_type(TYPE_DATEV2, nullable);
         break;
     case tparquet::ConvertedType::type::TIME_MILLIS:
-        [[fallthrough]];
+        ans.first = DataTypeFactory::instance().create_data_type(TYPE_TIMEV2, nullable, 0, 3);
+        break;
     case tparquet::ConvertedType::type::TIME_MICROS:
-        ans.first = DataTypeFactory::instance().create_data_type(TYPE_TIMEV2, nullable);
+        ans.first = DataTypeFactory::instance().create_data_type(TYPE_TIMEV2, nullable, 0, 6);
         break;
     case tparquet::ConvertedType::type::TIMESTAMP_MILLIS:
         ans.first = DataTypeFactory::instance().create_data_type(TYPE_DATETIMEV2, nullable, 0, 3);
@@ -398,7 +416,10 @@ std::pair<DataTypePtr, bool> FieldDescriptor::convert_to_doris_type(
 
 Status FieldDescriptor::parse_group_field(const std::vector<tparquet::SchemaElement>& t_schemas,
                                           size_t curr_pos, FieldSchema* group_field) {
-    auto& group_schema = t_schemas[curr_pos];
+    const auto& group_schema = t_schemas[curr_pos];
+    if (is_variant_node(group_schema)) {
+        return parse_variant_field(t_schemas, curr_pos, group_field);
+    }
     if (is_map_node(group_schema)) {
         // the map definition:
         // optional group <name> (MAP) {
@@ -446,6 +467,67 @@ Status FieldDescriptor::parse_group_field(const std::vector<tparquet::SchemaElem
     return Status::OK();
 }
 
+Status FieldDescriptor::parse_variant_field(const std::vector<tparquet::SchemaElement>& t_schemas,
+                                            size_t curr_pos, FieldSchema* variant_field) {
+    RETURN_IF_ERROR(parse_struct_field(t_schemas, curr_pos, variant_field));
+
+    bool has_metadata = false;
+    bool metadata_required = false;
+    bool has_value = false;
+    bool has_typed_value = false;
+    for (const auto& child : variant_field->children) {
+        if (child.lower_case_name == "metadata") {
+            if (has_metadata) {
+                return Status::InvalidArgument(
+                        "Parquet VARIANT field '{}' has duplicate metadata child",
+                        variant_field->name);
+            }
+            if (!is_unannotated_binary_field(child)) {
+                return Status::InvalidArgument(
+                        "Parquet VARIANT field '{}' metadata child must be unannotated binary",
+                        variant_field->name);
+            }
+            has_metadata = true;
+            metadata_required = !child.data_type->is_nullable();
+        } else if (child.lower_case_name == "value") {
+            if (has_value) {
+                return Status::InvalidArgument(
+                        "Parquet VARIANT field '{}' has duplicate value child",
+                        variant_field->name);
+            }
+            if (!is_unannotated_binary_field(child)) {
+                return Status::InvalidArgument(
+                        "Parquet VARIANT field '{}' value child must be unannotated binary",
+                        variant_field->name);
+            }
+            has_value = true;
+        } else if (child.lower_case_name == "typed_value") {
+            if (has_typed_value) {
+                return Status::InvalidArgument(
+                        "Parquet VARIANT field '{}' has duplicate typed_value child",
+                        variant_field->name);
+            }
+            has_typed_value = true;
+        } else {
+            return Status::InvalidArgument("Parquet VARIANT field '{}' has unexpected child '{}'",
+                                           variant_field->name, child.name);
+        }
+    }
+    if (!has_metadata || !metadata_required || (!has_value && !has_typed_value)) {
+        return Status::InvalidArgument(
+                "Parquet VARIANT field '{}' must contain required binary metadata and at least one "
+                "binary value or typed_value field",
+                variant_field->name);
+    }
+
+    variant_field->data_type = std::make_shared<DataTypeVariant>(0, false);
+    if (is_optional_node(t_schemas[curr_pos])) {
+        variant_field->data_type = make_nullable(variant_field->data_type);
+    }
+    mark_variant_subfields(variant_field);
+    return Status::OK();
+}
+
 Status FieldDescriptor::parse_list_field(const std::vector<tparquet::SchemaElement>& t_schemas,
                                          size_t curr_pos, FieldSchema* list_field) {
     // the list definition:
@@ -641,6 +723,32 @@ FieldSchema* FieldDescriptor::get_column(const std::string& name) const {
     return nullptr;
 }
 
+namespace {
+
+void collect_physical_fields(FieldSchema* field, std::vector<FieldSchema*>* physical_fields) {
+    if (field->children.empty()) {
+        if (field->physical_column_index >= 0) {
+            field->physical_column_index = cast_set<int>(physical_fields->size());
+            physical_fields->push_back(field);
+        }
+        return;
+    }
+    for (auto& child : field->children) {
+        collect_physical_fields(&child, physical_fields);
+    }
+}
+
+} // namespace
+
+void FieldDescriptor::rebuild_indexes() {
+    _physical_fields.clear();
+    _name_to_field.clear();
+    for (auto& field : _fields) {
+        _name_to_field.emplace(field.name, &field);
+        collect_physical_fields(&field, &_physical_fields);
+    }
+}
+
 void FieldDescriptor::get_column_names(std::unordered_set<std::string>* names) const {
     names->clear();
     for (const FieldSchema& f : _fields) {
@@ -668,6 +776,13 @@ void FieldDescriptor::assign_ids() {
     }
 }
 
+FieldDescriptor FieldDescriptor::copy_with_assigned_ids() const {
+    FieldDescriptor copy = *this;
+    copy.rebuild_indexes();
+    copy.assign_ids();
+    return copy;
+}
+
 const FieldSchema* FieldDescriptor::find_column_by_id(uint64_t column_id) const {
     for (const auto& field : _fields) {
         if (auto result = field.find_column_by_id(column_id)) {
diff --git a/be/src/format/parquet/schema_desc.h b/be/src/format/parquet/schema_desc.h
index 544050a8516ea6..17027e5016cb09 100644
--- a/be/src/format/parquet/schema_desc.h
+++ b/be/src/format/parquet/schema_desc.h
@@ -58,6 +58,7 @@ struct FieldSchema {
 
     //For UInt8 -> Int16,UInt16 -> Int32,UInt32 -> Int64,UInt64 -> Int128.
     bool is_type_compatibility = false;
+    bool is_in_variant = false;
 
     FieldSchema()
             : data_type(std::make_shared<DataTypeNothing>()), column_id(UNASSIGNED_COLUMN_ID) {}
@@ -101,6 +102,9 @@ class FieldDescriptor {
     Status parse_map_field(const std::vector<tparquet::SchemaElement>& t_schemas, size_t curr_pos,
                            FieldSchema* map_field);
 
+    Status parse_variant_field(const std::vector<tparquet::SchemaElement>& t_schemas,
+                               size_t curr_pos, FieldSchema* variant_field);
+
     Status parse_struct_field(const std::vector<tparquet::SchemaElement>& t_schemas,
                               size_t curr_pos, FieldSchema* struct_field);
 
@@ -110,6 +114,8 @@ class FieldDescriptor {
     Status parse_node_field(const std::vector<tparquet::SchemaElement>& t_schemas, size_t curr_pos,
                             FieldSchema* node_field);
 
+    void rebuild_indexes();
+
     std::pair<DataTypePtr, bool> convert_to_doris_type(tparquet::LogicalType logicalType,
                                                        bool nullable);
     std::pair<DataTypePtr, bool> convert_to_doris_type(
@@ -119,6 +125,23 @@ class FieldDescriptor {
 
 public:
     FieldDescriptor() = default;
+    FieldDescriptor(const FieldDescriptor& other)
+            : _fields(other._fields),
+              _next_schema_pos(other._next_schema_pos),
+              _enable_mapping_varbinary(other._enable_mapping_varbinary),
+              _enable_mapping_timestamp_tz(other._enable_mapping_timestamp_tz) {
+        rebuild_indexes();
+    }
+    FieldDescriptor& operator=(const FieldDescriptor& other) {
+        if (this != &other) {
+            _fields = other._fields;
+            _next_schema_pos = other._next_schema_pos;
+            _enable_mapping_varbinary = other._enable_mapping_varbinary;
+            _enable_mapping_timestamp_tz = other._enable_mapping_timestamp_tz;
+            rebuild_indexes();
+        }
+        return *this;
+    }
     ~FieldDescriptor() = default;
 
     /**
@@ -161,6 +184,8 @@ class FieldDescriptor {
      */
     void assign_ids();
 
+    FieldDescriptor copy_with_assigned_ids() const;
+
     const FieldSchema* find_column_by_id(uint64_t column_id) const;
     void set_enable_mapping_varbinary(bool enable) { _enable_mapping_varbinary = enable; }
     void set_enable_mapping_timestamp_tz(bool enable) { _enable_mapping_timestamp_tz = enable; }
diff --git a/be/src/format/parquet/vparquet_column_chunk_reader.cpp b/be/src/format/parquet/vparquet_column_chunk_reader.cpp
index b4b919f187073c..b03f7335e2bf5c 100644
--- a/be/src/format/parquet/vparquet_column_chunk_reader.cpp
+++ b/be/src/format/parquet/vparquet_column_chunk_reader.cpp
@@ -83,10 +83,16 @@ Status ColumnChunkReader<IN_COLLECTION, OFFSET_INDEX>::init() {
 template <bool IN_COLLECTION, bool OFFSET_INDEX>
 Status ColumnChunkReader<IN_COLLECTION, OFFSET_INDEX>::skip_nested_values(
         const std::vector<level_t>& def_levels) {
+    return skip_nested_values(def_levels, 0, def_levels.size());
+}
+
+template <bool IN_COLLECTION, bool OFFSET_INDEX>
+Status ColumnChunkReader<IN_COLLECTION, OFFSET_INDEX>::skip_nested_values(
+        const std::vector<level_t>& def_levels, size_t begin, size_t end) {
     size_t no_value_cnt = 0;
     size_t value_cnt = 0;
 
-    for (size_t idx = 0; idx < def_levels.size(); idx++) {
+    for (size_t idx = begin; idx < end; idx++) {
         level_t def_level = def_levels[idx];
         if (IN_COLLECTION && def_level < _field_schema->repeated_parent_def_level) {
             no_value_cnt++;
diff --git a/be/src/format/parquet/vparquet_column_chunk_reader.h b/be/src/format/parquet/vparquet_column_chunk_reader.h
index b117f6c6652e7e..bfa0ad73174d4a 100644
--- a/be/src/format/parquet/vparquet_column_chunk_reader.h
+++ b/be/src/format/parquet/vparquet_column_chunk_reader.h
@@ -191,12 +191,13 @@ class ColumnChunkReader {
 
     Status seek_to_nested_row(size_t left_row);
     Status skip_nested_values(const std::vector<level_t>& def_levels);
+    Status skip_nested_values(const std::vector<level_t>& def_levels, size_t begin, size_t end);
     Status fill_def(std::vector<level_t>& def_values) {
         auto before_sz = def_values.size();
         auto append_sz = _remaining_def_nums - _remaining_rep_nums;
         def_values.resize(before_sz + append_sz, 0);
         if (max_def_level() != 0) {
-            auto ptr = def_values.data() + before_sz;
+            auto* ptr = def_values.data() + before_sz;
             _def_level_decoder.get_levels(ptr, append_sz);
         }
         _remaining_def_nums -= append_sz;
diff --git a/be/src/format/parquet/vparquet_column_reader.cpp b/be/src/format/parquet/vparquet_column_reader.cpp
index ba7d42a5aed84e..ac096af06e733e 100644
--- a/be/src/format/parquet/vparquet_column_reader.cpp
+++ b/be/src/format/parquet/vparquet_column_reader.cpp
@@ -17,29 +17,53 @@
 
 #include "format/parquet/vparquet_column_reader.h"
 
+#include <cctz/time_zone.h>
 #include <gen_cpp/parquet_types.h>
-#include <limits.h>
+#include <rapidjson/document.h>
 #include <sys/types.h>
 
 #include <algorithm>
+#include <deque>
+#include <limits>
+#include <map>
+#include <set>
+#include <string_view>
 #include <utility>
+#include <vector>
 
+#include "common/exception.h"
 #include "common/status.h"
 #include "core/column/column.h"
 #include "core/column/column_array.h"
 #include "core/column/column_map.h"
 #include "core/column/column_nullable.h"
+#include "core/column/column_string.h"
 #include "core/column/column_struct.h"
+#include "core/column/column_varbinary.h"
+#include "core/column/column_variant.h"
 #include "core/data_type/data_type_array.h"
+#include "core/data_type/data_type_factory.hpp"
+#include "core/data_type/data_type_jsonb.h"
 #include "core/data_type/data_type_map.h"
 #include "core/data_type/data_type_nullable.h"
+#include "core/data_type/data_type_number.h"
+#include "core/data_type/data_type_string.h"
 #include "core/data_type/data_type_struct.h"
+#include "core/data_type/data_type_variant.h"
 #include "core/data_type/define_primitive_type.h"
+#include "core/data_type_serde/data_type_serde.h"
+#include "core/string_buffer.hpp"
+#include "core/value/jsonb_value.h"
+#include "core/value/timestamptz_value.h"
+#include "core/value/vdatetime_value.h"
+#include "exec/common/variant_util.h"
 #include "format/parquet/level_decoder.h"
+#include "format/parquet/parquet_variant_reader.h"
 #include "format/parquet/schema_desc.h"
 #include "format/parquet/vparquet_column_chunk_reader.h"
 #include "io/fs/tracing_file_reader.h"
 #include "runtime/runtime_profile.h"
+#include "util/jsonb_document.h"
 
 namespace doris {
 static void fill_struct_null_map(FieldSchema* field, NullMap& null_map,
@@ -103,6 +127,1837 @@ static void fill_array_offset(FieldSchema* field, ColumnArray::Offsets64& offset
     }
 }
 
+static constexpr int64_t UNIX_EPOCH_DAYNR = 719528;
+static constexpr int64_t MICROS_PER_SECOND = 1000000;
+
+static int64_t variant_date_value(const VecDateTimeValue& value) {
+    return value.daynr() - UNIX_EPOCH_DAYNR;
+}
+
+static int64_t variant_date_value(const DateV2Value<DateV2ValueType>& value) {
+    return value.daynr() - UNIX_EPOCH_DAYNR;
+}
+
+static int64_t variant_datetime_value(const VecDateTimeValue& value) {
+    int64_t timestamp = 0;
+    value.unix_timestamp(&timestamp, cctz::utc_time_zone());
+    return timestamp * MICROS_PER_SECOND;
+}
+
+static int64_t variant_datetime_value(const DateV2Value<DateTimeV2ValueType>& value) {
+    int64_t timestamp = 0;
+    value.unix_timestamp(&timestamp, cctz::utc_time_zone());
+    return timestamp * MICROS_PER_SECOND + value.microsecond();
+}
+
+static int64_t variant_datetime_value(const TimestampTzValue& value) {
+    int64_t timestamp = 0;
+    value.unix_timestamp(&timestamp, cctz::utc_time_zone());
+    return timestamp * MICROS_PER_SECOND + value.microsecond();
+}
+
+static int find_child_idx(const FieldSchema& field, std::string_view name) {
+    for (int i = 0; i < field.children.size(); ++i) {
+        if (field.children[i].lower_case_name == name) {
+            return i;
+        }
+    }
+    return -1;
+}
+
+static bool is_variant_wrapper_typed_value_child(const FieldSchema& field) {
+    auto type = remove_nullable(field.data_type);
+    return type->get_primitive_type() == TYPE_STRUCT || type->get_primitive_type() == TYPE_ARRAY;
+}
+
+static bool is_unannotated_variant_value_field(const FieldSchema& field) {
+    // VARIANT residual value is raw binary; annotated strings named value are user fields.
+    return field.lower_case_name == "value" && field.physical_type == tparquet::Type::BYTE_ARRAY &&
+           !field.parquet_schema.__isset.logicalType &&
+           !field.parquet_schema.__isset.converted_type;
+}
+
+static bool is_unannotated_variant_metadata_field(const FieldSchema& field) {
+    return field.lower_case_name == "metadata" &&
+           field.physical_type == tparquet::Type::BYTE_ARRAY &&
+           !field.parquet_schema.__isset.logicalType &&
+           !field.parquet_schema.__isset.converted_type;
+}
+
+static bool is_variant_wrapper_field(const FieldSchema& field,
+                                     bool allow_scalar_typed_value_only_wrapper) {
+    auto type = remove_nullable(field.data_type);
+    if (type->get_primitive_type() != TYPE_STRUCT && type->get_primitive_type() != TYPE_VARIANT) {
+        return false;
+    }
+
+    bool has_metadata = false;
+    bool has_value = false;
+    const FieldSchema* typed_value = nullptr;
+    for (const auto& child : field.children) {
+        if (child.lower_case_name == "metadata") {
+            if (!is_unannotated_variant_metadata_field(child)) {
+                return false;
+            }
+            has_metadata = true;
+            continue;
+        }
+        if (child.lower_case_name == "value") {
+            if (!is_unannotated_variant_value_field(child)) {
+                return false;
+            }
+            has_value = true;
+            continue;
+        }
+        if (child.lower_case_name == "typed_value") {
+            typed_value = &child;
+            continue;
+        }
+        return false;
+    }
+    if (has_metadata) {
+        return type->get_primitive_type() == TYPE_VARIANT && (has_value || typed_value != nullptr);
+    }
+    if (has_value) {
+        return typed_value != nullptr;
+    }
+    return typed_value != nullptr && (allow_scalar_typed_value_only_wrapper ||
+                                      is_variant_wrapper_typed_value_child(*typed_value));
+}
+
+static bool is_value_only_variant_wrapper_candidate(const FieldSchema& field) {
+    auto type = remove_nullable(field.data_type);
+    if (type->get_primitive_type() != TYPE_STRUCT && type->get_primitive_type() != TYPE_VARIANT) {
+        return false;
+    }
+
+    bool has_value = false;
+    for (const auto& child : field.children) {
+        if (is_unannotated_variant_value_field(child)) {
+            has_value = true;
+            continue;
+        }
+        return false;
+    }
+    return has_value;
+}
+
+static Status get_binary_field(const Field& field, std::string* value, bool* present) {
+    if (field.is_null()) {
+        *present = false;
+        return Status::OK();
+    }
+    *present = true;
+    switch (field.get_type()) {
+    case TYPE_STRING:
+        *value = field.get<TYPE_STRING>();
+        return Status::OK();
+    case TYPE_CHAR:
+        *value = field.get<TYPE_CHAR>();
+        return Status::OK();
+    case TYPE_VARCHAR:
+        *value = field.get<TYPE_VARCHAR>();
+        return Status::OK();
+    case TYPE_VARBINARY: {
+        auto ref = field.get<TYPE_VARBINARY>().to_string_ref();
+        value->assign(ref.data, ref.size);
+        return Status::OK();
+    }
+    default:
+        return Status::Corruption("Parquet VARIANT binary field has unexpected Doris type {}",
+                                  field.get_type_name());
+    }
+}
+
+static PathInData append_path(const PathInData& prefix, const PathInData& suffix) {
+    if (prefix.empty()) {
+        return suffix;
+    }
+    if (suffix.empty()) {
+        return prefix;
+    }
+    PathInDataBuilder builder;
+    builder.append(prefix.get_parts(), false);
+    builder.append(suffix.get_parts(), false);
+    return builder.build();
+}
+
+static Status make_jsonb_field(std::string_view json, FieldWithDataType* value) {
+    JsonBinaryValue jsonb_value;
+    RETURN_IF_ERROR(jsonb_value.from_json_string(json.data(), json.size()));
+    value->field =
+            Field::create_field<TYPE_JSONB>(JsonbField(jsonb_value.value(), jsonb_value.size()));
+    value->base_scalar_type_id = TYPE_JSONB;
+    value->num_dimensions = 0;
+    value->precision = 0;
+    value->scale = 0;
+    return Status::OK();
+}
+
+static std::string make_null_array_json(size_t elements) {
+    std::string json = "[";
+    for (size_t i = 0; i < elements; ++i) {
+        if (i != 0) {
+            json.push_back(',');
+        }
+        json.append("null");
+    }
+    json.push_back(']');
+    return json;
+}
+
+static Status make_empty_object_field(Field* field) {
+    FieldWithDataType value;
+    RETURN_IF_ERROR(make_jsonb_field("{}", &value));
+    *field = std::move(value.field);
+    return Status::OK();
+}
+
+static Status insert_jsonb_value(const PathInData& path, std::string_view json,
+                                 VariantMap* values) {
+    FieldWithDataType value;
+    RETURN_IF_ERROR(make_jsonb_field(json, &value));
+    (*values)[path] = std::move(value);
+    return Status::OK();
+}
+
+static Status insert_empty_object_marker(const PathInData& path, VariantMap* values) {
+    return insert_jsonb_value(path, "{}", values);
+}
+
+static bool is_empty_object_marker(const FieldWithDataType& value) {
+    if (value.field.get_type() != TYPE_JSONB) {
+        return false;
+    }
+    const auto& jsonb = value.field.get<TYPE_JSONB>();
+    const JsonbDocument* document = nullptr;
+    Status st =
+            JsonbDocument::checkAndCreateDocument(jsonb.get_value(), jsonb.get_size(), &document);
+    if (!st.ok() || document == nullptr || document->getValue() == nullptr ||
+        !document->getValue()->isObject()) {
+        return false;
+    }
+    return document->getValue()->unpack<ObjectVal>()->numElem() == 0;
+}
+
+static Status collect_empty_object_markers(const rapidjson::Value& value, PathInDataBuilder* path,
+                                           VariantMap* values) {
+    if (!value.IsObject()) {
+        return Status::OK();
+    }
+    if (value.MemberCount() == 0) {
+        return insert_empty_object_marker(path->build(), values);
+    }
+    for (auto it = value.MemberBegin(); it != value.MemberEnd(); ++it) {
+        if (it->value.IsObject()) {
+            path->append(std::string_view(it->name.GetString(), it->name.GetStringLength()), false);
+            RETURN_IF_ERROR(collect_empty_object_markers(it->value, path, values));
+            path->pop_back();
+        }
+    }
+    return Status::OK();
+}
+
+static Status add_empty_object_markers_from_json(const std::string& json, const PathInData& prefix,
+                                                 VariantMap* values) {
+    if (json.find("{}") == std::string::npos) {
+        return Status::OK();
+    }
+    rapidjson::Document document;
+    document.Parse(json.data(), json.size());
+    if (document.HasParseError()) {
+        return Status::Corruption("Invalid Parquet VARIANT decoded JSON");
+    }
+    PathInDataBuilder path;
+    path.append(prefix.get_parts(), false);
+    return collect_empty_object_markers(document, &path, values);
+}
+
+static Status parse_json_to_variant_map(const std::string& json, const PathInData& prefix,
+                                        VariantMap* values) {
+    auto parsed_column = ColumnVariant::create(0, false);
+    ParseConfig parse_config;
+    StringRef json_ref(json.data(), json.size());
+    RETURN_IF_CATCH_EXCEPTION(
+            variant_util::parse_json_to_variant(*parsed_column, json_ref, nullptr, parse_config));
+    Field parsed = (*parsed_column)[0];
+    if (!parsed.is_null()) {
+        auto& parsed_values = parsed.get<TYPE_VARIANT>();
+        for (auto& [path, value] : parsed_values) {
+            (*values)[append_path(prefix, path)] = std::move(value);
+        }
+    }
+    RETURN_IF_ERROR(add_empty_object_markers_from_json(json, prefix, values));
+    return Status::OK();
+}
+
+static Status variant_map_to_json(VariantMap values, std::string* json) {
+    auto variant_column = ColumnVariant::create(0, false);
+    RETURN_IF_CATCH_EXCEPTION(
+            variant_column->insert(Field::create_field<TYPE_VARIANT>(std::move(values))));
+    DataTypeSerDe::FormatOptions options;
+    variant_column->serialize_one_row_to_string(0, json, options);
+    return Status::OK();
+}
+
+static bool path_has_prefix(const PathInData& path, const PathInData& prefix) {
+    const auto& parts = path.get_parts();
+    const auto& prefix_parts = prefix.get_parts();
+    if (parts.size() < prefix_parts.size()) {
+        return false;
+    }
+    for (size_t i = 0; i < prefix_parts.size(); ++i) {
+        if (parts[i] != prefix_parts[i]) {
+            return false;
+        }
+    }
+    return true;
+}
+
+static bool has_descendant_path(const VariantMap& values, const PathInData& prefix) {
+    const size_t prefix_size = prefix.get_parts().size();
+    return std::ranges::any_of(values, [&](const auto& entry) {
+        const auto& path = entry.first;
+        return path.get_parts().size() > prefix_size && path_has_prefix(path, prefix);
+    });
+}
+
+static void erase_shadowed_empty_object_markers(VariantMap* values,
+                                                const VariantMap& shadowing_values) {
+    for (auto it = values->begin(); it != values->end();) {
+        if (is_empty_object_marker(it->second) &&
+            (has_descendant_path(*values, it->first) ||
+             has_descendant_path(shadowing_values, it->first))) {
+            it = values->erase(it);
+            continue;
+        }
+        ++it;
+    }
+}
+
+static void erase_shadowed_empty_object_markers(VariantMap* value_values,
+                                                VariantMap* typed_values) {
+    erase_shadowed_empty_object_markers(value_values, *typed_values);
+    erase_shadowed_empty_object_markers(typed_values, *value_values);
+}
+
+static Status check_no_shredded_value_typed_duplicates(const VariantMap& value_values,
+                                                       const VariantMap& typed_values,
+                                                       const PathInData& prefix) {
+    const size_t prefix_size = prefix.get_parts().size();
+    for (const auto& value_entry : value_values) {
+        const auto& value_path = value_entry.first;
+        if (!path_has_prefix(value_path, prefix)) {
+            continue;
+        }
+        if (value_path.get_parts().size() == prefix_size) {
+            if (is_empty_object_marker(value_entry.second) &&
+                !has_descendant_path(typed_values, value_path)) {
+                continue;
+            }
+            if (!typed_values.empty()) {
+                return Status::Corruption(
+                        "Parquet VARIANT residual value conflicts with typed_value at path {}",
+                        value_path.get_path());
+            }
+            continue;
+        }
+        for (const auto& typed_entry : typed_values) {
+            const auto& typed_path = typed_entry.first;
+            if (!path_has_prefix(typed_path, prefix)) {
+                continue;
+            }
+            if (typed_path.get_parts().size() == prefix_size) {
+                if (is_empty_object_marker(typed_entry.second) &&
+                    !has_descendant_path(value_values, typed_path)) {
+                    continue;
+                }
+                return Status::Corruption(
+                        "Parquet VARIANT residual value and typed_value contain duplicate field {}",
+                        value_path.get_parts()[prefix_size].key);
+            }
+            if (value_path.get_parts()[prefix_size] == typed_path.get_parts()[prefix_size]) {
+                if (value_path == typed_path && is_empty_object_marker(value_entry.second) &&
+                    is_empty_object_marker(typed_entry.second)) {
+                    continue;
+                }
+                return Status::Corruption(
+                        "Parquet VARIANT residual value and typed_value contain duplicate field {}",
+                        value_path.get_parts()[prefix_size].key);
+            }
+        }
+    }
+    return Status::OK();
+}
+
+static bool has_direct_typed_parent_null(const std::vector<const NullMap*>& null_maps, size_t row) {
+    return std::ranges::any_of(null_maps, [&](const NullMap* null_map) {
+        DCHECK_LT(row, null_map->size());
+        return (*null_map)[row];
+    });
+}
+
+static void insert_direct_typed_leaf_range(const IColumn& column, size_t start, size_t rows,
+                                           const std::vector<const NullMap*>& parent_null_maps,
+                                           IColumn* variant_leaf) {
+    auto& nullable_leaf = assert_cast<ColumnNullable&>(*variant_leaf);
+    const IColumn* value_column = &column;
+    const NullMap* leaf_null_map = nullptr;
+    if (const auto* nullable_column = check_and_get_column<ColumnNullable>(&column)) {
+        value_column = &nullable_column->get_nested_column();
+        leaf_null_map = &nullable_column->get_null_map_data();
+    }
+
+    nullable_leaf.get_nested_column().insert_range_from(*value_column, start, rows);
+    auto& null_map = nullable_leaf.get_null_map_data();
+    null_map.reserve(null_map.size() + rows);
+    for (size_t i = 0; i < rows; ++i) {
+        const size_t row = start + i;
+        const bool leaf_is_null = leaf_null_map != nullptr && (*leaf_null_map)[row];
+        null_map.push_back(leaf_is_null || has_direct_typed_parent_null(parent_null_maps, row));
+    }
+}
+
+static bool is_temporal_variant_leaf_type(PrimitiveType type) {
+    switch (type) {
+    case TYPE_TIMEV2:
+    case TYPE_DATE:
+    case TYPE_DATETIME:
+    case TYPE_DATEV2:
+    case TYPE_DATETIMEV2:
+    case TYPE_TIMESTAMPTZ:
+        return true;
+    default:
+        return false;
+    }
+}
+
+static bool is_floating_point_variant_leaf_type(PrimitiveType type) {
+    switch (type) {
+    case TYPE_FLOAT:
+    case TYPE_DOUBLE:
+        return true;
+    default:
+        return false;
+    }
+}
+
+static bool is_uuid_typed_value_field(const FieldSchema& field_schema);
+static bool contains_uuid_typed_value_field(const FieldSchema& field_schema);
+
+static DataTypePtr direct_variant_leaf_type(const DataTypePtr& data_type) {
+    const auto& type = remove_nullable(data_type);
+    if (is_temporal_variant_leaf_type(type->get_primitive_type())) {
+        return std::make_shared<DataTypeInt64>();
+    }
+    return type;
+}
+
+static DataTypePtr direct_variant_leaf_type(const FieldSchema& field_schema) {
+    const auto& type = remove_nullable(field_schema.data_type);
+    if (is_uuid_typed_value_field(field_schema)) {
+        return std::make_shared<DataTypeString>();
+    }
+    if (type->get_primitive_type() == TYPE_ARRAY) {
+        DORIS_CHECK(!field_schema.children.empty());
+        DataTypePtr nested_type = direct_variant_leaf_type(field_schema.children[0]);
+        if (field_schema.children[0].data_type->is_nullable()) {
+            nested_type = make_nullable(nested_type);
+        }
+        return std::make_shared<DataTypeArray>(nested_type);
+    }
+    return direct_variant_leaf_type(field_schema.data_type);
+}
+
+static bool contains_temporal_variant_leaf_type(const DataTypePtr& data_type) {
+    const auto& type = remove_nullable(data_type);
+    if (is_temporal_variant_leaf_type(type->get_primitive_type())) {
+        return true;
+    }
+    if (type->get_primitive_type() == TYPE_ARRAY) {
+        return contains_temporal_variant_leaf_type(
+                assert_cast<const DataTypeArray*>(type.get())->get_nested_type());
+    }
+    return false;
+}
+
+static bool contains_floating_point_variant_leaf_type(const DataTypePtr& data_type) {
+    const auto& type = remove_nullable(data_type);
+    if (is_floating_point_variant_leaf_type(type->get_primitive_type())) {
+        return true;
+    }
+    if (type->get_primitive_type() == TYPE_ARRAY) {
+        return contains_floating_point_variant_leaf_type(
+                assert_cast<const DataTypeArray*>(type.get())->get_nested_type());
+    }
+    return false;
+}
+
+static int64_t direct_temporal_variant_value(PrimitiveType type, const IColumn& column,
+                                             size_t row) {
+    switch (type) {
+    case TYPE_TIMEV2:
+        return static_cast<int64_t>(
+                std::llround(assert_cast<const ColumnTimeV2&>(column).get_data()[row]));
+    case TYPE_DATE:
+        return variant_date_value(assert_cast<const ColumnDate&>(column).get_data()[row]);
+    case TYPE_DATETIME:
+        return variant_datetime_value(assert_cast<const ColumnDateTime&>(column).get_data()[row]);
+    case TYPE_DATEV2:
+        return variant_date_value(assert_cast<const ColumnDateV2&>(column).get_data()[row]);
+    case TYPE_DATETIMEV2:
+        return variant_datetime_value(assert_cast<const ColumnDateTimeV2&>(column).get_data()[row]);
+    case TYPE_TIMESTAMPTZ:
+        return variant_datetime_value(
+                assert_cast<const ColumnTimeStampTz&>(column).get_data()[row]);
+    default:
+        DORIS_CHECK(false);
+        return 0;
+    }
+}
+
+static void insert_direct_typed_temporal_leaf_range(
+        PrimitiveType type, const IColumn& column, size_t start, size_t rows,
+        const std::vector<const NullMap*>& parent_null_maps, IColumn* variant_leaf) {
+    auto& nullable_leaf = assert_cast<ColumnNullable&>(*variant_leaf);
+    const IColumn* value_column = &column;
+    const NullMap* leaf_null_map = nullptr;
+    if (const auto* nullable_column = check_and_get_column<ColumnNullable>(&column)) {
+        value_column = &nullable_column->get_nested_column();
+        leaf_null_map = &nullable_column->get_null_map_data();
+    }
+
+    auto& data = assert_cast<ColumnInt64&>(nullable_leaf.get_nested_column()).get_data();
+    data.reserve(data.size() + rows);
+    auto& null_map = nullable_leaf.get_null_map_data();
+    null_map.reserve(null_map.size() + rows);
+    for (size_t i = 0; i < rows; ++i) {
+        const size_t row = start + i;
+        const bool leaf_is_null = leaf_null_map != nullptr && (*leaf_null_map)[row];
+        const bool is_null = leaf_is_null || has_direct_typed_parent_null(parent_null_maps, row);
+        if (is_null) {
+            data.push_back(0);
+            null_map.push_back(1);
+            continue;
+        }
+        data.push_back(direct_temporal_variant_value(type, *value_column, row));
+        null_map.push_back(0);
+    }
+}
+
+static Status insert_direct_typed_uuid_leaf_range(
+        const IColumn& column, size_t start, size_t rows,
+        const std::vector<const NullMap*>& parent_null_maps, IColumn* variant_leaf) {
+    auto& nullable_leaf = assert_cast<ColumnNullable&>(*variant_leaf);
+    const IColumn* value_column = &column;
+    const NullMap* leaf_null_map = nullptr;
+    if (const auto* nullable_column = check_and_get_column<ColumnNullable>(&column)) {
+        value_column = &nullable_column->get_nested_column();
+        leaf_null_map = &nullable_column->get_null_map_data();
+    }
+
+    auto& data = assert_cast<ColumnString&>(nullable_leaf.get_nested_column());
+    auto& null_map = nullable_leaf.get_null_map_data();
+    null_map.reserve(null_map.size() + rows);
+    for (size_t i = 0; i < rows; ++i) {
+        const size_t row = start + i;
+        const bool leaf_is_null = leaf_null_map != nullptr && (*leaf_null_map)[row];
+        const bool is_null = leaf_is_null || has_direct_typed_parent_null(parent_null_maps, row);
+        if (is_null) {
+            data.insert_default();
+            null_map.push_back(1);
+            continue;
+        }
+        StringRef bytes = value_column->get_data_at(row);
+        if (bytes.size != 16) {
+            return Status::Corruption("Parquet VARIANT UUID typed_value has invalid length {}",
+                                      bytes.size);
+        }
+        std::string uuid =
+                parquet::format_variant_uuid(reinterpret_cast<const uint8_t*>(bytes.data));
+        data.insert_data(uuid.data(), uuid.size());
+        null_map.push_back(0);
+    }
+    return Status::OK();
+}
+
+static void append_json_string(std::string_view value, std::string* json) {
+    auto column = ColumnString::create();
+    VectorBufferWriter writer(*column);
+    writer.write_json_string(value);
+    writer.commit();
+    json->append(column->get_data_at(0).data, column->get_data_at(0).size);
+}
+
+static bool is_column_selected(const FieldSchema& field_schema,
+                               const std::set<uint64_t>& column_ids) {
+    return column_ids.empty() || column_ids.find(field_schema.get_column_id()) != column_ids.end();
+}
+
+static bool has_selected_column(const FieldSchema& field_schema,
+                                const std::set<uint64_t>& column_ids) {
+    if (is_column_selected(field_schema, column_ids)) {
+        return true;
+    }
+    return std::any_of(field_schema.children.begin(), field_schema.children.end(),
+                       [&column_ids](const FieldSchema& child) {
+                           return has_selected_column(child, column_ids);
+                       });
+}
+
+static bool is_direct_variant_leaf_type(const DataTypePtr& data_type) {
+    const auto& type = remove_nullable(data_type);
+    switch (type->get_primitive_type()) {
+    case TYPE_BOOLEAN:
+    case TYPE_TINYINT:
+    case TYPE_SMALLINT:
+    case TYPE_INT:
+    case TYPE_BIGINT:
+    case TYPE_LARGEINT:
+    case TYPE_DECIMALV2:
+    case TYPE_DECIMAL32:
+    case TYPE_DECIMAL64:
+    case TYPE_DECIMAL128I:
+    case TYPE_DECIMAL256:
+    case TYPE_FLOAT:
+    case TYPE_DOUBLE:
+    case TYPE_STRING:
+    case TYPE_CHAR:
+    case TYPE_VARCHAR:
+    case TYPE_VARBINARY:
+        return true;
+    case TYPE_TIMEV2:
+    case TYPE_DATE:
+    case TYPE_DATETIME:
+    case TYPE_DATEV2:
+    case TYPE_DATETIMEV2:
+    case TYPE_TIMESTAMPTZ:
+        return true;
+    case TYPE_ARRAY: {
+        const auto* array_type = assert_cast<const DataTypeArray*>(type.get());
+        return is_direct_variant_leaf_type(array_type->get_nested_type());
+    }
+    default:
+        return false;
+    }
+}
+
+static bool can_direct_read_typed_value(const FieldSchema& field_schema, bool allow_variant_wrapper,
+                                        const std::set<uint64_t>& column_ids) {
+    if (!has_selected_column(field_schema, column_ids)) {
+        return true;
+    }
+    if (allow_variant_wrapper && is_variant_wrapper_field(field_schema, false)) {
+        const int value_idx = find_child_idx(field_schema, "value");
+        const int typed_value_idx = find_child_idx(field_schema, "typed_value");
+        return (value_idx < 0 ||
+                !has_selected_column(field_schema.children[value_idx], column_ids)) &&
+               typed_value_idx >= 0 &&
+               can_direct_read_typed_value(field_schema.children[typed_value_idx], false,
+                                           column_ids);
+    }
+
+    const auto& type = remove_nullable(field_schema.data_type);
+    if (type->get_primitive_type() == TYPE_STRUCT) {
+        return std::all_of(field_schema.children.begin(), field_schema.children.end(),
+                           [&column_ids](const FieldSchema& child) {
+                               return can_direct_read_typed_value(child, true, column_ids);
+                           });
+    }
+    return is_direct_variant_leaf_type(field_schema.data_type);
+}
+
+static bool has_selected_direct_typed_leaf(const FieldSchema& field_schema,
+                                           bool allow_variant_wrapper,
+                                           const std::set<uint64_t>& column_ids) {
+    if (!has_selected_column(field_schema, column_ids)) {
+        return false;
+    }
+    if (allow_variant_wrapper && is_variant_wrapper_field(field_schema, false)) {
+        const int typed_value_idx = find_child_idx(field_schema, "typed_value");
+        DCHECK_GE(typed_value_idx, 0);
+        return has_selected_direct_typed_leaf(field_schema.children[typed_value_idx], false,
+                                              column_ids);
+    }
+
+    const auto& type = remove_nullable(field_schema.data_type);
+    if (type->get_primitive_type() == TYPE_STRUCT) {
+        return std::any_of(field_schema.children.begin(), field_schema.children.end(),
+                           [&column_ids](const FieldSchema& child) {
+                               return has_selected_direct_typed_leaf(child, true, column_ids);
+                           });
+    }
+    return is_direct_variant_leaf_type(field_schema.data_type);
+}
+
+static bool can_use_direct_typed_only_value(const FieldSchema& variant_field,
+                                            const std::set<uint64_t>& column_ids) {
+    const int value_idx = find_child_idx(variant_field, "value");
+    const int typed_value_idx = find_child_idx(variant_field, "typed_value");
+    return (value_idx < 0 || !has_selected_column(variant_field.children[value_idx], column_ids)) &&
+           typed_value_idx >= 0 &&
+           has_selected_direct_typed_leaf(variant_field.children[typed_value_idx], false,
+                                          column_ids) &&
+           can_direct_read_typed_value(variant_field.children[typed_value_idx], false, column_ids);
+}
+
+static DataTypePtr make_variant_struct_reader_type(const FieldSchema& field) {
+    DataTypes child_types;
+    Strings child_names;
+    child_types.reserve(field.children.size());
+    child_names.reserve(field.children.size());
+    for (const auto& child : field.children) {
+        child_types.push_back(make_nullable(child.data_type));
+        child_names.push_back(child.name);
+    }
+    return std::make_shared<DataTypeStruct>(child_types, child_names);
+}
+
+static ColumnPtr make_variant_struct_read_column(const FieldSchema& field,
+                                                 const DataTypePtr& variant_struct_type) {
+    if (field.data_type->is_nullable()) {
+        return make_nullable(variant_struct_type)->create_column();
+    }
+    return variant_struct_type->create_column();
+}
+
+static void fill_variant_field_info(FieldWithDataType* value) {
+    FieldInfo info;
+    variant_util::get_field_info(value->field, &info);
+    DCHECK_LE(info.num_dimensions, std::numeric_limits<uint8_t>::max());
+    value->base_scalar_type_id = info.scalar_type_id;
+    value->num_dimensions = static_cast<uint8_t>(info.num_dimensions);
+}
+
+static void fill_variant_leaf_type_info(const DataTypePtr& data_type, FieldWithDataType* value) {
+    auto leaf_type = remove_nullable(data_type);
+    size_t num_dimensions = 0;
+    while (leaf_type->get_primitive_type() == TYPE_ARRAY) {
+        ++num_dimensions;
+        leaf_type = remove_nullable(
+                assert_cast<const DataTypeArray*>(leaf_type.get())->get_nested_type());
+    }
+    DCHECK_LE(num_dimensions, std::numeric_limits<uint8_t>::max());
+    if (value->base_scalar_type_id == INVALID_TYPE) {
+        value->base_scalar_type_id = leaf_type->get_primitive_type();
+    }
+    if (value->num_dimensions == 0 && num_dimensions > 0) {
+        value->num_dimensions = static_cast<uint8_t>(num_dimensions);
+    }
+    if (is_decimal(leaf_type->get_primitive_type())) {
+        value->precision = leaf_type->get_precision();
+        value->scale = leaf_type->get_scale();
+    }
+}
+
+static Status fill_floating_point_variant_field(const Field& field, FieldWithDataType* value) {
+    value->field = field;
+    fill_variant_field_info(value);
+    return Status::OK();
+}
+
+static Status fill_floating_point_variant_field(PrimitiveType type, const Field& field,
+                                                FieldWithDataType* value) {
+    DORIS_CHECK(type == TYPE_FLOAT || type == TYPE_DOUBLE);
+    return fill_floating_point_variant_field(field, value);
+}
+
+static bool is_uuid_typed_value_field(const FieldSchema& field_schema) {
+    return field_schema.parquet_schema.__isset.logicalType &&
+           field_schema.parquet_schema.logicalType.__isset.UUID;
+}
+
+static bool contains_uuid_typed_value_field(const FieldSchema& field_schema) {
+    return is_uuid_typed_value_field(field_schema) ||
+           std::any_of(
+                   field_schema.children.begin(), field_schema.children.end(),
+                   [](const FieldSchema& child) { return contains_uuid_typed_value_field(child); });
+}
+
+static Status uuid_field_to_string(const Field& field, std::string* uuid) {
+    StringRef bytes;
+    switch (field.get_type()) {
+    case TYPE_STRING:
+        bytes = StringRef(field.get<TYPE_STRING>());
+        break;
+    case TYPE_CHAR:
+        bytes = StringRef(field.get<TYPE_CHAR>());
+        break;
+    case TYPE_VARCHAR:
+        bytes = StringRef(field.get<TYPE_VARCHAR>());
+        break;
+    case TYPE_VARBINARY:
+        bytes = field.get<TYPE_VARBINARY>().to_string_ref();
+        break;
+    default:
+        return Status::Corruption("Parquet VARIANT UUID typed_value has unexpected Doris type {}",
+                                  field.get_type_name());
+    }
+    if (bytes.size != 16) {
+        return Status::Corruption("Parquet VARIANT UUID typed_value has invalid length {}",
+                                  bytes.size);
+    }
+    *uuid = parquet::format_variant_uuid(reinterpret_cast<const uint8_t*>(bytes.data));
+    return Status::OK();
+}
+
+static Status fill_uuid_variant_field(const Field& field, FieldWithDataType* value) {
+    std::string uuid;
+    RETURN_IF_ERROR(uuid_field_to_string(field, &uuid));
+    value->field = Field::create_field<TYPE_STRING>(std::move(uuid));
+    value->base_scalar_type_id = TYPE_STRING;
+    return Status::OK();
+}
+
+static Status fill_temporal_variant_field(PrimitiveType type, const Field& field,
+                                          FieldWithDataType* value) {
+    switch (type) {
+    case TYPE_TIMEV2:
+        value->field = Field::create_field<TYPE_BIGINT>(
+                static_cast<int64_t>(std::llround(field.get<TYPE_TIMEV2>())));
+        value->base_scalar_type_id = TYPE_BIGINT;
+        return Status::OK();
+    case TYPE_DATE:
+        value->field = Field::create_field<TYPE_BIGINT>(variant_date_value(field.get<TYPE_DATE>()));
+        value->base_scalar_type_id = TYPE_BIGINT;
+        return Status::OK();
+    case TYPE_DATETIME:
+        value->field = Field::create_field<TYPE_BIGINT>(
+                variant_datetime_value(field.get<TYPE_DATETIME>()));
+        value->base_scalar_type_id = TYPE_BIGINT;
+        return Status::OK();
+    case TYPE_DATEV2:
+        value->field =
+                Field::create_field<TYPE_BIGINT>(variant_date_value(field.get<TYPE_DATEV2>()));
+        value->base_scalar_type_id = TYPE_BIGINT;
+        return Status::OK();
+    case TYPE_DATETIMEV2:
+        value->field = Field::create_field<TYPE_BIGINT>(
+                variant_datetime_value(field.get<TYPE_DATETIMEV2>()));
+        value->base_scalar_type_id = TYPE_BIGINT;
+        return Status::OK();
+    case TYPE_TIMESTAMPTZ:
+        value->field = Field::create_field<TYPE_BIGINT>(
+                variant_datetime_value(field.get<TYPE_TIMESTAMPTZ>()));
+        value->base_scalar_type_id = TYPE_BIGINT;
+        return Status::OK();
+    default:
+        DORIS_CHECK(false);
+        return Status::OK();
+    }
+}
+
+static uint8_t direct_array_dimensions(const DataTypePtr& data_type) {
+    uint8_t num_dimensions = 0;
+    auto type = remove_nullable(data_type);
+    while (type->get_primitive_type() == TYPE_ARRAY) {
+        ++num_dimensions;
+        type = remove_nullable(assert_cast<const DataTypeArray*>(type.get())->get_nested_type());
+    }
+    return num_dimensions;
+}
+
+static PrimitiveType direct_array_base_scalar_type(const FieldSchema& field_schema) {
+    auto leaf_type = remove_nullable(direct_variant_leaf_type(field_schema));
+    while (leaf_type->get_primitive_type() == TYPE_ARRAY) {
+        leaf_type = remove_nullable(
+                assert_cast<const DataTypeArray*>(leaf_type.get())->get_nested_type());
+    }
+    return leaf_type->get_primitive_type();
+}
+
+static Status convert_direct_array_value(const FieldSchema& field_schema, const Field& field,
+                                         Field* converted) {
+    if (field.is_null()) {
+        *converted = Field();
+        return Status::OK();
+    }
+
+    const auto& type = remove_nullable(field_schema.data_type);
+    if (type->get_primitive_type() == TYPE_ARRAY) {
+        if (field_schema.children.empty()) {
+            return Status::Corruption("Parquet VARIANT array typed_value has no element schema");
+        }
+        Array converted_elements;
+        const auto& elements = field.get<TYPE_ARRAY>();
+        converted_elements.reserve(elements.size());
+        for (const auto& element : elements) {
+            Field converted_element;
+            RETURN_IF_ERROR(convert_direct_array_value(field_schema.children[0], element,
+                                                       &converted_element));
+            converted_elements.push_back(std::move(converted_element));
+        }
+        *converted = Field::create_field<TYPE_ARRAY>(std::move(converted_elements));
+        return Status::OK();
+    }
+
+    if (is_uuid_typed_value_field(field_schema)) {
+        FieldWithDataType value;
+        RETURN_IF_ERROR(fill_uuid_variant_field(field, &value));
+        *converted = std::move(value.field);
+        return Status::OK();
+    }
+    if (is_temporal_variant_leaf_type(type->get_primitive_type())) {
+        FieldWithDataType value;
+        RETURN_IF_ERROR(fill_temporal_variant_field(type->get_primitive_type(), field, &value));
+        *converted = std::move(value.field);
+        return Status::OK();
+    }
+    if (is_floating_point_variant_leaf_type(type->get_primitive_type())) {
+        FieldWithDataType value;
+        RETURN_IF_ERROR(
+                fill_floating_point_variant_field(type->get_primitive_type(), field, &value));
+        *converted = std::move(value.field);
+        return Status::OK();
+    }
+
+    *converted = field;
+    return Status::OK();
+}
+
+static Status insert_direct_typed_array_leaf_range(
+        const FieldSchema& field_schema, const IColumn& column, size_t start, size_t rows,
+        const std::vector<const NullMap*>& parent_null_maps, IColumn* variant_leaf) {
+    auto& nullable_leaf = assert_cast<ColumnNullable&>(*variant_leaf);
+    const IColumn* value_column = &column;
+    const NullMap* leaf_null_map = nullptr;
+    if (const auto* nullable_column = check_and_get_column<ColumnNullable>(&column)) {
+        value_column = &nullable_column->get_nested_column();
+        leaf_null_map = &nullable_column->get_null_map_data();
+    }
+
+    auto& data = nullable_leaf.get_nested_column();
+    auto& null_map = nullable_leaf.get_null_map_data();
+    null_map.reserve(null_map.size() + rows);
+    for (size_t i = 0; i < rows; ++i) {
+        const size_t row = start + i;
+        const bool leaf_is_null = leaf_null_map != nullptr && (*leaf_null_map)[row];
+        const bool is_null = leaf_is_null || has_direct_typed_parent_null(parent_null_maps, row);
+        if (is_null) {
+            data.insert_default();
+            null_map.push_back(1);
+            continue;
+        }
+
+        Field field;
+        value_column->get(row, field);
+        Field converted;
+        RETURN_IF_ERROR(convert_direct_array_value(field_schema, field, &converted));
+        data.insert(converted);
+        null_map.push_back(0);
+    }
+    return Status::OK();
+}
+
+static Status fill_direct_array_variant_field(const FieldSchema& field_schema, const Field& field,
+                                              FieldWithDataType* value, bool* present) {
+    if (field.is_null()) {
+        *present = false;
+        return Status::OK();
+    }
+    *present = true;
+    RETURN_IF_ERROR(convert_direct_array_value(field_schema, field, &value->field));
+    value->base_scalar_type_id = direct_array_base_scalar_type(field_schema);
+    value->num_dimensions = direct_array_dimensions(field_schema.data_type);
+    return Status::OK();
+}
+
+static Status field_to_variant_field(const FieldSchema& field_schema, const Field& field,
+                                     FieldWithDataType* value, bool* present) {
+    if (field.is_null()) {
+        *present = false;
+        return Status::OK();
+    }
+    *present = true;
+    if (is_uuid_typed_value_field(field_schema)) {
+        return fill_uuid_variant_field(field, value);
+    }
+    const DataTypePtr& type = remove_nullable(field_schema.data_type);
+    if (is_temporal_variant_leaf_type(type->get_primitive_type())) {
+        return fill_temporal_variant_field(type->get_primitive_type(), field, value);
+    }
+    switch (type->get_primitive_type()) {
+    case TYPE_BOOLEAN:
+    case TYPE_TINYINT:
+    case TYPE_SMALLINT:
+    case TYPE_INT:
+    case TYPE_BIGINT:
+    case TYPE_LARGEINT:
+    case TYPE_DECIMALV2:
+    case TYPE_DECIMAL32:
+    case TYPE_DECIMAL64:
+    case TYPE_DECIMAL128I:
+    case TYPE_DECIMAL256:
+    case TYPE_STRING:
+    case TYPE_CHAR:
+    case TYPE_VARCHAR:
+    case TYPE_VARBINARY:
+    case TYPE_ARRAY:
+        value->field = field;
+        fill_variant_field_info(value);
+        fill_variant_leaf_type_info(type, value);
+        return Status::OK();
+    case TYPE_FLOAT:
+    case TYPE_DOUBLE:
+        return fill_floating_point_variant_field(field, value);
+    default:
+        return Status::Corruption("Unsupported Parquet VARIANT typed_value Doris type {}",
+                                  type->get_name());
+    }
+}
+
+static Status typed_value_to_json(const FieldSchema& typed_value_field, const Field& field,
+                                  const std::string& metadata, std::string* json, bool* present);
+static Status typed_map_to_variant_map(const FieldSchema& typed_value_field, const Field& field,
+                                       const std::string& metadata, PathInDataBuilder* path,
+                                       VariantMap* values, bool* present,
+                                       std::deque<std::string>* string_values);
+
+static Status serialize_field_to_json(const DataTypePtr& data_type, const Field& field,
+                                      std::string* json) {
+    MutableColumnPtr column = data_type->create_column();
+    column->insert(field);
+
+    auto json_column = ColumnString::create();
+    VectorBufferWriter writer(*json_column);
+    auto serde = data_type->get_serde();
+    DataTypeSerDe::FormatOptions options;
+    RETURN_IF_ERROR(serde->serialize_one_cell_to_json(*column, 0, writer, options));
+    writer.commit();
+    *json = json_column->get_data_at(0).to_string();
+    return Status::OK();
+}
+
+static Status scalar_typed_value_to_json(const FieldSchema& field_schema, const Field& field,
+                                         std::string* json, bool* present) {
+    FieldWithDataType value;
+    RETURN_IF_ERROR(field_to_variant_field(field_schema, field, &value, present));
+    if (!*present) {
+        return Status::OK();
+    }
+    if (value.field.is_null()) {
+        *json = "null";
+        return Status::OK();
+    }
+    if (!is_uuid_typed_value_field(field_schema) &&
+        remove_nullable(field_schema.data_type)->get_primitive_type() == TYPE_VARBINARY) {
+        return Status::NotSupported(
+                "Parquet VARIANT binary typed_value cannot be serialized to JSON");
+    }
+
+    DataTypePtr json_type;
+    if (value.base_scalar_type_id != PrimitiveType::INVALID_TYPE) {
+        json_type = DataTypeFactory::instance().create_data_type(value.base_scalar_type_id, false,
+                                                                 value.precision, value.scale);
+    } else {
+        json_type = remove_nullable(field_schema.data_type);
+    }
+    return serialize_field_to_json(json_type, value.field, json);
+}
+
+static Status resolve_variant_metadata(const FieldSchema& variant_field, const Struct& fields,
+                                       const std::string* inherited_metadata, std::string* metadata,
+                                       bool* has_metadata) {
+    *has_metadata = false;
+    if (inherited_metadata != nullptr) {
+        *metadata = *inherited_metadata;
+        *has_metadata = true;
+    }
+
+    const int metadata_idx = find_child_idx(variant_field, "metadata");
+    if (metadata_idx >= 0) {
+        bool metadata_present = false;
+        RETURN_IF_ERROR(get_binary_field(fields[metadata_idx], metadata, &metadata_present));
+        *has_metadata = metadata_present;
+    }
+    return Status::OK();
+}
+
+static Status variant_typed_value_to_json(const FieldSchema& variant_field, const Struct& fields,
+                                          const std::string& metadata, std::string* typed_json,
+                                          bool* typed_present) {
+    *typed_present = false;
+    const int typed_value_idx = find_child_idx(variant_field, "typed_value");
+    if (typed_value_idx < 0) {
+        return Status::OK();
+    }
+    return typed_value_to_json(variant_field.children[typed_value_idx], fields[typed_value_idx],
+                               metadata, typed_json, typed_present);
+}
+
+static Status variant_residual_value_to_json(const FieldSchema& variant_field, const Struct& fields,
+                                             const std::string& metadata, bool has_metadata,
+                                             std::string* value_json, bool* value_present) {
+    *value_present = false;
+    const int value_idx = find_child_idx(variant_field, "value");
+    if (value_idx < 0) {
+        return Status::OK();
+    }
+
+    std::string value;
+    RETURN_IF_ERROR(get_binary_field(fields[value_idx], &value, value_present));
+    if (!*value_present) {
+        return Status::OK();
+    }
+    if (!has_metadata) {
+        return Status::Corruption("Parquet VARIANT value is present without metadata");
+    }
+    return parquet::decode_variant_to_json(StringRef(metadata.data(), metadata.size()),
+                                           StringRef(value.data(), value.size()), value_json);
+}
+
+static Status merge_variant_value_and_typed_json(const std::string& value_json,
+                                                 const std::string& typed_json, std::string* json) {
+    VariantMap value_values;
+    RETURN_IF_ERROR(parse_json_to_variant_map(value_json, PathInData(), &value_values));
+    VariantMap typed_values;
+    RETURN_IF_ERROR(parse_json_to_variant_map(typed_json, PathInData(), &typed_values));
+    erase_shadowed_empty_object_markers(&value_values, &typed_values);
+    auto root_value = value_values.find(PathInData());
+    if (root_value != value_values.end() && !is_empty_object_marker(root_value->second)) {
+        return Status::Corruption(
+                "Parquet VARIANT has conflicting non-object value and typed_value");
+    }
+    RETURN_IF_ERROR(
+            check_no_shredded_value_typed_duplicates(value_values, typed_values, PathInData()));
+    value_values.merge(std::move(typed_values));
+    return variant_map_to_json(std::move(value_values), json);
+}
+
+static Status variant_to_json(const FieldSchema& variant_field, const Field& field,
+                              const std::string* inherited_metadata, std::string* json,
+                              bool* present) {
+    if (field.is_null()) {
+        *present = false;
+        return Status::OK();
+    }
+
+    const auto& fields = field.get<TYPE_STRUCT>();
+    std::string metadata;
+    bool has_metadata = false;
+    RETURN_IF_ERROR(resolve_variant_metadata(variant_field, fields, inherited_metadata, &metadata,
+                                             &has_metadata));
+
+    std::string typed_json;
+    bool typed_present = false;
+    RETURN_IF_ERROR(variant_typed_value_to_json(variant_field, fields, metadata, &typed_json,
+                                                &typed_present));
+
+    std::string value_json;
+    bool value_present = false;
+    RETURN_IF_ERROR(variant_residual_value_to_json(variant_field, fields, metadata, has_metadata,
+                                                   &value_json, &value_present));
+
+    if (value_present && typed_present) {
+        RETURN_IF_ERROR(merge_variant_value_and_typed_json(value_json, typed_json, json));
+        *present = true;
+        return Status::OK();
+    }
+
+    if (typed_present) {
+        *json = std::move(typed_json);
+        *present = true;
+        return Status::OK();
+    }
+    if (value_present) {
+        *json = std::move(value_json);
+        *present = true;
+        return Status::OK();
+    }
+
+    *present = false;
+    return Status::OK();
+}
+
+static Status shredded_field_to_json(const FieldSchema& field_schema, const Field& field,
+                                     const std::string& metadata, std::string* json, bool* present,
+                                     bool allow_scalar_typed_value_only_wrapper) {
+    if (is_variant_wrapper_field(field_schema, allow_scalar_typed_value_only_wrapper)) {
+        return variant_to_json(field_schema, field, &metadata, json, present);
+    }
+    if (is_value_only_variant_wrapper_candidate(field_schema)) {
+        Status st = variant_to_json(field_schema, field, &metadata, json, present);
+        if (st.ok()) {
+            return st;
+        }
+        if (!st.is<ErrorCode::CORRUPTION>()) {
+            return st;
+        }
+    }
+    return typed_value_to_json(field_schema, field, metadata, json, present);
+}
+
+static Status typed_array_to_json(const FieldSchema& typed_value_field, const Field& field,
+                                  const std::string& metadata, std::string* json, bool* present) {
+    if (field.is_null()) {
+        *present = false;
+        return Status::OK();
+    }
+    if (typed_value_field.children.empty()) {
+        return Status::Corruption("Parquet VARIANT array typed_value has no element schema");
+    }
+
+    const auto& elements = field.get<TYPE_ARRAY>();
+    const auto& element_schema = typed_value_field.children[0];
+    json->clear();
+    json->push_back('[');
+    for (size_t i = 0; i < elements.size(); ++i) {
+        if (i != 0) {
+            json->push_back(',');
+        }
+        std::string element_json;
+        bool element_present = false;
+        RETURN_IF_ERROR(shredded_field_to_json(element_schema, elements[i], metadata, &element_json,
+                                               &element_present, true));
+        if (!element_present) {
+            if (elements[i].is_null()) {
+                json->append("null");
+                continue;
+            }
+            return Status::Corruption("Parquet VARIANT array element is missing");
+        }
+        json->append(element_json);
+    }
+    json->push_back(']');
+    *present = true;
+    return Status::OK();
+}
+
+static Status typed_struct_to_json(const FieldSchema& typed_value_field, const Field& field,
+                                   const std::string& metadata, std::string* json, bool* present) {
+    if (field.is_null()) {
+        *present = false;
+        return Status::OK();
+    }
+
+    const auto& fields = field.get<TYPE_STRUCT>();
+    json->clear();
+    json->push_back('{');
+    bool first = true;
+    for (int i = 0; i < typed_value_field.children.size(); ++i) {
+        std::string child_json;
+        bool child_present = false;
+        RETURN_IF_ERROR(shredded_field_to_json(typed_value_field.children[i], fields[i], metadata,
+                                               &child_json, &child_present, false));
+        if (!child_present) {
+            continue;
+        }
+        if (!first) {
+            json->push_back(',');
+        }
+        append_json_string(typed_value_field.children[i].name, json);
+        json->push_back(':');
+        json->append(child_json);
+        first = false;
+    }
+    json->push_back('}');
+    *present = true;
+    return Status::OK();
+}
+
+static Status typed_value_to_json(const FieldSchema& typed_value_field, const Field& field,
+                                  const std::string& metadata, std::string* json, bool* present) {
+    const DataTypePtr& typed_type = remove_nullable(typed_value_field.data_type);
+    switch (typed_type->get_primitive_type()) {
+    case TYPE_STRUCT:
+        return typed_struct_to_json(typed_value_field, field, metadata, json, present);
+    case TYPE_ARRAY:
+        return typed_array_to_json(typed_value_field, field, metadata, json, present);
+    case TYPE_MAP: {
+        VariantMap values;
+        PathInDataBuilder path;
+        std::deque<std::string> string_values;
+        RETURN_IF_ERROR(typed_map_to_variant_map(typed_value_field, field, metadata, &path, &values,
+                                                 present, &string_values));
+        if (!*present) {
+            return Status::OK();
+        }
+        return variant_map_to_json(std::move(values), json);
+    }
+    default:
+        return scalar_typed_value_to_json(typed_value_field, field, json, present);
+    }
+}
+
+static Status typed_value_to_variant_map(const FieldSchema& typed_value_field, const Field& field,
+                                         const std::string& metadata, PathInDataBuilder* path,
+                                         VariantMap* values, bool* present,
+                                         std::deque<std::string>* string_values);
+
+static Status variant_to_variant_map(const FieldSchema& variant_field, const Field& field,
+                                     const std::string* inherited_metadata, PathInDataBuilder* path,
+                                     VariantMap* values, bool* present,
+                                     std::deque<std::string>* string_values) {
+    if (field.is_null()) {
+        *present = false;
+        return Status::OK();
+    }
+    const auto& fields = field.get<TYPE_STRUCT>();
+    const int metadata_idx = find_child_idx(variant_field, "metadata");
+    const int value_idx = find_child_idx(variant_field, "value");
+    const int typed_value_idx = find_child_idx(variant_field, "typed_value");
+
+    std::string metadata;
+    bool has_metadata = false;
+    if (inherited_metadata != nullptr) {
+        metadata = *inherited_metadata;
+        has_metadata = true;
+    }
+    if (metadata_idx >= 0) {
+        bool metadata_present = false;
+        RETURN_IF_ERROR(get_binary_field(fields[metadata_idx], &metadata, &metadata_present));
+        has_metadata = metadata_present;
+    }
+
+    VariantMap value_values;
+    bool value_present = false;
+    const PathInData current_path = path->build();
+    if (value_idx >= 0) {
+        std::string value;
+        RETURN_IF_ERROR(get_binary_field(fields[value_idx], &value, &value_present));
+        if (value_present) {
+            if (!has_metadata) {
+                return Status::Corruption("Parquet VARIANT value is present without metadata");
+            }
+            RETURN_IF_ERROR(parquet::decode_variant_to_variant_map(
+                    StringRef(metadata.data(), metadata.size()),
+                    StringRef(value.data(), value.size()), current_path, &value_values,
+                    string_values));
+        }
+    }
+
+    VariantMap typed_values;
+    bool typed_present = false;
+    if (typed_value_idx >= 0) {
+        RETURN_IF_ERROR(typed_value_to_variant_map(variant_field.children[typed_value_idx],
+                                                   fields[typed_value_idx], metadata, path,
+                                                   &typed_values, &typed_present, string_values));
+    }
+
+    erase_shadowed_empty_object_markers(&value_values, &typed_values);
+    auto current_value = value_values.find(current_path);
+    if (value_present && typed_present && current_value != value_values.end() &&
+        !is_empty_object_marker(current_value->second)) {
+        return Status::Corruption(
+                "Parquet VARIANT has conflicting non-object value and typed_value");
+    }
+    RETURN_IF_ERROR(
+            check_no_shredded_value_typed_duplicates(value_values, typed_values, current_path));
+    values->merge(std::move(value_values));
+    values->merge(std::move(typed_values));
+    *present = value_present || typed_present;
+    return Status::OK();
+}
+
+static Status shredded_field_to_variant_map(const FieldSchema& field_schema, const Field& field,
+                                            const std::string& metadata, PathInDataBuilder* path,
+                                            VariantMap* values, bool* present,
+                                            std::deque<std::string>* string_values) {
+    if (is_variant_wrapper_field(field_schema, false)) {
+        return variant_to_variant_map(field_schema, field, &metadata, path, values, present,
+                                      string_values);
+    }
+    if (is_value_only_variant_wrapper_candidate(field_schema)) {
+        Status st = variant_to_variant_map(field_schema, field, &metadata, path, values, present,
+                                           string_values);
+        if (st.ok()) {
+            return st;
+        }
+        if (!st.is<ErrorCode::CORRUPTION>()) {
+            return st;
+        }
+    }
+    return typed_value_to_variant_map(field_schema, field, metadata, path, values, present,
+                                      string_values);
+}
+
+static Status append_typed_field_to_variant_map(const FieldSchema& typed_value_field,
+                                                const Field& field, PathInDataBuilder* path,
+                                                VariantMap* values, bool* present) {
+    FieldWithDataType value;
+    RETURN_IF_ERROR(field_to_variant_field(typed_value_field, field, &value, present));
+    if (*present) {
+        (*values)[path->build()] = std::move(value);
+    }
+    return Status::OK();
+}
+
+static void move_variant_map_to_field(VariantMap&& element_values, FieldWithDataType* value) {
+    if (element_values.size() == 1 && element_values.begin()->first.empty()) {
+        *value = std::move(element_values.begin()->second);
+        return;
+    }
+    value->field = Field::create_field<TYPE_VARIANT>(std::move(element_values));
+    fill_variant_field_info(value);
+}
+
+static Status typed_array_to_variant_map(const FieldSchema& typed_value_field, const Field& field,
+                                         const std::string& metadata, PathInDataBuilder* path,
+                                         VariantMap* values, bool* present,
+                                         std::deque<std::string>* string_values) {
+    if ((contains_uuid_typed_value_field(typed_value_field) ||
+         contains_temporal_variant_leaf_type(typed_value_field.data_type) ||
+         contains_floating_point_variant_leaf_type(typed_value_field.data_type)) &&
+        is_direct_variant_leaf_type(typed_value_field.data_type)) {
+        FieldWithDataType value;
+        RETURN_IF_ERROR(fill_direct_array_variant_field(typed_value_field, field, &value, present));
+        if (*present) {
+            (*values)[path->build()] = std::move(value);
+        }
+        return Status::OK();
+    }
+    if (is_direct_variant_leaf_type(typed_value_field.data_type)) {
+        return append_typed_field_to_variant_map(typed_value_field, field, path, values, present);
+    }
+
+    if (field.is_null()) {
+        *present = false;
+        return Status::OK();
+    }
+    if (typed_value_field.children.empty()) {
+        return Status::Corruption("Parquet VARIANT array typed_value has no element schema");
+    }
+
+    const auto& elements = field.get<TYPE_ARRAY>();
+    const auto& element_schema = typed_value_field.children[0];
+    Array array;
+    array.reserve(elements.size());
+    for (const auto& element : elements) {
+        VariantMap element_values;
+        bool element_present = false;
+        PathInDataBuilder element_path;
+        RETURN_IF_ERROR(shredded_field_to_variant_map(element_schema, element, metadata,
+                                                      &element_path, &element_values,
+                                                      &element_present, string_values));
+        if (!element_present) {
+            if (element.is_null()) {
+                array.push_back(Field());
+                continue;
+            }
+            return Status::Corruption("Parquet VARIANT array element is missing");
+        }
+
+        FieldWithDataType element_value;
+        move_variant_map_to_field(std::move(element_values), &element_value);
+        array.push_back(std::move(element_value.field));
+    }
+
+    FieldWithDataType value;
+    const size_t elements_count = array.size();
+    value.field = Field::create_field<TYPE_ARRAY>(std::move(array));
+    fill_variant_field_info(&value);
+    if (value.base_scalar_type_id == INVALID_TYPE) {
+        RETURN_IF_ERROR(make_jsonb_field(make_null_array_json(elements_count), &value));
+    }
+    (*values)[path->build()] = std::move(value);
+    *present = true;
+    return Status::OK();
+}
+
+static Status typed_map_to_variant_map(const FieldSchema& typed_value_field, const Field& field,
+                                       const std::string& metadata, PathInDataBuilder* path,
+                                       VariantMap* values, bool* present,
+                                       std::deque<std::string>* string_values) {
+    if (field.is_null()) {
+        *present = false;
+        return Status::OK();
+    }
+    if (typed_value_field.children.size() != 2) {
+        return Status::Corruption("Parquet VARIANT map typed_value has {} child fields",
+                                  typed_value_field.children.size());
+    }
+
+    const auto& map = field.get<TYPE_MAP>();
+    DORIS_CHECK(map.size() == 2);
+    DORIS_CHECK(map[0].get_type() == TYPE_ARRAY);
+    DORIS_CHECK(map[1].get_type() == TYPE_ARRAY);
+    const auto& keys = map[0].get<TYPE_ARRAY>();
+    const auto& value_fields = map[1].get<TYPE_ARRAY>();
+    DORIS_CHECK(keys.size() == value_fields.size());
+
+    if (keys.empty()) {
+        RETURN_IF_ERROR(insert_empty_object_marker(path->build(), values));
+        *present = true;
+        return Status::OK();
+    }
+
+    std::set<std::string> object_keys;
+    const FieldSchema& key_field = typed_value_field.children[0];
+    const FieldSchema& value_field = typed_value_field.children[1];
+    for (size_t i = 0; i < keys.size(); ++i) {
+        std::string key;
+        bool key_present = false;
+        RETURN_IF_ERROR(get_binary_field(keys[i], &key, &key_present));
+        if (!key_present) {
+            return Status::Corruption("Parquet VARIANT map typed_value has null key {}",
+                                      key_field.name);
+        }
+        if (!object_keys.insert(key).second) {
+            return Status::Corruption("Parquet VARIANT map typed_value has duplicate key {}", key);
+        }
+
+        path->append(key, false);
+        bool value_present = false;
+        Status st = shredded_field_to_variant_map(value_field, value_fields[i], metadata, path,
+                                                  values, &value_present, string_values);
+        if (!st.ok()) {
+            path->pop_back();
+            return st;
+        }
+        if (!value_present) {
+            (*values)[path->build()] = FieldWithDataType {.field = Field()};
+        }
+        path->pop_back();
+    }
+    *present = true;
+    return Status::OK();
+}
+
+static Status typed_value_to_variant_map(const FieldSchema& typed_value_field, const Field& field,
+                                         const std::string& metadata, PathInDataBuilder* path,
+                                         VariantMap* values, bool* present,
+                                         std::deque<std::string>* string_values) {
+    if (field.is_null()) {
+        *present = false;
+        return Status::OK();
+    }
+    const DataTypePtr& typed_type = remove_nullable(typed_value_field.data_type);
+    if (typed_type->get_primitive_type() == TYPE_STRUCT) {
+        const auto& fields = field.get<TYPE_STRUCT>();
+        *present = true;
+        bool has_present_child = false;
+        for (int i = 0; i < typed_value_field.children.size(); ++i) {
+            path->append(typed_value_field.children[i].name, false);
+            bool child_present = false;
+            RETURN_IF_ERROR(shredded_field_to_variant_map(typed_value_field.children[i], fields[i],
+                                                          metadata, path, values, &child_present,
+                                                          string_values));
+            has_present_child |= child_present;
+            path->pop_back();
+        }
+        if (!has_present_child) {
+            RETURN_IF_ERROR(insert_empty_object_marker(path->build(), values));
+        }
+        return Status::OK();
+    }
+    if (typed_type->get_primitive_type() == TYPE_ARRAY) {
+        return typed_array_to_variant_map(typed_value_field, field, metadata, path, values, present,
+                                          string_values);
+    }
+    if (typed_type->get_primitive_type() == TYPE_MAP) {
+        return typed_map_to_variant_map(typed_value_field, field, metadata, path, values, present,
+                                        string_values);
+    }
+
+    return append_typed_field_to_variant_map(typed_value_field, field, path, values, present);
+}
+
+static bool direct_typed_value_present_at(const FieldSchema& field_schema, const IColumn& column,
+                                          size_t row, bool allow_variant_wrapper,
+                                          const std::set<uint64_t>& column_ids,
+                                          const std::vector<const NullMap*>& parent_null_maps) {
+    if (!has_selected_column(field_schema, column_ids) ||
+        has_direct_typed_parent_null(parent_null_maps, row)) {
+        return false;
+    }
+
+    const IColumn* value_column = &column;
+    if (const auto* nullable_column = check_and_get_column<ColumnNullable>(&column)) {
+        const auto& null_map = nullable_column->get_null_map_data();
+        DCHECK_LT(row, null_map.size());
+        if (null_map[row]) {
+            return false;
+        }
+        value_column = &nullable_column->get_nested_column();
+    }
+
+    if (allow_variant_wrapper && is_variant_wrapper_field(field_schema, false)) {
+        const int typed_value_idx = find_child_idx(field_schema, "typed_value");
+        DCHECK_GE(typed_value_idx, 0);
+        const auto& typed_struct = assert_cast<const ColumnStruct&>(*value_column);
+        return direct_typed_value_present_at(field_schema.children[typed_value_idx],
+                                             typed_struct.get_column(typed_value_idx), row, false,
+                                             column_ids, parent_null_maps);
+    }
+
+    return true;
+}
+
+static Status append_direct_typed_empty_object_markers(
+        const FieldSchema& field_schema, const ColumnStruct& struct_column, size_t start,
+        size_t rows, PathInDataBuilder* path, ColumnVariant* batch,
+        const std::set<uint64_t>& column_ids, const std::vector<const NullMap*>& parent_null_maps) {
+    DataTypePtr marker_type = make_nullable(std::make_shared<DataTypeJsonb>());
+    MutableColumnPtr marker_column = marker_type->create_column();
+    marker_column->insert_default();
+    bool has_marker = false;
+
+    const PathInData marker_path = path->build();
+    Field empty_object;
+    RETURN_IF_ERROR(make_empty_object_field(&empty_object));
+    for (size_t i = 0; i < rows; ++i) {
+        const size_t row = start + i;
+        if (has_direct_typed_parent_null(parent_null_maps, row)) {
+            marker_column->insert_default();
+            has_marker |= marker_path.empty();
+            continue;
+        }
+
+        bool has_present_child = false;
+        for (int child_idx = 0; child_idx < field_schema.children.size(); ++child_idx) {
+            if (direct_typed_value_present_at(field_schema.children[child_idx],
+                                              struct_column.get_column(child_idx), row, true,
+                                              column_ids, parent_null_maps)) {
+                has_present_child = true;
+                break;
+            }
+        }
+
+        if (has_present_child) {
+            marker_column->insert_default();
+            continue;
+        }
+        marker_column->insert(empty_object);
+        has_marker = true;
+    }
+
+    if (!has_marker) {
+        return Status::OK();
+    }
+    if (!batch->add_sub_column(marker_path, std::move(marker_column), marker_type)) {
+        return Status::Corruption("Failed to add Parquet VARIANT empty typed object marker {}",
+                                  marker_path.get_path());
+    }
+    return Status::OK();
+}
+
+static Status append_direct_typed_column_to_batch(const FieldSchema& field_schema,
+                                                  const IColumn& column, size_t start, size_t rows,
+                                                  PathInDataBuilder* path, ColumnVariant* batch,
+                                                  bool allow_variant_wrapper,
+                                                  const std::set<uint64_t>& column_ids,
+                                                  std::vector<const NullMap*> parent_null_maps) {
+    if (!has_selected_column(field_schema, column_ids)) {
+        return Status::OK();
+    }
+
+    const IColumn* value_column = &column;
+    if (const auto* nullable_column = check_and_get_column<ColumnNullable>(&column)) {
+        parent_null_maps.push_back(&nullable_column->get_null_map_data());
+        value_column = &nullable_column->get_nested_column();
+    }
+
+    if (allow_variant_wrapper && is_variant_wrapper_field(field_schema, false)) {
+        const int typed_value_idx = find_child_idx(field_schema, "typed_value");
+        DCHECK_GE(typed_value_idx, 0);
+        const auto& typed_struct = assert_cast<const ColumnStruct&>(*value_column);
+        return append_direct_typed_column_to_batch(
+                field_schema.children[typed_value_idx], typed_struct.get_column(typed_value_idx),
+                start, rows, path, batch, false, column_ids, parent_null_maps);
+    }
+
+    const auto& type = remove_nullable(field_schema.data_type);
+    if (type->get_primitive_type() == TYPE_STRUCT) {
+        const auto& struct_column = assert_cast<const ColumnStruct&>(*value_column);
+        for (int i = 0; i < field_schema.children.size(); ++i) {
+            if (!has_selected_column(field_schema.children[i], column_ids)) {
+                continue;
+            }
+            path->append(field_schema.children[i].name, false);
+            RETURN_IF_ERROR(append_direct_typed_column_to_batch(
+                    field_schema.children[i], struct_column.get_column(i), start, rows, path, batch,
+                    true, column_ids, parent_null_maps));
+            path->pop_back();
+        }
+        return append_direct_typed_empty_object_markers(field_schema, struct_column, start, rows,
+                                                        path, batch, column_ids, parent_null_maps);
+    }
+
+    DataTypePtr variant_leaf_type = make_nullable(direct_variant_leaf_type(field_schema));
+    MutableColumnPtr variant_leaf = variant_leaf_type->create_column();
+    variant_leaf->insert_default();
+    if (type->get_primitive_type() == TYPE_ARRAY &&
+        (contains_uuid_typed_value_field(field_schema) ||
+         contains_temporal_variant_leaf_type(field_schema.data_type) ||
+         contains_floating_point_variant_leaf_type(field_schema.data_type))) {
+        RETURN_IF_ERROR(insert_direct_typed_array_leaf_range(
+                field_schema, *value_column, start, rows, parent_null_maps, variant_leaf.get()));
+    } else if (is_uuid_typed_value_field(field_schema)) {
+        RETURN_IF_ERROR(insert_direct_typed_uuid_leaf_range(*value_column, start, rows,
+                                                            parent_null_maps, variant_leaf.get()));
+    } else if (is_temporal_variant_leaf_type(type->get_primitive_type())) {
+        insert_direct_typed_temporal_leaf_range(type->get_primitive_type(), *value_column, start,
+                                                rows, parent_null_maps, variant_leaf.get());
+    } else {
+        insert_direct_typed_leaf_range(*value_column, start, rows, parent_null_maps,
+                                       variant_leaf.get());
+    }
+    if (!batch->add_sub_column(path->build(), std::move(variant_leaf), variant_leaf_type)) {
+        return Status::Corruption("Failed to add Parquet VARIANT typed subcolumn {}",
+                                  path->build().get_path());
+    }
+    return Status::OK();
+}
+
+static Status append_variant_struct_rows_to_column(
+        const FieldSchema& field_schema, const ColumnStruct& variant_struct_column,
+        const NullMap* struct_null_map, size_t start, size_t rows,
+        const std::set<uint64_t>& column_ids, ColumnPtr& doris_column,
+        ParquetColumnReader::ColumnStatistics* variant_statistics) {
+    DCHECK_LE(start + rows, variant_struct_column.size());
+
+    MutableColumnPtr variant_column_ptr;
+    NullMap* null_map_ptr = nullptr;
+    auto mutable_column = doris_column->assume_mutable();
+    if (doris_column->is_nullable()) {
+        auto* nullable_column = assert_cast<ColumnNullable*>(mutable_column.get());
+        variant_column_ptr = nullable_column->get_nested_column_ptr();
+        null_map_ptr = &nullable_column->get_null_map_data();
+    } else {
+        if (field_schema.data_type->is_nullable()) {
+            return Status::Corruption("Not nullable column has null values in parquet file");
+        }
+        variant_column_ptr = std::move(mutable_column);
+    }
+    auto* variant_column = assert_cast<ColumnVariant*>(variant_column_ptr.get());
+
+    const int typed_value_idx = find_child_idx(field_schema, "typed_value");
+    if (can_use_direct_typed_only_value(field_schema, column_ids)) {
+        variant_statistics->variant_direct_typed_value_read_rows += static_cast<int64_t>(rows);
+        MutableColumnPtr batch_variant_column =
+                ColumnVariant::create(variant_column->max_subcolumns_count(),
+                                      variant_column->enable_doc_mode(), rows + 1);
+        auto* batch_variant = assert_cast<ColumnVariant*>(batch_variant_column.get());
+        PathInDataBuilder path;
+        RETURN_IF_ERROR(append_direct_typed_column_to_batch(
+                field_schema.children[typed_value_idx],
+                variant_struct_column.get_column(typed_value_idx), start, rows, &path,
+                batch_variant, false, column_ids, {}));
+        variant_column->insert_range_from(*batch_variant_column, 1, rows);
+        if (null_map_ptr != nullptr) {
+            for (size_t i = start; i < start + rows; ++i) {
+                null_map_ptr->push_back(struct_null_map != nullptr && (*struct_null_map)[i]);
+            }
+        }
+        return Status::OK();
+    }
+
+    variant_statistics->variant_rowwise_read_rows += static_cast<int64_t>(rows);
+    for (size_t i = start; i < start + rows; ++i) {
+        if (struct_null_map != nullptr && (*struct_null_map)[i]) {
+            if (null_map_ptr == nullptr) {
+                return Status::Corruption("Not nullable column has null values in parquet file");
+            }
+            variant_column->insert_default();
+            null_map_ptr->push_back(1);
+            continue;
+        }
+        VariantMap values;
+        bool present = false;
+        PathInDataBuilder path;
+        std::deque<std::string> string_values;
+        RETURN_IF_ERROR(variant_to_variant_map(field_schema, variant_struct_column[i], nullptr,
+                                               &path, &values, &present, &string_values));
+        if (!present) {
+            values[PathInData()] = FieldWithDataType {.field = Field()};
+        }
+        RETURN_IF_CATCH_EXCEPTION(
+                variant_column->insert(Field::create_field<TYPE_VARIANT>(std::move(values))));
+        if (null_map_ptr != nullptr) {
+            null_map_ptr->push_back(0);
+        }
+    }
+    return Status::OK();
+}
+
+#ifdef BE_TEST
+namespace parquet_variant_reader_test {
+bool can_direct_read_typed_value_for_test(const FieldSchema& typed_value_field) {
+    const std::set<uint64_t> column_ids;
+    return can_direct_read_typed_value(typed_value_field, false, column_ids);
+}
+
+bool can_use_direct_typed_only_value_for_test(const FieldSchema& variant_field,
+                                              const std::set<uint64_t>& column_ids) {
+    return can_use_direct_typed_only_value(variant_field, column_ids);
+}
+
+Status append_direct_typed_column_to_batch_for_test(const FieldSchema& typed_value_field,
+                                                    const IColumn& typed_value_column, size_t start,
+                                                    size_t rows, ColumnVariant* batch) {
+    PathInDataBuilder path;
+    const std::set<uint64_t> column_ids;
+    return append_direct_typed_column_to_batch(typed_value_field, typed_value_column, start, rows,
+                                               &path, batch, false, column_ids, {});
+}
+
+Status read_variant_row_for_test(const FieldSchema& variant_field, const Field& field,
+                                 bool output_nullable, Field* result, bool* sql_null) {
+    if (field.is_null()) {
+        if (!output_nullable) {
+            return Status::Corruption("Not nullable column has null values in parquet file");
+        }
+        *sql_null = true;
+        return Status::OK();
+    }
+
+    VariantMap values;
+    bool present = false;
+    PathInDataBuilder path;
+    std::deque<std::string> string_values;
+    RETURN_IF_ERROR(variant_to_variant_map(variant_field, field, nullptr, &path, &values, &present,
+                                           &string_values));
+    if (!present) {
+        values[PathInData()] = FieldWithDataType {.field = Field()};
+    }
+
+    auto variant_column = ColumnVariant::create(0, false);
+    RETURN_IF_CATCH_EXCEPTION(
+            variant_column->insert(Field::create_field<TYPE_VARIANT>(std::move(values))));
+    variant_column->get(0, *result);
+    *sql_null = false;
+    return Status::OK();
+}
+
+Status read_variant_rows_for_test(const FieldSchema& variant_field, const IColumn& struct_column,
+                                  const std::set<uint64_t>& column_ids, ColumnPtr& doris_column,
+                                  int64_t* direct_rows, int64_t* rowwise_rows) {
+    const IColumn* struct_source = &struct_column;
+    const NullMap* struct_null_map = nullptr;
+    if (const auto* nullable_struct = check_and_get_column<ColumnNullable>(struct_source)) {
+        struct_null_map = &nullable_struct->get_null_map_data();
+        struct_source = &nullable_struct->get_nested_column();
+    }
+    const auto& variant_struct_column = assert_cast<const ColumnStruct&>(*struct_source);
+
+    ParquetColumnReader::ColumnStatistics variant_statistics;
+    RETURN_IF_ERROR(append_variant_struct_rows_to_column(
+            variant_field, variant_struct_column, struct_null_map, 0, variant_struct_column.size(),
+            column_ids, doris_column, &variant_statistics));
+    *direct_rows = variant_statistics.variant_direct_typed_value_read_rows;
+    *rowwise_rows = variant_statistics.variant_rowwise_read_rows;
+    return Status::OK();
+}
+
+Status variant_to_json_for_test(const FieldSchema& variant_field, const Field& field,
+                                const std::string& inherited_metadata, std::string* json,
+                                bool* present) {
+    return variant_to_json(variant_field, field, &inherited_metadata, json, present);
+}
+
+bool variant_struct_reader_type_is_nullable_for_test(const FieldSchema& variant_field) {
+    return make_variant_struct_reader_type(variant_field)->is_nullable();
+}
+
+bool variant_struct_reader_column_is_nullable_for_test(const FieldSchema& variant_field) {
+    auto variant_struct_type = make_variant_struct_reader_type(variant_field);
+    return make_variant_struct_read_column(variant_field, variant_struct_type)->is_nullable();
+}
+} // namespace parquet_variant_reader_test
+#endif
+
+// Existing recursive factory keeps nested reader wiring and shared state in one dispatch point.
+// NOLINTNEXTLINE(readability-function-cognitive-complexity,readability-function-size)
 Status ParquetColumnReader::create(io::FileReaderSPtr file, FieldSchema* field,
                                    const tparquet::RowGroup& row_group, const RowRanges& row_ranges,
                                    const cctz::time_zone* ctz, io::IOContext* io_ctx,
@@ -113,29 +1968,33 @@ Status ParquetColumnReader::create(io::FileReaderSPtr file, FieldSchema* field,
                                    const std::set<uint64_t>& column_ids,
                                    const std::set<uint64_t>& filter_column_ids) {
     size_t total_rows = row_group.num_rows;
-    if (field->data_type->get_primitive_type() == TYPE_ARRAY) {
+    const auto field_primitive_type = remove_nullable(field->data_type)->get_primitive_type();
+    if (field_primitive_type == TYPE_ARRAY) {
+        const bool offset_only = !column_ids.empty() &&
+                                 column_ids.contains(field->get_column_id()) &&
+                                 !column_ids.contains(field->children[0].get_column_id());
         std::unique_ptr<ParquetColumnReader> element_reader;
-        RETURN_IF_ERROR(create(file, &field->children[0], row_group, row_ranges, ctz, io_ctx,
+        RETURN_IF_ERROR(create(file, field->children.data(), row_group, row_ranges, ctz, io_ctx,
                                element_reader, max_buf_size, col_offsets, state, true, column_ids,
                                filter_column_ids));
         auto array_reader = ArrayColumnReader::create_unique(row_ranges, total_rows, ctz, io_ctx);
         element_reader->set_column_in_nested();
-        RETURN_IF_ERROR(array_reader->init(std::move(element_reader), field));
+        RETURN_IF_ERROR(array_reader->init(std::move(element_reader), field, offset_only));
         array_reader->_filter_column_ids = filter_column_ids;
         reader.reset(array_reader.release());
-    } else if (field->data_type->get_primitive_type() == TYPE_MAP) {
+    } else if (field_primitive_type == TYPE_MAP) {
         std::unique_ptr<ParquetColumnReader> key_reader;
         std::unique_ptr<ParquetColumnReader> value_reader;
 
         if (column_ids.empty() ||
             column_ids.find(field->children[0].get_column_id()) != column_ids.end()) {
             // Create key reader
-            RETURN_IF_ERROR(create(file, &field->children[0], row_group, row_ranges, ctz, io_ctx,
+            RETURN_IF_ERROR(create(file, field->children.data(), row_group, row_ranges, ctz, io_ctx,
                                    key_reader, max_buf_size, col_offsets, state, true, column_ids,
                                    filter_column_ids));
         } else {
             auto skip_reader = std::make_unique<SkipReadingReader>(row_ranges, total_rows, ctz,
-                                                                   io_ctx, &field->children[0]);
+                                                                   io_ctx, field->children.data());
             key_reader = std::move(skip_reader);
         }
 
@@ -147,7 +2006,7 @@ Status ParquetColumnReader::create(io::FileReaderSPtr file, FieldSchema* field,
                                    filter_column_ids));
         } else {
             auto skip_reader = std::make_unique<SkipReadingReader>(row_ranges, total_rows, ctz,
-                                                                   io_ctx, &field->children[0]);
+                                                                   io_ctx, &field->children[1]);
             value_reader = std::move(skip_reader);
         }
 
@@ -157,7 +2016,7 @@ Status ParquetColumnReader::create(io::FileReaderSPtr file, FieldSchema* field,
         RETURN_IF_ERROR(map_reader->init(std::move(key_reader), std::move(value_reader), field));
         map_reader->_filter_column_ids = filter_column_ids;
         reader.reset(map_reader.release());
-    } else if (field->data_type->get_primitive_type() == TYPE_STRUCT) {
+    } else if (field_primitive_type == TYPE_STRUCT) {
         std::unordered_map<std::string, std::unique_ptr<ParquetColumnReader>> child_readers;
         child_readers.reserve(field->children.size());
         int non_skip_reader_idx = -1;
@@ -184,7 +2043,7 @@ Status ParquetColumnReader::create(io::FileReaderSPtr file, FieldSchema* field,
         // If all children are SkipReadingReader, force the first child to call create
         if (non_skip_reader_idx == -1) {
             std::unique_ptr<ParquetColumnReader> child_reader;
-            RETURN_IF_ERROR(create(file, &field->children[0], row_group, row_ranges, ctz, io_ctx,
+            RETURN_IF_ERROR(create(file, field->children.data(), row_group, row_ranges, ctz, io_ctx,
                                    child_reader, max_buf_size, col_offsets, state, in_collection,
                                    column_ids, filter_column_ids));
             child_reader->set_column_in_nested();
@@ -194,6 +2053,13 @@ Status ParquetColumnReader::create(io::FileReaderSPtr file, FieldSchema* field,
         RETURN_IF_ERROR(struct_reader->init(std::move(child_readers), field));
         struct_reader->_filter_column_ids = filter_column_ids;
         reader.reset(struct_reader.release());
+    } else if (field_primitive_type == TYPE_VARIANT) {
+        auto variant_reader =
+                VariantColumnReader::create_unique(row_ranges, total_rows, ctz, io_ctx);
+        RETURN_IF_ERROR(variant_reader->init(file, field, row_group, max_buf_size, col_offsets,
+                                             state, in_collection, column_ids, filter_column_ids));
+        variant_reader->_filter_column_ids = filter_column_ids;
+        reader.reset(variant_reader.release());
     } else {
         auto physical_index = field->physical_column_index;
         const tparquet::OffsetIndex* offset_index =
@@ -288,7 +2154,7 @@ Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::_skip_values(size_t num_
             size_t loop_skip = def_decoder.get_next_run(&def_level, num_values - skipped);
             if (loop_skip == 0) {
                 std::stringstream ss;
-                auto& bit_reader = def_decoder.rle_decoder().bit_reader();
+                const auto& bit_reader = def_decoder.rle_decoder().bit_reader();
                 ss << "def_decoder buffer (hex): ";
                 for (size_t i = 0; i < bit_reader.max_bytes(); ++i) {
                     ss << std::hex << std::setw(2) << std::setfill('0')
@@ -346,7 +2212,7 @@ Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::_read_values(size_t num_
                 size_t loop_read = def_decoder.get_next_run(&def_level, num_values - has_read);
                 if (loop_read == 0) {
                     std::stringstream ss;
-                    auto& bit_reader = def_decoder.rle_decoder().bit_reader();
+                    const auto& bit_reader = def_decoder.rle_decoder().bit_reader();
                     ss << "def_decoder buffer (hex): ";
                     for (size_t i = 0; i < bit_reader.max_bytes(); ++i) {
                         ss << std::hex << std::setw(2) << std::setfill('0')
@@ -377,7 +2243,7 @@ Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::_read_values(size_t num_
         }
         data_column = doris_column->assume_mutable();
     }
-    if (null_map.size() == 0) {
+    if (null_map.empty()) {
         size_t remaining = num_values;
         while (remaining > USHRT_MAX) {
             null_map.emplace_back(USHRT_MAX);
@@ -402,6 +2268,8 @@ Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::_read_values(size_t num_
  * whether the reader should read the remaining value of the last row in previous page.
  */
 template <bool IN_COLLECTION, bool OFFSET_INDEX>
+// Existing nested scalar reader is the central row/page alignment loop for complex values.
+// NOLINTNEXTLINE(readability-function-cognitive-complexity,readability-function-size)
 Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::_read_nested_column(
         ColumnPtr& doris_column, DataTypePtr& type, FilterMap& filter_map, size_t batch_size,
         size_t* read_rows, bool* eof, bool is_dict_filter) {
@@ -455,7 +2323,7 @@ Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::_read_nested_column(
 
         RETURN_IF_ERROR(
                 _chunk_reader->decode_values(data_column, type, select_vector, is_dict_filter));
-        if (ancestor_null_indices.size() != 0) {
+        if (!ancestor_null_indices.empty()) {
             RETURN_IF_ERROR(_chunk_reader->skip_values(ancestor_null_indices.size(), false));
         }
         if (filter_map.has_filter()) {
@@ -503,6 +2371,77 @@ Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::_read_nested_column(
     return Status::OK();
 }
 
+template <bool IN_COLLECTION, bool OFFSET_INDEX>
+Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::_read_and_skip_nested_levels(
+        FilterMap& filter_map, size_t before_rep_level_sz, size_t filter_map_index,
+        std::vector<uint8_t>& nested_filter_map_data) {
+    RETURN_IF_ERROR(_chunk_reader->fill_def(_def_levels));
+    RETURN_IF_ERROR(_chunk_reader->skip_nested_values(_def_levels, before_rep_level_sz,
+                                                      _def_levels.size()));
+    if (!filter_map.has_filter()) {
+        return Status::OK();
+    }
+
+    std::unique_ptr<FilterMap> nested_filter_map = std::make_unique<FilterMap>();
+    RETURN_IF_ERROR(gen_filter_map(filter_map, filter_map_index, before_rep_level_sz,
+                                   _rep_levels.size(), nested_filter_map_data, &nested_filter_map));
+    auto new_rep_sz = before_rep_level_sz;
+    for (size_t idx = before_rep_level_sz; idx < _rep_levels.size(); idx++) {
+        if (nested_filter_map_data[idx - before_rep_level_sz]) {
+            _rep_levels[new_rep_sz] = _rep_levels[idx];
+            _def_levels[new_rep_sz] = _def_levels[idx];
+            new_rep_sz++;
+        }
+    }
+    _rep_levels.resize(new_rep_sz);
+    _def_levels.resize(new_rep_sz);
+    return Status::OK();
+}
+
+template <bool IN_COLLECTION, bool OFFSET_INDEX>
+Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::read_nested_levels(FilterMap& filter_map,
+                                                                           size_t batch_size,
+                                                                           size_t* read_rows,
+                                                                           bool* eof) {
+    _rep_levels.clear();
+    _def_levels.clear();
+    *read_rows = 0;
+
+    std::vector<uint8_t> nested_filter_map_data;
+
+    while (_current_range_idx < _row_ranges.range_size()) {
+        size_t left_row =
+                std::max(_current_row_index, _row_ranges.get_range_from(_current_range_idx));
+        size_t right_row = std::min(left_row + batch_size - *read_rows,
+                                    (size_t)_row_ranges.get_range_to(_current_range_idx));
+        _current_row_index = left_row;
+        RETURN_IF_ERROR(_chunk_reader->seek_to_nested_row(left_row));
+        size_t load_rows = 0;
+        bool cross_page = false;
+        size_t before_rep_level_sz = _rep_levels.size();
+        RETURN_IF_ERROR(_chunk_reader->load_page_nested_rows(_rep_levels, right_row - left_row,
+                                                             &load_rows, &cross_page));
+        RETURN_IF_ERROR(_read_and_skip_nested_levels(filter_map, before_rep_level_sz,
+                                                     _filter_map_index, nested_filter_map_data));
+        _filter_map_index += load_rows;
+        while (cross_page) {
+            before_rep_level_sz = _rep_levels.size();
+            RETURN_IF_ERROR(_chunk_reader->load_cross_page_nested_row(_rep_levels, &cross_page));
+            RETURN_IF_ERROR(_read_and_skip_nested_levels(filter_map, before_rep_level_sz,
+                                                         _filter_map_index - 1,
+                                                         nested_filter_map_data));
+        }
+        *read_rows += load_rows;
+        _current_row_index += load_rows;
+        _current_range_idx += (_current_row_index == _row_ranges.get_range_to(_current_range_idx));
+        if (*read_rows == batch_size) {
+            break;
+        }
+    }
+    *eof = _current_range_idx == _row_ranges.range_size();
+    return Status::OK();
+}
+
 template <bool IN_COLLECTION, bool OFFSET_INDEX>
 Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::read_dict_values_to_column(
         MutableColumnPtr& doris_column, bool* has_dict) {
@@ -530,6 +2469,8 @@ Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::_try_load_dict_page(bool
 }
 
 template <bool IN_COLLECTION, bool OFFSET_INDEX>
+// Existing scalar read path handles page iteration, filtering, and conversion in one dispatch loop.
+// NOLINTNEXTLINE(readability-function-cognitive-complexity,readability-function-size)
 Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::read_column_data(
         ColumnPtr& doris_column, const DataTypePtr& type,
         const std::shared_ptr<TableSchemaChangeHelper::Node>& root_node, FilterMap& filter_map,
@@ -645,9 +2586,10 @@ Status ScalarColumnReader<IN_COLLECTION, OFFSET_INDEX>::read_column_data(
 }
 
 Status ArrayColumnReader::init(std::unique_ptr<ParquetColumnReader> element_reader,
-                               FieldSchema* field) {
+                               FieldSchema* field, bool offset_only) {
     _field_schema = field;
     _element_reader = std::move(element_reader);
+    _offset_only = offset_only;
     return Status::OK();
 }
 
@@ -678,10 +2620,15 @@ Status ArrayColumnReader::read_column_data(
     ColumnPtr& element_column = assert_cast<ColumnArray&>(*data_column).get_data_ptr();
     const DataTypePtr& element_type =
             (assert_cast<const DataTypeArray*>(remove_nullable(type).get()))->get_nested_type();
-    // read nested column
-    RETURN_IF_ERROR(_element_reader->read_column_data(element_column, element_type,
-                                                      root_node->get_element_node(), filter_map,
-                                                      batch_size, read_rows, eof, is_dict_filter));
+    if (_offset_only) {
+        // Cardinality needs collection levels and offsets, but not element payloads.
+        RETURN_IF_ERROR(
+                _element_reader->read_nested_levels(filter_map, batch_size, read_rows, eof));
+    } else {
+        RETURN_IF_ERROR(_element_reader->read_column_data(
+                element_column, element_type, root_node->get_element_node(), filter_map, batch_size,
+                read_rows, eof, is_dict_filter));
+    }
     if (*read_rows == 0) {
         return Status::OK();
     }
@@ -690,6 +2637,11 @@ Status ArrayColumnReader::read_column_data(
     // fill offset and null map
     fill_array_offset(_field_schema, offsets_data, null_map_ptr, _element_reader->get_rep_level(),
                       _element_reader->get_def_level());
+    if (_offset_only && offsets_data.back() > element_column->size()) {
+        auto mutable_element_column = element_column->assume_mutable();
+        mutable_element_column->insert_many_defaults(offsets_data.back() - element_column->size());
+        element_column = std::move(mutable_element_column);
+    }
     DCHECK_EQ(element_column->size(), offsets_data.back());
 #ifndef NDEBUG
     doris_column->sanity_check();
@@ -782,6 +2734,25 @@ Status StructColumnReader::init(
     _child_readers = std::move(child_readers);
     return Status::OK();
 }
+
+Status StructColumnReader::read_nested_levels(FilterMap& filter_map, size_t batch_size,
+                                              size_t* read_rows, bool* eof) {
+    _read_column_names.clear();
+    for (const auto& child : _field_schema->children) {
+        auto it = _child_readers.find(child.name);
+        if (it == _child_readers.end() ||
+            dynamic_cast<SkipReadingReader*>(it->second.get()) != nullptr) {
+            continue;
+        }
+        _read_column_names.emplace_back(child.name);
+        return it->second->read_nested_levels(filter_map, batch_size, read_rows, eof);
+    }
+    return Status::Corruption("Cannot read struct '{}' levels without a reference column",
+                              _field_schema->name);
+}
+
+// Existing struct reader coordinates child readers, missing columns, and selection state.
+// NOLINTNEXTLINE(readability-function-cognitive-complexity,readability-function-size)
 Status StructColumnReader::read_column_data(
         ColumnPtr& doris_column, const DataTypePtr& type,
         const std::shared_ptr<TableSchemaChangeHelper::Node>& root_node, FilterMap& filter_map,
@@ -818,8 +2789,8 @@ Status StructColumnReader::read_column_data(
 
     for (size_t i = 0; i < doris_struct.tuple_size(); ++i) {
         ColumnPtr& doris_field = doris_struct.get_column_ptr(i);
-        auto& doris_type = doris_struct_type->get_element(i);
-        auto& doris_name = doris_struct_type->get_element_name(i);
+        const auto& doris_type = doris_struct_type->get_element(i);
+        const auto& doris_name = doris_struct_type->get_element_name(i);
         if (!root_node->children_column_exists(doris_name)) {
             missing_column_idxs.push_back(i);
             VLOG_DEBUG << "[ParquetReader] Missing column in schema: column_idx[" << i
@@ -984,7 +2955,7 @@ Status StructColumnReader::read_column_data(
     // Fill truly missing columns (not in root_node) with null or default value
     for (auto idx : missing_column_idxs) {
         auto& doris_field = doris_struct.get_column_ptr(idx);
-        auto& doris_type = doris_struct_type->get_element(idx);
+        const auto& doris_type = doris_struct_type->get_element(idx);
         DCHECK(doris_type->is_nullable());
         auto mutable_column = doris_field->assume_mutable();
         auto* nullable_column = static_cast<ColumnNullable*>(mutable_column.get());
@@ -1001,6 +2972,69 @@ Status StructColumnReader::read_column_data(
     return Status::OK();
 }
 
+Status VariantColumnReader::init(io::FileReaderSPtr file, FieldSchema* field,
+                                 const tparquet::RowGroup& row_group, size_t max_buf_size,
+                                 std::unordered_map<int, tparquet::OffsetIndex>& col_offsets,
+                                 RuntimeState* state, bool in_collection,
+                                 const std::set<uint64_t>& column_ids,
+                                 const std::set<uint64_t>& filter_column_ids) {
+    _field_schema = field;
+    _column_ids = column_ids;
+    _variant_struct_field = std::make_unique<FieldSchema>(*field);
+
+    DataTypePtr variant_struct_type = make_variant_struct_reader_type(*field);
+    _variant_struct_field->data_type = variant_struct_type;
+
+    RETURN_IF_ERROR(ParquetColumnReader::create(file, _variant_struct_field.get(), row_group,
+                                                _row_ranges, _ctz, _io_ctx, _struct_reader,
+                                                max_buf_size, col_offsets, state, in_collection,
+                                                column_ids, filter_column_ids));
+    _struct_reader->set_column_in_nested();
+    return Status::OK();
+}
+
+Status VariantColumnReader::read_column_data(
+        ColumnPtr& doris_column, const DataTypePtr& type,
+        const std::shared_ptr<TableSchemaChangeHelper::Node>& root_node, FilterMap& filter_map,
+        size_t batch_size, size_t* read_rows, bool* eof, bool is_dict_filter,
+        int64_t real_column_size) {
+    (void)root_node;
+    if (remove_nullable(type)->get_primitive_type() != PrimitiveType::TYPE_VARIANT) {
+        return Status::Corruption(
+                "Wrong data type for column '{}', expected Variant type, actual type: {}.",
+                _field_schema->name, type->get_name());
+    }
+
+    const auto& variant_struct_type = _variant_struct_field->data_type;
+    ColumnPtr struct_column = make_variant_struct_read_column(*_field_schema, variant_struct_type);
+    const size_t old_struct_rows = struct_column->size();
+    auto const_node = TableSchemaChangeHelper::ConstNode::get_instance();
+    RETURN_IF_ERROR(_struct_reader->read_column_data(struct_column, variant_struct_type, const_node,
+                                                     filter_map, batch_size, read_rows, eof,
+                                                     is_dict_filter, real_column_size));
+
+    const size_t new_struct_rows = struct_column->size() - old_struct_rows;
+    if (new_struct_rows == 0) {
+        return Status::OK();
+    }
+
+    const IColumn* variant_struct_source = struct_column.get();
+    const NullMap* struct_null_map = nullptr;
+    if (const auto* nullable_struct = check_and_get_column<ColumnNullable>(variant_struct_source)) {
+        struct_null_map = &nullable_struct->get_null_map_data();
+        variant_struct_source = &nullable_struct->get_nested_column();
+    }
+    const auto& variant_struct_column = assert_cast<const ColumnStruct&>(*variant_struct_source);
+
+    RETURN_IF_ERROR(append_variant_struct_rows_to_column(
+            *_field_schema, variant_struct_column, struct_null_map, old_struct_rows,
+            new_struct_rows, _column_ids, doris_column, &_variant_statistics));
+#ifndef NDEBUG
+    doris_column->sanity_check();
+#endif
+    return Status::OK();
+}
+
 template class ScalarColumnReader<true, true>;
 template class ScalarColumnReader<true, false>;
 template class ScalarColumnReader<false, true>;
diff --git a/be/src/format/parquet/vparquet_column_reader.h b/be/src/format/parquet/vparquet_column_reader.h
index 9d9fd2280c88f8..f05276d4a574ba 100644
--- a/be/src/format/parquet/vparquet_column_reader.h
+++ b/be/src/format/parquet/vparquet_column_reader.h
@@ -18,13 +18,16 @@
 #pragma once
 #include <gen_cpp/parquet_types.h>
 #include <glog/logging.h>
-#include <stddef.h>
-#include <stdint.h>
 
+#include <cstddef>
+#include <cstdint>
 #include <list>
 #include <memory>
 #include <ostream>
+#include <set>
+#include <string>
 #include <unordered_map>
+#include <utility>
 #include <vector>
 
 #include "common/status.h"
@@ -48,11 +51,35 @@ struct IOContext;
 } // namespace doris::io
 
 namespace doris {
+class Field;
 struct FieldSchema;
+class IColumn;
+class ColumnVariant;
 template <typename T>
 class ColumnStr;
 using ColumnString = ColumnStr<UInt32>;
 
+#ifdef BE_TEST
+namespace parquet_variant_reader_test {
+bool can_direct_read_typed_value_for_test(const FieldSchema& typed_value_field);
+bool can_use_direct_typed_only_value_for_test(const FieldSchema& variant_field,
+                                              const std::set<uint64_t>& column_ids);
+Status append_direct_typed_column_to_batch_for_test(const FieldSchema& typed_value_field,
+                                                    const IColumn& typed_value_column, size_t start,
+                                                    size_t rows, ColumnVariant* batch);
+Status read_variant_row_for_test(const FieldSchema& variant_field, const Field& field,
+                                 bool output_nullable, Field* result, bool* sql_null);
+Status read_variant_rows_for_test(const FieldSchema& variant_field, const IColumn& struct_column,
+                                  const std::set<uint64_t>& column_ids, ColumnPtr& doris_column,
+                                  int64_t* direct_rows, int64_t* rowwise_rows);
+Status variant_to_json_for_test(const FieldSchema& variant_field, const Field& field,
+                                const std::string& inherited_metadata, std::string* json,
+                                bool* present);
+bool variant_struct_reader_type_is_nullable_for_test(const FieldSchema& variant_field);
+bool variant_struct_reader_column_is_nullable_for_test(const FieldSchema& variant_field);
+} // namespace parquet_variant_reader_test
+#endif
+
 class ParquetColumnReader {
 public:
     struct ColumnStatistics {
@@ -76,7 +103,9 @@ class ParquetColumnReader {
                   page_cache_hit_counter(0),
                   page_cache_missing_counter(0),
                   page_cache_compressed_hit_counter(0),
-                  page_cache_decompressed_hit_counter(0) {}
+                  page_cache_decompressed_hit_counter(0),
+                  variant_direct_typed_value_read_rows(0),
+                  variant_rowwise_read_rows(0) {}
 
         ColumnStatistics(ColumnChunkReaderStatistics& cs, int64_t null_map_time,
                          int64_t convert_time_)
@@ -99,7 +128,9 @@ class ParquetColumnReader {
                   page_cache_hit_counter(cs.page_cache_hit_counter),
                   page_cache_missing_counter(cs.page_cache_missing_counter),
                   page_cache_compressed_hit_counter(cs.page_cache_compressed_hit_counter),
-                  page_cache_decompressed_hit_counter(cs.page_cache_decompressed_hit_counter) {}
+                  page_cache_decompressed_hit_counter(cs.page_cache_decompressed_hit_counter),
+                  variant_direct_typed_value_read_rows(0),
+                  variant_rowwise_read_rows(0) {}
 
         int64_t page_index_read_calls;
         int64_t decompress_time;
@@ -121,6 +152,8 @@ class ParquetColumnReader {
         int64_t page_cache_missing_counter;
         int64_t page_cache_compressed_hit_counter;
         int64_t page_cache_decompressed_hit_counter;
+        int64_t variant_direct_typed_value_read_rows;
+        int64_t variant_rowwise_read_rows;
 
         void merge(ColumnStatistics& col_statistics) {
             page_index_read_calls += col_statistics.page_index_read_calls;
@@ -146,6 +179,9 @@ class ParquetColumnReader {
             page_cache_compressed_hit_counter += col_statistics.page_cache_compressed_hit_counter;
             page_cache_decompressed_hit_counter +=
                     col_statistics.page_cache_decompressed_hit_counter;
+            variant_direct_typed_value_read_rows +=
+                    col_statistics.variant_direct_typed_value_read_rows;
+            variant_rowwise_read_rows += col_statistics.variant_rowwise_read_rows;
         }
     };
 
@@ -158,6 +194,10 @@ class ParquetColumnReader {
                                     FilterMap& filter_map, size_t batch_size, size_t* read_rows,
                                     bool* eof, bool is_dict_filter,
                                     int64_t real_column_size = -1) = 0;
+    virtual Status read_nested_levels(FilterMap& filter_map, size_t batch_size, size_t* read_rows,
+                                      bool* eof) {
+        return Status::NotSupported("read_nested_levels is not supported for parquet field");
+    }
 
     virtual Status read_dict_values_to_column(MutableColumnPtr& doris_column, bool* has_dict) {
         return Status::NotSupported("read_dict_values_to_column is not supported");
@@ -211,11 +251,10 @@ class ScalarColumnReader : public ParquetColumnReader {
     ENABLE_FACTORY_CREATOR(ScalarColumnReader)
 public:
     ScalarColumnReader(const RowRanges& row_ranges, size_t total_rows,
-                       const tparquet::ColumnChunk& chunk_meta,
-                       const tparquet::OffsetIndex* offset_index, const cctz::time_zone* ctz,
-                       io::IOContext* io_ctx)
+                       tparquet::ColumnChunk chunk_meta, const tparquet::OffsetIndex* offset_index,
+                       const cctz::time_zone* ctz, io::IOContext* io_ctx)
             : ParquetColumnReader(row_ranges, total_rows, ctz, io_ctx),
-              _chunk_meta(chunk_meta),
+              _chunk_meta(std::move(chunk_meta)),
               _offset_index(offset_index) {}
     ~ScalarColumnReader() override { close(); }
     Status init(io::FileReaderSPtr file, FieldSchema* field, size_t max_buf_size,
@@ -325,6 +364,11 @@ class ScalarColumnReader : public ParquetColumnReader {
     Status _read_nested_column(ColumnPtr& doris_column, DataTypePtr& type, FilterMap& filter_map,
                                size_t batch_size, size_t* read_rows, bool* eof,
                                bool is_dict_filter);
+    Status _read_and_skip_nested_levels(FilterMap& filter_map, size_t before_rep_level_sz,
+                                        size_t filter_map_index,
+                                        std::vector<uint8_t>& nested_filter_map_data);
+    Status read_nested_levels(FilterMap& filter_map, size_t batch_size, size_t* read_rows,
+                              bool* eof) override;
     Status _try_load_dict_page(bool* loaded, bool* has_dict);
 };
 
@@ -335,11 +379,16 @@ class ArrayColumnReader : public ParquetColumnReader {
                       io::IOContext* io_ctx)
             : ParquetColumnReader(row_ranges, total_rows, ctz, io_ctx) {}
     ~ArrayColumnReader() override { close(); }
-    Status init(std::unique_ptr<ParquetColumnReader> element_reader, FieldSchema* field);
+    Status init(std::unique_ptr<ParquetColumnReader> element_reader, FieldSchema* field,
+                bool offset_only);
     Status read_column_data(ColumnPtr& doris_column, const DataTypePtr& type,
                             const std::shared_ptr<TableSchemaChangeHelper::Node>& root_node,
                             FilterMap& filter_map, size_t batch_size, size_t* read_rows, bool* eof,
                             bool is_dict_filter, int64_t real_column_size = -1) override;
+    Status read_nested_levels(FilterMap& filter_map, size_t batch_size, size_t* read_rows,
+                              bool* eof) override {
+        return _element_reader->read_nested_levels(filter_map, batch_size, read_rows, eof);
+    }
     const std::vector<level_t>& get_rep_level() const override {
         return _element_reader->get_rep_level();
     }
@@ -353,6 +402,7 @@ class ArrayColumnReader : public ParquetColumnReader {
 
 private:
     std::unique_ptr<ParquetColumnReader> _element_reader;
+    bool _offset_only = false;
 };
 
 class MapColumnReader : public ParquetColumnReader {
@@ -369,6 +419,10 @@ class MapColumnReader : public ParquetColumnReader {
                             const std::shared_ptr<TableSchemaChangeHelper::Node>& root_node,
                             FilterMap& filter_map, size_t batch_size, size_t* read_rows, bool* eof,
                             bool is_dict_filter, int64_t real_column_size = -1) override;
+    Status read_nested_levels(FilterMap& filter_map, size_t batch_size, size_t* read_rows,
+                              bool* eof) override {
+        return _key_reader->read_nested_levels(filter_map, batch_size, read_rows, eof);
+    }
 
     const std::vector<level_t>& get_rep_level() const override {
         return _key_reader->get_rep_level();
@@ -411,6 +465,8 @@ class StructColumnReader : public ParquetColumnReader {
                             const std::shared_ptr<TableSchemaChangeHelper::Node>& root_node,
                             FilterMap& filter_map, size_t batch_size, size_t* read_rows, bool* eof,
                             bool is_dict_filter, int64_t real_column_size = -1) override;
+    Status read_nested_levels(FilterMap& filter_map, size_t batch_size, size_t* read_rows,
+                              bool* eof) override;
 
     const std::vector<level_t>& get_rep_level() const override {
         if (!_read_column_names.empty()) {
@@ -460,6 +516,49 @@ class StructColumnReader : public ParquetColumnReader {
     //Need to use vector instead of set,see `get_rep_level()` for the reason.
 };
 
+class VariantColumnReader : public ParquetColumnReader {
+    ENABLE_FACTORY_CREATOR(VariantColumnReader)
+public:
+    VariantColumnReader(const RowRanges& row_ranges, size_t total_rows, const cctz::time_zone* ctz,
+                        io::IOContext* io_ctx)
+            : ParquetColumnReader(row_ranges, total_rows, ctz, io_ctx) {}
+    ~VariantColumnReader() override { close(); }
+
+    Status init(io::FileReaderSPtr file, FieldSchema* field, const tparquet::RowGroup& row_group,
+                size_t max_buf_size, std::unordered_map<int, tparquet::OffsetIndex>& col_offsets,
+                RuntimeState* state, bool in_collection, const std::set<uint64_t>& column_ids,
+                const std::set<uint64_t>& filter_column_ids);
+    Status read_column_data(ColumnPtr& doris_column, const DataTypePtr& type,
+                            const std::shared_ptr<TableSchemaChangeHelper::Node>& root_node,
+                            FilterMap& filter_map, size_t batch_size, size_t* read_rows, bool* eof,
+                            bool is_dict_filter, int64_t real_column_size = -1) override;
+    Status read_nested_levels(FilterMap& filter_map, size_t batch_size, size_t* read_rows,
+                              bool* eof) override {
+        return _struct_reader->read_nested_levels(filter_map, batch_size, read_rows, eof);
+    }
+
+    const std::vector<level_t>& get_rep_level() const override {
+        return _struct_reader->get_rep_level();
+    }
+    const std::vector<level_t>& get_def_level() const override {
+        return _struct_reader->get_def_level();
+    }
+    ColumnStatistics column_statistics() override {
+        auto statistics = _struct_reader->column_statistics();
+        statistics.merge(_variant_statistics);
+        return statistics;
+    }
+    void close() override {}
+
+    void reset_filter_map_index() override { _struct_reader->reset_filter_map_index(); }
+
+private:
+    std::unique_ptr<FieldSchema> _variant_struct_field;
+    std::unique_ptr<ParquetColumnReader> _struct_reader;
+    std::set<uint64_t> _column_ids;
+    ColumnStatistics _variant_statistics;
+};
+
 // A special reader that skips actual reading but provides empty data with correct structure
 // This is used when a column is not needed but its structure is required (e.g., for map keys)
 class SkipReadingReader : public ParquetColumnReader {
@@ -532,9 +631,7 @@ class SkipReadingReader : public ParquetColumnReader {
     }
 
     // Implement required pure virtual methods from base class
-    ColumnStatistics column_statistics() override {
-        return ColumnStatistics(); // Return empty statistics
-    }
+    ColumnStatistics column_statistics() override { return {}; }
 
     void close() override {
         // Nothing to close for skip reading
diff --git a/be/src/format/parquet/vparquet_reader.cpp b/be/src/format/parquet/vparquet_reader.cpp
index a2f2356085b171..9e1b7e700fab01 100644
--- a/be/src/format/parquet/vparquet_reader.cpp
+++ b/be/src/format/parquet/vparquet_reader.cpp
@@ -24,6 +24,7 @@
 
 #include <algorithm>
 #include <functional>
+#include <sstream>
 #include <utility>
 
 #include "common/config.h"
@@ -46,12 +47,14 @@
 #include "format/column_type_convert.h"
 #include "format/parquet/parquet_block_split_bloom_filter.h"
 #include "format/parquet/parquet_common.h"
+#include "format/parquet/parquet_nested_column_utils.h"
 #include "format/parquet/parquet_predicate.h"
 #include "format/parquet/parquet_thrift_util.h"
 #include "format/parquet/schema_desc.h"
 #include "format/parquet/vparquet_file_metadata.h"
 #include "format/parquet/vparquet_group_reader.h"
 #include "format/parquet/vparquet_page_index.h"
+#include "format/table/nested_column_access_helper.h"
 #include "information_schema/schema_scanner.h"
 #include "io/file_factory.h"
 #include "io/fs/buffered_reader.h"
@@ -194,6 +197,7 @@ void ParquetReader::set_file_reader(io::FileReaderSPtr file_reader) {
 }
 #endif
 
+// NOLINTNEXTLINE(readability-function-size): existing Parquet counter initialization stays grouped.
 void ParquetReader::_init_profile() {
     if (_profile != nullptr) {
         static const char* parquet_profile = "ParquetReader";
@@ -287,6 +291,10 @@ void ParquetReader::_init_profile() {
                 ADD_CHILD_TIMER_WITH_LEVEL(_profile, "ConvertTime", parquet_profile, 1);
         _parquet_profile.bloom_filter_read_time =
                 ADD_CHILD_TIMER_WITH_LEVEL(_profile, "BloomFilterReadTime", parquet_profile, 1);
+        _parquet_profile.variant_direct_typed_value_read_rows = ADD_CHILD_COUNTER_WITH_LEVEL(
+                _profile, "VariantDirectTypedValueReadRows", TUnit::UNIT, parquet_profile, 1);
+        _parquet_profile.variant_rowwise_read_rows = ADD_CHILD_COUNTER_WITH_LEVEL(
+                _profile, "VariantRowWiseReadRows", TUnit::UNIT, parquet_profile, 1);
     }
 }
 
@@ -372,10 +380,21 @@ Status ParquetReader::_open_file() {
 Status ParquetReader::get_file_metadata_schema(const FieldDescriptor** ptr) {
     RETURN_IF_ERROR(_open_file());
     DCHECK(_file_metadata != nullptr);
-    *ptr = &_file_metadata->schema();
+    *ptr = &parquet_file_schema();
     return Status::OK();
 }
 
+const FieldDescriptor& ParquetReader::parquet_file_schema() const {
+    if (_file_schema_with_ids.has_value()) {
+        return *_file_schema_with_ids;
+    }
+    return _file_metadata->schema();
+}
+
+void ParquetReader::prepare_parquet_file_schema_with_ids(const FieldDescriptor* field_desc) {
+    _file_schema_with_ids = field_desc->copy_with_assigned_ids();
+}
+
 void ParquetReader::_init_system_properties() {
     if (_scan_range.__isset.file_type) {
         // for compatibility
@@ -430,13 +449,115 @@ Status ParquetReader::on_before_init_reader(ReaderInitContext* ctx) {
     if (ctx->tuple_descriptor != nullptr) {
         const FieldDescriptor* field_desc = nullptr;
         RETURN_IF_ERROR(get_file_metadata_schema(&field_desc));
+        prepare_parquet_file_schema_with_ids(field_desc);
+        field_desc = &parquet_file_schema();
         RETURN_IF_ERROR(TableSchemaChangeHelper::BuildTableInfoUtil::by_parquet_name(
                 ctx->tuple_descriptor, *field_desc, ctx->table_info_node));
+        auto column_id_result = _create_column_ids_by_name(field_desc, ctx->tuple_descriptor);
+        ctx->column_ids = std::move(column_id_result.column_ids);
+        ctx->filter_column_ids = std::move(column_id_result.filter_column_ids);
     }
 
     return Status::OK();
 }
 
+ColumnIdResult ParquetReader::_create_column_ids_by_name(const FieldDescriptor* field_desc,
+                                                         const TupleDescriptor* tuple_descriptor) {
+    FieldDescriptor field_desc_with_ids = field_desc->copy_with_assigned_ids();
+    field_desc = &field_desc_with_ids;
+
+    std::unordered_map<std::string, const FieldSchema*> table_col_name_to_field_schema_map;
+    for (int i = 0; i < field_desc->size(); ++i) {
+        const auto* field_schema = field_desc->get_column(i);
+        if (!field_schema) {
+            continue;
+        }
+        table_col_name_to_field_schema_map[field_schema->lower_case_name] = field_schema;
+    }
+
+    std::set<uint64_t> column_ids;
+    std::set<uint64_t> filter_column_ids;
+
+    auto process_access_paths = [](const FieldSchema* parquet_field,
+                                   const std::vector<TColumnAccessPath>& access_paths,
+                                   std::set<uint64_t>& out_ids) {
+        process_nested_access_paths(
+                parquet_field, access_paths, out_ids,
+                [](const FieldSchema* field) { return field->get_column_id(); },
+                [](const FieldSchema* field) { return field->get_max_column_id(); },
+                ParquetNestedColumnUtils::extract_nested_column_ids_by_name);
+    };
+
+    for (const auto* slot : tuple_descriptor->slots()) {
+        auto it = table_col_name_to_field_schema_map.find(slot->col_name_lower_case());
+        if (it == table_col_name_to_field_schema_map.end()) {
+            continue;
+        }
+        const auto* field_schema = it->second;
+
+        if ((slot->col_type() != TYPE_STRUCT && slot->col_type() != TYPE_ARRAY &&
+             slot->col_type() != TYPE_MAP && slot->col_type() != TYPE_VARIANT)) {
+            column_ids.insert(field_schema->column_id);
+            if (slot->is_predicate()) {
+                filter_column_ids.insert(field_schema->column_id);
+            }
+            continue;
+        }
+
+        process_access_paths(field_schema, slot->all_access_paths(), column_ids);
+        if (!slot->predicate_access_paths().empty()) {
+            process_access_paths(field_schema, slot->predicate_access_paths(), filter_column_ids);
+        }
+    }
+
+    return {std::move(column_ids), std::move(filter_column_ids)};
+}
+
+std::string ParquetReader::_selected_leaf_column_paths() const {
+    if (_file_metadata == nullptr) {
+        return "";
+    }
+
+    std::vector<std::string> leaf_paths;
+    const auto& schema_desc = parquet_file_schema();
+    std::function<void(const FieldSchema*, const std::string&)> collect =
+            [&](const FieldSchema* field, const std::string& path) {
+                if (!_column_ids.empty() && !_column_ids.contains(field->get_column_id())) {
+                    return;
+                }
+
+                if (field->children.empty()) {
+                    if (field->physical_column_index >= 0) {
+                        leaf_paths.push_back(path);
+                    }
+                    return;
+                }
+
+                for (const auto& child : field->children) {
+                    collect(&child, path + "." + child.name);
+                }
+            };
+
+    for (const auto& read_col : _read_file_columns) {
+        const FieldSchema* field = schema_desc.get_column(read_col);
+        if (field != nullptr) {
+            collect(field, field->name);
+        }
+    }
+
+    std::sort(leaf_paths.begin(), leaf_paths.end());
+    leaf_paths.erase(std::unique(leaf_paths.begin(), leaf_paths.end()), leaf_paths.end());
+
+    std::stringstream result;
+    for (size_t i = 0; i < leaf_paths.size(); ++i) {
+        if (i != 0) {
+            result << ", ";
+        }
+        result << leaf_paths[i];
+    }
+    return result.str();
+}
+
 Status ParquetReader::_open_file_reader(ReaderInitContext* /*ctx*/) {
     return _open_file();
 }
@@ -490,6 +611,9 @@ Status ParquetReader::_do_init_reader(ReaderInitContext* base_ctx) {
     // _init_read_columns handles both normal path (missing cols populated above)
     // and standalone path (_fill_missing_cols empty, _table_info_node_ptr may be null).
     _init_read_columns(base_ctx->column_names);
+    if (_profile != nullptr) {
+        _profile->add_info_string("ParquetReadColumnPaths", _selected_leaf_column_paths());
+    }
 
     // build column predicates for column lazy read
     if (ctx->conjuncts != nullptr) {
@@ -534,7 +658,7 @@ void ParquetReader::_init_read_columns(const std::vector<std::string>& column_na
     // Build file_col_name → table_col_name map, skipping missing columns.
     // Must iterate file schema in physical order so that _generate_random_access_ranges
     // sees monotonically increasing chunk offsets.
-    auto schema_desc = _file_metadata->schema();
+    const auto& schema_desc = parquet_file_schema();
     std::map<std::string, std::string> required_file_columns;
     for (const auto& col_name : column_names) {
         if (_fill_missing_cols.contains(col_name)) {
@@ -572,7 +696,7 @@ bool ParquetReader::_type_matches(const int cid) const {
 
     const auto& file_col_name = _table_info_node_ptr->children_file_column_name(slot->col_name());
     const auto& file_col_type =
-            remove_nullable(_file_metadata->schema().get_column(file_col_name)->data_type);
+            remove_nullable(parquet_file_schema().get_column(file_col_name)->data_type);
 
     return (table_col_type->get_primitive_type() == file_col_type->get_primitive_type()) &&
            !is_complex_type(table_col_type->get_primitive_type());
@@ -635,7 +759,7 @@ void ParquetReader::_classify_columns_for_lazy_read(
         const std::unordered_map<std::string, std::tuple<std::string, const SlotDescriptor*>>&
                 partition_columns,
         const std::unordered_map<std::string, VExprContextSPtr>& missing_columns) {
-    const FieldDescriptor& schema = _file_metadata->schema();
+    const FieldDescriptor& schema = parquet_file_schema();
     auto predicate_columns = predicate_conjuncts_columns;
 #ifndef BE_TEST
     for (const auto& [col_name, _] : _generated_col_handlers) {
@@ -745,7 +869,7 @@ Status ParquetReader::init_schema_reader() {
 Status ParquetReader::get_parsed_schema(std::vector<std::string>* col_names,
                                         std::vector<DataTypePtr>* col_types) {
     _total_groups = _t_metadata->row_groups.size();
-    auto schema_desc = _file_metadata->schema();
+    const auto& schema_desc = parquet_file_schema();
     for (int i = 0; i < schema_desc.size(); ++i) {
         // Get the Column Reader for the boolean column
         col_names->emplace_back(schema_desc.get_column(i)->name);
@@ -756,7 +880,7 @@ Status ParquetReader::get_parsed_schema(std::vector<std::string>* col_names,
 
 Status ParquetReader::_get_columns_impl(
         std::unordered_map<std::string, DataTypePtr>* name_to_type) {
-    const auto& schema_desc = _file_metadata->schema();
+    const auto& schema_desc = parquet_file_schema();
     std::unordered_set<std::string> column_names;
     schema_desc.get_column_names(&column_names);
     for (auto& name : column_names) {
@@ -839,7 +963,7 @@ Status ParquetReader::_do_get_next_block(Block* block, size_t* read_rows, bool*
 RowGroupReader::PositionDeleteContext ParquetReader::_get_position_delete_ctx(
         const tparquet::RowGroup& row_group, const RowGroupReader::RowGroupIndex& row_group_index) {
     if (_delete_rows == nullptr) {
-        return RowGroupReader::PositionDeleteContext(row_group.num_rows, row_group_index.first_row);
+        return {row_group.num_rows, row_group_index.first_row};
     }
     const int64_t* delete_rows = &(*_delete_rows)[0];
     const int64_t* delete_rows_end = delete_rows + _delete_rows->size();
@@ -890,7 +1014,7 @@ Status ParquetReader::_next_row_group_reader() {
         };
         int64_t group_size = 0; // only calculate the needed columns
         for (auto& read_col : _read_file_columns) {
-            const FieldSchema* field = _file_metadata->schema().get_column(read_col);
+            const FieldSchema* field = parquet_file_schema().get_column(read_col);
             group_size += column_compressed_size(field);
         }
 
@@ -960,7 +1084,7 @@ Status ParquetReader::_next_row_group_reader() {
     _current_group_reader->set_table_format_reader(this);
 
     _current_group_reader->_table_info_node_ptr = _table_info_node_ptr;
-    return _current_group_reader->init(_file_metadata->schema(), candidate_row_ranges, _col_offsets,
+    return _current_group_reader->init(parquet_file_schema(), candidate_row_ranges, _col_offsets,
                                        _tuple_descriptor, _row_descriptor, _colname_to_slot_id,
                                        _not_single_slot_filter_conjuncts,
                                        _slot_id_to_filter_conjuncts);
@@ -975,14 +1099,15 @@ std::vector<io::PrefetchRange> ParquetReader::_generate_random_access_ranges(
             [&](const FieldSchema* field, const tparquet::RowGroup& row_group) {
                 if (_column_ids.empty() ||
                     _column_ids.find(field->get_column_id()) != _column_ids.end()) {
-                    if (field->data_type->get_primitive_type() == TYPE_ARRAY) {
-                        scalar_range(&field->children[0], row_group);
-                    } else if (field->data_type->get_primitive_type() == TYPE_MAP) {
-                        scalar_range(&field->children[0], row_group);
-                        scalar_range(&field->children[1], row_group);
-                    } else if (field->data_type->get_primitive_type() == TYPE_STRUCT) {
-                        for (int i = 0; i < field->children.size(); ++i) {
-                            scalar_range(&field->children[i], row_group);
+                    const auto field_type = remove_nullable(field->data_type)->get_primitive_type();
+                    if (field_type == TYPE_ARRAY) {
+                        scalar_range(field->children.data(), row_group);
+                    } else if (field_type == TYPE_MAP) {
+                        scalar_range(field->children.data(), row_group);
+                        scalar_range(field->children.data() + 1, row_group);
+                    } else if (field_type == TYPE_STRUCT || field_type == TYPE_VARIANT) {
+                        for (const auto& child : field->children) {
+                            scalar_range(&child, row_group);
                         }
                     } else {
                         const tparquet::ColumnChunk& chunk =
@@ -1001,7 +1126,7 @@ std::vector<io::PrefetchRange> ParquetReader::_generate_random_access_ranges(
             };
     const tparquet::RowGroup& row_group = _t_metadata->row_groups[group.row_group_id];
     for (const auto& read_col : _read_file_columns) {
-        const FieldSchema* field = _file_metadata->schema().get_column(read_col);
+        const FieldSchema* field = parquet_file_schema().get_column(read_col);
         scalar_range(field, row_group);
     }
     if (!result.empty()) {
@@ -1025,8 +1150,12 @@ bool ParquetReader::_is_misaligned_range_group(const tparquet::RowGroup& row_gro
 }
 
 int64_t ParquetReader::get_total_rows() const {
-    if (!_t_metadata) return 0;
-    if (!_filter_groups) return _t_metadata->num_rows;
+    if (!_t_metadata) {
+        return 0;
+    }
+    if (!_filter_groups) {
+        return _t_metadata->num_rows;
+    }
     int64_t total = 0;
     for (const auto& rg : _t_metadata->row_groups) {
         if (!_is_misaligned_range_group(rg)) {
@@ -1079,22 +1208,23 @@ Status ParquetReader::_process_page_index_filter(
         if (!_colname_to_slot_id->contains(read_table_col)) {
             continue;
         }
-        auto* field = _file_metadata->schema().get_column(read_file_col);
+        const auto* field = parquet_file_schema().get_column(read_file_col);
 
-        std::function<void(FieldSchema * field)> f = [&](FieldSchema* field) {
+        std::function<void(const FieldSchema* field)> f = [&](const FieldSchema* field) {
             if (!_column_ids.empty() &&
                 _column_ids.find(field->get_column_id()) == _column_ids.end()) {
                 return;
             }
 
-            if (field->data_type->get_primitive_type() == TYPE_ARRAY) {
-                f(&field->children[0]);
-            } else if (field->data_type->get_primitive_type() == TYPE_MAP) {
-                f(&field->children[0]);
-                f(&field->children[1]);
-            } else if (field->data_type->get_primitive_type() == TYPE_STRUCT) {
-                for (int i = 0; i < field->children.size(); ++i) {
-                    f(&field->children[i]);
+            const auto field_type = remove_nullable(field->data_type)->get_primitive_type();
+            if (field_type == TYPE_ARRAY) {
+                f(field->children.data());
+            } else if (field_type == TYPE_MAP) {
+                f(field->children.data());
+                f(field->children.data() + 1);
+            } else if (field_type == TYPE_STRUCT || field_type == TYPE_VARIANT) {
+                for (const auto& child : field->children) {
+                    f(&child);
                 }
             } else {
                 int parquet_col_id = field->physical_column_index;
@@ -1175,7 +1305,7 @@ Status ParquetReader::_process_page_index_filter(
 
         const auto& file_col_name =
                 _table_info_node_ptr->children_file_column_name(slot->col_name());
-        const FieldSchema* col_schema = _file_metadata->schema().get_column(file_col_name);
+        const FieldSchema* col_schema = parquet_file_schema().get_column(file_col_name);
         int parquet_col_id = col_schema->physical_column_index;
 
         if (parquet_col_id < 0) {
@@ -1322,6 +1452,18 @@ Status ParquetReader::_process_column_stat_filter(
     // when there are multiple predicates on the same column
     std::unordered_map<int, std::unique_ptr<ParquetBlockSplitBloomFilter>> bloom_filter_cache;
 
+    auto find_physical_column = [&](const SlotDescriptor* slot, const FieldSchema** col_schema,
+                                    int* parquet_col_id) -> bool {
+        if (!_table_info_node_ptr->children_column_exists(slot->col_name())) {
+            return false;
+        }
+        const auto& file_col_name =
+                _table_info_node_ptr->children_file_column_name(slot->col_name());
+        *col_schema = parquet_file_schema().get_column(file_col_name);
+        *parquet_col_id = (*col_schema)->physical_column_index;
+        return *parquet_col_id >= 0;
+    };
+
     // Initialize output parameters
     *filtered_by_min_max = false;
     *filtered_by_bloom_filter = false;
@@ -1333,15 +1475,12 @@ Status ParquetReader::_process_column_stat_filter(
                     if (!_enable_filter_by_min_max) {
                         return false;
                     }
+                    const FieldSchema* col_schema = nullptr;
+                    int parquet_col_id = -1;
                     auto* slot = _tuple_descriptor->slots()[cid];
-                    if (!_table_info_node_ptr->children_column_exists(slot->col_name())) {
+                    if (!find_physical_column(slot, &col_schema, &parquet_col_id)) {
                         return false;
                     }
-                    const auto& file_col_name =
-                            _table_info_node_ptr->children_file_column_name(slot->col_name());
-                    const FieldSchema* col_schema =
-                            _file_metadata->schema().get_column(file_col_name);
-                    int parquet_col_id = col_schema->physical_column_index;
                     auto meta_data = row_group.columns[parquet_col_id].meta_data;
                     stat->col_schema = col_schema;
                     return ParquetPredicate::read_column_stats(col_schema, meta_data,
@@ -1351,15 +1490,12 @@ Status ParquetReader::_process_column_stat_filter(
                 };
         std::function<bool(ParquetPredicate::ColumnStat*, int)> get_bloom_filter_func =
                 [&](ParquetPredicate::ColumnStat* stat, const int cid) {
+                    const FieldSchema* col_schema = nullptr;
+                    int parquet_col_id = -1;
                     auto* slot = _tuple_descriptor->slots()[cid];
-                    if (!_table_info_node_ptr->children_column_exists(slot->col_name())) {
+                    if (!find_physical_column(slot, &col_schema, &parquet_col_id)) {
                         return false;
                     }
-                    const auto& file_col_name =
-                            _table_info_node_ptr->children_file_column_name(slot->col_name());
-                    const FieldSchema* col_schema =
-                            _file_metadata->schema().get_column(file_col_name);
-                    int parquet_col_id = col_schema->physical_column_index;
                     auto meta_data = row_group.columns[parquet_col_id].meta_data;
                     if (!meta_data.__isset.bloom_filter_offset) {
                         return false;
@@ -1423,16 +1559,14 @@ Status ParquetReader::_process_column_stat_filter(
         if (stat.bloom_filter) {
             // Find the column id for caching
             for (auto* slot : _tuple_descriptor->slots()) {
-                if (_table_info_node_ptr->children_column_exists(slot->col_name())) {
-                    const auto& file_col_name =
-                            _table_info_node_ptr->children_file_column_name(slot->col_name());
-                    const FieldSchema* col_schema =
-                            _file_metadata->schema().get_column(file_col_name);
-                    int parquet_col_id = col_schema->physical_column_index;
-                    if (stat.col_schema == col_schema) {
-                        bloom_filter_cache[parquet_col_id] = std::move(stat.bloom_filter);
-                        break;
-                    }
+                const FieldSchema* col_schema = nullptr;
+                int parquet_col_id = -1;
+                if (!find_physical_column(slot, &col_schema, &parquet_col_id)) {
+                    continue;
+                }
+                if (stat.col_schema == col_schema) {
+                    bloom_filter_cache[parquet_col_id] = std::move(stat.bloom_filter);
+                    break;
                 }
             }
         }
@@ -1522,6 +1656,10 @@ void ParquetReader::_collect_profile() {
     COUNTER_UPDATE(_parquet_profile.decode_dict_time, _column_statistics.decode_dict_time);
     COUNTER_UPDATE(_parquet_profile.decode_level_time, _column_statistics.decode_level_time);
     COUNTER_UPDATE(_parquet_profile.decode_null_map_time, _column_statistics.decode_null_map_time);
+    COUNTER_UPDATE(_parquet_profile.variant_direct_typed_value_read_rows,
+                   _column_statistics.variant_direct_typed_value_read_rows);
+    COUNTER_UPDATE(_parquet_profile.variant_rowwise_read_rows,
+                   _column_statistics.variant_rowwise_read_rows);
 }
 
 void ParquetReader::_collect_profile_before_close() {
diff --git a/be/src/format/parquet/vparquet_reader.h b/be/src/format/parquet/vparquet_reader.h
index 68979bf9e4f027..e40714ffe84c6d 100644
--- a/be/src/format/parquet/vparquet_reader.h
+++ b/be/src/format/parquet/vparquet_reader.h
@@ -18,11 +18,13 @@
 #pragma once
 
 #include <gen_cpp/parquet_types.h>
-#include <stddef.h>
-#include <stdint.h>
 
+#include <cstddef>
+#include <cstdint>
 #include <list>
 #include <memory>
+#include <optional>
+#include <set>
 #include <string>
 #include <tuple>
 #include <unordered_map>
@@ -239,8 +241,14 @@ class ParquetReader : public TableFormatReader {
     const TupleDescriptor* get_tuple_descriptor() const { return _tuple_descriptor; }
     const RowDescriptor* get_row_descriptor() const { return _row_descriptor; }
     const FileMetaData* get_file_metadata() const { return _file_metadata; }
+    const FieldDescriptor& parquet_file_schema() const;
+    void prepare_parquet_file_schema_with_ids(const FieldDescriptor* field_desc);
 
 private:
+    static ColumnIdResult _create_column_ids_by_name(const FieldDescriptor* field_desc,
+                                                     const TupleDescriptor* tuple_descriptor);
+    std::string _selected_leaf_column_paths() const;
+
     struct ParquetProfile {
         RuntimeProfile::Counter* filtered_row_groups = nullptr;
         RuntimeProfile::Counter* filtered_row_groups_by_min_max = nullptr;
@@ -286,6 +294,8 @@ class ParquetReader : public TableFormatReader {
         RuntimeProfile::Counter* dict_filter_rewrite_time = nullptr;
         RuntimeProfile::Counter* convert_time = nullptr;
         RuntimeProfile::Counter* bloom_filter_read_time = nullptr;
+        RuntimeProfile::Counter* variant_direct_typed_value_read_rows = nullptr;
+        RuntimeProfile::Counter* variant_rowwise_read_rows = nullptr;
     };
 
     // ---- set_fill_columns sub-functions ----
@@ -361,6 +371,7 @@ class ParquetReader : public TableFormatReader {
     // after _file_reader. Otherwise, there may be heap-use-after-free bug.
     ObjLRUCache::CacheHandle _meta_cache_handle;
     std::unique_ptr<FileMetaData> _file_metadata_ptr;
+    std::optional<FieldDescriptor> _file_schema_with_ids;
     const tparquet::FileMetaData* _t_metadata = nullptr;
 
     // _tracing_file_reader wraps _file_reader.
diff --git a/be/src/format/table/hive/hive_parquet_nested_column_utils.cpp b/be/src/format/table/hive/hive_parquet_nested_column_utils.cpp
index d9d7642afeb888..b0d222f9d36797 100644
--- a/be/src/format/table/hive/hive_parquet_nested_column_utils.cpp
+++ b/be/src/format/table/hive/hive_parquet_nested_column_utils.cpp
@@ -17,154 +17,14 @@
 
 #include "format/table/hive/hive_parquet_nested_column_utils.h"
 
-#include <algorithm>
-#include <memory>
-#include <set>
-#include <string>
-#include <unordered_map>
-#include <vector>
-
-#include "format/parquet/schema_desc.h"
-#include "format/table/table_schema_change_helper.h"
+#include "format/parquet/parquet_nested_column_utils.h"
 
 namespace doris {
 
 void HiveParquetNestedColumnUtils::extract_nested_column_ids(
         const FieldSchema& field_schema, const std::vector<std::vector<std::string>>& paths,
         std::set<uint64_t>& column_ids) {
-    // Group paths by first field_id
-    std::unordered_map<std::string, std::vector<std::vector<std::string>>>
-            child_paths_by_table_col_name;
-
-    for (const auto& path : paths) {
-        if (!path.empty()) {
-            std::string first_table_col_name = path[0];
-            std::vector<std::string> remaining;
-            if (path.size() > 1) {
-                remaining.assign(path.begin() + 1, path.end());
-            }
-            child_paths_by_table_col_name[first_table_col_name].push_back(std::move(remaining));
-        }
-    }
-
-    // Track whether any child column was added to determine if parent should be included
-    bool has_child_columns = false;
-
-    // For MAP type, normalize wildcard "*" to explicit KEYS/VALUES access
-    // Wildcard in MAP context means accessing both map keys and values
-    // Normalization logic:
-    //   path: ["map_col", "*"]              → ["map_col", "VALUES"] + ["map_col", "KEYS"]
-    //   path: ["map_col", "*", "field"]     → ["map_col", "VALUES", "field"] + ["map_col", "KEYS"]
-    if (field_schema.data_type->get_primitive_type() == PrimitiveType::TYPE_MAP) {
-        auto wildcard_it = child_paths_by_table_col_name.find("*");
-        if (wildcard_it != child_paths_by_table_col_name.end()) {
-            auto& wildcard_paths = wildcard_it->second;
-
-            // All wildcard paths go to VALUES
-            auto& values_paths = child_paths_by_table_col_name["VALUES"];
-            values_paths.insert(values_paths.end(), wildcard_paths.begin(), wildcard_paths.end());
-
-            // Always add KEYS for wildcard access
-            auto& keys_paths = child_paths_by_table_col_name["KEYS"];
-            // Add an empty path to request full KEYS
-            std::vector<std::string> empty_path;
-            keys_paths.push_back(empty_path);
-
-            // Remove wildcard entry as it's been expanded
-            child_paths_by_table_col_name.erase(wildcard_it);
-        }
-    }
-
-    // Efficiently traverse children
-    for (uint64_t i = 0; i < field_schema.children.size(); ++i) {
-        const auto& child = field_schema.children[i];
-        std::string child_field_name;
-
-        bool is_list = field_schema.data_type->get_primitive_type() == PrimitiveType::TYPE_ARRAY;
-        bool is_map = field_schema.data_type->get_primitive_type() == PrimitiveType::TYPE_MAP;
-
-        if (is_list) {
-            child_field_name = "*";
-        } else if (is_map) {
-            // After wildcard normalization above, all MAP accesses are explicit KEYS/VALUES
-            // Simply assign the appropriate field name based on which child we're processing
-            if (i == 0) {
-                child_field_name = "KEYS";
-            } else if (i == 1) {
-                child_field_name = "VALUES";
-            }
-
-            // Special handling for Parquet MAP structure:
-            // When accessing only VALUES, we still need KEY structure for levels
-            // Check if we're at key child (i==0) and only VALUES is requested (no KEYS)
-            if (i == 0) {
-                bool has_keys_access = child_paths_by_table_col_name.find("KEYS") !=
-                                       child_paths_by_table_col_name.end();
-                bool has_values_access = child_paths_by_table_col_name.find("VALUES") !=
-                                         child_paths_by_table_col_name.end();
-
-                // If only VALUES is accessed (not KEYS), still include key structure for RL/DL
-                if (!has_keys_access && has_values_access) {
-                    // For map_values() queries, we need key's structure for correct RL/DL parsing.
-                    // If key is a nested type (e.g., STRUCT), RL/DL info is stored at leaf columns.
-                    // Add all column IDs from key's start to max (all leaves + intermediate nodes).
-                    uint64_t key_start_id = child.get_column_id();
-                    uint64_t key_max_id = child.get_max_column_id();
-                    for (uint64_t id = key_start_id; id <= key_max_id; ++id) {
-                        column_ids.insert(id);
-                    }
-                    has_child_columns = true;
-                    continue; // Skip further processing of key child
-                }
-            }
-
-        } else {
-            child_field_name = child.lower_case_name;
-        }
-
-        if (child_field_name.empty()) {
-            continue;
-        }
-
-        auto child_paths_it = child_paths_by_table_col_name.find(child_field_name);
-        if (child_paths_it != child_paths_by_table_col_name.end()) {
-            const auto& child_paths = child_paths_it->second;
-
-            // Check if any child path is empty (meaning full child needed)
-            bool needs_full_child =
-                    std::any_of(child_paths.begin(), child_paths.end(),
-                                [](const std::vector<std::string>& path) { return path.empty(); });
-
-            if (needs_full_child) {
-                // Add all column IDs from current child node to max_column_id
-                // This efficiently handles all nested/complex cases in one loop
-                uint64_t start_id = child.get_column_id();
-                uint64_t max_column_id = child.get_max_column_id();
-                for (uint64_t id = start_id; id <= max_column_id; ++id) {
-                    column_ids.insert(id);
-                }
-                has_child_columns = true;
-            } else {
-                // Store current size to check if recursive call added any columns
-                size_t before_size = column_ids.size();
-
-                // Recursively extract from child
-                extract_nested_column_ids(child, child_paths, column_ids);
-
-                // Check if recursive call added any columns
-                if (column_ids.size() > before_size) {
-                    has_child_columns = true;
-                }
-            }
-        }
-    }
-
-    // If any child columns were added, also add the parent column ID
-    // This ensures parent struct/container nodes are included when their children are needed
-    if (has_child_columns) {
-        // Set automatically handles deduplication, so no need to check if it already exists
-        column_ids.insert(field_schema.get_column_id());
-    }
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_name(field_schema, paths, column_ids);
 }
 
 } // namespace doris
diff --git a/be/src/format/table/hive/hive_parquet_nested_column_utils.h b/be/src/format/table/hive/hive_parquet_nested_column_utils.h
index be960c9da8fcd1..ddd237877859af 100644
--- a/be/src/format/table/hive/hive_parquet_nested_column_utils.h
+++ b/be/src/format/table/hive/hive_parquet_nested_column_utils.h
@@ -17,14 +17,11 @@
 
 #pragma once
 
-#include <memory>
+#include <cstdint>
 #include <set>
 #include <string>
-#include <unordered_map>
 #include <vector>
 
-#include "format/table/table_schema_change_helper.h"
-
 namespace doris {
 
 struct FieldSchema;
diff --git a/be/src/format/table/hive_reader.cpp b/be/src/format/table/hive_reader.cpp
index 1a8d8f79bd9774..4a917398932a8f 100644
--- a/be/src/format/table/hive_reader.cpp
+++ b/be/src/format/table/hive_reader.cpp
@@ -137,7 +137,7 @@ ColumnIdResult HiveOrcReader::_create_column_ids(const orc::Type* orc_type,
 
         // primitive (non-nested) types
         if ((slot->col_type() != TYPE_STRUCT && slot->col_type() != TYPE_ARRAY &&
-             slot->col_type() != TYPE_MAP)) {
+             slot->col_type() != TYPE_MAP && slot->col_type() != TYPE_VARIANT)) {
             column_ids.insert(orc_field->getColumnId());
             if (slot->is_predicate()) {
                 filter_column_ids.insert(orc_field->getColumnId());
@@ -193,7 +193,7 @@ ColumnIdResult HiveOrcReader::_create_column_ids_by_top_level_col_index(
 
         // primitive (non-nested) types
         if ((slot->col_type() != TYPE_STRUCT && slot->col_type() != TYPE_ARRAY &&
-             slot->col_type() != TYPE_MAP)) {
+             slot->col_type() != TYPE_MAP && slot->col_type() != TYPE_VARIANT)) {
             column_ids.insert(orc_field->getColumnId());
             if (slot->is_predicate()) {
                 filter_column_ids.insert(orc_field->getColumnId());
@@ -240,6 +240,8 @@ Status HiveParquetReader::on_before_init_reader(ReaderInitContext* ctx) {
     const FieldDescriptor* field_desc = nullptr;
     RETURN_IF_ERROR(get_file_metadata_schema(&field_desc));
     DCHECK(field_desc != nullptr);
+    prepare_parquet_file_schema_with_ids(field_desc);
+    field_desc = &parquet_file_schema();
 
     // Build table_info_node based on config
     if (get_state()->query_options().hive_parquet_use_column_names) {
@@ -279,8 +281,9 @@ Status HiveParquetReader::on_before_init_reader(ReaderInitContext* ctx) {
     if (get_state()->query_options().hive_parquet_use_column_names) {
         column_id_result = _create_column_ids(field_desc, ctx->tuple_descriptor);
     } else {
-        column_id_result =
-                _create_column_ids_by_top_level_col_index(field_desc, ctx->tuple_descriptor);
+        column_id_result = _create_column_ids_by_top_level_col_index(
+                field_desc, ctx->tuple_descriptor, ctx->column_names,
+                get_scan_params().column_idxs);
     }
     ctx->column_ids = std::move(column_id_result.column_ids);
     ctx->filter_column_ids = std::move(column_id_result.filter_column_ids);
@@ -291,9 +294,8 @@ Status HiveParquetReader::on_before_init_reader(ReaderInitContext* ctx) {
 
 ColumnIdResult HiveParquetReader::_create_column_ids(const FieldDescriptor* field_desc,
                                                      const TupleDescriptor* tuple_descriptor) {
-    // First, assign column IDs to the field descriptor
-    auto* mutable_field_desc = const_cast<FieldDescriptor*>(field_desc);
-    mutable_field_desc->assign_ids();
+    FieldDescriptor field_desc_with_ids = field_desc->copy_with_assigned_ids();
+    field_desc = &field_desc_with_ids;
 
     // map top-level table column name (lower-cased) -> FieldSchema*
     std::unordered_map<std::string, const FieldSchema*> table_col_name_to_field_schema_map;
@@ -328,7 +330,7 @@ ColumnIdResult HiveParquetReader::_create_column_ids(const FieldDescriptor* fiel
 
         // primitive (non-nested) types
         if ((slot->col_type() != TYPE_STRUCT && slot->col_type() != TYPE_ARRAY &&
-             slot->col_type() != TYPE_MAP)) {
+             slot->col_type() != TYPE_MAP && slot->col_type() != TYPE_VARIANT)) {
             column_ids.insert(field_schema->column_id);
 
             if (slot->is_predicate()) {
@@ -351,18 +353,24 @@ ColumnIdResult HiveParquetReader::_create_column_ids(const FieldDescriptor* fiel
 }
 
 ColumnIdResult HiveParquetReader::_create_column_ids_by_top_level_col_index(
-        const FieldDescriptor* field_desc, const TupleDescriptor* tuple_descriptor) {
-    // First, assign column IDs to the field descriptor
-    auto* mutable_field_desc = const_cast<FieldDescriptor*>(field_desc);
-    mutable_field_desc->assign_ids();
-
-    // map top-level table column position -> FieldSchema*
-    std::unordered_map<uint64_t, const FieldSchema*> table_col_pos_to_field_schema_map;
-    for (int i = 0; i < field_desc->size(); ++i) {
-        auto field_schema = field_desc->get_column(i);
-        if (!field_schema) continue;
-
-        table_col_pos_to_field_schema_map[i] = field_schema;
+        const FieldDescriptor* field_desc, const TupleDescriptor* tuple_descriptor,
+        const std::vector<std::string>& table_column_names,
+        const std::vector<int32_t>& file_column_idxs) {
+    FieldDescriptor field_desc_with_ids = field_desc->copy_with_assigned_ids();
+    field_desc = &field_desc_with_ids;
+
+    // map top-level table column name -> file FieldSchema* using the same by-position mapping
+    // that builds table_info_node.
+    DORIS_CHECK(table_column_names.size() == file_column_idxs.size());
+    std::unordered_map<std::string, const FieldSchema*> table_col_name_to_field_schema_map;
+    const auto& parquet_fields_schema = field_desc->get_fields_schema();
+    for (size_t idx = 0; idx < file_column_idxs.size(); ++idx) {
+        const int32_t file_index = file_column_idxs[idx];
+        if (file_index >= parquet_fields_schema.size()) {
+            continue;
+        }
+        table_col_name_to_field_schema_map[to_lower(table_column_names[idx])] =
+                &parquet_fields_schema[file_index];
     }
 
     std::set<uint64_t> column_ids;
@@ -380,8 +388,8 @@ ColumnIdResult HiveParquetReader::_create_column_ids_by_top_level_col_index(
     };
 
     for (const auto* slot : tuple_descriptor->slots()) {
-        auto it = table_col_pos_to_field_schema_map.find(slot->col_pos());
-        if (it == table_col_pos_to_field_schema_map.end()) {
+        auto it = table_col_name_to_field_schema_map.find(slot->col_name_lower_case());
+        if (it == table_col_name_to_field_schema_map.end()) {
             // Column not found in file
             continue;
         }
@@ -389,7 +397,7 @@ ColumnIdResult HiveParquetReader::_create_column_ids_by_top_level_col_index(
 
         // primitive (non-nested) types
         if ((slot->col_type() != TYPE_STRUCT && slot->col_type() != TYPE_ARRAY &&
-             slot->col_type() != TYPE_MAP)) {
+             slot->col_type() != TYPE_MAP && slot->col_type() != TYPE_VARIANT)) {
             column_ids.insert(field_schema->column_id);
 
             if (slot->is_predicate()) {
diff --git a/be/src/format/table/hive_reader.h b/be/src/format/table/hive_reader.h
index 9bcaa0536e7374..57ea175264219d 100644
--- a/be/src/format/table/hive_reader.h
+++ b/be/src/format/table/hive_reader.h
@@ -72,7 +72,9 @@ class HiveParquetReader final : public ParquetReader, public TableSchemaChangeHe
                                              const TupleDescriptor* tuple_descriptor);
 
     static ColumnIdResult _create_column_ids_by_top_level_col_index(
-            const FieldDescriptor* field_desc, const TupleDescriptor* tuple_descriptor);
+            const FieldDescriptor* field_desc, const TupleDescriptor* tuple_descriptor,
+            const std::vector<std::string>& table_column_names,
+            const std::vector<int32_t>& file_column_idxs);
 
     const std::set<TSlotId>* _is_file_slot = nullptr;
 };
diff --git a/be/src/format/table/iceberg/arrow_schema_util.cpp b/be/src/format/table/iceberg/arrow_schema_util.cpp
index e0bf830dfc8168..aa6e6a7ad60e5f 100644
--- a/be/src/format/table/iceberg/arrow_schema_util.cpp
+++ b/be/src/format/table/iceberg/arrow_schema_util.cpp
@@ -119,6 +119,9 @@ Status ArrowSchemaUtil::convert_to(const iceberg::NestedField& field,
         break;
     }
 
+    case iceberg::TypeID::VARIANT:
+        return Status::NotSupported("Iceberg VARIANT write is not supported");
+
     case iceberg::TypeID::TIME:
     default:
         return Status::InternalError("Unsupported field type:" + field.field_type()->to_string());
diff --git a/be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp b/be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp
index 726a66b580f541..d8a51f2ec17e05 100644
--- a/be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp
+++ b/be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.cpp
@@ -17,155 +17,15 @@
 
 #include "format/table/iceberg/iceberg_parquet_nested_column_utils.h"
 
-#include <algorithm>
-#include <iostream>
-#include <memory>
-#include <set>
-#include <string>
-#include <unordered_map>
-#include <vector>
-
-#include "format/parquet/schema_desc.h"
-#include "format/table/table_schema_change_helper.h"
+#include "format/parquet/parquet_nested_column_utils.h"
 
 namespace doris {
 
 void IcebergParquetNestedColumnUtils::extract_nested_column_ids(
         const FieldSchema& field_schema, const std::vector<std::vector<std::string>>& paths,
         std::set<uint64_t>& column_ids) {
-    // Group paths by first field_id
-    std::unordered_map<std::string, std::vector<std::vector<std::string>>> child_paths_by_field_id;
-
-    for (const auto& path : paths) {
-        if (!path.empty()) {
-            std::string first_field_id = path[0];
-            std::vector<std::string> remaining;
-            if (path.size() > 1) {
-                remaining.assign(path.begin() + 1, path.end());
-            }
-            child_paths_by_field_id[first_field_id].push_back(std::move(remaining));
-        }
-    }
-
-    // Track whether any child column was added to determine if parent should be included
-    bool has_child_columns = false;
-
-    // For MAP type, normalize wildcard "*" to explicit KEYS/VALUES access
-    // Wildcard in MAP context means accessing both map keys and values
-    // Normalization logic:
-    //   path: ["map_col", "*"]              → ["map_col", "VALUES"] + ["map_col", "KEYS"]
-    //   path: ["map_col", "*", "field"]     → ["map_col", "VALUES", "field"] + ["map_col", "KEYS"]
-    if (field_schema.data_type->get_primitive_type() == PrimitiveType::TYPE_MAP) {
-        auto wildcard_it = child_paths_by_field_id.find("*");
-        if (wildcard_it != child_paths_by_field_id.end()) {
-            auto& wildcard_paths = wildcard_it->second;
-
-            // All wildcard paths go to VALUES
-            auto& values_paths = child_paths_by_field_id["VALUES"];
-            values_paths.insert(values_paths.end(), wildcard_paths.begin(), wildcard_paths.end());
-
-            // Always add KEYS for wildcard access
-            auto& keys_paths = child_paths_by_field_id["KEYS"];
-            // Add an empty path to request full KEYS
-            std::vector<std::string> empty_path;
-            keys_paths.push_back(empty_path);
-
-            // Remove wildcard entry as it's been expanded
-            child_paths_by_field_id.erase(wildcard_it);
-        }
-    }
-
-    // Efficiently traverse children
-    for (uint64_t i = 0; i < field_schema.children.size(); ++i) {
-        const auto& child = field_schema.children[i];
-
-        std::string child_field_id;
-
-        bool is_list = field_schema.data_type->get_primitive_type() == PrimitiveType::TYPE_ARRAY;
-        bool is_map = field_schema.data_type->get_primitive_type() == PrimitiveType::TYPE_MAP;
-
-        if (is_list) {
-            child_field_id = "*";
-        } else if (is_map) {
-            // After wildcard normalization above, all MAP accesses are explicit KEYS/VALUES
-            // Simply assign the appropriate field name based on which child we're processing
-            if (i == 0) {
-                child_field_id = "KEYS";
-            } else if (i == 1) {
-                child_field_id = "VALUES";
-            }
-
-            // Special handling for Parquet MAP structure:
-            // When accessing only VALUES, we still need KEY structure for levels
-            // Check if we're at key child (i==0) and only VALUES is requested (no KEYS)
-            if (i == 0) {
-                bool has_keys_access =
-                        child_paths_by_field_id.find("KEYS") != child_paths_by_field_id.end();
-                bool has_values_access =
-                        child_paths_by_field_id.find("VALUES") != child_paths_by_field_id.end();
-
-                // If only VALUES is accessed (not KEYS), still include key structure for RL/DL
-                if (!has_keys_access && has_values_access) {
-                    // For map_values() queries, we need key's structure for correct RL/DL parsing.
-                    // If key is a nested type (e.g., STRUCT), RL/DL info is stored at leaf columns.
-                    // Add all column IDs from key's start to max (all leaves + intermediate nodes).
-                    uint64_t key_start_id = child.get_column_id();
-                    uint64_t key_max_id = child.get_max_column_id();
-                    for (uint64_t id = key_start_id; id <= key_max_id; ++id) {
-                        column_ids.insert(id);
-                    }
-                    has_child_columns = true;
-                    continue; // Skip further processing of key child
-                }
-            }
-
-        } else {
-            child_field_id = std::to_string(child.field_id);
-        }
-
-        if (child_field_id.empty() || child_field_id == "-1") {
-            continue;
-        }
-
-        auto child_paths_it = child_paths_by_field_id.find(child_field_id);
-        if (child_paths_it != child_paths_by_field_id.end()) {
-            const auto& child_paths = child_paths_it->second;
-
-            // Check if any child path is empty (meaning full child needed)
-            bool needs_full_child =
-                    std::any_of(child_paths.begin(), child_paths.end(),
-                                [](const std::vector<std::string>& path) { return path.empty(); });
-
-            if (needs_full_child) {
-                // Add all column IDs from current child node to max_column_id
-                // This efficiently handles all nested/complex cases in one loop
-                uint64_t start_id = child.get_column_id();
-                uint64_t max_column_id = child.get_max_column_id();
-                for (uint64_t id = start_id; id <= max_column_id; ++id) {
-                    column_ids.insert(id);
-                }
-                has_child_columns = true;
-            } else {
-                // Store current size to check if recursive call added any columns
-                size_t before_size = column_ids.size();
-
-                // Recursively extract from child
-                extract_nested_column_ids(child, child_paths, column_ids);
-
-                // Check if recursive call added any columns
-                if (column_ids.size() > before_size) {
-                    has_child_columns = true;
-                }
-            }
-        }
-    }
-
-    // If any child columns were added, also add the parent column ID
-    // This ensures parent struct/container nodes are included when their children are needed
-    if (has_child_columns) {
-        // Set automatically handles deduplication, so no need to check if it already exists
-        column_ids.insert(field_schema.get_column_id());
-    }
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_field_id(field_schema, paths,
+                                                                    column_ids);
 }
 
 } // namespace doris
diff --git a/be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.h b/be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.h
index 39c1b90fac0977..bf54823b7a32f8 100644
--- a/be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.h
+++ b/be/src/format/table/iceberg/iceberg_parquet_nested_column_utils.h
@@ -17,17 +17,13 @@
 
 #pragma once
 
-#include <memory>
+#include <cstdint>
 #include <set>
 #include <string>
-#include <unordered_map>
 #include <vector>
 
-#include "format/table/table_schema_change_helper.h"
-
 namespace doris {
 
-class FieldDescriptor;
 struct FieldSchema;
 
 class IcebergParquetNestedColumnUtils {
diff --git a/be/src/format/table/iceberg/types.cpp b/be/src/format/table/iceberg/types.cpp
index 252f9035518c0b..75a9f5c939d467 100644
--- a/be/src/format/table/iceberg/types.cpp
+++ b/be/src/format/table/iceberg/types.cpp
@@ -170,6 +170,8 @@ std::unique_ptr<PrimitiveType> Types::from_primitive_string(const std::string& t
         return std::make_unique<UUIDType>();
     } else if (lower_type_string == "binary") {
         return std::make_unique<BinaryType>();
+    } else if (lower_type_string == "variant") {
+        return std::make_unique<VariantType>();
     } else {
         std::regex fixed(R"(fixed\[\s*(\d+)\s*\])");
         std::regex decimal(R"(decimal\(\s*(\d+)\s*,\s*(\d+)\s*\))");
diff --git a/be/src/format/table/iceberg/types.h b/be/src/format/table/iceberg/types.h
index 53c54e238fa255..09b00defb9bcde 100644
--- a/be/src/format/table/iceberg/types.h
+++ b/be/src/format/table/iceberg/types.h
@@ -46,6 +46,7 @@ enum TypeID {
     FIXED,
     BINARY,
     DECIMAL,
+    VARIANT,
     STRUCT,
     LIST,
     MAP
@@ -394,6 +395,15 @@ class BooleanType : public PrimitiveType {
     std::string to_string() const override { return "boolean"; }
 };
 
+class VariantType : public PrimitiveType {
+public:
+    ~VariantType() override = default;
+
+    TypeID type_id() const override { return TypeID::VARIANT; }
+
+    std::string to_string() const override { return "variant"; }
+};
+
 class Types {
 public:
     static std::unique_ptr<PrimitiveType> from_primitive_string(const std::string& type_string);
diff --git a/be/src/format/table/iceberg_reader.cpp b/be/src/format/table/iceberg_reader.cpp
index 7a74431a05851b..2c1e40c236fbdc 100644
--- a/be/src/format/table/iceberg_reader.cpp
+++ b/be/src/format/table/iceberg_reader.cpp
@@ -136,6 +136,8 @@ Status IcebergParquetReader::on_before_init_reader(ReaderInitContext* ctx) {
     const FieldDescriptor* field_desc = nullptr;
     RETURN_IF_ERROR(this->get_file_metadata_schema(&field_desc));
     DCHECK(field_desc != nullptr);
+    this->prepare_parquet_file_schema_with_ids(field_desc);
+    field_desc = &this->parquet_file_schema();
 
     // Build table_info_node by field_id or name matching.
     // This must happen BEFORE column classification so we can use children_column_exists
@@ -312,8 +314,8 @@ Status IcebergParquetReader::on_before_init_reader(ReaderInitContext* ctx) {
 // ============================================================================
 ColumnIdResult IcebergParquetReader::_create_column_ids(const FieldDescriptor* field_desc,
                                                         const TupleDescriptor* tuple_descriptor) {
-    auto* mutable_field_desc = const_cast<FieldDescriptor*>(field_desc);
-    mutable_field_desc->assign_ids();
+    FieldDescriptor field_desc_with_ids = field_desc->copy_with_assigned_ids();
+    field_desc = &field_desc_with_ids;
 
     std::unordered_map<int, const FieldSchema*> iceberg_id_to_field_schema_map;
     for (int i = 0; i < field_desc->size(); ++i) {
@@ -344,7 +346,7 @@ ColumnIdResult IcebergParquetReader::_create_column_ids(const FieldDescriptor* f
         auto field_schema = it->second;
 
         if ((slot->col_type() != TYPE_STRUCT && slot->col_type() != TYPE_ARRAY &&
-             slot->col_type() != TYPE_MAP)) {
+             slot->col_type() != TYPE_MAP && slot->col_type() != TYPE_VARIANT)) {
             column_ids.insert(field_schema->column_id);
             if (slot->is_predicate()) {
                 filter_column_ids.insert(field_schema->column_id);
diff --git a/be/test/format/parquet/delta_byte_array_decoder_test.cpp b/be/test/format/parquet/delta_byte_array_decoder_test.cpp
index 1b039da3d2344d..4ebab87320f3f1 100644
--- a/be/test/format/parquet/delta_byte_array_decoder_test.cpp
+++ b/be/test/format/parquet/delta_byte_array_decoder_test.cpp
@@ -20,9 +20,13 @@
 #include <gtest/gtest.h>
 
 #include "arrow/api.h"
+#include "core/column/column_nullable.h"
+#include "core/column/column_varbinary.h"
 #include "core/column/column_vector.h"
+#include "core/data_type/data_type_nullable.h"
 #include "core/data_type/data_type_number.h"
 #include "core/data_type/data_type_string.h"
+#include "core/data_type/data_type_varbinary.h"
 #include "format/parquet/delta_bit_pack_decoder.h"
 #include "parquet/encoding.h"
 #include "parquet/schema.h"
@@ -38,6 +42,28 @@ class DeltaByteArrayDecoderTest : public ::testing::Test {
     std::unique_ptr<DeltaByteArrayDecoder> _decoder;
 };
 
+static std::vector<parquet::ByteArray> make_byte_array_values(
+        const std::vector<std::string>& values) {
+    std::vector<parquet::ByteArray> byte_array_values;
+    byte_array_values.reserve(values.size());
+    for (const auto& value : values) {
+        byte_array_values.emplace_back(static_cast<uint32_t>(value.size()),
+                                       reinterpret_cast<const uint8_t*>(value.data()));
+    }
+    return byte_array_values;
+}
+
+static Status init_all_selected_nullable_vector(size_t num_values,
+                                                std::vector<uint16_t>* run_length_null_map,
+                                                std::vector<uint8_t>* filter_data,
+                                                FilterMap* filter_map, NullMap* null_map,
+                                                ColumnSelectVector* select_vector) {
+    run_length_null_map->assign(num_values, 1);
+    filter_data->assign(num_values, 1);
+    RETURN_IF_ERROR(filter_map->init(filter_data->data(), filter_data->size(), false));
+    return select_vector->init(*run_length_null_map, num_values, null_map, filter_map, 0);
+}
+
 // Test basic decoding byte array functionality
 TEST_F(DeltaByteArrayDecoderTest, test_basic_decode_byte_array) {
     // Create ColumnDescriptor
@@ -47,12 +73,7 @@ TEST_F(DeltaByteArrayDecoderTest, test_basic_decode_byte_array) {
 
     // Prepare original data
     std::vector<std::string> values = {"Hello", "World", "Foobar", "ABCDEF"};
-    std::vector<parquet::ByteArray> byte_array_values;
-    for (const auto& value : values) {
-        byte_array_values.emplace_back(
-                parquet::ByteArray {static_cast<uint32_t>(value.size()),
-                                    reinterpret_cast<const uint8_t*>(value.data())});
-    }
+    auto byte_array_values = make_byte_array_values(values);
 
     // Create encoder
     auto encoder = MakeTypedEncoder<parquet::ByteArrayType>(parquet::Encoding::DELTA_BYTE_ARRAY,
@@ -100,12 +121,7 @@ TEST_F(DeltaByteArrayDecoderTest, test_decode_byte_array_with_filter) {
 
     // Prepare original data
     std::vector<std::string> values = {"Hello", "World", "Foobar", "ABCDEF"};
-    std::vector<parquet::ByteArray> byte_array_values;
-    for (const auto& value : values) {
-        byte_array_values.emplace_back(
-                parquet::ByteArray {static_cast<uint32_t>(value.size()),
-                                    reinterpret_cast<const uint8_t*>(value.data())});
-    }
+    auto byte_array_values = make_byte_array_values(values);
 
     // Create encoder
     auto encoder = MakeTypedEncoder<parquet::ByteArrayType>(parquet::Encoding::DELTA_BYTE_ARRAY,
@@ -152,12 +168,7 @@ TEST_F(DeltaByteArrayDecoderTest, test_decode_byte_array_with_filter_and_null) {
 
     // Prepare original data
     std::vector<std::string> values = {"Hello", "World", "ABCDEF"};
-    std::vector<parquet::ByteArray> byte_array_values;
-    for (const auto& value : values) {
-        byte_array_values.emplace_back(
-                parquet::ByteArray {static_cast<uint32_t>(value.size()),
-                                    reinterpret_cast<const uint8_t*>(value.data())});
-    }
+    auto byte_array_values = make_byte_array_values(values);
 
     // Create encoder
     auto encoder = MakeTypedEncoder<parquet::ByteArrayType>(parquet::Encoding::DELTA_BYTE_ARRAY,
@@ -209,6 +220,49 @@ TEST_F(DeltaByteArrayDecoderTest, test_decode_byte_array_with_filter_and_null) {
     }
 }
 
+TEST_F(DeltaByteArrayDecoderTest, test_decode_nullable_varbinary) {
+    auto node = parquet::schema::PrimitiveNode::Make("test_column", parquet::Repetition::OPTIONAL,
+                                                     parquet::Type::BYTE_ARRAY);
+    auto descr = std::make_shared<parquet::ColumnDescriptor>(node, 0, 1);
+
+    std::vector<std::string> values = {"hello", std::string("\x01\xff", 2)};
+    auto byte_array_values = make_byte_array_values(values);
+
+    auto encoder = MakeTypedEncoder<parquet::ByteArrayType>(parquet::Encoding::DELTA_BYTE_ARRAY,
+                                                            /*use_dictionary=*/false, descr.get());
+    ASSERT_NO_THROW(
+            encoder->Put(byte_array_values.data(), static_cast<int>(byte_array_values.size())));
+
+    auto encoded_buffer = encoder->FlushValues();
+    Slice data_slice(encoded_buffer->data(), encoded_buffer->size());
+    ASSERT_TRUE(_decoder->set_data(&data_slice).ok());
+
+    DataTypePtr data_type = make_nullable(std::make_shared<DataTypeVarbinary>());
+    MutableColumnPtr column = data_type->create_column();
+
+    constexpr size_t num_values = 3;
+    std::vector<uint16_t> run_length_null_map;
+    std::vector<uint8_t> filter_data;
+    FilterMap filter_map;
+    ColumnSelectVector select_vector;
+    NullMap null_map;
+    ASSERT_TRUE(init_all_selected_nullable_vector(num_values, &run_length_null_map, &filter_data,
+                                                  &filter_map, &null_map, &select_vector)
+                        .ok());
+
+    ASSERT_TRUE(_decoder->decode_values(column, data_type, select_vector, false).ok());
+
+    ASSERT_EQ(column->size(), num_values);
+    const auto* nullable_column = assert_cast<const ColumnNullable*>(column.get());
+    const auto& result_column =
+            assert_cast<const ColumnVarbinary&>(nullable_column->get_nested_column());
+    EXPECT_EQ(nullable_column->get_null_map_data()[0], 0);
+    EXPECT_EQ(nullable_column->get_null_map_data()[1], 1);
+    EXPECT_EQ(nullable_column->get_null_map_data()[2], 0);
+    EXPECT_EQ(result_column.get_data_at(0).to_string(), values[0]);
+    EXPECT_EQ(result_column.get_data_at(2).to_string(), values[1]);
+}
+
 // Test skipping values for byte array decoding
 TEST_F(DeltaByteArrayDecoderTest, test_skip_value_for_byte_array) {
     // Create ColumnDescriptor
diff --git a/be/test/format/parquet/parquet_expr_test.cpp b/be/test/format/parquet/parquet_expr_test.cpp
index 83a83e71d3098d..def801cf39cd48 100644
--- a/be/test/format/parquet/parquet_expr_test.cpp
+++ b/be/test/format/parquet/parquet_expr_test.cpp
@@ -83,6 +83,37 @@ class VExprContext;
 //using namespace iceberg;
 using namespace parquet;
 
+namespace {
+
+std::vector<tparquet::SchemaElement> make_variant_root_schema(const std::string& column_name) {
+    tparquet::SchemaElement root;
+    root.__set_name("schema");
+    root.__set_num_children(1);
+
+    tparquet::LogicalType variant_type;
+    variant_type.__set_VARIANT(tparquet::VariantType());
+
+    tparquet::SchemaElement variant;
+    variant.__set_name(column_name);
+    variant.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+    variant.__set_num_children(2);
+    variant.__set_logicalType(variant_type);
+
+    tparquet::SchemaElement metadata;
+    metadata.__set_name("metadata");
+    metadata.__set_type(tparquet::Type::BYTE_ARRAY);
+    metadata.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+
+    tparquet::SchemaElement value;
+    value.__set_name("value");
+    value.__set_type(tparquet::Type::BYTE_ARRAY);
+    value.__set_repetition_type(tparquet::FieldRepetitionType::OPTIONAL);
+
+    return {root, variant, metadata, value};
+}
+
+} // namespace
+
 class ParquetExprTest : public testing::Test {
 public:
     ParquetExprTest() {}
@@ -1173,6 +1204,37 @@ TEST_F(ParquetExprTest, test_expr_push_down_and) {
     ASSERT_TRUE(filter_group);
 }
 
+TEST_F(ParquetExprTest, test_row_group_stats_skip_top_level_variant_root) {
+    FieldDescriptor descriptor;
+    Status st = descriptor.parse_from_thrift(make_variant_root_schema("int64_col"));
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    p_reader->prepare_parquet_file_schema_with_ids(&descriptor);
+
+    std::unique_ptr<MutilColumnBlockPredicate> pred = AndBlockColumnPredicate::create_unique();
+    pred->add_column_predicate(SingleColumnBlockPredicate::create_unique(
+            ComparisonPredicateBase<TYPE_BIGINT, PredicateType::EQ>::create_shared(
+                    2, "", Field::create_field<TYPE_BIGINT>(10000000001))));
+
+    p_reader->_push_down_predicates.clear();
+    p_reader->_push_down_predicates.push_back(std::move(pred));
+    p_reader->_enable_filter_by_min_max = true;
+    p_reader->_enable_filter_by_bloom_filter = true;
+
+    tparquet::RowGroup row_group;
+    row_group.__set_num_rows(3);
+
+    bool filter_group = false;
+    bool filtered_by_min_max = false;
+    bool filtered_by_bloom_filter = false;
+    ASSERT_TRUE(p_reader->_process_column_stat_filter(row_group, p_reader->_push_down_predicates,
+                                                      &filter_group, &filtered_by_min_max,
+                                                      &filtered_by_bloom_filter)
+                        .ok());
+    EXPECT_FALSE(filter_group);
+    EXPECT_FALSE(filtered_by_min_max);
+    EXPECT_FALSE(filtered_by_bloom_filter);
+}
+
 TEST_F(ParquetExprTest, test_expr_push_down_or_string) {
     auto or_expr = std::make_shared<VCompoundPred>();
     or_expr->_op = TExprOpcode::COMPOUND_OR;
diff --git a/be/test/format/parquet/parquet_variant_reader_test.cpp b/be/test/format/parquet/parquet_variant_reader_test.cpp
new file mode 100644
index 00000000000000..74ad54453b7428
--- /dev/null
+++ b/be/test/format/parquet/parquet_variant_reader_test.cpp
@@ -0,0 +1,2994 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "format/parquet/parquet_variant_reader.h"
+
+#include <cctz/time_zone.h>
+#include <gtest/gtest.h>
+
+#include <cmath>
+#include <cstdint>
+#include <deque>
+#include <initializer_list>
+#include <limits>
+#include <set>
+#include <string>
+#include <string_view>
+#include <utility>
+#include <vector>
+
+#include "core/column/column_nullable.h"
+#include "core/column/column_variant.h"
+#include "core/column/column_vector.h"
+#include "core/data_type/data_type_array.h"
+#include "core/data_type/data_type_decimal.h"
+#include "core/data_type/data_type_map.h"
+#include "core/data_type/data_type_nullable.h"
+#include "core/data_type/data_type_number.h"
+#include "core/data_type/data_type_string.h"
+#include "core/data_type/data_type_struct.h"
+#include "core/data_type/data_type_time.h"
+#include "core/data_type/data_type_varbinary.h"
+#include "core/data_type/data_type_variant.h"
+#include "core/data_type/primitive_type.h"
+#include "core/data_type_serde/data_type_serde.h"
+#include "core/field.h"
+#include "format/parquet/parquet_column_convert.h"
+#include "format/parquet/schema_desc.h"
+#include "format/parquet/vparquet_column_reader.h"
+
+namespace doris::parquet {
+namespace {
+
+StringRef bytes_ref(const std::vector<uint8_t>& bytes) {
+    return {bytes.data(), bytes.size()};
+}
+
+void append_int64_le(std::vector<uint8_t>* bytes, int64_t value) {
+    auto unsigned_value = static_cast<uint64_t>(value);
+    for (int i = 0; i < 8; ++i) {
+        bytes->push_back(static_cast<uint8_t>(unsigned_value >> (i * 8)));
+    }
+}
+
+std::vector<uint8_t> make_metadata(std::initializer_list<std::string_view> keys,
+                                   bool sorted_strings = false) {
+    const uint8_t header = sorted_strings ? 0x11 : 0x01;
+    std::vector<uint8_t> metadata {header, static_cast<uint8_t>(keys.size())};
+    uint8_t offset = 0;
+    metadata.push_back(offset);
+    for (std::string_view key : keys) {
+        offset += static_cast<uint8_t>(key.size());
+        metadata.push_back(offset);
+    }
+    for (std::string_view key : keys) {
+        metadata.insert(metadata.end(), key.begin(), key.end());
+    }
+    return metadata;
+}
+
+void expect_variant_json(const std::vector<uint8_t>& metadata, std::initializer_list<uint8_t> value,
+                         std::string_view expected) {
+    std::vector<uint8_t> value_bytes(value);
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value_bytes), &json);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_EQ(expected, json);
+}
+
+void expect_variant_corruption(const std::vector<uint8_t>& metadata,
+                               std::initializer_list<uint8_t> value) {
+    std::vector<uint8_t> value_bytes(value);
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value_bytes), &json);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+FieldSchema make_int32_field_schema(std::string name) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::INT32;
+    field.data_type = make_nullable(std::make_shared<DataTypeInt32>());
+    return field;
+}
+
+FieldSchema make_int64_field_schema(std::string name) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::INT64;
+    field.data_type = make_nullable(std::make_shared<DataTypeInt64>());
+    return field;
+}
+
+FieldSchema make_float_field_schema(std::string name) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::FLOAT;
+    field.data_type = make_nullable(std::make_shared<DataTypeFloat32>());
+    return field;
+}
+
+FieldSchema make_double_field_schema(std::string name) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::DOUBLE;
+    field.data_type = make_nullable(std::make_shared<DataTypeFloat64>());
+    return field;
+}
+
+FieldSchema make_varbinary_field_schema(std::string name) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::BYTE_ARRAY;
+    field.data_type = make_nullable(std::make_shared<DataTypeVarbinary>());
+    return field;
+}
+
+Field make_varbinary_field(std::initializer_list<uint8_t> bytes) {
+    const auto* data = reinterpret_cast<const char*>(bytes.begin());
+    return Field::create_field<TYPE_VARBINARY>(
+            StringView(data, static_cast<uint32_t>(bytes.size())));
+}
+
+Field make_varbinary_field(std::string_view bytes) {
+    return Field::create_field<TYPE_VARBINARY>(StringView(bytes));
+}
+
+std::string varbinary_field_bytes(const Field& field) {
+    auto ref = field.get<TYPE_VARBINARY>().to_string_ref();
+    return {ref.data, ref.size};
+}
+
+std::string test_uuid_bytes() {
+    return {"\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f", 16};
+}
+
+FieldSchema make_uuid_field_schema(std::string name) {
+    FieldSchema field = make_varbinary_field_schema(std::move(name));
+    field.physical_type = tparquet::Type::FIXED_LEN_BYTE_ARRAY;
+    field.parquet_schema.__set_logicalType(tparquet::LogicalType());
+    field.parquet_schema.logicalType.__set_UUID(tparquet::UUIDType());
+    field.parquet_schema.__set_type_length(16);
+    return field;
+}
+
+FieldSchema make_datev2_field_schema(std::string name) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::INT32;
+    field.data_type = make_nullable(std::make_shared<DataTypeDateV2>());
+    return field;
+}
+
+FieldSchema make_timev2_field_schema(std::string name) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::INT64;
+    field.data_type = make_nullable(std::make_shared<DataTypeTimeV2>(6));
+    return field;
+}
+
+FieldSchema make_datetimev2_field_schema(std::string name) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::INT64;
+    field.data_type = make_nullable(std::make_shared<DataTypeDateTimeV2>(6));
+    return field;
+}
+
+FieldSchema make_required_int64_field_schema(std::string name) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::INT64;
+    field.data_type = std::make_shared<DataTypeInt64>();
+    return field;
+}
+
+FieldSchema make_binary_field_schema(std::string name, bool nullable) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::BYTE_ARRAY;
+    field.data_type = std::make_shared<DataTypeString>();
+    if (nullable) {
+        field.data_type = make_nullable(field.data_type);
+    }
+    return field;
+}
+
+FieldSchema make_string_field_schema(std::string name, bool nullable) {
+    FieldSchema field = make_binary_field_schema(std::move(name), nullable);
+    tparquet::LogicalType logical_type;
+    logical_type.__set_STRING(tparquet::StringType());
+    field.parquet_schema.__set_logicalType(logical_type);
+    return field;
+}
+
+FieldSchema make_required_shredded_variant_schema() {
+    FieldSchema field;
+    field.name = "measurement";
+    field.lower_case_name = field.name;
+    field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    field.children = {make_binary_field_schema("metadata", false),
+                      make_binary_field_schema("value", true),
+                      make_int64_field_schema("typed_value")};
+    return field;
+}
+
+std::string serialize_variant_field(const Field& field) {
+    auto variant_column = ColumnVariant::create(0, false);
+    variant_column->insert(field);
+    std::string json;
+    DataTypeSerDe::FormatOptions options;
+    variant_column->serialize_one_row_to_string(0, &json, options);
+    return json;
+}
+
+} // namespace
+
+TEST(ParquetVariantReaderTest, ParseTypedOnlyVariantSchemaWithoutTopLevelValue) {
+    tparquet::SchemaElement root;
+    root.__set_name("schema");
+    root.__set_num_children(1);
+
+    tparquet::LogicalType variant_type;
+    variant_type.__set_VARIANT(tparquet::VariantType());
+
+    tparquet::SchemaElement variant;
+    variant.__set_name("v");
+    variant.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+    variant.__set_num_children(2);
+    variant.__set_logicalType(variant_type);
+
+    tparquet::SchemaElement metadata;
+    metadata.__set_name("metadata");
+    metadata.__set_type(tparquet::Type::BYTE_ARRAY);
+    metadata.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+
+    tparquet::SchemaElement typed_value;
+    typed_value.__set_name("typed_value");
+    typed_value.__set_repetition_type(tparquet::FieldRepetitionType::OPTIONAL);
+    typed_value.__set_num_children(1);
+
+    tparquet::SchemaElement metric;
+    metric.__set_name("metric");
+    metric.__set_type(tparquet::Type::INT64);
+    metric.__set_repetition_type(tparquet::FieldRepetitionType::OPTIONAL);
+
+    FieldDescriptor descriptor;
+    Status st = descriptor.parse_from_thrift({root, variant, metadata, typed_value, metric});
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    const auto* variant_field = descriptor.get_column("v");
+    ASSERT_NE(variant_field, nullptr);
+    EXPECT_EQ(variant_field->data_type->get_primitive_type(), TYPE_VARIANT);
+    ASSERT_EQ(variant_field->children.size(), 2);
+    EXPECT_EQ(variant_field->children[0].name, "metadata");
+    EXPECT_EQ(variant_field->children[1].name, "typed_value");
+}
+
+TEST(ParquetVariantReaderTest, RejectVariantSchemaWithUnexpectedChild) {
+    tparquet::SchemaElement root;
+    root.__set_name("schema");
+    root.__set_num_children(1);
+
+    tparquet::LogicalType variant_type;
+    variant_type.__set_VARIANT(tparquet::VariantType());
+
+    tparquet::SchemaElement variant;
+    variant.__set_name("v");
+    variant.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+    variant.__set_num_children(3);
+    variant.__set_logicalType(variant_type);
+
+    tparquet::SchemaElement metadata;
+    metadata.__set_name("metadata");
+    metadata.__set_type(tparquet::Type::BYTE_ARRAY);
+    metadata.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+
+    tparquet::SchemaElement value;
+    value.__set_name("value");
+    value.__set_type(tparquet::Type::BYTE_ARRAY);
+    value.__set_repetition_type(tparquet::FieldRepetitionType::OPTIONAL);
+
+    tparquet::SchemaElement extra;
+    extra.__set_name("extra");
+    extra.__set_type(tparquet::Type::INT32);
+    extra.__set_repetition_type(tparquet::FieldRepetitionType::OPTIONAL);
+
+    FieldDescriptor descriptor;
+    Status st = descriptor.parse_from_thrift({root, variant, metadata, value, extra});
+    EXPECT_TRUE(st.is<ErrorCode::INVALID_ARGUMENT>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RejectVariantSchemaWithDuplicateStructuralChild) {
+    tparquet::SchemaElement root;
+    root.__set_name("schema");
+    root.__set_num_children(1);
+
+    tparquet::LogicalType variant_type;
+    variant_type.__set_VARIANT(tparquet::VariantType());
+
+    tparquet::SchemaElement variant;
+    variant.__set_name("v");
+    variant.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+    variant.__set_num_children(3);
+    variant.__set_logicalType(variant_type);
+
+    tparquet::SchemaElement metadata;
+    metadata.__set_name("metadata");
+    metadata.__set_type(tparquet::Type::BYTE_ARRAY);
+    metadata.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+
+    tparquet::SchemaElement value;
+    value.__set_name("value");
+    value.__set_type(tparquet::Type::BYTE_ARRAY);
+    value.__set_repetition_type(tparquet::FieldRepetitionType::OPTIONAL);
+
+    tparquet::SchemaElement duplicate_value = value;
+
+    FieldDescriptor descriptor;
+    Status st = descriptor.parse_from_thrift({root, variant, metadata, value, duplicate_value});
+    EXPECT_TRUE(st.is<ErrorCode::INVALID_ARGUMENT>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RejectVariantSchemaWithNonBinaryValueChild) {
+    tparquet::SchemaElement root;
+    root.__set_name("schema");
+    root.__set_num_children(1);
+
+    tparquet::LogicalType variant_type;
+    variant_type.__set_VARIANT(tparquet::VariantType());
+
+    tparquet::SchemaElement variant;
+    variant.__set_name("v");
+    variant.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+    variant.__set_num_children(3);
+    variant.__set_logicalType(variant_type);
+
+    tparquet::SchemaElement metadata;
+    metadata.__set_name("metadata");
+    metadata.__set_type(tparquet::Type::BYTE_ARRAY);
+    metadata.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+
+    tparquet::SchemaElement value;
+    value.__set_name("value");
+    value.__set_type(tparquet::Type::INT32);
+    value.__set_repetition_type(tparquet::FieldRepetitionType::OPTIONAL);
+
+    tparquet::SchemaElement typed_value;
+    typed_value.__set_name("typed_value");
+    typed_value.__set_type(tparquet::Type::INT64);
+    typed_value.__set_repetition_type(tparquet::FieldRepetitionType::OPTIONAL);
+
+    FieldDescriptor descriptor;
+    Status st = descriptor.parse_from_thrift({root, variant, metadata, value, typed_value});
+    EXPECT_TRUE(st.is<ErrorCode::INVALID_ARGUMENT>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RejectVariantSchemaWithAnnotatedMetadataChild) {
+    tparquet::SchemaElement root;
+    root.__set_name("schema");
+    root.__set_num_children(1);
+
+    tparquet::LogicalType variant_type;
+    variant_type.__set_VARIANT(tparquet::VariantType());
+
+    tparquet::SchemaElement variant;
+    variant.__set_name("v");
+    variant.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+    variant.__set_num_children(2);
+    variant.__set_logicalType(variant_type);
+
+    tparquet::SchemaElement metadata;
+    metadata.__set_name("metadata");
+    metadata.__set_type(tparquet::Type::BYTE_ARRAY);
+    metadata.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+    tparquet::LogicalType metadata_type;
+    metadata_type.__set_STRING(tparquet::StringType());
+    metadata.__set_logicalType(metadata_type);
+
+    tparquet::SchemaElement typed_value;
+    typed_value.__set_name("typed_value");
+    typed_value.__set_type(tparquet::Type::INT64);
+    typed_value.__set_repetition_type(tparquet::FieldRepetitionType::OPTIONAL);
+
+    FieldDescriptor descriptor;
+    Status st = descriptor.parse_from_thrift({root, variant, metadata, typed_value});
+    EXPECT_TRUE(st.is<ErrorCode::INVALID_ARGUMENT>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RejectVariantSchemaWithAnnotatedValueChild) {
+    tparquet::SchemaElement root;
+    root.__set_name("schema");
+    root.__set_num_children(1);
+
+    tparquet::LogicalType variant_type;
+    variant_type.__set_VARIANT(tparquet::VariantType());
+
+    tparquet::SchemaElement variant;
+    variant.__set_name("v");
+    variant.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+    variant.__set_num_children(3);
+    variant.__set_logicalType(variant_type);
+
+    tparquet::SchemaElement metadata;
+    metadata.__set_name("metadata");
+    metadata.__set_type(tparquet::Type::BYTE_ARRAY);
+    metadata.__set_repetition_type(tparquet::FieldRepetitionType::REQUIRED);
+
+    tparquet::SchemaElement value;
+    value.__set_name("value");
+    value.__set_type(tparquet::Type::BYTE_ARRAY);
+    value.__set_repetition_type(tparquet::FieldRepetitionType::OPTIONAL);
+    tparquet::LogicalType value_type;
+    value_type.__set_STRING(tparquet::StringType());
+    value.__set_logicalType(value_type);
+
+    tparquet::SchemaElement typed_value;
+    typed_value.__set_name("typed_value");
+    typed_value.__set_type(tparquet::Type::INT64);
+    typed_value.__set_repetition_type(tparquet::FieldRepetitionType::OPTIONAL);
+
+    FieldDescriptor descriptor;
+    Status st = descriptor.parse_from_thrift({root, variant, metadata, value, typed_value});
+    EXPECT_TRUE(st.is<ErrorCode::INVALID_ARGUMENT>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, OptionalTopLevelVariantUsesNullableReadColumnOnly) {
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true)};
+
+    EXPECT_FALSE(parquet_variant_reader_test::variant_struct_reader_type_is_nullable_for_test(
+            variant_field));
+    EXPECT_TRUE(parquet_variant_reader_test::variant_struct_reader_column_is_nullable_for_test(
+            variant_field));
+
+    variant_field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    EXPECT_FALSE(parquet_variant_reader_test::variant_struct_reader_column_is_nullable_for_test(
+            variant_field));
+}
+
+TEST(ParquetVariantReaderTest, DecodeSimpleObject) {
+    auto metadata = make_metadata({"a"});
+    std::vector<uint8_t> value {
+            0x02,       // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x01,       // one field
+            0x00,       // dictionary id 0
+            0x00, 0x02, // field value offsets
+            0x0c, 0x07  // int8(7)
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_EQ("{\"a\":7}", json);
+}
+
+TEST(ParquetVariantReaderTest, DecodeUnsortedMetadataMayContainDuplicateDictionaryStrings) {
+    auto metadata = make_metadata({"a", "a"});
+    std::vector<uint8_t> value {
+            0x02,       // object
+            0x01,       // one field
+            0x01,       // dictionary id 1
+            0x00, 0x02, // field value offsets
+            0x0c, 0x07  // int8(7)
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_EQ("{\"a\":7}", json);
+}
+
+TEST(ParquetVariantReaderTest, RejectDuplicateObjectKeysFromDuplicateMetadataEntries) {
+    auto metadata = make_metadata({"a", "a"});
+    std::vector<uint8_t> value {
+            0x02,                  // object
+            0x02,                  // two fields
+            0x00, 0x01,            // strictly increasing dictionary ids
+            0x00, 0x02, 0x04,      // valid physical value offsets
+            0x0c, 0x01, 0x0c, 0x02 // int8(1), int8(2)
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RejectInvalidSortedMetadataDictionaryStrings) {
+    expect_variant_corruption(make_metadata({"a", "a"}, true), {0x00});
+    expect_variant_corruption(make_metadata({"b", "a"}, true), {0x00});
+}
+
+TEST(ParquetVariantReaderTest, RejectMetadataTrailingBytes) {
+    auto metadata = make_metadata({"a"});
+    metadata.push_back(0xff);
+    expect_variant_corruption(metadata, {0x00});
+}
+
+TEST(ParquetVariantReaderTest, RejectMetadataFirstDictionaryOffsetNotZero) {
+    std::vector<uint8_t> metadata {
+            0x01,       // version 1, one-byte offsets
+            0x01,       // one dictionary entry
+            0x01, 0x02, // invalid dictionary offsets: first offset must be zero
+            'x',  'a'   // no trailing bytes; offset[1] consumes both bytes
+    };
+    expect_variant_corruption(metadata, {0x00});
+}
+
+TEST(ParquetVariantReaderTest, RejectMetadataReservedHeaderBits) {
+    std::vector<uint8_t> metadata {
+            0x21, // reserved bit 5 is set
+            0x00, // zero dictionary entries
+            0x00  // offset[0]
+    };
+    expect_variant_corruption(metadata, {0x00});
+}
+
+TEST(ParquetVariantReaderTest, RejectInvalidUtf8MetadataAndStrings) {
+    std::vector<uint8_t> invalid_metadata {
+            0x01,       // version 1, one-byte offsets
+            0x01,       // one dictionary entry
+            0x00, 0x01, // dictionary offsets
+            0xff        // invalid UTF-8 dictionary key
+    };
+    expect_variant_corruption(invalid_metadata, {0x00});
+
+    auto metadata = make_metadata({});
+    expect_variant_corruption(metadata, {0x05, 0xff});
+    expect_variant_corruption(metadata, {0x40, 0x01, 0x00, 0x00, 0x00, 0xff});
+}
+
+TEST(ParquetVariantReaderTest, DecodeObjectUsesUnsignedByteFieldOrder) {
+    const std::string e_acute("\xc3\xa9", 2);
+    auto metadata = make_metadata({std::string_view("z"), std::string_view(e_acute)});
+    std::vector<uint8_t> value {
+            0x02,                  // object
+            0x02,                  // two fields
+            0x00, 0x01,            // dictionary ids are sorted by unsigned UTF-8 bytes: z, e acute
+            0x00, 0x02, 0x04,      // valid physical value offsets
+            0x0c, 0x01, 0x0c, 0x02 // int8(1), int8(2)
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    std::string expected = R"({"z":1,")";
+    expected.append(e_acute);
+    expected.append(R"(":2})");
+    EXPECT_EQ(expected, json);
+}
+
+TEST(ParquetVariantReaderTest, RejectObjectFieldOrderUsingUnsignedBytes) {
+    const std::string e_acute("\xc3\xa9", 2);
+    auto metadata = make_metadata({std::string_view(e_acute), std::string_view("z")});
+    std::vector<uint8_t> value {
+            0x02,                  // object
+            0x02,                  // two fields
+            0x00, 0x01,            // dictionary ids are not sorted by unsigned UTF-8 bytes
+            0x00, 0x02, 0x04,      // valid physical value offsets
+            0x0c, 0x02, 0x0c, 0x01 // int8(2), int8(1)
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, DecodeDecimal128MinimumValue) {
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> value {
+            0x28,                   // decimal128 primitive
+            0x00,                   // scale 0
+            0x00, 0x00, 0x00, 0x00, //
+            0x00, 0x00, 0x00, 0x00, //
+            0x00, 0x00, 0x00, 0x00, //
+            0x00, 0x00, 0x00, 0x80  // -2^127 in little-endian two's complement
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_EQ("-170141183460469231731687303715884105728", json);
+}
+
+TEST(ParquetVariantReaderTest, RejectInvalidDecimalScale) {
+    auto metadata = make_metadata({});
+
+    expect_variant_corruption(metadata, {0x20, 0xff, 0x00, 0x00, 0x00, 0x00});
+    expect_variant_corruption(metadata, {0x20, 0x27, 0x00, 0x00, 0x00, 0x00});
+}
+
+TEST(ParquetVariantReaderTest, DecodePrimitiveCoverageExtras) {
+    auto metadata = make_metadata({});
+
+    expect_variant_json(metadata, {0x00}, "null");
+    expect_variant_json(metadata, {0x10, 0xff, 0xff}, "-1");
+    expect_variant_json(metadata, {0x20, 0x02, 0x85, 0xff, 0xff, 0xff}, "-1.23");
+    expect_variant_json(metadata, {0x24, 0x00, 0xb0, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00},
+                        "1200");
+    expect_variant_json(metadata,
+                        {0x28, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+                         0x00, 0x00, 0x00, 0x00, 0x00, 0x00},
+                        "1");
+    expect_variant_json(metadata, {0x1c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf8, 0x3f}, "1.5");
+    expect_variant_json(metadata, {0x38, 0x00, 0x00, 0xc0, 0x3f}, "1.5");
+    expect_variant_json(metadata, {0x2c, 0x2a, 0x00, 0x00, 0x00}, "42");
+    expect_variant_json(metadata, {0x30, 0x2a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00}, "42");
+    expect_variant_json(metadata, {0x40, 0x04, 0x00, 0x00, 0x00, 't', 'e', 'x', 't'}, "\"text\"");
+    expect_variant_json(metadata, {0x3c, 0x03, 0x00, 0x00, 0x00, 0xff, 0x00, 'A'},
+                        R"("\u00ff\u0000A")");
+    expect_variant_json(metadata, {0x21, '"', '\\', '\b', '\f', '\n', '\r', '\t', 0x01},
+                        R"("\"\\\b\f\n\r\t\u0001")");
+    expect_variant_json(metadata,
+                        {0x50, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a,
+                         0x0b, 0x0c, 0x0d, 0x0e, 0x0f},
+                        "\"00010203-0405-0607-0809-0a0b0c0d0e0f\"");
+}
+
+TEST(ParquetVariantReaderTest, DecodeResidualPrimitiveToVariantMapPreservesScalarTypes) {
+    auto metadata = make_metadata({});
+    auto decode_root = [&](std::initializer_list<uint8_t> value_bytes) {
+        std::vector<uint8_t> value(value_bytes);
+        VariantMap values;
+        std::deque<std::string> string_values;
+        Status st = decode_variant_to_variant_map(bytes_ref(metadata), bytes_ref(value),
+                                                  PathInData(), &values, &string_values);
+        EXPECT_TRUE(st.ok()) << st.to_string();
+        auto root = values.find(PathInData());
+        EXPECT_NE(root, values.end());
+        if (!st.ok() || root == values.end()) {
+            return FieldWithDataType {};
+        }
+        return root->second;
+    };
+
+    auto int16_value = decode_root({0x10, 0xff, 0xff});
+    EXPECT_EQ(int16_value.base_scalar_type_id, TYPE_SMALLINT);
+    EXPECT_EQ(int16_value.field.get<TYPE_SMALLINT>(), -1);
+
+    auto decimal_value = decode_root({0x20, 0x02, 0x85, 0xff, 0xff, 0xff});
+    EXPECT_EQ(decimal_value.base_scalar_type_id, TYPE_DECIMAL32);
+    EXPECT_EQ(decimal_value.precision, BeConsts::MAX_DECIMAL32_PRECISION);
+    EXPECT_EQ(decimal_value.scale, 2);
+    EXPECT_EQ(decimal_value.field.to_debug_string(decimal_value.scale), "-1.23");
+
+    auto string_value = decode_root({0x40, 0x04, 0x00, 0x00, 0x00, 't', 'e', 'x', 't'});
+    EXPECT_EQ(string_value.base_scalar_type_id, TYPE_STRING);
+    EXPECT_EQ(string_value.field.get<TYPE_STRING>(), "text");
+}
+
+TEST(ParquetVariantReaderTest, RejectInvalidVariantEncodingsCoverageExtras) {
+    expect_variant_corruption(std::vector<uint8_t> {0x02}, {0x00});
+    expect_variant_corruption(std::vector<uint8_t> {0x01, 0x01, 0x01, 0x00}, {0x00});
+
+    auto metadata = make_metadata({});
+    expect_variant_corruption(metadata, {0x0c, 0x01, 0x00});
+    expect_variant_corruption(metadata, {0x03, 0x01, 0x01, 0x02, 0x0c, 0x07});
+    expect_variant_corruption(metadata, {0x03, 0x02, 0x00, 0x02, 0x01, 0x0c, 0x01});
+    expect_variant_corruption(metadata, {0x02, 0x00, 0x01, 0x00});
+    expect_variant_corruption(metadata, {0x54});
+
+    auto object_metadata = make_metadata({"a"});
+    expect_variant_corruption(object_metadata, {0x02, 0x01, 0x00, 0x01, 0x02, 0x0c, 0x07});
+    expect_variant_corruption(metadata, {0x02, 0x01, 0x00, 0x00, 0x02, 0x0c, 0x07});
+}
+
+TEST(ParquetVariantReaderTest, DecodeResidualRootBinaryToVariantMap) {
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> value {0x3c, // binary primitive, 3 bytes
+                                0x03, 0x00, 0x00, 0x00, 0xff, 0x00, 0x41};
+
+    VariantMap values;
+    std::deque<std::string> string_values;
+    Status st = decode_variant_to_variant_map(bytes_ref(metadata), bytes_ref(value), PathInData(),
+                                              &values, &string_values);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    const auto& binary = values.at(PathInData());
+    EXPECT_EQ(binary.base_scalar_type_id, TYPE_VARBINARY);
+    EXPECT_EQ(varbinary_field_bytes(binary.field), std::string("\xff\x00\x41", 3));
+}
+
+TEST(ParquetVariantReaderTest, DecodeResidualBinaryToVariantMap) {
+    auto metadata = make_metadata({"b"});
+    std::vector<uint8_t> value {0x02,                         // object
+                                0x01,                         // one field
+                                0x00,                         // dictionary id 0: b
+                                0x00, 0x08,                   // field value offsets
+                                0x3c, 0x03, 0x00, 0x00, 0x00, // binary primitive, 3 bytes
+                                0xff, 0x00, 0x41};
+
+    VariantMap values;
+    std::deque<std::string> string_values;
+    Status st = decode_variant_to_variant_map(bytes_ref(metadata), bytes_ref(value), PathInData(),
+                                              &values, &string_values);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    const auto& binary = values.at(PathInData("b"));
+    EXPECT_EQ(binary.base_scalar_type_id, TYPE_VARBINARY);
+    EXPECT_EQ(varbinary_field_bytes(binary.field), std::string("\xff\x00\x41", 3));
+}
+
+TEST(ParquetVariantReaderTest, DecodeResidualBinaryArrayToVariantMap) {
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> value {0x03,                         // array
+                                0x02,                         // two elements
+                                0x00, 0x07, 0x08,             // element value offsets
+                                0x3c, 0x02, 0x00, 0x00, 0x00, // binary primitive, 2 bytes
+                                0xc3, 0x28, 0x00};            // variant null
+
+    VariantMap values;
+    std::deque<std::string> string_values;
+    Status st = decode_variant_to_variant_map(bytes_ref(metadata), bytes_ref(value), PathInData(),
+                                              &values, &string_values);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    const auto& binary_array = values.at(PathInData());
+    EXPECT_EQ(binary_array.base_scalar_type_id, TYPE_VARBINARY);
+    EXPECT_EQ(binary_array.num_dimensions, 1);
+    const auto& array = binary_array.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(array.size(), 2);
+    EXPECT_EQ(varbinary_field_bytes(array[0]), std::string("\xc3\x28", 2));
+    EXPECT_TRUE(array[1].is_null());
+}
+
+TEST(ParquetVariantReaderTest, DecodeResidualNonFiniteDoubleArrayToVariantMap) {
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> value {0x03,             // array
+                                0x02,             // two elements
+                                0x00, 0x09, 0x12, // element value offsets
+                                0x1c};            // double primitive
+    append_int64_le(&value, static_cast<int64_t>(0x7ff8000000000000ULL));
+    value.push_back(0x1c); // double primitive
+    append_int64_le(&value, static_cast<int64_t>(0x7ff0000000000000ULL));
+
+    VariantMap values;
+    std::deque<std::string> string_values;
+    Status st = decode_variant_to_variant_map(bytes_ref(metadata), bytes_ref(value), PathInData(),
+                                              &values, &string_values);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    const auto& double_array = values.at(PathInData());
+    EXPECT_EQ(double_array.base_scalar_type_id, TYPE_DOUBLE);
+    EXPECT_EQ(double_array.num_dimensions, 1);
+    const auto& array = double_array.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(array.size(), 2);
+    EXPECT_TRUE(std::isnan(array[0].get<TYPE_DOUBLE>()));
+    EXPECT_TRUE(std::isinf(array[1].get<TYPE_DOUBLE>()));
+}
+
+TEST(ParquetVariantReaderTest, DecodeResidualBinaryObjectArrayToVariantMap) {
+    auto metadata = make_metadata({"b"});
+    std::vector<uint8_t> value {0x03,                         // array
+                                0x01,                         // one element
+                                0x00, 0x0c,                   // element value offsets
+                                0x02,                         // object
+                                0x01,                         // one field
+                                0x00,                         // dictionary id 0: b
+                                0x00, 0x07,                   // field value offsets
+                                0x3c, 0x02, 0x00, 0x00, 0x00, // binary primitive, 2 bytes
+                                0xc3, 0x28};
+
+    VariantMap values;
+    std::deque<std::string> string_values;
+    Status st = decode_variant_to_variant_map(bytes_ref(metadata), bytes_ref(value), PathInData(),
+                                              &values, &string_values);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    const auto& object_array = values.at(PathInData());
+    EXPECT_EQ(object_array.base_scalar_type_id, TYPE_VARIANT);
+    EXPECT_EQ(object_array.num_dimensions, 1);
+    const auto& array = object_array.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(array.size(), 1);
+    ASSERT_EQ(array[0].get_type(), TYPE_VARIANT);
+    const auto& object = array[0].get<TYPE_VARIANT>();
+    const auto& binary = object.at(PathInData("b"));
+    EXPECT_EQ(binary.base_scalar_type_id, TYPE_VARBINARY);
+    EXPECT_EQ(varbinary_field_bytes(binary.field), std::string("\xc3\x28", 2));
+}
+
+TEST(ParquetVariantReaderTest, DecodeObjectOutOfOrderPhysicalValuesToVariantMap) {
+    auto metadata = make_metadata({"a", "b", "c"});
+    std::vector<uint8_t> value {
+            0x02,             // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x03,             // three fields
+            0x00, 0x01, 0x02, // dictionary ids: a, b, c
+            0x04, 0x02, 0x00, 0x06, // field offsets in key order; values are c, b, a
+            0x0c, 0x03,             // c: int8(3)
+            0x0c, 0x02,             // b: int8(2)
+            0x0c, 0x01              // a: int8(1)
+    };
+
+    VariantMap values;
+    std::deque<std::string> string_values;
+    Status st = decode_variant_to_variant_map(bytes_ref(metadata), bytes_ref(value), PathInData(),
+                                              &values, &string_values);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    Field result = Field::create_field<TYPE_VARIANT>(std::move(values));
+    EXPECT_EQ("{\"a\":1,\"b\":2,\"c\":3}", serialize_variant_field(result));
+}
+
+TEST(ParquetVariantReaderTest, DecodeResidualNullToVariantMap) {
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> value {0x00}; // variant null
+
+    VariantMap values;
+    std::deque<std::string> string_values;
+    Status st = decode_variant_to_variant_map(bytes_ref(metadata), bytes_ref(value), PathInData(),
+                                              &values, &string_values);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    auto root = values.find(PathInData());
+    ASSERT_NE(root, values.end());
+    EXPECT_TRUE(root->second.field.is_null());
+}
+
+TEST(ParquetVariantReaderTest, DecodeResidualObjectNullChildToVariantMap) {
+    auto metadata = make_metadata({"a"});
+    std::vector<uint8_t> value {
+            0x02,       // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x01,       // one field
+            0x00,       // dictionary id 0: a
+            0x00, 0x01, // field value offsets
+            0x00        // variant null
+    };
+
+    VariantMap values;
+    std::deque<std::string> string_values;
+    Status st = decode_variant_to_variant_map(bytes_ref(metadata), bytes_ref(value), PathInData(),
+                                              &values, &string_values);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    auto child = values.find(PathInData("a"));
+    ASSERT_NE(child, values.end());
+    EXPECT_TRUE(child->second.field.is_null());
+}
+
+TEST(ParquetVariantReaderTest, DecodeNonFiniteDoublePrimitive) {
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> value {0x1c}; // primitive double
+    append_int64_le(&value, static_cast<int64_t>(0x7ff8000000000000ULL));
+
+    VariantMap values;
+    std::deque<std::string> string_values;
+    Status st = decode_variant_to_variant_map(bytes_ref(metadata), bytes_ref(value), PathInData(),
+                                              &values, &string_values);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    auto root = values.find(PathInData());
+    ASSERT_NE(root, values.end());
+    ASSERT_EQ(root->second.field.get_type(), TYPE_DOUBLE);
+    EXPECT_TRUE(std::isnan(root->second.field.get<TYPE_DOUBLE>()));
+
+    std::string json;
+    st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    EXPECT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(json.empty());
+}
+
+TEST(ParquetVariantReaderTest, DecodeNanosecondTimestampAsMicros) {
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> value {0x48}; // primitive timestamptz nanos
+    append_int64_le(&value, 1234567890);
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_EQ("1234567", json);
+}
+
+TEST(ParquetVariantReaderTest, TimeV2ConverterRequiresVariantContext) {
+    FieldSchema time_field;
+    time_field.name = "timestamp";
+    time_field.lower_case_name = time_field.name;
+    time_field.physical_type = tparquet::Type::INT64;
+    time_field.parquet_schema.__set_name(time_field.name);
+    time_field.parquet_schema.__set_type(tparquet::Type::INT64);
+    time_field.parquet_schema.__set_converted_type(tparquet::ConvertedType::TIME_MICROS);
+    time_field.data_type = make_nullable(std::make_shared<DataTypeTimeV2>(6));
+
+    auto converter = PhysicalToLogicalConverter::get_converter(
+            &time_field, time_field.data_type, time_field.data_type, nullptr, false);
+    EXPECT_FALSE(converter->support());
+
+    time_field.is_in_variant = true;
+    converter = PhysicalToLogicalConverter::get_converter(&time_field, time_field.data_type,
+                                                          time_field.data_type, nullptr, false);
+    EXPECT_TRUE(converter->support());
+
+    auto physical_column = ColumnInt64::create();
+    physical_column->insert_value(3723004005);
+    ColumnPtr physical = std::move(physical_column);
+    ColumnPtr logical = time_field.data_type->create_column();
+    Status st = converter->convert(physical, time_field.data_type, time_field.data_type, logical,
+                                   false);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    const auto& nullable = assert_cast<const ColumnNullable&>(*logical);
+    const auto& time_column = assert_cast<const ColumnTimeV2&>(nullable.get_nested_column());
+    ASSERT_EQ(1, time_column.size());
+    EXPECT_DOUBLE_EQ(3723004005, time_column.get_data()[0]);
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyKeepsStructuralNameUserKeys) {
+    auto int_type = make_nullable(std::make_shared<DataTypeInt32>());
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = "typed_value";
+    typed_value_field.children = {make_int32_field_schema("typed_value"),
+                                  make_int32_field_schema("value")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {int_type, int_type}, Strings {"typed_value", "value"}));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    Struct row;
+    row.push_back(Field::create_field<TYPE_INT>(42));
+    row.push_back(Field::create_field<TYPE_INT>(7));
+    typed_value_column->insert(Field::create_field<TYPE_STRUCT>(row));
+
+    auto batch = ColumnVariant::create(0, false, 2);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 1, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field result;
+    batch_variant->get(1, result);
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_EQ(values.find(PathInData()), values.end());
+    EXPECT_EQ(values.at(PathInData("typed_value")).field.get<TYPE_INT>(), 42);
+    EXPECT_EQ(values.at(PathInData("value")).field.get<TYPE_INT>(), 7);
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyKeepsNestedStructuralNameUserKeys) {
+    auto int_type = make_nullable(std::make_shared<DataTypeInt32>());
+    FieldSchema nested_field;
+    nested_field.name = "nested";
+    nested_field.lower_case_name = nested_field.name;
+    nested_field.children = {make_int32_field_schema("typed_value"),
+                             make_int32_field_schema("value")};
+    nested_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {int_type, int_type}, Strings {"typed_value", "value"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {nested_field};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {nested_field.data_type}, Strings {"nested"}));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    Struct nested;
+    nested.push_back(Field::create_field<TYPE_INT>(42));
+    nested.push_back(Field::create_field<TYPE_INT>(7));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRUCT>(nested));
+    typed_value_column->insert(Field::create_field<TYPE_STRUCT>(row));
+
+    auto batch = ColumnVariant::create(0, false, 2);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 1, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field result;
+    batch_variant->get(1, result);
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_EQ(values.find(PathInData("nested")), values.end());
+    EXPECT_EQ(values.at(PathInData("nested.typed_value")).field.get<TYPE_INT>(), 42);
+    EXPECT_EQ(values.at(PathInData("nested.value")).field.get<TYPE_INT>(), 7);
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyConvertsTemporalLeavesToVariantMicros) {
+    FieldSchema date_field = make_datev2_field_schema("d");
+    FieldSchema time_field = make_timev2_field_schema("t");
+    FieldSchema timestamp_field = make_datetimev2_field_schema("ts");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {date_field, time_field, timestamp_field};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {date_field.data_type, time_field.data_type, timestamp_field.data_type},
+            Strings {"d", "t", "ts"}));
+
+    DateV2Value<DateV2ValueType> date;
+    std::string date_text = "1970-01-03";
+    std::string date_format = "%Y-%m-%d";
+    ASSERT_TRUE(date.from_date_format_str(date_format.data(), date_format.size(), date_text.data(),
+                                          date_text.size()));
+
+    DateV2Value<DateTimeV2ValueType> timestamp;
+    std::string timestamp_text = "1970-01-01 00:00:01.000002";
+    std::string timestamp_format = "%Y-%m-%d %H:%i:%s.%f";
+    ASSERT_TRUE(timestamp.from_date_format_str(timestamp_format.data(), timestamp_format.size(),
+                                               timestamp_text.data(), timestamp_text.size()));
+    int64_t timestamp_seconds = 0;
+    timestamp.unix_timestamp(&timestamp_seconds, cctz::utc_time_zone());
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_DATEV2>(date));
+    row.push_back(Field::create_field<TYPE_TIMEV2>(3723004005.0));
+    row.push_back(Field::create_field<TYPE_DATETIMEV2>(timestamp));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    typed_value_column->insert(Field::create_field<TYPE_STRUCT>(row));
+
+    auto batch = ColumnVariant::create(0, false, 2);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 1, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field result;
+    batch_variant->get(1, result);
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_EQ(values.at(PathInData("d")).field.get<TYPE_BIGINT>(), 2);
+    EXPECT_EQ(values.at(PathInData("t")).field.get<TYPE_BIGINT>(), 3723004005);
+    EXPECT_EQ(values.at(PathInData("ts")).field.get<TYPE_BIGINT>(),
+              timestamp_seconds * 1000000 + timestamp.microsecond());
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyPreservesTemporalLeafNull) {
+    FieldSchema date_field = make_datev2_field_schema("d");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {date_field};
+    typed_value_field.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {date_field.data_type}, Strings {"d"}));
+
+    Struct row;
+    row.push_back(Field());
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    typed_value_column->insert(Field::create_field<TYPE_STRUCT>(row));
+
+    auto batch = ColumnVariant::create(0, false, 2);
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 1, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    const auto* date_subcolumn = batch_variant->get_subcolumn(PathInData("d"));
+    ASSERT_NE(date_subcolumn, nullptr);
+    EXPECT_TRUE(date_subcolumn->is_null_at(1));
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyConvertsTemporalArrayLeavesToVariantMicros) {
+    FieldSchema element = make_timev2_field_schema("element");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {element};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    typed_value_column->insert(Field());
+
+    Array array;
+    array.push_back(Field::create_field<TYPE_TIMEV2>(3723004005.0));
+    array.push_back(Field());
+    typed_value_column->insert(Field::create_field<TYPE_ARRAY>(array));
+
+    auto batch = ColumnVariant::create(0, false, 3);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 2, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field null_result;
+    batch_variant->get(1, null_result);
+    EXPECT_TRUE(null_result.is_null());
+    const auto* root_subcolumn = batch_variant->get_subcolumn(PathInData());
+    ASSERT_NE(root_subcolumn, nullptr);
+    EXPECT_TRUE(root_subcolumn->is_null_at(1));
+
+    Field present_result;
+    batch_variant->get(2, present_result);
+    const auto& values = present_result.get<TYPE_VARIANT>();
+    auto array_value = values.find(PathInData());
+    ASSERT_NE(array_value, values.end());
+    EXPECT_EQ(array_value->second.base_scalar_type_id, TYPE_BIGINT);
+    EXPECT_EQ(array_value->second.num_dimensions, 1);
+    const auto& result_array = array_value->second.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(result_array.size(), 2);
+    EXPECT_EQ(result_array[0].get<TYPE_BIGINT>(), 3723004005);
+    EXPECT_TRUE(result_array[1].is_null());
+}
+
+TEST(ParquetVariantReaderTest, TypedOnlyKeepsUserMetadataAndValueFields) {
+    FieldSchema object_field;
+    object_field.name = "obj";
+    object_field.lower_case_name = object_field.name;
+    object_field.children = {make_binary_field_schema("metadata", true),
+                             make_binary_field_schema("value", true)};
+    object_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {object_field.children[0].data_type, object_field.children[1].data_type},
+            Strings {"metadata", "value"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {object_field};
+    typed_value_field.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {object_field.data_type}, Strings {"obj"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    auto metadata = make_metadata({});
+    Struct object;
+    object.push_back(Field::create_field<TYPE_STRING>(String("user-metadata")));
+    object.push_back(Field::create_field<TYPE_STRING>(String("\0", 1)));
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_STRUCT>(object));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_EQ(values.at(PathInData("obj.metadata")).field.get<TYPE_STRING>(), "user-metadata");
+    EXPECT_EQ(values.at(PathInData("obj.value")).field.get<TYPE_STRING>(), std::string("\0", 1));
+}
+
+TEST(ParquetVariantReaderTest, TypedOnlyKeepsUserValueOnlyField) {
+    FieldSchema object_field;
+    object_field.name = "obj";
+    object_field.lower_case_name = object_field.name;
+    object_field.children = {make_string_field_schema("value", true)};
+    object_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {object_field.children[0].data_type}, Strings {"value"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {object_field};
+    typed_value_field.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {object_field.data_type}, Strings {"obj"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    auto metadata = make_metadata({});
+    Struct object;
+    object.push_back(Field::create_field<TYPE_STRING>(String("\0", 1)));
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_STRUCT>(object));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_EQ(values.at(PathInData("obj.value")).field.get<TYPE_STRING>(), std::string("\0", 1));
+}
+
+TEST(ParquetVariantReaderTest, TypedOnlyKeepsAnnotatedValueAndTypedValueUserFields) {
+    auto int_type = make_nullable(std::make_shared<DataTypeInt32>());
+    FieldSchema nested_typed_value;
+    nested_typed_value.name = "typed_value";
+    nested_typed_value.lower_case_name = nested_typed_value.name;
+    nested_typed_value.children = {make_int32_field_schema("x")};
+    nested_typed_value.data_type =
+            make_nullable(std::make_shared<DataTypeStruct>(DataTypes {int_type}, Strings {"x"}));
+
+    FieldSchema object_field;
+    object_field.name = "obj";
+    object_field.lower_case_name = object_field.name;
+    object_field.children = {make_string_field_schema("value", true), nested_typed_value};
+    object_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {object_field.children[0].data_type, nested_typed_value.data_type},
+            Strings {"value", "typed_value"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {object_field};
+    typed_value_field.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {object_field.data_type}, Strings {"obj"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    auto metadata = make_metadata({});
+    Struct nested;
+    nested.push_back(Field::create_field<TYPE_INT>(42));
+    Struct object;
+    object.push_back(Field::create_field<TYPE_STRING>(String("abc")));
+    object.push_back(Field::create_field<TYPE_STRUCT>(nested));
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_STRUCT>(object));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_EQ(values.at(PathInData("obj.value")).field.get<TYPE_STRING>(), "abc");
+    EXPECT_EQ(values.at(PathInData("obj.typed_value.x")).field.get<TYPE_INT>(), 42);
+}
+
+TEST(ParquetVariantReaderTest, TypedOnlyKeepsUserMetadataAndTypedValueFields) {
+    auto int_type = make_nullable(std::make_shared<DataTypeInt32>());
+    FieldSchema nested_typed_value;
+    nested_typed_value.name = "typed_value";
+    nested_typed_value.lower_case_name = nested_typed_value.name;
+    nested_typed_value.children = {make_int32_field_schema("x")};
+    nested_typed_value.data_type =
+            make_nullable(std::make_shared<DataTypeStruct>(DataTypes {int_type}, Strings {"x"}));
+
+    FieldSchema object_field;
+    object_field.name = "obj";
+    object_field.lower_case_name = object_field.name;
+    object_field.children = {make_string_field_schema("metadata", true), nested_typed_value};
+    object_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {object_field.children[0].data_type, nested_typed_value.data_type},
+            Strings {"metadata", "typed_value"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {object_field};
+    typed_value_field.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {object_field.data_type}, Strings {"obj"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    auto metadata = make_metadata({});
+    Struct nested;
+    nested.push_back(Field::create_field<TYPE_INT>(42));
+    Struct object;
+    object.push_back(Field::create_field<TYPE_STRING>(String("user-metadata")));
+    object.push_back(Field::create_field<TYPE_STRUCT>(nested));
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_STRUCT>(object));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_EQ(values.at(PathInData("obj.metadata")).field.get<TYPE_STRING>(), "user-metadata");
+    EXPECT_EQ(values.at(PathInData("obj.typed_value.x")).field.get<TYPE_INT>(), 42);
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyRequiresSelectedTypedLeaf) {
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {make_int64_field_schema("metric")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {typed_value_field.children[0].data_type}, Strings {"metric"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    uint64_t next_id = 1;
+    variant_field.assign_ids(next_id);
+    const auto& typed_value = variant_field.children[1];
+    const auto& metric = typed_value.children[0];
+
+    std::set<uint64_t> missing_path_ids {variant_field.get_column_id(),
+                                         variant_field.children[0].get_column_id()};
+    EXPECT_FALSE(parquet_variant_reader_test::can_use_direct_typed_only_value_for_test(
+            variant_field, missing_path_ids));
+
+    std::set<uint64_t> typed_root_only_ids {typed_value.get_column_id()};
+    EXPECT_FALSE(parquet_variant_reader_test::can_use_direct_typed_only_value_for_test(
+            variant_field, typed_root_only_ids));
+
+    std::set<uint64_t> metric_ids {metric.get_column_id()};
+    EXPECT_TRUE(parquet_variant_reader_test::can_use_direct_typed_only_value_for_test(variant_field,
+                                                                                      metric_ids));
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyAllowsUnselectedTopLevelResidualValue) {
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {make_int64_field_schema("metric")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {typed_value_field.children[0].data_type}, Strings {"metric"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true), typed_value_field};
+
+    uint64_t next_id = 1;
+    variant_field.assign_ids(next_id);
+    const auto& value = variant_field.children[1];
+    const auto& metric = variant_field.children[2].children[0];
+
+    std::set<uint64_t> metric_ids {metric.get_column_id()};
+    EXPECT_TRUE(parquet_variant_reader_test::can_use_direct_typed_only_value_for_test(variant_field,
+                                                                                      metric_ids));
+
+    std::set<uint64_t> metric_with_residual_ids {value.get_column_id(), metric.get_column_id()};
+    EXPECT_FALSE(parquet_variant_reader_test::can_use_direct_typed_only_value_for_test(
+            variant_field, metric_with_residual_ids));
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyReaderCountersUseNativePath) {
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {make_int64_field_schema("metric")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {typed_value_field.children[0].data_type}, Strings {"metric"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    uint64_t next_id = 1;
+    variant_field.assign_ids(next_id);
+    const auto& metric = variant_field.children[1].children[0];
+
+    auto variant_struct_type = std::make_shared<DataTypeStruct>(
+            DataTypes {variant_field.children[0].data_type, typed_value_field.data_type},
+            Strings {"metadata", "typed_value"});
+    MutableColumnPtr struct_column = variant_struct_type->create_column();
+    for (int64_t metric_value : {7, 11}) {
+        Struct typed_value;
+        typed_value.push_back(Field::create_field<TYPE_BIGINT>(metric_value));
+        Struct row;
+        row.push_back(Field::create_field<TYPE_STRING>(String("")));
+        row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+        struct_column->insert(Field::create_field<TYPE_STRUCT>(row));
+    }
+
+    ColumnPtr output = ColumnVariant::create(0, false);
+    int64_t direct_rows = 0;
+    int64_t rowwise_rows = 0;
+    Status st = parquet_variant_reader_test::read_variant_rows_for_test(
+            variant_field, *struct_column, {metric.get_column_id()}, output, &direct_rows,
+            &rowwise_rows);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_EQ(2, direct_rows);
+    EXPECT_EQ(0, rowwise_rows);
+    ASSERT_EQ(2, output->size());
+
+    Field first;
+    output->get(0, first);
+    const auto& first_values = first.get<TYPE_VARIANT>();
+    EXPECT_EQ(first_values.at(PathInData("metric")).field.get<TYPE_BIGINT>(), 7);
+}
+
+TEST(ParquetVariantReaderTest, VariantReaderCountersUseRowWiseWhenResidualValueSelected) {
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {make_int64_field_schema("metric")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {typed_value_field.children[0].data_type}, Strings {"metric"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true), typed_value_field};
+
+    uint64_t next_id = 1;
+    variant_field.assign_ids(next_id);
+    const auto& value = variant_field.children[1];
+    const auto& metric = variant_field.children[2].children[0];
+
+    auto variant_struct_type = std::make_shared<DataTypeStruct>(
+            DataTypes {variant_field.children[0].data_type, value.data_type,
+                       typed_value_field.data_type},
+            Strings {"metadata", "value", "typed_value"});
+    MutableColumnPtr struct_column = variant_struct_type->create_column();
+    for (int64_t metric_value : {7, 11}) {
+        Struct typed_value;
+        typed_value.push_back(Field::create_field<TYPE_BIGINT>(metric_value));
+        Struct row;
+        row.push_back(Field::create_field<TYPE_STRING>(String("")));
+        row.push_back(Field());
+        row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+        struct_column->insert(Field::create_field<TYPE_STRUCT>(row));
+    }
+
+    ColumnPtr output = ColumnVariant::create(0, false);
+    int64_t direct_rows = 0;
+    int64_t rowwise_rows = 0;
+    Status st = parquet_variant_reader_test::read_variant_rows_for_test(
+            variant_field, *struct_column, {value.get_column_id(), metric.get_column_id()}, output,
+            &direct_rows, &rowwise_rows);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_EQ(0, direct_rows);
+    EXPECT_EQ(2, rowwise_rows);
+    ASSERT_EQ(2, output->size());
+
+    Field second;
+    output->get(1, second);
+    const auto& second_values = second.get<TYPE_VARIANT>();
+    EXPECT_EQ(second_values.at(PathInData("metric")).field.get<TYPE_BIGINT>(), 11);
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyPreservesNullableTypedStructNull) {
+    FieldSchema metric_field = make_required_int64_field_schema("metric");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {metric_field};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {metric_field.data_type}, Strings {"metric"}));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    typed_value_column->insert(Field());
+
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_BIGINT>(7));
+    typed_value_column->insert(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    auto batch = ColumnVariant::create(0, false, 3);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 2, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field null_result;
+    batch_variant->get(1, null_result);
+    EXPECT_TRUE(null_result.is_null());
+
+    Field present_result;
+    batch_variant->get(2, present_result);
+    const auto& values = present_result.get<TYPE_VARIANT>();
+    EXPECT_EQ(values.at(PathInData("metric")).field.get<TYPE_BIGINT>(), 7);
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyPreservesEmptyTypedObject) {
+    FieldSchema metric_field = make_int64_field_schema("metric");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {metric_field};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {metric_field.data_type}, Strings {"metric"}));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    typed_value_column->insert(Field());
+
+    Struct empty_object;
+    empty_object.push_back(Field());
+    typed_value_column->insert(Field::create_field<TYPE_STRUCT>(empty_object));
+
+    auto batch = ColumnVariant::create(0, false, 3);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 2, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field null_result;
+    batch_variant->get(1, null_result);
+    EXPECT_TRUE(null_result.is_null());
+
+    Field empty_result;
+    batch_variant->get(2, empty_result);
+    EXPECT_FALSE(empty_result.is_null());
+
+    std::string json;
+    DataTypeSerDe::FormatOptions options;
+    batch_variant->serialize_one_row_to_string(2, &json, options);
+    EXPECT_EQ(json, "{}");
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyPreservesNestedEmptyTypedObject) {
+    FieldSchema metric_field = make_int64_field_schema("metric");
+
+    FieldSchema nested_field;
+    nested_field.name = "nested";
+    nested_field.lower_case_name = nested_field.name;
+    nested_field.children = {metric_field};
+    nested_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {metric_field.data_type}, Strings {"metric"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {nested_field};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {nested_field.data_type}, Strings {"nested"}));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    typed_value_column->insert(Field());
+
+    Struct nested_object;
+    nested_object.push_back(Field());
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_STRUCT>(nested_object));
+    typed_value_column->insert(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    auto batch = ColumnVariant::create(0, false, 3);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 2, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    std::string json;
+    DataTypeSerDe::FormatOptions options;
+    batch_variant->serialize_one_row_to_string(2, &json, options);
+    EXPECT_EQ(json, "{\"nested\":{}}");
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyPreservesVarbinaryLeaf) {
+    FieldSchema payload_field = make_varbinary_field_schema("payload");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {payload_field};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {payload_field.data_type}, Strings {"payload"}));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    typed_value_column->insert(Field());
+
+    Struct typed_value;
+    typed_value.push_back(make_varbinary_field({0xff, 0x00, 0x41}));
+    typed_value_column->insert(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    auto batch = ColumnVariant::create(0, false, 3);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 2, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field null_result;
+    batch_variant->get(1, null_result);
+    EXPECT_TRUE(null_result.is_null());
+
+    Field present_result;
+    batch_variant->get(2, present_result);
+    const auto& values = present_result.get<TYPE_VARIANT>();
+    const auto& payload = values.at(PathInData("payload"));
+    EXPECT_EQ(payload.base_scalar_type_id, TYPE_VARBINARY);
+    EXPECT_EQ(varbinary_field_bytes(payload.field), std::string("\xff\x00\x41", 3));
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyPreservesFloatingPointLeaves) {
+    FieldSchema float_field = make_float_field_schema("f");
+    FieldSchema double_field = make_double_field_schema("d");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {float_field, double_field};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {float_field.data_type, double_field.data_type}, Strings {"f", "d"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+    uint64_t next_id = 1;
+    variant_field.assign_ids(next_id);
+    std::set<uint64_t> typed_leaf_ids {variant_field.children[1].children[0].get_column_id(),
+                                       variant_field.children[1].children[1].get_column_id()};
+    EXPECT_TRUE(parquet_variant_reader_test::can_use_direct_typed_only_value_for_test(
+            variant_field, typed_leaf_ids));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    typed_value_column->insert(Field());
+
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_FLOAT>(1.25F));
+    typed_value.push_back(Field::create_field<TYPE_DOUBLE>(2.5));
+    typed_value_column->insert(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    auto batch = ColumnVariant::create(0, false, 3);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 2, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field null_result;
+    batch_variant->get(1, null_result);
+    EXPECT_TRUE(null_result.is_null());
+
+    Field present_result;
+    batch_variant->get(2, present_result);
+    const auto& values = present_result.get<TYPE_VARIANT>();
+    const auto& float_value = values.at(PathInData("f"));
+    EXPECT_EQ(float_value.base_scalar_type_id, TYPE_FLOAT);
+    EXPECT_FLOAT_EQ(float_value.field.get<TYPE_FLOAT>(), 1.25F);
+    const auto& double_value = values.at(PathInData("d"));
+    EXPECT_EQ(double_value.base_scalar_type_id, TYPE_DOUBLE);
+    EXPECT_DOUBLE_EQ(double_value.field.get<TYPE_DOUBLE>(), 2.5);
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyPreservesNonFiniteFloatingPointLeaf) {
+    FieldSchema nan_field = make_double_field_schema("nan");
+    FieldSchema inf_field = make_double_field_schema("inf");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {nan_field, inf_field};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {nan_field.data_type, inf_field.data_type}, Strings {"nan", "inf"}));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    Struct typed_value;
+    typed_value.push_back(
+            Field::create_field<TYPE_DOUBLE>(std::numeric_limits<double>::quiet_NaN()));
+    typed_value.push_back(
+            Field::create_field<TYPE_DOUBLE>(std::numeric_limits<double>::infinity()));
+    typed_value_column->insert(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    auto batch = ColumnVariant::create(0, false, 2);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 1, batch.get());
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field present_result;
+    batch->get(1, present_result);
+    const auto& values = present_result.get<TYPE_VARIANT>();
+    EXPECT_TRUE(std::isnan(values.at(PathInData("nan")).field.get<TYPE_DOUBLE>()));
+    EXPECT_TRUE(std::isinf(values.at(PathInData("inf")).field.get<TYPE_DOUBLE>()));
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyPreservesFloatingPointArrayLeaf) {
+    FieldSchema element = make_double_field_schema("element");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {element};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    typed_value_column->insert(Field());
+
+    Array array;
+    array.push_back(Field::create_field<TYPE_DOUBLE>(1.5));
+    array.push_back(Field());
+    array.push_back(Field::create_field<TYPE_DOUBLE>(2.25));
+    typed_value_column->insert(Field::create_field<TYPE_ARRAY>(array));
+
+    auto batch = ColumnVariant::create(0, false, 3);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 2, batch.get());
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    const auto* root_subcolumn = batch->get_subcolumn(PathInData());
+    ASSERT_NE(root_subcolumn, nullptr);
+    EXPECT_TRUE(root_subcolumn->is_null_at(1));
+
+    Field present_result;
+    batch->get(2, present_result);
+    const auto& values = present_result.get<TYPE_VARIANT>();
+    auto array_value = values.find(PathInData());
+    ASSERT_NE(array_value, values.end());
+    EXPECT_EQ(array_value->second.base_scalar_type_id, TYPE_DOUBLE);
+    EXPECT_EQ(array_value->second.num_dimensions, 1);
+    const auto& result_array = array_value->second.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(result_array.size(), 3);
+    EXPECT_DOUBLE_EQ(result_array[0].get<TYPE_DOUBLE>(), 1.5);
+    EXPECT_TRUE(result_array[1].is_null());
+    EXPECT_DOUBLE_EQ(result_array[2].get<TYPE_DOUBLE>(), 2.25);
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyPreservesNonFiniteFloatingPointArrayLeaf) {
+    FieldSchema element = make_double_field_schema("element");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {element};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    Array array;
+    array.push_back(Field::create_field<TYPE_DOUBLE>(std::numeric_limits<double>::quiet_NaN()));
+    typed_value_column->insert(Field::create_field<TYPE_ARRAY>(array));
+
+    auto batch = ColumnVariant::create(0, false, 2);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 1, batch.get());
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field present_result;
+    batch->get(1, present_result);
+    const auto& values = present_result.get<TYPE_VARIANT>();
+    auto array_value = values.find(PathInData());
+    ASSERT_NE(array_value, values.end());
+    const auto& result_array = array_value->second.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(result_array.size(), 1);
+    EXPECT_TRUE(std::isnan(result_array[0].get<TYPE_DOUBLE>()));
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyPreservesUuidSemantics) {
+    FieldSchema uuid_field = make_uuid_field_schema("u");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {uuid_field};
+    typed_value_field.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {uuid_field.data_type}, Strings {"u"}));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    typed_value_column->insert(Field());
+
+    std::string uuid_bytes = test_uuid_bytes();
+    Struct typed_value;
+    typed_value.push_back(make_varbinary_field(uuid_bytes));
+    typed_value_column->insert(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    auto batch = ColumnVariant::create(0, false, 3);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 2, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field present_result;
+    batch_variant->get(2, present_result);
+    const auto& values = present_result.get<TYPE_VARIANT>();
+    const auto& uuid = values.at(PathInData("u"));
+    EXPECT_EQ(uuid.base_scalar_type_id, TYPE_STRING);
+    EXPECT_EQ(uuid.field.get<TYPE_STRING>(), "00010203-0405-0607-0809-0a0b0c0d0e0f");
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesTypedUuidSemantics) {
+    FieldSchema uuid_field = make_uuid_field_schema("u");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {uuid_field};
+    typed_value_field.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {uuid_field.data_type}, Strings {"u"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    auto metadata = make_metadata({});
+    std::string uuid_bytes = test_uuid_bytes();
+    Struct typed_value;
+    typed_value.push_back(make_varbinary_field(uuid_bytes));
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    const auto& uuid = values.at(PathInData("u"));
+    EXPECT_EQ(uuid.base_scalar_type_id, TYPE_STRING);
+    EXPECT_EQ(uuid.field.get<TYPE_STRING>(), "00010203-0405-0607-0809-0a0b0c0d0e0f");
+}
+
+TEST(ParquetVariantReaderTest, DirectTypedOnlyPreservesTypedUuidArraySemantics) {
+    FieldSchema element = make_uuid_field_schema("element");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {element};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    MutableColumnPtr typed_value_column = typed_value_field.data_type->create_column();
+    typed_value_column->insert(Field());
+
+    std::string uuid_bytes = test_uuid_bytes();
+    Array array;
+    array.push_back(make_varbinary_field(uuid_bytes));
+    array.push_back(Field());
+    typed_value_column->insert(Field::create_field<TYPE_ARRAY>(array));
+
+    auto batch = ColumnVariant::create(0, false, 3);
+    ASSERT_TRUE(
+            parquet_variant_reader_test::can_direct_read_typed_value_for_test(typed_value_field));
+    auto* batch_variant = batch.get();
+    Status st = parquet_variant_reader_test::append_direct_typed_column_to_batch_for_test(
+            typed_value_field, *typed_value_column, 0, 2, batch_variant);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+
+    Field null_result;
+    batch_variant->get(1, null_result);
+    EXPECT_TRUE(null_result.is_null());
+
+    Field present_result;
+    batch_variant->get(2, present_result);
+    const auto& values = present_result.get<TYPE_VARIANT>();
+    auto array_value = values.find(PathInData());
+    ASSERT_NE(array_value, values.end());
+    EXPECT_EQ(array_value->second.base_scalar_type_id, TYPE_STRING);
+    EXPECT_EQ(array_value->second.num_dimensions, 1);
+    const auto& result_array = array_value->second.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(result_array.size(), 2);
+    EXPECT_EQ(result_array[0].get<TYPE_STRING>(), "00010203-0405-0607-0809-0a0b0c0d0e0f");
+    EXPECT_TRUE(result_array[1].is_null());
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesTypedUuidArraySemantics) {
+    FieldSchema element = make_uuid_field_schema("element");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {element};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    auto metadata = make_metadata({});
+    std::string uuid_bytes = test_uuid_bytes();
+    Array array;
+    array.push_back(make_varbinary_field(uuid_bytes));
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_ARRAY>(array));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    auto array_value = values.find(PathInData());
+    ASSERT_NE(array_value, values.end());
+    EXPECT_EQ(array_value->second.base_scalar_type_id, TYPE_STRING);
+    EXPECT_EQ(array_value->second.num_dimensions, 1);
+    const auto& result_array = array_value->second.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(result_array.size(), 1);
+    EXPECT_EQ(result_array[0].get<TYPE_STRING>(), "00010203-0405-0607-0809-0a0b0c0d0e0f");
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesExplicitVariantNullShreddedArrayElement) {
+    FieldSchema element;
+    element.name = "element";
+    element.lower_case_name = element.name;
+    element.children = {make_binary_field_schema("value", true),
+                        make_int64_field_schema("typed_value")};
+    element.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {element.children[0].data_type, element.children[1].data_type},
+            Strings {"value", "typed_value"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {element};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true), typed_value_field};
+
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> variant_null {0x00};
+    Struct element_row;
+    element_row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(variant_null.data()), variant_null.size())));
+    element_row.push_back(Field());
+    Array array;
+    array.push_back(Field::create_field<TYPE_STRUCT>(element_row));
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field());
+    row.push_back(Field::create_field<TYPE_ARRAY>(array));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    auto array_value = values.find(PathInData());
+    ASSERT_NE(array_value, values.end());
+    EXPECT_EQ("[null]", serialize_variant_field(result));
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesNullComplexTypedArrayElement) {
+    FieldSchema payload_field = make_int64_field_schema("payload");
+
+    FieldSchema element;
+    element.name = "element";
+    element.lower_case_name = element.name;
+    element.children = {payload_field};
+    element.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {payload_field.data_type}, Strings {"payload"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {element};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    auto metadata = make_metadata({});
+    Struct element_value;
+    element_value.push_back(Field::create_field<TYPE_BIGINT>(7));
+    Array array;
+    array.push_back(Field());
+    array.push_back(Field::create_field<TYPE_STRUCT>(element_value));
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_ARRAY>(array));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    const auto& object_array = values.at(PathInData());
+    EXPECT_EQ(object_array.base_scalar_type_id, TYPE_VARIANT);
+    EXPECT_EQ(object_array.num_dimensions, 1);
+    const auto& result_array = object_array.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(result_array.size(), 2);
+    EXPECT_TRUE(result_array[0].is_null());
+    ASSERT_EQ(result_array[1].get_type(), TYPE_VARIANT);
+    const auto& object = result_array[1].get<TYPE_VARIANT>();
+    const auto& payload = object.at(PathInData("payload"));
+    EXPECT_EQ(payload.base_scalar_type_id, TYPE_BIGINT);
+    EXPECT_EQ(payload.field.get<TYPE_BIGINT>(), 7);
+}
+
+TEST(ParquetVariantReaderTest, RowWiseRejectsMissingShreddedArrayElement) {
+    FieldSchema element;
+    element.name = "element";
+    element.lower_case_name = element.name;
+    element.children = {make_binary_field_schema("value", true),
+                        make_int64_field_schema("typed_value")};
+    element.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {element.children[0].data_type, element.children[1].data_type},
+            Strings {"value", "typed_value"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {element};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true), typed_value_field};
+
+    auto metadata = make_metadata({});
+    Struct element_row;
+    element_row.push_back(Field());
+    element_row.push_back(Field());
+    Array array;
+    array.push_back(Field::create_field<TYPE_STRUCT>(element_row));
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field());
+    row.push_back(Field::create_field<TYPE_ARRAY>(array));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesTypedDecimalArrayMetadata) {
+    FieldSchema element;
+    element.name = "element";
+    element.lower_case_name = element.name;
+    element.physical_type = tparquet::Type::INT64;
+    element.data_type = make_nullable(std::make_shared<DataTypeDecimal64>(18, 2));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {element};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true), typed_value_field};
+
+    auto metadata = make_metadata({});
+    Array array;
+    array.push_back(Field::create_field<TYPE_DECIMAL64>(Decimal64(12345)));
+    array.push_back(Field::create_field<TYPE_DECIMAL64>(Decimal64(67890)));
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field());
+    row.push_back(Field::create_field<TYPE_ARRAY>(array));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    auto array_value = values.find(PathInData());
+    ASSERT_NE(array_value, values.end());
+    EXPECT_EQ(array_value->second.base_scalar_type_id, TYPE_DECIMAL64);
+    EXPECT_EQ(array_value->second.num_dimensions, 1);
+    EXPECT_EQ(array_value->second.precision, 18);
+    EXPECT_EQ(array_value->second.scale, 2);
+    const auto& result_array = array_value->second.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(result_array.size(), 2);
+    EXPECT_EQ(result_array[0].get<TYPE_DECIMAL64>(), Decimal64(12345));
+    EXPECT_EQ(result_array[1].get<TYPE_DECIMAL64>(), Decimal64(67890));
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesTypedVarbinaryObjectField) {
+    FieldSchema payload_field = make_varbinary_field_schema("payload");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {payload_field};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {payload_field.data_type}, Strings {"payload"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    auto metadata = make_metadata({});
+    Struct typed_value;
+    typed_value.push_back(make_varbinary_field({0xc3, 0x28}));
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    const auto& payload = values.at(PathInData("payload"));
+    EXPECT_EQ(payload.base_scalar_type_id, TYPE_VARBINARY);
+    EXPECT_EQ(varbinary_field_bytes(payload.field), std::string("\xc3\x28", 2));
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesTypedVarbinaryObjectArrayField) {
+    FieldSchema payload_field = make_varbinary_field_schema("payload");
+
+    FieldSchema element;
+    element.name = "element";
+    element.lower_case_name = element.name;
+    element.children = {payload_field};
+    element.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {payload_field.data_type}, Strings {"payload"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {element};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    auto metadata = make_metadata({});
+    Struct element_value;
+    element_value.push_back(make_varbinary_field({0xc3, 0x28}));
+    Array array;
+    array.push_back(Field::create_field<TYPE_STRUCT>(element_value));
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_ARRAY>(array));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    const auto& object_array = values.at(PathInData());
+    EXPECT_EQ(object_array.base_scalar_type_id, TYPE_VARIANT);
+    EXPECT_EQ(object_array.num_dimensions, 1);
+    const auto& result_array = object_array.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(result_array.size(), 1);
+    ASSERT_EQ(result_array[0].get_type(), TYPE_VARIANT);
+    const auto& object = result_array[0].get<TYPE_VARIANT>();
+    const auto& payload = object.at(PathInData("payload"));
+    EXPECT_EQ(payload.base_scalar_type_id, TYPE_VARBINARY);
+    EXPECT_EQ(varbinary_field_bytes(payload.field), std::string("\xc3\x28", 2));
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesResidualBinaryObjectField) {
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true)};
+
+    auto metadata = make_metadata({"b"});
+    std::vector<uint8_t> residual_value {0x02,                         // object
+                                         0x01,                         // one field
+                                         0x00,                         // dictionary id 0: b
+                                         0x00, 0x07,                   // field value offsets
+                                         0x3c, 0x02, 0x00, 0x00, 0x00, // binary primitive, 2 bytes
+                                         0xc3, 0x28};
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    const auto& payload = values.at(PathInData("b"));
+    EXPECT_EQ(payload.base_scalar_type_id, TYPE_VARBINARY);
+    EXPECT_EQ(varbinary_field_bytes(payload.field), std::string("\xc3\x28", 2));
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesResidualBinaryArray) {
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true)};
+
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> residual_value {0x03,                         // array
+                                         0x02,                         // two elements
+                                         0x00, 0x07, 0x08,             // element value offsets
+                                         0x3c, 0x02, 0x00, 0x00, 0x00, // binary primitive, 2 bytes
+                                         0xc3, 0x28, 0x00};            // variant null
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    const auto& binary_array = values.at(PathInData());
+    EXPECT_EQ(binary_array.base_scalar_type_id, TYPE_VARBINARY);
+    EXPECT_EQ(binary_array.num_dimensions, 1);
+    const auto& array = binary_array.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(array.size(), 2);
+    EXPECT_EQ(varbinary_field_bytes(array[0]), std::string("\xc3\x28", 2));
+    EXPECT_TRUE(array[1].is_null());
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesResidualBinaryObjectArray) {
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true)};
+
+    auto metadata = make_metadata({"b"});
+    std::vector<uint8_t> residual_value {0x03,                         // array
+                                         0x01,                         // one element
+                                         0x00, 0x0c,                   // element value offsets
+                                         0x02,                         // object
+                                         0x01,                         // one field
+                                         0x00,                         // dictionary id 0: b
+                                         0x00, 0x07,                   // field value offsets
+                                         0x3c, 0x02, 0x00, 0x00, 0x00, // binary primitive, 2 bytes
+                                         0xc3, 0x28};
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    const auto& object_array = values.at(PathInData());
+    EXPECT_EQ(object_array.base_scalar_type_id, TYPE_VARIANT);
+    EXPECT_EQ(object_array.num_dimensions, 1);
+    const auto& array = object_array.field.get<TYPE_ARRAY>();
+    ASSERT_EQ(array.size(), 1);
+    ASSERT_EQ(array[0].get_type(), TYPE_VARIANT);
+    const auto& object = array[0].get<TYPE_VARIANT>();
+    const auto& binary = object.at(PathInData("b"));
+    EXPECT_EQ(binary.base_scalar_type_id, TYPE_VARBINARY);
+    EXPECT_EQ(varbinary_field_bytes(binary.field), std::string("\xc3\x28", 2));
+}
+
+TEST(ParquetVariantReaderTest, RequiredMissingPayloadIsVariantNull) {
+    FieldSchema variant_field = make_required_shredded_variant_schema();
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(String("")));
+    row.push_back(Field());
+    row.push_back(Field());
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+    EXPECT_TRUE(result.is_null());
+}
+
+TEST(ParquetVariantReaderTest, NullableTopLevelGroupIsSqlNull) {
+    FieldSchema variant_field = make_required_shredded_variant_schema();
+
+    Field result;
+    bool sql_null = false;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(variant_field, Field(), true,
+                                                                       &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_TRUE(sql_null);
+}
+
+TEST(ParquetVariantReaderTest, NestedWrapperMergesResidualValueAndTypedValue) {
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {make_int32_field_schema("metric")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {make_nullable(std::make_shared<DataTypeInt32>())}, Strings {"metric"}));
+
+    FieldSchema wrapper_field;
+    wrapper_field.name = "element";
+    wrapper_field.lower_case_name = wrapper_field.name;
+    wrapper_field.children = {make_binary_field_schema("value", true), typed_value_field};
+    wrapper_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {wrapper_field.children[0].data_type, typed_value_field.data_type},
+            Strings {"value", "typed_value"}));
+
+    auto metadata = make_metadata({"extra"});
+    std::vector<uint8_t> residual_value {
+            0x02,       // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x01,       // one field
+            0x00,       // dictionary id 0: extra
+            0x00, 0x02, // field value offsets
+            0x0c, 0x07  // int8(7)
+    };
+
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_INT>(1));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    std::string json;
+    bool present = false;
+    Status st = parquet_variant_reader_test::variant_to_json_for_test(
+            wrapper_field, Field::create_field<TYPE_STRUCT>(row),
+            std::string(reinterpret_cast<const char*>(metadata.data()), metadata.size()), &json,
+            &present);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_TRUE(present);
+    EXPECT_NE(json.find("\"extra\":7"), std::string::npos);
+    EXPECT_NE(json.find("\"metric\":1"), std::string::npos);
+}
+
+TEST(ParquetVariantReaderTest, NestedWrapperMergesEmptyResidualObjectAndTypedValue) {
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {make_int32_field_schema("metric")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {typed_value_field.children[0].data_type}, Strings {"metric"}));
+
+    FieldSchema wrapper_field;
+    wrapper_field.name = "element";
+    wrapper_field.lower_case_name = wrapper_field.name;
+    wrapper_field.children = {make_binary_field_schema("value", true), typed_value_field};
+    wrapper_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {wrapper_field.children[0].data_type, typed_value_field.data_type},
+            Strings {"value", "typed_value"}));
+
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> residual_value {
+            0x02, // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x00, // zero fields
+            0x00  // total field value size
+    };
+
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_INT>(1));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    std::string json;
+    bool present = false;
+    Status st = parquet_variant_reader_test::variant_to_json_for_test(
+            wrapper_field, Field::create_field<TYPE_STRUCT>(row),
+            std::string(reinterpret_cast<const char*>(metadata.data()), metadata.size()), &json,
+            &present);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_TRUE(present);
+    EXPECT_EQ("{\"metric\":1}", json);
+}
+
+TEST(ParquetVariantReaderTest, NestedWrapperRejectsResidualTypedKeyCollision) {
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {make_int32_field_schema("metric")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {make_nullable(std::make_shared<DataTypeInt32>())}, Strings {"metric"}));
+
+    FieldSchema wrapper_field;
+    wrapper_field.name = "element";
+    wrapper_field.lower_case_name = wrapper_field.name;
+    wrapper_field.children = {make_binary_field_schema("value", true), typed_value_field};
+    wrapper_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {wrapper_field.children[0].data_type, typed_value_field.data_type},
+            Strings {"value", "typed_value"}));
+
+    auto metadata = make_metadata({"metric"});
+    std::vector<uint8_t> residual_value {
+            0x02,       // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x01,       // one field
+            0x00,       // dictionary id 0: metric
+            0x00, 0x02, // field value offsets
+            0x0c, 0x02  // int8(2)
+    };
+
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_INT>(1));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    std::string json;
+    bool present = false;
+    Status st = parquet_variant_reader_test::variant_to_json_for_test(
+            wrapper_field, Field::create_field<TYPE_STRUCT>(row),
+            std::string(reinterpret_cast<const char*>(metadata.data()), metadata.size()), &json,
+            &present);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RowWiseRejectsResidualTypedKeyCollision) {
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {make_int32_field_schema("metric")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {make_nullable(std::make_shared<DataTypeInt32>())}, Strings {"metric"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true), typed_value_field};
+
+    auto metadata = make_metadata({"metric"});
+    std::vector<uint8_t> residual_value {
+            0x02,       // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x01,       // one field
+            0x00,       // dictionary id 0: metric
+            0x00, 0x02, // field value offsets
+            0x0c, 0x02  // int8(2)
+    };
+
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_INT>(1));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = false;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesEmptyTypedObject) {
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {make_int32_field_schema("metric")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {typed_value_field.children[0].data_type}, Strings {"metric"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    auto metadata = make_metadata({});
+    Struct typed_value;
+    typed_value.push_back(Field());
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+    EXPECT_EQ("{}", serialize_variant_field(result));
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_NE(values.find(PathInData()), values.end());
+}
+
+TEST(ParquetVariantReaderTest, RowWiseReadsRootTypedMapObject) {
+    FieldSchema key_field = make_binary_field_schema("key", false);
+    FieldSchema value_field = make_int32_field_schema("value");
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {key_field, value_field};
+    typed_value_field.data_type = make_nullable(
+            std::make_shared<DataTypeMap>(key_field.data_type, value_field.data_type));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false), typed_value_field};
+
+    Array keys;
+    keys.push_back(Field::create_field<TYPE_STRING>(String("a")));
+    keys.push_back(Field::create_field<TYPE_STRING>(String("b")));
+    Array values;
+    values.push_back(Field::create_field<TYPE_INT>(7));
+    values.push_back(Field::create_field<TYPE_INT>(8));
+    Map typed_map {Field::create_field<TYPE_ARRAY>(keys), Field::create_field<TYPE_ARRAY>(values)};
+
+    auto metadata = make_metadata({});
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_MAP>(typed_map));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+    EXPECT_EQ("{\"a\":7,\"b\":8}", serialize_variant_field(result));
+
+    const auto& variant_values = result.get<TYPE_VARIANT>();
+    EXPECT_EQ(variant_values.at(PathInData("a")).field.get<TYPE_INT>(), 7);
+    EXPECT_EQ(variant_values.at(PathInData("b")).field.get<TYPE_INT>(), 8);
+}
+
+TEST(ParquetVariantReaderTest, RowWisePreservesEmptyResidualObject) {
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true)};
+
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> residual_value {
+            0x02, // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x00, // zero fields
+            0x00  // total field value size
+    };
+
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+    EXPECT_EQ("{}", serialize_variant_field(result));
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_NE(values.find(PathInData()), values.end());
+}
+
+TEST(ParquetVariantReaderTest, RowWiseMergesEmptyResidualObjectAndTypedValue) {
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {make_int32_field_schema("metric")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {typed_value_field.children[0].data_type}, Strings {"metric"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true), typed_value_field};
+
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> residual_value {
+            0x02, // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x00, // zero fields
+            0x00  // total field value size
+    };
+
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_INT>(1));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+    EXPECT_EQ("{\"metric\":1}", serialize_variant_field(result));
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_EQ(values.find(PathInData()), values.end());
+    EXPECT_NE(values.find(PathInData("metric")), values.end());
+}
+
+TEST(ParquetVariantReaderTest, RowWiseMergesResidualObjectAndEmptyTypedValue) {
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {make_int32_field_schema("metric")};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {typed_value_field.children[0].data_type}, Strings {"metric"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true), typed_value_field};
+
+    auto metadata = make_metadata({"x"});
+    std::vector<uint8_t> residual_value {
+            0x02,       // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x01,       // one field
+            0x00,       // dictionary id 0: x
+            0x00, 0x02, // field value offsets
+            0x0c, 0x07  // int8(7)
+    };
+
+    Struct typed_value;
+    typed_value.push_back(Field());
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+    EXPECT_EQ("{\"x\":7}", serialize_variant_field(result));
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_EQ(values.find(PathInData()), values.end());
+    EXPECT_NE(values.find(PathInData("x")), values.end());
+}
+
+TEST(ParquetVariantReaderTest, RowWiseMergesMatchingEmptyResidualAndTypedObjects) {
+    FieldSchema metric_field;
+    metric_field.name = "metric";
+    metric_field.lower_case_name = metric_field.name;
+    metric_field.children = {make_int32_field_schema("x")};
+    metric_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {metric_field.children[0].data_type}, Strings {"x"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {metric_field};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {metric_field.data_type}, Strings {"metric"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true), typed_value_field};
+
+    auto metadata = make_metadata({"metric"});
+    std::vector<uint8_t> residual_value {
+            0x02,            // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x01,            // one field
+            0x00,            // dictionary id 0: metric
+            0x00, 0x03,      // field value offsets
+            0x02, 0x00, 0x00 // metric: empty object
+    };
+
+    Struct metric;
+    metric.push_back(Field());
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_STRUCT>(metric));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+    EXPECT_EQ("{\"metric\":{}}", serialize_variant_field(result));
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    EXPECT_NE(values.find(PathInData("metric")), values.end());
+}
+
+TEST(ParquetVariantReaderTest, RowWiseReadsValueOnlyNestedResidualField) {
+    FieldSchema metric_field;
+    metric_field.name = "metric";
+    metric_field.lower_case_name = metric_field.name;
+    metric_field.children = {make_binary_field_schema("value", true)};
+    metric_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {metric_field.children[0].data_type}, Strings {"value"}));
+
+    FieldSchema typed_value_field;
+    typed_value_field.name = "typed_value";
+    typed_value_field.lower_case_name = typed_value_field.name;
+    typed_value_field.children = {metric_field};
+    typed_value_field.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {metric_field.data_type}, Strings {"metric"}));
+
+    FieldSchema variant_field;
+    variant_field.name = "v";
+    variant_field.lower_case_name = variant_field.name;
+    variant_field.data_type = make_nullable(std::make_shared<DataTypeVariant>(0, false));
+    variant_field.children = {make_binary_field_schema("metadata", false),
+                              make_binary_field_schema("value", true), typed_value_field};
+
+    auto metadata = make_metadata({"x"});
+    std::vector<uint8_t> residual_value {
+            0x02,       // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x01,       // one field
+            0x00,       // dictionary id 0: x
+            0x00, 0x02, // field value offsets
+            0x0c, 0x07  // int8(7)
+    };
+
+    Struct metric;
+    metric.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(residual_value.data()), residual_value.size())));
+    Struct typed_value;
+    typed_value.push_back(Field::create_field<TYPE_STRUCT>(metric));
+    Struct row;
+    row.push_back(Field::create_field<TYPE_STRING>(
+            String(reinterpret_cast<const char*>(metadata.data()), metadata.size())));
+    row.push_back(Field());
+    row.push_back(Field::create_field<TYPE_STRUCT>(typed_value));
+
+    Field result;
+    bool sql_null = true;
+    Status st = parquet_variant_reader_test::read_variant_row_for_test(
+            variant_field, Field::create_field<TYPE_STRUCT>(row), true, &result, &sql_null);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_FALSE(sql_null);
+
+    const auto& values = result.get<TYPE_VARIANT>();
+    auto metric_x = values.find(PathInData("metric.x"));
+    ASSERT_NE(metric_x, values.end());
+    EXPECT_EQ(metric_x->second.field.get<TYPE_TINYINT>(), 7);
+    EXPECT_EQ(values.find(PathInData("metric.value")), values.end());
+}
+
+TEST(ParquetVariantReaderTest, DecodeObjectWithOutOfOrderPhysicalValues) {
+    auto metadata = make_metadata({"a", "b", "c"});
+    std::vector<uint8_t> value {
+            0x02,             // object, 1-byte offsets, 1-byte field ids, 1-byte element count
+            0x03,             // three fields
+            0x00, 0x01, 0x02, // dictionary ids: a, b, c
+            0x04, 0x02, 0x00, 0x06, // field offsets in key order; values are c, b, a
+            0x0c, 0x03,             // c: int8(3)
+            0x0c, 0x02,             // b: int8(2)
+            0x0c, 0x01              // a: int8(1)
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_EQ("{\"a\":1,\"b\":2,\"c\":3}", json);
+}
+
+TEST(ParquetVariantReaderTest, RejectObjectChildTrailingBytes) {
+    auto metadata = make_metadata({"a"});
+    std::vector<uint8_t> value {
+            0x02,            // object
+            0x01,            // one field
+            0x00,            // dictionary id 0
+            0x00, 0x03,      // child is declared as 3 bytes
+            0x0c, 0x07, 0x00 // int8(7) plus one trailing byte inside the child range
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RejectObjectDuplicatePhysicalOffsets) {
+    auto metadata = make_metadata({"a", "b"});
+    std::vector<uint8_t> value {
+            0x02,             // object
+            0x02,             // two fields
+            0x00, 0x01,       // dictionary ids
+            0x00, 0x00, 0x02, // both fields point at the same physical value
+            0x0c, 0x07        // int8(7)
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RejectObjectDuplicateFieldIds) {
+    auto metadata = make_metadata({"a"});
+    std::vector<uint8_t> value {
+            0x02,                  // object
+            0x02,                  // two fields
+            0x00, 0x00,            // duplicate dictionary id 0
+            0x00, 0x02, 0x04,      // valid physical value offsets
+            0x0c, 0x01, 0x0c, 0x02 // int8(1), int8(2)
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, DecodeObjectWithLexicographicFieldOrderAndNonMonotonicIds) {
+    auto metadata = make_metadata({"b", "a"});
+    std::vector<uint8_t> value {
+            0x02,                  // object
+            0x02,                  // two fields
+            0x01, 0x00,            // dictionary ids are sorted by field name: a, b
+            0x00, 0x02, 0x04,      // valid physical value offsets
+            0x0c, 0x01, 0x0c, 0x02 // int8(1), int8(2)
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    ASSERT_TRUE(st.ok()) << st.to_string();
+    EXPECT_EQ("{\"a\":1,\"b\":2}", json);
+}
+
+TEST(ParquetVariantReaderTest, RejectObjectOutOfOrderFieldNames) {
+    auto metadata = make_metadata({"b", "a"});
+    std::vector<uint8_t> value {
+            0x02,                  // object
+            0x02,                  // two fields
+            0x00, 0x01,            // dictionary ids are not sorted by field name
+            0x00, 0x02, 0x04,      // valid physical value offsets
+            0x0c, 0x02, 0x0c, 0x01 // int8(2), int8(1)
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RejectArrayChildTrailingBytes) {
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> value {
+            0x03,            // array, 1-byte offsets, 1-byte element count
+            0x01,            // one element
+            0x00, 0x03,      // element is declared as 3 bytes
+            0x0c, 0x07, 0x00 // int8(7) plus one trailing byte inside the element range
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+TEST(ParquetVariantReaderTest, RejectOversizedPrimitiveLength) {
+    auto metadata = make_metadata({});
+    std::vector<uint8_t> value {
+            0x40,                  // primitive string
+            0xff, 0xff, 0xff, 0xff // length exceeds the remaining buffer
+    };
+
+    std::string json;
+    Status st = decode_variant_to_json(bytes_ref(metadata), bytes_ref(value), &json);
+    EXPECT_TRUE(st.is<ErrorCode::CORRUPTION>()) << st.to_string();
+}
+
+} // namespace doris::parquet
diff --git a/be/test/format/table/hive/hive_reader_create_column_ids_test.cpp b/be/test/format/table/hive/hive_reader_create_column_ids_test.cpp
index 7a884359027d73..27b16c9041b34f 100644
--- a/be/test/format/table/hive/hive_reader_create_column_ids_test.cpp
+++ b/be/test/format/table/hive/hive_reader_create_column_ids_test.cpp
@@ -722,7 +722,8 @@ class HiveReaderCreateColumnIdsTest : public ::testing::Test {
                           const std::vector<ColumnAccessPathConfig>& access_configs,
                           const std::set<uint64_t>& expected_column_ids,
                           const std::set<uint64_t>& expected_filter_column_ids,
-                          bool use_top_level_method = false, bool should_skip_assertion = false) {
+                          bool use_top_level_method = false, bool should_skip_assertion = false,
+                          const std::vector<int32_t>& top_level_file_column_idxs = {}) {
         std::string test_file =
                 "./be/test/exec/test_data/nested_user_profiles_parquet/"
                 "part-00000-64a7a390-1a03-4efc-ab51-557e9369a1f9-c000.snappy.parquet";
@@ -775,8 +776,13 @@ class HiveReaderCreateColumnIdsTest : public ::testing::Test {
         // Execute test based on method choice
         ColumnIdResult actual_result;
         if (use_top_level_method) {
+            std::vector<int32_t> file_column_idxs = top_level_file_column_idxs;
+            if (file_column_idxs.empty()) {
+                file_column_idxs.assign(table_column_positions.begin(),
+                                        table_column_positions.end());
+            }
             actual_result = HiveParquetReader::_create_column_ids_by_top_level_col_index(
-                    field_desc, tuple_descriptor);
+                    field_desc, tuple_descriptor, table_column_names, file_column_idxs);
         } else {
             actual_result = HiveParquetReader::_create_column_ids(field_desc, tuple_descriptor);
         }
@@ -931,6 +937,15 @@ TEST_F(HiveReaderCreateColumnIdsTest, test_create_column_ids_2) {
                  expected_filter_column_ids, true);
 }
 
+TEST_F(HiveReaderCreateColumnIdsTest, test_parquet_top_level_index_uses_scan_column_mapping) {
+    std::vector<std::string> table_column_names = {"friends"};
+    std::set<uint64_t> expected_column_ids = {26, 27, 28, 29, 30, 31, 32};
+    std::set<uint64_t> expected_filter_column_ids = {};
+
+    run_parquet_test(table_column_names, {}, expected_column_ids, expected_filter_column_ids, true,
+                     false, {5});
+}
+
 TEST_F(HiveReaderCreateColumnIdsTest, test_create_column_ids_3) {
     // ORC column IDs are assigned in a tree-like incremental manner: the root node is 0, and child nodes increase sequentially.
     // Currently, Parquet uses a similar design.
@@ -1171,4 +1186,4 @@ TEST_F(HiveReaderCreateColumnIdsTest, test_create_column_ids_6) {
     }
 }
 
-} // namespace doris
\ No newline at end of file
+} // namespace doris
diff --git a/be/test/format/table/iceberg/iceberg_reader_create_column_ids_test.cpp b/be/test/format/table/iceberg/iceberg_reader_create_column_ids_test.cpp
index e32153d1ef7f74..cb0fb7264354e7 100644
--- a/be/test/format/table/iceberg/iceberg_reader_create_column_ids_test.cpp
+++ b/be/test/format/table/iceberg/iceberg_reader_create_column_ids_test.cpp
@@ -175,7 +175,8 @@ class IcebergReaderCreateColumnIdsTest : public ::testing::Test {
                 {"id", 1},         {"name", 2},
                 {"profile", 3},    {"tags", 4},
                 {"friends", 5},    {"recent_activity", 6},
-                {"attributes", 7}, {"complex_attributes", 8}};
+                {"attributes", 7}, {"complex_attributes", 8},
+                {"v", 100}};
 
         auto it = column_to_field_id.find(column_name);
         if (it != column_to_field_id.end()) {
@@ -185,6 +186,7 @@ class IcebergReaderCreateColumnIdsTest : public ::testing::Test {
     }
 
     // Helper function to create tuple descriptor
+    // NOLINTNEXTLINE(readability-function-size): test descriptor setup mirrors thrift fixtures.
     const TupleDescriptor* create_tuple_descriptor(
             DescriptorTbl** desc_tbl, ObjectPool& obj_pool, TDescriptorTable& t_desc_table,
             TTableDescriptor& t_table_desc, const std::vector<std::string>& column_names,
@@ -573,6 +575,16 @@ class IcebergReaderCreateColumnIdsTest : public ::testing::Test {
                     hobby_level_node.__set_scalar_type(hobby_level_scalar);
                     type.types.push_back(hobby_level_node);
                     tslot_desc.__set_slotType(type);
+                } else if (types[i] == TPrimitiveType::VARIANT) {
+                    TTypeNode node;
+                    node.__set_type(TTypeNodeType::SCALAR);
+                    TScalarType scalar_type;
+                    scalar_type.__set_type(TPrimitiveType::VARIANT);
+                    scalar_type.__set_variant_max_subcolumns_count(2048);
+                    scalar_type.__set_variant_enable_doc_mode(false);
+                    node.__set_scalar_type(scalar_type);
+                    type.types.push_back(node);
+                    tslot_desc.__set_slotType(type);
                 } else {
                     // 普通类型
                     TTypeNode node;
@@ -621,6 +633,68 @@ class IcebergReaderCreateColumnIdsTest : public ::testing::Test {
         return (*desc_tbl)->get_tuple_descriptor(0);
     }
 
+    static tparquet::SchemaElement make_root_schema(int num_children) {
+        tparquet::SchemaElement schema;
+        schema.__set_name("schema");
+        schema.__set_num_children(num_children);
+        return schema;
+    }
+
+    static tparquet::SchemaElement make_group_schema(
+            std::string name, int num_children, tparquet::FieldRepetitionType::type repetition_type,
+            int field_id = -1, bool is_variant = false) {
+        tparquet::SchemaElement schema;
+        schema.__set_name(name);
+        schema.__set_num_children(num_children);
+        schema.__set_repetition_type(repetition_type);
+        if (field_id >= 0) {
+            schema.__set_field_id(field_id);
+        }
+        if (is_variant) {
+            tparquet::LogicalType logical_type;
+            logical_type.__set_VARIANT(tparquet::VariantType());
+            schema.__set_logicalType(logical_type);
+        }
+        return schema;
+    }
+
+    static tparquet::SchemaElement make_primitive_schema(
+            std::string name, tparquet::Type::type type,
+            tparquet::FieldRepetitionType::type repetition_type, int field_id = -1) {
+        tparquet::SchemaElement schema;
+        schema.__set_name(name);
+        schema.__set_type(type);
+        schema.__set_repetition_type(repetition_type);
+        if (field_id >= 0) {
+            schema.__set_field_id(field_id);
+        }
+        return schema;
+    }
+
+    FieldDescriptor make_iceberg_variant_field_id_descriptor() {
+        std::vector<tparquet::SchemaElement> schemas;
+        schemas.push_back(make_root_schema(1));
+        schemas.push_back(
+                make_group_schema("v", 3, tparquet::FieldRepetitionType::OPTIONAL, 100, true));
+        schemas.push_back(make_primitive_schema("metadata", tparquet::Type::BYTE_ARRAY,
+                                                tparquet::FieldRepetitionType::REQUIRED, 101));
+        schemas.push_back(make_primitive_schema("value", tparquet::Type::BYTE_ARRAY,
+                                                tparquet::FieldRepetitionType::OPTIONAL, 102));
+        schemas.push_back(
+                make_group_schema("typed_value", 1, tparquet::FieldRepetitionType::OPTIONAL, 103));
+        schemas.push_back(
+                make_group_schema("metric", 1, tparquet::FieldRepetitionType::REQUIRED, 104));
+        schemas.push_back(
+                make_group_schema("typed_value", 1, tparquet::FieldRepetitionType::OPTIONAL, 105));
+        schemas.push_back(make_group_schema("x", 1, tparquet::FieldRepetitionType::REQUIRED, 106));
+        schemas.push_back(make_primitive_schema("typed_value", tparquet::Type::INT64,
+                                                tparquet::FieldRepetitionType::OPTIONAL, 107));
+
+        FieldDescriptor field_desc;
+        EXPECT_TRUE(field_desc.parse_from_thrift(schemas).ok());
+        return field_desc;
+    }
+
     // Helper function: set column access paths on a slot descriptor
     void set_column_access_paths(TSlotDescriptor& tslot_desc,
                                  const ColumnAccessPathConfig& config) {
@@ -1166,4 +1240,81 @@ TEST_F(IcebergReaderCreateColumnIdsTest, test_create_column_ids_6) {
     }
 }
 
-} // namespace doris
\ No newline at end of file
+TEST_F(IcebergReaderCreateColumnIdsTest, test_variant_field_id_pruning_uses_typed_value_columns) {
+    auto field_desc = make_iceberg_variant_field_id_descriptor();
+
+    ColumnAccessPathConfig access_config;
+    access_config.column_name = "v";
+    access_config.all_column_paths = {{"100", "metric", "x"}};
+    access_config.predicate_paths = {{"100", "metric", "x"}};
+
+    DescriptorTbl* desc_tbl;
+    ObjectPool obj_pool;
+    TDescriptorTable t_desc_table;
+    TTableDescriptor t_table_desc;
+    const TupleDescriptor* tuple_descriptor =
+            create_tuple_descriptor(&desc_tbl, obj_pool, t_desc_table, t_table_desc, {"v"}, {0},
+                                    {TPrimitiveType::VARIANT}, {access_config});
+
+    auto actual_result = IcebergParquetReader::_create_column_ids(&field_desc, tuple_descriptor);
+
+    const std::set<uint64_t> expected_typed_value_column_ids = {1, 4, 5, 6, 7, 8};
+    EXPECT_EQ(actual_result.column_ids, expected_typed_value_column_ids);
+    EXPECT_EQ(actual_result.filter_column_ids, expected_typed_value_column_ids);
+    EXPECT_FALSE(actual_result.column_ids.contains(2)); // top-level metadata
+    EXPECT_FALSE(actual_result.column_ids.contains(3)); // top-level residual value
+}
+
+TEST_F(IcebergReaderCreateColumnIdsTest, test_parquet_column_id_creation_does_not_mutate_schema) {
+    auto field_desc = make_iceberg_variant_field_id_descriptor();
+    ASSERT_EQ(field_desc.get_column(0)->get_column_id(), UNASSIGNED_COLUMN_ID);
+
+    ColumnAccessPathConfig access_config;
+    access_config.column_name = "v";
+    access_config.all_column_paths = {{"100", "metric", "x"}};
+    access_config.predicate_paths = {{"100", "metric", "x"}};
+
+    DescriptorTbl* desc_tbl;
+    ObjectPool obj_pool;
+    TDescriptorTable t_desc_table;
+    TTableDescriptor t_table_desc;
+    const TupleDescriptor* tuple_descriptor =
+            create_tuple_descriptor(&desc_tbl, obj_pool, t_desc_table, t_table_desc, {"v"}, {0},
+                                    {TPrimitiveType::VARIANT}, {access_config});
+
+    auto actual_result = IcebergParquetReader::_create_column_ids(&field_desc, tuple_descriptor);
+
+    EXPECT_FALSE(actual_result.column_ids.empty());
+    EXPECT_EQ(field_desc.get_column(0)->get_column_id(), UNASSIGNED_COLUMN_ID);
+    EXPECT_EQ(field_desc.get_column(0)->get_max_column_id(), 0);
+}
+
+TEST_F(IcebergReaderCreateColumnIdsTest,
+       test_variant_field_id_pruning_treats_numeric_keys_as_variant_names) {
+    auto field_desc = make_iceberg_variant_field_id_descriptor();
+
+    ColumnAccessPathConfig access_config;
+    access_config.column_name = "v";
+    access_config.all_column_paths = {{"100", "104", "106"}};
+    access_config.predicate_paths = {{"100", "104", "106"}};
+
+    DescriptorTbl* desc_tbl;
+    ObjectPool obj_pool;
+    TDescriptorTable t_desc_table;
+    TTableDescriptor t_table_desc;
+    const TupleDescriptor* tuple_descriptor =
+            create_tuple_descriptor(&desc_tbl, obj_pool, t_desc_table, t_table_desc, {"v"}, {0},
+                                    {TPrimitiveType::VARIANT}, {access_config});
+
+    auto actual_result = IcebergParquetReader::_create_column_ids(&field_desc, tuple_descriptor);
+
+    const std::set<uint64_t> expected_residual_value_column_ids = {1, 2, 3};
+    EXPECT_EQ(actual_result.column_ids, expected_residual_value_column_ids);
+    EXPECT_EQ(actual_result.filter_column_ids, expected_residual_value_column_ids);
+    EXPECT_FALSE(actual_result.column_ids.contains(4)); // typed_value
+    EXPECT_FALSE(actual_result.column_ids.contains(5)); // typed_value.metric
+    EXPECT_FALSE(actual_result.column_ids.contains(6)); // typed_value.metric.typed_value
+    EXPECT_FALSE(actual_result.column_ids.contains(7)); // typed_value.metric.typed_value.x
+}
+
+} // namespace doris
diff --git a/be/test/format/table/nested_column_access_helper_test.cpp b/be/test/format/table/nested_column_access_helper_test.cpp
new file mode 100644
index 00000000000000..7967cd08f33c4d
--- /dev/null
+++ b/be/test/format/table/nested_column_access_helper_test.cpp
@@ -0,0 +1,1113 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "format/table/nested_column_access_helper.h"
+
+#include <gtest/gtest.h>
+
+#include <set>
+#include <string>
+#include <utility>
+#include <vector>
+
+#include "common/exception.h"
+#include "core/data_type/data_type_array.h"
+#include "core/data_type/data_type_map.h"
+#include "core/data_type/data_type_nullable.h"
+#include "core/data_type/data_type_number.h"
+#include "core/data_type/data_type_string.h"
+#include "core/data_type/data_type_struct.h"
+#include "core/data_type/data_type_variant.h"
+#include "format/parquet/parquet_nested_column_utils.h"
+#include "format/parquet/schema_desc.h"
+#include "format/table/hive/hive_parquet_nested_column_utils.h"
+#include "format/table/iceberg/iceberg_parquet_nested_column_utils.h"
+
+namespace doris {
+namespace {
+
+FieldSchema make_variant_field_for_access_path_test() {
+    FieldSchema field;
+    field.name = "v";
+    field.lower_case_name = "v";
+    field.column_id = 10;
+    field.max_column_id = 16;
+    return field;
+}
+
+FieldSchema make_binary_field_schema(std::string name, bool nullable) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::BYTE_ARRAY;
+    field.data_type = std::make_shared<DataTypeString>();
+    if (nullable) {
+        field.data_type = make_nullable(field.data_type);
+    }
+    return field;
+}
+
+FieldSchema make_string_field_schema(std::string name, bool nullable) {
+    FieldSchema field = make_binary_field_schema(std::move(name), nullable);
+    tparquet::LogicalType logical_type;
+    logical_type.__set_STRING(tparquet::StringType());
+    field.parquet_schema.__set_logicalType(logical_type);
+    return field;
+}
+
+FieldSchema make_int32_field_schema(std::string name) {
+    FieldSchema field;
+    field.name = std::move(name);
+    field.lower_case_name = field.name;
+    field.physical_type = tparquet::Type::INT32;
+    field.data_type = make_nullable(std::make_shared<DataTypeInt32>());
+    return field;
+}
+
+FieldSchema make_variant_field_with_nested_structural_name_keys() {
+    FieldSchema nested;
+    nested.name = "nested";
+    nested.lower_case_name = nested.name;
+    nested.children = {make_int32_field_schema("typed_value"), make_int32_field_schema("value")};
+    nested.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {nested.children[0].data_type, nested.children[1].data_type},
+            Strings {"typed_value", "value"}));
+
+    FieldSchema typed_value;
+    typed_value.name = "typed_value";
+    typed_value.lower_case_name = typed_value.name;
+    typed_value.children = {nested};
+    typed_value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {nested.data_type}, Strings {"nested"}));
+
+    FieldSchema field;
+    field.name = "v";
+    field.lower_case_name = field.name;
+    field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    field.children = {make_binary_field_schema("metadata", false), typed_value};
+    uint64_t next_id = 10;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_variant_field_with_annotated_value_user_field() {
+    FieldSchema object;
+    object.name = "obj";
+    object.lower_case_name = object.name;
+    object.children = {make_string_field_schema("value", true)};
+    object.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {object.children[0].data_type}, Strings {"value"}));
+
+    FieldSchema typed_value;
+    typed_value.name = "typed_value";
+    typed_value.lower_case_name = typed_value.name;
+    typed_value.children = {object};
+    typed_value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {object.data_type}, Strings {"obj"}));
+
+    FieldSchema field;
+    field.name = "v";
+    field.lower_case_name = field.name;
+    field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    field.children = {make_binary_field_schema("metadata", false), typed_value};
+    uint64_t next_id = 10;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_variant_field_with_typed_only_nested_shredded_object() {
+    FieldSchema nested_x = make_int32_field_schema("x");
+
+    FieldSchema nested_typed_value;
+    nested_typed_value.name = "typed_value";
+    nested_typed_value.lower_case_name = nested_typed_value.name;
+    nested_typed_value.children = {nested_x};
+    nested_typed_value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {nested_x.data_type}, Strings {"x"}));
+
+    FieldSchema nested;
+    nested.name = "nested";
+    nested.lower_case_name = nested.name;
+    nested.children = {nested_typed_value};
+    nested.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {nested_typed_value.data_type}, Strings {"typed_value"}));
+
+    FieldSchema typed_value;
+    typed_value.name = "typed_value";
+    typed_value.lower_case_name = typed_value.name;
+    typed_value.children = {nested};
+    typed_value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {nested.data_type}, Strings {"nested"}));
+
+    FieldSchema field;
+    field.name = "v";
+    field.lower_case_name = field.name;
+    field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    field.children = {make_binary_field_schema("metadata", false), typed_value};
+    uint64_t next_id = 10;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_variant_field_with_typed_only_array_field() {
+    FieldSchema element_n = make_int32_field_schema("n");
+
+    FieldSchema element;
+    element.name = "element";
+    element.lower_case_name = element.name;
+    element.children = {element_n};
+    element.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {element_n.data_type}, Strings {"n"}));
+
+    FieldSchema items_typed_value;
+    items_typed_value.name = "typed_value";
+    items_typed_value.lower_case_name = items_typed_value.name;
+    items_typed_value.children = {element};
+    items_typed_value.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    FieldSchema items;
+    items.name = "items";
+    items.lower_case_name = items.name;
+    items.children = {items_typed_value};
+    items.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {items_typed_value.data_type}, Strings {"typed_value"}));
+
+    FieldSchema typed_value;
+    typed_value.name = "typed_value";
+    typed_value.lower_case_name = typed_value.name;
+    typed_value.children = {items};
+    typed_value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {items.data_type}, Strings {"items"}));
+
+    FieldSchema field;
+    field.name = "v";
+    field.lower_case_name = field.name;
+    field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    field.children = {make_binary_field_schema("metadata", false), typed_value};
+    uint64_t next_id = 10;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_variant_field_with_root_typed_only_array() {
+    FieldSchema element_n = make_int32_field_schema("n");
+
+    FieldSchema element;
+    element.name = "element";
+    element.lower_case_name = element.name;
+    element.children = {element_n};
+    element.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {element_n.data_type}, Strings {"n"}));
+
+    FieldSchema typed_value;
+    typed_value.name = "typed_value";
+    typed_value.lower_case_name = typed_value.name;
+    typed_value.children = {element};
+    typed_value.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+
+    FieldSchema field;
+    field.name = "v";
+    field.lower_case_name = field.name;
+    field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    field.children = {make_binary_field_schema("metadata", false), typed_value};
+    uint64_t next_id = 10;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_variant_field_with_typed_only_map_field() {
+    FieldSchema key = make_binary_field_schema("key", false);
+
+    FieldSchema value_n = make_int32_field_schema("n");
+    FieldSchema value;
+    value.name = "value";
+    value.lower_case_name = value.name;
+    value.children = {value_n};
+    value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {value_n.data_type}, Strings {"n"}));
+
+    FieldSchema attrs;
+    attrs.name = "attrs";
+    attrs.lower_case_name = attrs.name;
+    attrs.children = {key, value};
+    attrs.data_type = make_nullable(std::make_shared<DataTypeMap>(key.data_type, value.data_type));
+
+    FieldSchema typed_value;
+    typed_value.name = "typed_value";
+    typed_value.lower_case_name = typed_value.name;
+    typed_value.children = {attrs};
+    typed_value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {attrs.data_type}, Strings {"attrs"}));
+
+    FieldSchema field;
+    field.name = "v";
+    field.lower_case_name = field.name;
+    field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    field.children = {make_binary_field_schema("metadata", false), typed_value};
+    uint64_t next_id = 10;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_variant_field_with_value_only_residual_field() {
+    FieldSchema metric;
+    metric.name = "metric";
+    metric.lower_case_name = metric.name;
+    metric.children = {make_binary_field_schema("value", true)};
+    metric.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {metric.children[0].data_type}, Strings {"value"}));
+
+    FieldSchema typed_value;
+    typed_value.name = "typed_value";
+    typed_value.lower_case_name = typed_value.name;
+    typed_value.children = {metric};
+    typed_value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {metric.data_type}, Strings {"metric"}));
+
+    FieldSchema field;
+    field.name = "v";
+    field.lower_case_name = field.name;
+    field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    field.children = {make_binary_field_schema("metadata", false),
+                      make_binary_field_schema("value", true), typed_value};
+    uint64_t next_id = 10;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_variant_field_with_partially_shredded_metric() {
+    FieldSchema metric_x = make_int32_field_schema("x");
+
+    FieldSchema metric_typed_value;
+    metric_typed_value.name = "typed_value";
+    metric_typed_value.lower_case_name = metric_typed_value.name;
+    metric_typed_value.children = {metric_x};
+    metric_typed_value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {metric_x.data_type}, Strings {"x"}));
+
+    FieldSchema metric;
+    metric.name = "metric";
+    metric.lower_case_name = metric.name;
+    metric.children = {make_binary_field_schema("value", true), metric_typed_value};
+    metric.data_type = make_nullable(std::make_shared<DataTypeStruct>(
+            DataTypes {metric.children[0].data_type, metric_typed_value.data_type},
+            Strings {"value", "typed_value"}));
+
+    FieldSchema typed_value;
+    typed_value.name = "typed_value";
+    typed_value.lower_case_name = typed_value.name;
+    typed_value.children = {metric};
+    typed_value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {metric.data_type}, Strings {"metric"}));
+
+    FieldSchema field;
+    field.name = "v";
+    field.lower_case_name = field.name;
+    field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    field.children = {make_binary_field_schema("metadata", false),
+                      make_binary_field_schema("value", true), typed_value};
+    uint64_t next_id = 10;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_variant_field_with_numeric_key_field_id_collision() {
+    FieldSchema metric = make_int32_field_schema("metric");
+    metric.field_id = 20;
+
+    FieldSchema typed_value;
+    typed_value.name = "typed_value";
+    typed_value.lower_case_name = typed_value.name;
+    typed_value.children = {metric};
+    typed_value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {metric.data_type}, Strings {"metric"}));
+
+    FieldSchema field;
+    field.name = "v";
+    field.lower_case_name = field.name;
+    field.data_type = std::make_shared<DataTypeVariant>(0, false);
+    field.children = {make_binary_field_schema("metadata", false),
+                      make_binary_field_schema("value", true), typed_value};
+    uint64_t next_id = 10;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_optional_array_field_for_pruning() {
+    FieldSchema element_n = make_int32_field_schema("n");
+    element_n.field_id = 102;
+
+    FieldSchema element;
+    element.name = "element";
+    element.lower_case_name = element.name;
+    element.field_id = 101;
+    element.children = {element_n};
+    element.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {element_n.data_type}, Strings {"n"}));
+
+    FieldSchema field;
+    field.name = "items";
+    field.lower_case_name = field.name;
+    field.field_id = 100;
+    field.children = {element};
+    field.data_type = make_nullable(std::make_shared<DataTypeArray>(element.data_type));
+    uint64_t next_id = 10;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_optional_map_field_for_pruning() {
+    FieldSchema key = make_binary_field_schema("key", false);
+    key.field_id = 101;
+
+    FieldSchema value_n = make_int32_field_schema("n");
+    value_n.field_id = 103;
+    FieldSchema value;
+    value.name = "value";
+    value.lower_case_name = value.name;
+    value.field_id = 102;
+    value.children = {value_n};
+    value.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {value_n.data_type}, Strings {"n"}));
+
+    FieldSchema field;
+    field.name = "attrs";
+    field.lower_case_name = field.name;
+    field.field_id = 100;
+    field.children = {key, value};
+    field.data_type = make_nullable(std::make_shared<DataTypeMap>(key.data_type, value.data_type));
+    uint64_t next_id = 10;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_struct_with_optional_map_field_for_pruning() {
+    FieldSchema attrs = make_optional_map_field_for_pruning();
+
+    FieldSchema field;
+    field.name = "s";
+    field.lower_case_name = field.name;
+    field.field_id = 200;
+    field.children = {attrs};
+    field.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {attrs.data_type}, Strings {"attrs"}));
+    uint64_t next_id = 20;
+    field.assign_ids(next_id);
+    return field;
+}
+
+FieldSchema make_struct_with_optional_array_field_for_pruning() {
+    FieldSchema items = make_optional_array_field_for_pruning();
+
+    FieldSchema field;
+    field.name = "s";
+    field.lower_case_name = field.name;
+    field.field_id = 200;
+    field.children = {items};
+    field.data_type = make_nullable(
+            std::make_shared<DataTypeStruct>(DataTypes {items.data_type}, Strings {"items"}));
+    uint64_t next_id = 20;
+    field.assign_ids(next_id);
+    return field;
+}
+
+const FieldSchema& child_by_name(const FieldSchema& field, const std::string& name) {
+    for (const auto& child : field.children) {
+        if (child.name == name) {
+            return child;
+        }
+    }
+    throw Exception(Status::InternalError("missing test child {}", name));
+}
+
+std::set<uint64_t> collect_ids(const FieldSchema& field,
+                               const std::vector<TColumnAccessPath>& access_paths) {
+    std::set<uint64_t> ids;
+    process_nested_access_paths(
+            &field, access_paths, ids,
+            [](const FieldSchema* field) { return field->get_column_id(); },
+            [](const FieldSchema* field) { return field->get_max_column_id(); },
+            [](const FieldSchema&, const std::vector<std::vector<std::string>>&,
+               std::set<uint64_t>&) { FAIL() << "full projection should not call extractor"; });
+    return ids;
+}
+
+void expect_map_offset_only_ids(const FieldSchema& field, const std::set<uint64_t>& ids) {
+    const auto& key = child_by_name(field, "key");
+    const auto& value = child_by_name(field, "value");
+    const auto& value_n = child_by_name(value, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(key.get_column_id()));
+    EXPECT_FALSE(ids.contains(value.get_column_id()));
+    EXPECT_FALSE(ids.contains(value_n.get_column_id()));
+}
+
+void expect_array_offset_only_ids(const FieldSchema& field, const std::set<uint64_t>& ids) {
+    const auto& element = child_by_name(field, "element");
+    const auto& element_n = child_by_name(element, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(element.get_column_id()));
+    EXPECT_FALSE(ids.contains(element_n.get_column_id()));
+}
+
+void expect_nested_array_offset_only_ids(const FieldSchema& field, const std::set<uint64_t>& ids) {
+    const auto& items = child_by_name(field, "items");
+    const auto& element = child_by_name(items, "element");
+    const auto& element_n = child_by_name(element, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(items.get_column_id()));
+    EXPECT_FALSE(ids.contains(element.get_column_id()));
+    EXPECT_FALSE(ids.contains(element_n.get_column_id()));
+}
+
+void expect_nested_map_offset_only_ids(const FieldSchema& field, const std::set<uint64_t>& ids) {
+    const auto& attrs = child_by_name(field, "attrs");
+    const auto& key = child_by_name(attrs, "key");
+    const auto& value = child_by_name(attrs, "value");
+    const auto& value_n = child_by_name(value, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(attrs.get_column_id()));
+    EXPECT_TRUE(ids.contains(key.get_column_id()));
+    EXPECT_FALSE(ids.contains(value.get_column_id()));
+    EXPECT_FALSE(ids.contains(value_n.get_column_id()));
+}
+
+} // namespace
+
+TEST(NestedColumnAccessHelperTest, EmptyAccessPathsSelectFullFieldRange) {
+    const auto field = make_variant_field_for_access_path_test();
+    const std::set<uint64_t> expected {10, 11, 12, 13, 14, 15, 16};
+    EXPECT_EQ(collect_ids(field, {}), expected);
+}
+
+TEST(NestedColumnAccessHelperTest, NoRecognizedAccessPathsDoNotSelectFieldRange) {
+    const auto field = make_variant_field_for_access_path_test();
+    TColumnAccessPath ignored_path;
+    ignored_path.__set_type(static_cast<TAccessPathType::type>(0));
+
+    const std::set<uint64_t> expected;
+    EXPECT_EQ(collect_ids(field, {ignored_path}), expected);
+}
+
+TEST(NestedColumnAccessHelperTest, GenericParquetPruningUnwrapsOptionalVariant) {
+    auto field = make_variant_field_with_typed_only_nested_shredded_object();
+    field.data_type = make_nullable(field.data_type);
+
+    std::set<uint64_t> ids;
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_name(field, {{"nested", "x"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& nested = child_by_name(top_typed_value, "nested");
+    const auto& nested_typed_value = child_by_name(nested, "typed_value");
+    const auto& nested_x = child_by_name(nested_typed_value, "x");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested_x.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, GenericParquetPruningUnwrapsOptionalArray) {
+    const auto field = make_optional_array_field_for_pruning();
+    std::set<uint64_t> ids;
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_name(field, {{"*", "n"}}, ids);
+
+    const auto& element = child_by_name(field, "element");
+    const auto& element_n = child_by_name(element, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(element.get_column_id()));
+    EXPECT_TRUE(ids.contains(element_n.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, GenericParquetArrayOffsetOnlyKeepsArrayContainer) {
+    const auto field = make_optional_array_field_for_pruning();
+    std::set<uint64_t> ids;
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_name(field, {{"OFFSET"}}, ids);
+
+    expect_array_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, GenericParquetNestedArrayOffsetOnlyKeepsArrayContainer) {
+    const auto field = make_struct_with_optional_array_field_for_pruning();
+    std::set<uint64_t> ids;
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_name(field, {{"items", "OFFSET"}}, ids);
+
+    expect_nested_array_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, GenericParquetPruningUnwrapsOptionalMap) {
+    const auto field = make_optional_map_field_for_pruning();
+    std::set<uint64_t> ids;
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_name(field, {{"*", "n"}}, ids);
+
+    const auto& key = child_by_name(field, "key");
+    const auto& value = child_by_name(field, "value");
+    const auto& value_n = child_by_name(value, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(key.get_column_id()));
+    EXPECT_TRUE(ids.contains(value.get_column_id()));
+    EXPECT_TRUE(ids.contains(value_n.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, GenericParquetMapOffsetOnlyKeepsKeyReference) {
+    const auto field = make_optional_map_field_for_pruning();
+    std::set<uint64_t> ids;
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_name(field, {{"OFFSET"}}, ids);
+
+    expect_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, GenericParquetMapNullOnlyKeepsKeyReference) {
+    const auto field = make_optional_map_field_for_pruning();
+    std::set<uint64_t> ids;
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_name(field, {{"NULL"}}, ids);
+
+    expect_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, GenericParquetNestedMapOffsetOnlyKeepsKeyReference) {
+    const auto field = make_struct_with_optional_map_field_for_pruning();
+    std::set<uint64_t> ids;
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_name(field, {{"attrs", "OFFSET"}}, ids);
+
+    expect_nested_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, GenericParquetNestedMapNullOnlyKeepsKeyReference) {
+    const auto field = make_struct_with_optional_map_field_for_pruning();
+    std::set<uint64_t> ids;
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_name(field, {{"attrs", "NULL"}}, ids);
+
+    expect_nested_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, HiveArrayOffsetOnlyKeepsArrayContainer) {
+    const auto field = make_optional_array_field_for_pruning();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"OFFSET"}}, ids);
+
+    expect_array_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, HiveNestedArrayOffsetOnlyKeepsArrayContainer) {
+    const auto field = make_struct_with_optional_array_field_for_pruning();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"items", "OFFSET"}}, ids);
+
+    expect_nested_array_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergArrayOffsetOnlyKeepsArrayContainer) {
+    const auto field = make_optional_array_field_for_pruning();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"OFFSET"}}, ids);
+
+    expect_array_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergNestedArrayOffsetOnlyKeepsArrayContainer) {
+    const auto field = make_struct_with_optional_array_field_for_pruning();
+    const auto& items = child_by_name(field, "items");
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(
+            field, {{std::to_string(items.field_id), "OFFSET"}}, ids);
+
+    expect_nested_array_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, HiveMapOffsetOnlyKeepsKeyReference) {
+    const auto field = make_optional_map_field_for_pruning();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"OFFSET"}}, ids);
+
+    expect_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, HiveMapNullOnlyKeepsKeyReference) {
+    const auto field = make_optional_map_field_for_pruning();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"NULL"}}, ids);
+
+    expect_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, HiveNestedMapOffsetOnlyKeepsKeyReference) {
+    const auto field = make_struct_with_optional_map_field_for_pruning();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"attrs", "OFFSET"}}, ids);
+
+    expect_nested_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, HiveNestedMapNullOnlyKeepsKeyReference) {
+    const auto field = make_struct_with_optional_map_field_for_pruning();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"attrs", "NULL"}}, ids);
+
+    expect_nested_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergMapOffsetOnlyKeepsKeyReference) {
+    const auto field = make_optional_map_field_for_pruning();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"OFFSET"}}, ids);
+
+    expect_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergMapNullOnlyKeepsKeyReference) {
+    const auto field = make_optional_map_field_for_pruning();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"NULL"}}, ids);
+
+    expect_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergNestedMapOffsetOnlyKeepsKeyReference) {
+    const auto field = make_struct_with_optional_map_field_for_pruning();
+    const auto& attrs = child_by_name(field, "attrs");
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(
+            field, {{std::to_string(attrs.field_id), "OFFSET"}}, ids);
+
+    expect_nested_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergNestedMapNullOnlyKeepsKeyReference) {
+    const auto field = make_struct_with_optional_map_field_for_pruning();
+    const auto& attrs = child_by_name(field, "attrs");
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(
+            field, {{std::to_string(attrs.field_id), "NULL"}}, ids);
+
+    expect_nested_map_offset_only_ids(field, ids);
+}
+
+TEST(NestedColumnAccessHelperTest, HiveVariantPruningKeepsNestedStructuralNameUserKey) {
+    const auto field = make_variant_field_with_nested_structural_name_keys();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"nested", "typed_value"}},
+                                                            ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& nested = child_by_name(top_typed_value, "nested");
+    const auto& nested_typed_value = child_by_name(nested, "typed_value");
+    const auto& nested_value = child_by_name(nested, "value");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested_typed_value.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_FALSE(ids.contains(nested_value.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergVariantPruningKeepsNestedStructuralNameUserKey) {
+    const auto field = make_variant_field_with_nested_structural_name_keys();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"nested", "typed_value"}},
+                                                               ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& nested = child_by_name(top_typed_value, "nested");
+    const auto& nested_typed_value = child_by_name(nested, "typed_value");
+    const auto& nested_value = child_by_name(nested, "value");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested_typed_value.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_FALSE(ids.contains(nested_value.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, HiveVariantPruningKeepsAnnotatedValueUserField) {
+    const auto field = make_variant_field_with_annotated_value_user_field();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"obj", "value"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& object = child_by_name(top_typed_value, "obj");
+    const auto& object_value = child_by_name(object, "value");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(object.get_column_id()));
+    EXPECT_TRUE(ids.contains(object_value.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergVariantPruningKeepsAnnotatedValueUserField) {
+    const auto field = make_variant_field_with_annotated_value_user_field();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"obj", "value"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& object = child_by_name(top_typed_value, "obj");
+    const auto& object_value = child_by_name(object, "value");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(object.get_column_id()));
+    EXPECT_TRUE(ids.contains(object_value.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, HiveVariantPruningSkipsTypedOnlyNestedMissingKey) {
+    const auto field = make_variant_field_with_typed_only_nested_shredded_object();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"nested", "missing"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& nested = child_by_name(top_typed_value, "nested");
+    const auto& nested_typed_value = child_by_name(nested, "typed_value");
+    const auto& nested_x = child_by_name(nested_typed_value, "x");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(metadata.get_column_id()));
+    EXPECT_FALSE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_FALSE(ids.contains(nested.get_column_id()));
+    EXPECT_FALSE(ids.contains(nested_typed_value.get_column_id()));
+    EXPECT_FALSE(ids.contains(nested_x.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergVariantPruningSkipsTypedOnlyNestedMissingKey) {
+    const auto field = make_variant_field_with_typed_only_nested_shredded_object();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"nested", "missing"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& nested = child_by_name(top_typed_value, "nested");
+    const auto& nested_typed_value = child_by_name(nested, "typed_value");
+    const auto& nested_x = child_by_name(nested_typed_value, "x");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(metadata.get_column_id()));
+    EXPECT_FALSE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_FALSE(ids.contains(nested.get_column_id()));
+    EXPECT_FALSE(ids.contains(nested_typed_value.get_column_id()));
+    EXPECT_FALSE(ids.contains(nested_x.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, HiveVariantPruningStripsTerminalMetaSuffix) {
+    const auto field = make_variant_field_with_typed_only_nested_shredded_object();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(
+            field, {{"nested", "x", "NULL"}, {"nested", "x", "OFFSET"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& nested = child_by_name(top_typed_value, "nested");
+    const auto& nested_typed_value = child_by_name(nested, "typed_value");
+    const auto& nested_x = child_by_name(nested_typed_value, "x");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested_x.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, GenericParquetVariantPruningStripsTerminalMetaSuffix) {
+    const auto field = make_variant_field_with_typed_only_nested_shredded_object();
+    std::set<uint64_t> ids;
+    ParquetNestedColumnUtils::extract_nested_column_ids_by_name(
+            field, {{"nested", "x", "NULL"}, {"nested", "x", "OFFSET"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& nested = child_by_name(top_typed_value, "nested");
+    const auto& nested_typed_value = child_by_name(nested, "typed_value");
+    const auto& nested_x = child_by_name(nested_typed_value, "x");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested_x.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergVariantPruningStripsTerminalMetaSuffix) {
+    const auto field = make_variant_field_with_typed_only_nested_shredded_object();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(
+            field, {{"nested", "x", "NULL"}, {"nested", "x", "OFFSET"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& nested = child_by_name(top_typed_value, "nested");
+    const auto& nested_typed_value = child_by_name(nested, "typed_value");
+    const auto& nested_x = child_by_name(nested_typed_value, "x");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(nested_x.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, HiveVariantPruningSelectsValueOnlyResidualField) {
+    const auto field = make_variant_field_with_value_only_residual_field();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"metric", "x"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_value = child_by_name(field, "value");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& metric = child_by_name(top_typed_value, "metric");
+    const auto& metric_value = child_by_name(metric, "value");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(metadata.get_column_id()));
+    EXPECT_FALSE(ids.contains(top_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric_value.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, HiveVariantPruningKeepsResidualForTypedNestedField) {
+    const auto field = make_variant_field_with_partially_shredded_metric();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"metric", "x"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_value = child_by_name(field, "value");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& metric = child_by_name(top_typed_value, "metric");
+    const auto& metric_value = child_by_name(metric, "value");
+    const auto& metric_typed_value = child_by_name(metric, "typed_value");
+    const auto& metric_x = child_by_name(metric_typed_value, "x");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(metadata.get_column_id()));
+    EXPECT_FALSE(ids.contains(top_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric_x.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, HiveVariantPruningMapsArraySubscriptToTypedElement) {
+    const auto field = make_variant_field_with_typed_only_array_field();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"items", "1", "n"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& items = child_by_name(top_typed_value, "items");
+    const auto& items_typed_value = child_by_name(items, "typed_value");
+    const auto& element = child_by_name(items_typed_value, "element");
+    const auto& element_n = child_by_name(element, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(items.get_column_id()));
+    EXPECT_TRUE(ids.contains(items_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(element.get_column_id()));
+    EXPECT_TRUE(ids.contains(element_n.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergVariantPruningMapsArraySubscriptToTypedElement) {
+    const auto field = make_variant_field_with_typed_only_array_field();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"items", "1", "n"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& items = child_by_name(top_typed_value, "items");
+    const auto& items_typed_value = child_by_name(items, "typed_value");
+    const auto& element = child_by_name(items_typed_value, "element");
+    const auto& element_n = child_by_name(element, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(items.get_column_id()));
+    EXPECT_TRUE(ids.contains(items_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(element.get_column_id()));
+    EXPECT_TRUE(ids.contains(element_n.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, HiveVariantPruningMapsArrayElementPathToTypedElement) {
+    const auto field = make_variant_field_with_typed_only_array_field();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"items", "n"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& items = child_by_name(top_typed_value, "items");
+    const auto& items_typed_value = child_by_name(items, "typed_value");
+    const auto& element = child_by_name(items_typed_value, "element");
+    const auto& element_n = child_by_name(element, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(items.get_column_id()));
+    EXPECT_TRUE(ids.contains(items_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(element.get_column_id()));
+    EXPECT_TRUE(ids.contains(element_n.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergVariantPruningMapsArrayElementPathToTypedElement) {
+    const auto field = make_variant_field_with_typed_only_array_field();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"items", "n"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& items = child_by_name(top_typed_value, "items");
+    const auto& items_typed_value = child_by_name(items, "typed_value");
+    const auto& element = child_by_name(items_typed_value, "element");
+    const auto& element_n = child_by_name(element, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(items.get_column_id()));
+    EXPECT_TRUE(ids.contains(items_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(element.get_column_id()));
+    EXPECT_TRUE(ids.contains(element_n.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, HiveVariantPruningMapsRootArraySubscriptToTypedElement) {
+    const auto field = make_variant_field_with_root_typed_only_array();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"1", "n"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& element = child_by_name(top_typed_value, "element");
+    const auto& element_n = child_by_name(element, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(element.get_column_id()));
+    EXPECT_TRUE(ids.contains(element_n.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergVariantPruningMapsRootArraySubscriptToTypedElement) {
+    const auto field = make_variant_field_with_root_typed_only_array();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"1", "n"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& element = child_by_name(top_typed_value, "element");
+    const auto& element_n = child_by_name(element, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(element.get_column_id()));
+    EXPECT_TRUE(ids.contains(element_n.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, HiveVariantPruningMapsTypedMapKeyToValueSubtree) {
+    const auto field = make_variant_field_with_typed_only_map_field();
+    std::set<uint64_t> ids;
+    HiveParquetNestedColumnUtils::extract_nested_column_ids(field, {{"attrs", "k", "n"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& attrs = child_by_name(top_typed_value, "attrs");
+    const auto& key = child_by_name(attrs, "key");
+    const auto& value = child_by_name(attrs, "value");
+    const auto& value_n = child_by_name(value, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(attrs.get_column_id()));
+    EXPECT_TRUE(ids.contains(key.get_column_id()));
+    EXPECT_TRUE(ids.contains(value.get_column_id()));
+    EXPECT_TRUE(ids.contains(value_n.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergVariantPruningMapsTypedMapKeyToValueSubtree) {
+    const auto field = make_variant_field_with_typed_only_map_field();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"attrs", "k", "n"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& attrs = child_by_name(top_typed_value, "attrs");
+    const auto& key = child_by_name(attrs, "key");
+    const auto& value = child_by_name(attrs, "value");
+    const auto& value_n = child_by_name(value, "n");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_FALSE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(attrs.get_column_id()));
+    EXPECT_TRUE(ids.contains(key.get_column_id()));
+    EXPECT_TRUE(ids.contains(value.get_column_id()));
+    EXPECT_TRUE(ids.contains(value_n.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergVariantPruningSelectsValueOnlyResidualField) {
+    const auto field = make_variant_field_with_value_only_residual_field();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"metric", "x"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_value = child_by_name(field, "value");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& metric = child_by_name(top_typed_value, "metric");
+    const auto& metric_value = child_by_name(metric, "value");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(metadata.get_column_id()));
+    EXPECT_FALSE(ids.contains(top_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric_value.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergVariantPruningKeepsResidualForTypedNestedField) {
+    const auto field = make_variant_field_with_partially_shredded_metric();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"metric", "x"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_value = child_by_name(field, "value");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& metric = child_by_name(top_typed_value, "metric");
+    const auto& metric_value = child_by_name(metric, "value");
+    const auto& metric_typed_value = child_by_name(metric, "typed_value");
+    const auto& metric_x = child_by_name(metric_typed_value, "x");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(metadata.get_column_id()));
+    EXPECT_FALSE(ids.contains(top_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric_typed_value.get_column_id()));
+    EXPECT_TRUE(ids.contains(metric_x.get_column_id()));
+}
+
+TEST(NestedColumnAccessHelperTest, IcebergVariantPruningTreatsNumericKeyAsNameNotFieldId) {
+    const auto field = make_variant_field_with_numeric_key_field_id_collision();
+    std::set<uint64_t> ids;
+    IcebergParquetNestedColumnUtils::extract_nested_column_ids(field, {{"20"}}, ids);
+
+    const auto& metadata = child_by_name(field, "metadata");
+    const auto& top_value = child_by_name(field, "value");
+    const auto& top_typed_value = child_by_name(field, "typed_value");
+    const auto& metric = child_by_name(top_typed_value, "metric");
+    EXPECT_TRUE(ids.contains(field.get_column_id()));
+    EXPECT_TRUE(ids.contains(metadata.get_column_id()));
+    EXPECT_TRUE(ids.contains(top_value.get_column_id()));
+    EXPECT_FALSE(ids.contains(top_typed_value.get_column_id()));
+    EXPECT_FALSE(ids.contains(metric.get_column_id()));
+}
+
+} // namespace doris
diff --git a/fe/fe-connector/fe-connector-iceberg/src/main/java/org/apache/doris/connector/iceberg/IcebergTypeMapping.java b/fe/fe-connector/fe-connector-iceberg/src/main/java/org/apache/doris/connector/iceberg/IcebergTypeMapping.java
index 9539e2547d4a01..e894d29f9070e7 100644
--- a/fe/fe-connector/fe-connector-iceberg/src/main/java/org/apache/doris/connector/iceberg/IcebergTypeMapping.java
+++ b/fe/fe-connector/fe-connector-iceberg/src/main/java/org/apache/doris/connector/iceberg/IcebergTypeMapping.java
@@ -52,6 +52,8 @@ public static ConnectorType fromIcebergType(Type icebergType,
                     enableMappingVarbinary, enableMappingTimestampTz);
         }
         switch (icebergType.typeId()) {
+            case VARIANT:
+                return ConnectorType.of("VARIANT");
             case LIST:
                 Types.ListType list = (Types.ListType) icebergType;
                 ConnectorType elemType = fromIcebergType(
@@ -118,6 +120,8 @@ private static ConnectorType fromPrimitive(Type.PrimitiveType primitive,
                 return ConnectorType.of("DATETIMEV2", ICEBERG_DATETIME_SCALE_MS, 0);
             case TIME:
                 return ConnectorType.of("UNSUPPORTED");
+            case VARIANT:
+                return ConnectorType.of("VARIANT");
             default:
                 return ConnectorType.of("UNSUPPORTED");
         }
diff --git a/fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergUtils.java b/fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergUtils.java
index dd600b13725aeb..4b44bee8bcdec2 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergUtils.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/IcebergUtils.java
@@ -606,6 +606,8 @@ private static Type icebergPrimitiveTypeToDorisType(org.apache.iceberg.types.Typ
                 return ScalarType.createDatetimeV2Type(ICEBERG_DATETIME_SCALE_MS);
             case TIME:
                 return Type.UNSUPPORTED;
+            case VARIANT:
+                return Type.VARIANT;
             default:
                 throw new IllegalArgumentException("Cannot transform unknown type: " + primitive);
         }
@@ -618,6 +620,8 @@ public static Type icebergTypeToDorisType(org.apache.iceberg.types.Type type, bo
                     enableMappingVarbinary, enableMappingTimestampTz);
         }
         switch (type.typeId()) {
+            case VARIANT:
+                return Type.VARIANT;
             case LIST:
                 Types.ListType list = (Types.ListType) type;
                 return ArrayType.create(
@@ -1680,6 +1684,52 @@ public static Schema appendRowLineageFieldsForV3(Schema schema) {
                 MetadataColumns.ROW_ID, MetadataColumns.LAST_UPDATED_SEQUENCE_NUMBER));
     }
 
+    public static void validateVariantWriteUnsupported(Schema schema) throws AnalysisException {
+        Optional<String> variantPath = findVariantFieldPath(schema);
+        if (variantPath.isPresent()) {
+            throw new AnalysisException("Writing Iceberg VARIANT columns is not supported: "
+                    + variantPath.get());
+        }
+    }
+
+    private static Optional<String> findVariantFieldPath(Schema schema) {
+        for (NestedField field : schema.columns()) {
+            Optional<String> variantPath = findVariantFieldPath(field.type(), field.name());
+            if (variantPath.isPresent()) {
+                return variantPath;
+            }
+        }
+        return Optional.empty();
+    }
+
+    private static Optional<String> findVariantFieldPath(
+            org.apache.iceberg.types.Type type, String path) {
+        switch (type.typeId()) {
+            case VARIANT:
+                return Optional.of(path);
+            case STRUCT:
+                for (NestedField field : type.asStructType().fields()) {
+                    Optional<String> variantPath =
+                            findVariantFieldPath(field.type(), path + "." + field.name());
+                    if (variantPath.isPresent()) {
+                        return variantPath;
+                    }
+                }
+                return Optional.empty();
+            case LIST:
+                return findVariantFieldPath(type.asListType().elementType(), path + "[]");
+            case MAP:
+                Optional<String> keyVariantPath =
+                        findVariantFieldPath(type.asMapType().keyType(), path + ".key");
+                if (keyVariantPath.isPresent()) {
+                    return keyVariantPath;
+                }
+                return findVariantFieldPath(type.asMapType().valueType(), path + ".value");
+            default:
+                return Optional.empty();
+        }
+    }
+
     public static int getFormatVersion(Table table) {
         int formatVersion = 2; // default format version : 2
         if (table instanceof BaseTable) {
diff --git a/fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java b/fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java
index adc2507e2490a3..5f380a7e4c0075 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/datasource/iceberg/source/IcebergScanNode.java
@@ -22,9 +22,14 @@
 import org.apache.doris.analysis.TableScanParams;
 import org.apache.doris.analysis.TableSnapshot;
 import org.apache.doris.analysis.TupleDescriptor;
+import org.apache.doris.catalog.ArrayType;
 import org.apache.doris.catalog.Column;
 import org.apache.doris.catalog.Env;
+import org.apache.doris.catalog.MapType;
+import org.apache.doris.catalog.StructField;
+import org.apache.doris.catalog.StructType;
 import org.apache.doris.catalog.TableIf;
+import org.apache.doris.catalog.Type;
 import org.apache.doris.common.DdlException;
 import org.apache.doris.common.UserException;
 import org.apache.doris.common.profile.SummaryProfile;
@@ -817,6 +822,7 @@ private LocationPath createLocationPathWithCache(String path) {
     private Split createIcebergSplit(FileScanTask fileScanTask) {
         DataFile dataFile = fileScanTask.file();
         String originalPath = dataFile.path().toString();
+        validateVariantDataFileFormat(dataFile.format(), originalPath);
         LocationPath locationPath = createLocationPathWithCache(originalPath);
         IcebergSplit split = new IcebergSplit(
                 locationPath,
@@ -1058,6 +1064,7 @@ public TFileFormatType getFileFormatType() throws UserException {
         if (icebergFormat.equalsIgnoreCase("parquet")) {
             type = TFileFormatType.FORMAT_PARQUET;
         } else if (icebergFormat.equalsIgnoreCase("orc")) {
+            validateVariantReadSupported(icebergFormat);
             type = TFileFormatType.FORMAT_ORC;
         } else {
             throw new DdlException(String.format("Unsupported format name: %s for iceberg table.", icebergFormat));
@@ -1065,6 +1072,58 @@ public TFileFormatType getFileFormatType() throws UserException {
         return type;
     }
 
+    private void validateVariantReadSupported(String icebergFormat) throws DdlException {
+        String variantColumnName = findVariantReadColumnName();
+        if (variantColumnName != null) {
+            throw new DdlException("Reading Iceberg VARIANT columns is only supported for Parquet files, "
+                    + "but table file format is " + icebergFormat + ": " + variantColumnName);
+        }
+    }
+
+    @VisibleForTesting
+    void validateVariantDataFileFormat(FileFormat dataFileFormat, String path) {
+        if (dataFileFormat == FileFormat.PARQUET) {
+            return;
+        }
+        String variantColumnName = findVariantReadColumnName();
+        if (variantColumnName != null) {
+            throw new NotSupportedException("Reading Iceberg VARIANT columns is only supported for Parquet files, "
+                    + "but data file format is " + dataFileFormat.name() + ": " + variantColumnName
+                    + " (" + path + ")");
+        }
+    }
+
+    private String findVariantReadColumnName() {
+        for (SlotDescriptor slot : desc.getSlots()) {
+            Column column = slot.getColumn();
+            if (containsVariantType(column.getType())) {
+                return column.getName();
+            }
+        }
+        return null;
+    }
+
+    private static boolean containsVariantType(Type type) {
+        if (type.isVariantType()) {
+            return true;
+        }
+        if (type.isArrayType()) {
+            return containsVariantType(((ArrayType) type).getItemType());
+        }
+        if (type.isMapType()) {
+            MapType mapType = (MapType) type;
+            return containsVariantType(mapType.getKeyType()) || containsVariantType(mapType.getValueType());
+        }
+        if (type.isStructType()) {
+            for (StructField field : ((StructType) type).getFields()) {
+                if (containsVariantType(field.getType())) {
+                    return true;
+                }
+            }
+        }
+        return false;
+    }
+
     @Override
     public List<String> getPathPartitionKeys() throws UserException {
         // return icebergTable.spec().fields().stream().map(PartitionField::name).map(String::toLowerCase)
diff --git a/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/PhysicalPlanTranslator.java b/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/PhysicalPlanTranslator.java
index b2ca6cfa2b622a..07d5859893c532 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/PhysicalPlanTranslator.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/nereids/glue/translator/PhysicalPlanTranslator.java
@@ -637,7 +637,8 @@ public PlanFragment visitPhysicalIcebergMergeSink(PhysicalIcebergMergeSink<? ext
         rootFragment.setOutputExprs(outputExprs);
         IcebergMergeSink sink = new IcebergMergeSink(
                 (IcebergExternalTable) icebergMergeSink.getTargetTable(),
-                icebergMergeSink.getDeleteContext());
+                icebergMergeSink.getDeleteContext(),
+                icebergMergeSink.writeDataFiles());
         rootFragment.setSink(sink);
         return rootFragment;
     }
diff --git a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/implementation/LogicalIcebergMergeSinkToPhysicalIcebergMergeSink.java b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/implementation/LogicalIcebergMergeSinkToPhysicalIcebergMergeSink.java
index 9447aaf09d18c0..64f1d3c0f31351 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/implementation/LogicalIcebergMergeSinkToPhysicalIcebergMergeSink.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/implementation/LogicalIcebergMergeSinkToPhysicalIcebergMergeSink.java
@@ -39,6 +39,7 @@ public Rule build() {
                     sink.getCols(),
                     sink.getOutputExprs(),
                     sink.getDeleteContext(),
+                    sink.writeDataFiles(),
                     Optional.empty(),
                     sink.getLogicalProperties(),
                     null,
diff --git a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java
index 47d83e93ca9b0d..e1d75ba24c5747 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathExpressionCollector.java
@@ -25,6 +25,7 @@
 import org.apache.doris.nereids.trees.expressions.ArrayItemReference;
 import org.apache.doris.nereids.trees.expressions.ArrayItemReference.ArrayItemSlot;
 import org.apache.doris.nereids.trees.expressions.Cast;
+import org.apache.doris.nereids.trees.expressions.ComparisonPredicate;
 import org.apache.doris.nereids.trees.expressions.Expression;
 import org.apache.doris.nereids.trees.expressions.IsNull;
 import org.apache.doris.nereids.trees.expressions.Not;
@@ -44,7 +45,10 @@
 import org.apache.doris.nereids.trees.expressions.functions.scalar.ArraySortBy;
 import org.apache.doris.nereids.trees.expressions.functions.scalar.ArraySplit;
 import org.apache.doris.nereids.trees.expressions.functions.scalar.Cardinality;
+import org.apache.doris.nereids.trees.expressions.functions.scalar.CreateNamedStruct;
+import org.apache.doris.nereids.trees.expressions.functions.scalar.CreateStruct;
 import org.apache.doris.nereids.trees.expressions.functions.scalar.ElementAt;
+import org.apache.doris.nereids.trees.expressions.functions.scalar.GetVariantType;
 import org.apache.doris.nereids.trees.expressions.functions.scalar.Lambda;
 import org.apache.doris.nereids.trees.expressions.functions.scalar.Length;
 import org.apache.doris.nereids.trees.expressions.functions.scalar.MapContainsEntry;
@@ -55,6 +59,7 @@
 import org.apache.doris.nereids.trees.expressions.functions.scalar.MapValues;
 import org.apache.doris.nereids.trees.expressions.functions.scalar.StructElement;
 import org.apache.doris.nereids.trees.expressions.literal.Literal;
+import org.apache.doris.nereids.trees.expressions.literal.StructLiteral;
 import org.apache.doris.nereids.trees.expressions.visitor.DefaultExpressionVisitor;
 import org.apache.doris.nereids.types.ArrayType;
 import org.apache.doris.nereids.types.DataType;
@@ -98,14 +103,34 @@ public void collect(Expression expression) {
         expression.accept(this, new CollectorContext(statementContext, bottomPredicate));
     }
 
+    public void collectWholeVariantExpression(Expression expression) {
+        CollectorContext context = new CollectorContext(statementContext, bottomPredicate);
+        context.setCollectVariantRoot(true);
+        expression.accept(this, context);
+    }
+
     private Void continueCollectAccessPath(Expression expr, CollectorContext context) {
         return expr.accept(this, context);
     }
 
+    private void recordVariantRootAccessPath(SlotReference slotReference, CollectorContext context) {
+        int slotId = slotReference.getExprId().asInt();
+        slotToAccessPaths.put(slotId,
+                new CollectAccessPathResult(
+                        ImmutableList.of(slotReference.getName()),
+                        context.bottomFilter, ColumnAccessPathType.DATA));
+    }
+
     @Override
     public Void visit(Expression expr, CollectorContext context) {
         for (Expression child : expr.children()) {
-            child.accept(this, new CollectorContext(context.statementContext, context.bottomFilter));
+            if (child.getDataType().isVariantType()
+                    && (context.collectVariantRoot
+                            || (!context.accessPathBuilder.isEmpty() && expr.getDataType().isVariantType()))) {
+                child.accept(this, context.copy());
+            } else {
+                child.accept(this, new CollectorContext(context.statementContext, context.bottomFilter));
+            }
         }
         return null;
     }
@@ -113,6 +138,12 @@ public Void visit(Expression expr, CollectorContext context) {
     @Override
     public Void visitSlotReference(SlotReference slotReference, CollectorContext context) {
         DataType dataType = slotReference.getDataType();
+        List<String> builderPath = context.accessPathBuilder.getPathList();
+        if (dataType instanceof VariantType && builderPath.size() == 1
+                && AccessPathInfo.ACCESS_NULL.equals(builderPath.get(0))) {
+            recordVariantRootAccessPath(slotReference, context);
+            return null;
+        }
         if (dataType instanceof VariantType
                 && (slotReference.hasSubColPath() || !context.accessPathBuilder.isEmpty())) {
             List<String> path = new ArrayList<>();
@@ -122,7 +153,6 @@ public Void visitSlotReference(SlotReference slotReference, CollectorContext con
             }
             // Strip NULL suffix for variant sub-column access — null-flag-only optimization
             // does not apply to variant sub-column data layout.
-            List<String> builderPath = context.accessPathBuilder.getPathList();
             if (builderPath.size() > 1
                     && AccessPathInfo.ACCESS_NULL.equals(builderPath.get(builderPath.size() - 1))) {
                 builderPath = new ArrayList<>(builderPath.subList(0, builderPath.size() - 1));
@@ -133,6 +163,10 @@ public Void visitSlotReference(SlotReference slotReference, CollectorContext con
                     path, context.bottomFilter, ColumnAccessPathType.DATA));
             return null;
         }
+        if (dataType instanceof VariantType && context.collectVariantRoot) {
+            recordVariantRootAccessPath(slotReference, context);
+            return null;
+        }
         if (dataType instanceof NestedColumnPrunable) {
             context.accessPathBuilder.addPrefix(slotReference.getName().toLowerCase());
             ImmutableList<String> path = Utils.fastToImmutableList(context.accessPathBuilder.accessPath);
@@ -265,39 +299,94 @@ public Void visitAlias(Alias alias, CollectorContext context) {
 
     @Override
     public Void visitCast(Cast cast, CollectorContext context) {
+        Expression child = cast.child(0);
+        if (child.getDataType() instanceof VariantType && context.accessPathBuilder.isEmpty()) {
+            if (isVariantLiteralPathAccess(child)) {
+                return continueCollectAccessPath(child, context);
+            }
+            CollectorContext variantRootContext = context.copy();
+            variantRootContext.setCollectVariantRoot(true);
+            return continueCollectAccessPath(child, variantRootContext);
+        }
+        if (child.getDataType() instanceof VariantType && !context.accessPathBuilder.isEmpty()
+                && (cast.getDataType() instanceof VariantType
+                        || cast.getDataType() instanceof NestedColumnPrunable)) {
+            return continueCollectAccessPath(child, context);
+        }
         if (!context.accessPathBuilder.isEmpty()
                 && cast.getDataType() instanceof NestedColumnPrunable
-                && cast.child().getDataType() instanceof NestedColumnPrunable
-                && !mapTypeIsChanged(cast.child().getDataType(), cast.getDataType(), false)) {
+                && child.getDataType() instanceof NestedColumnPrunable
+                && !mapTypeIsChanged(child.getDataType(), cast.getDataType(), false)) {
 
             DataTypeAccessTree castTree = DataTypeAccessTree.of(
                     cast.getDataType(), ColumnAccessPathType.DATA);
             DataTypeAccessTree originTree = DataTypeAccessTree.of(
-                    cast.child().getDataType(), ColumnAccessPathType.DATA);
+                    child.getDataType(), ColumnAccessPathType.DATA);
 
             List<String> replacePath = new ArrayList<>(context.accessPathBuilder.getPathList());
             if (originTree.replacePathByAnotherTree(castTree, replacePath, 0)) {
                 CollectorContext castContext = new CollectorContext(context.statementContext, context.bottomFilter);
                 castContext.accessPathBuilder.accessPath.addAll(replacePath);
-                return continueCollectAccessPath(cast.child(), castContext);
+                return continueCollectAccessPath(child, castContext);
             }
         }
-        return cast.child(0).accept(this,
+        return child.accept(this,
                 new CollectorContext(context.statementContext, context.bottomFilter)
         );
     }
 
+    @Override
+    public Void visitGetVariantType(GetVariantType getVariantType, CollectorContext context) {
+        Expression child = getVariantType.child(0);
+        if (child.getDataType() instanceof VariantType && context.accessPathBuilder.isEmpty()) {
+            CollectorContext variantRootContext = context.copy();
+            variantRootContext.setCollectVariantRoot(true);
+            return continueCollectAccessPath(child, variantRootContext);
+        }
+        return visit(getVariantType, context);
+    }
+
+    @Override
+    public Void visitComparisonPredicate(ComparisonPredicate comparisonPredicate, CollectorContext context) {
+        if (context.collectVariantRoot) {
+            return visit(comparisonPredicate, context);
+        }
+        for (Expression child : comparisonPredicate.children()) {
+            CollectorContext childContext =
+                    new CollectorContext(context.statementContext, context.bottomFilter);
+            if (child.getDataType() instanceof VariantType && context.accessPathBuilder.isEmpty()
+                    && !isVariantLiteralPathAccess(child)) {
+                childContext.setCollectVariantRoot(true);
+            }
+            child.accept(this, childContext);
+        }
+        return null;
+    }
+
+    private boolean isVariantLiteralPathAccess(Expression expression) {
+        if (expression instanceof SlotReference) {
+            return ((SlotReference) expression).hasSubColPath();
+        }
+        if (!(expression instanceof ElementAt)) {
+            return false;
+        }
+        ElementAt elementAt = (ElementAt) expression;
+        return elementAt.child(0).getDataType().isVariantType() && elementAt.child(1).isLiteral();
+    }
+
     // array element at
     @Override
     public Void visitElementAt(ElementAt elementAt, CollectorContext context) {
         List<Expression> arguments = elementAt.getArguments();
         Expression first = arguments.get(0);
         if (first.getDataType().isArrayType() || first.getDataType().isMapType()) {
-            context.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_ALL);
-            continueCollectAccessPath(first, context);
+            CollectorContext valueContext = context.copy();
+            valueContext.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_ALL);
+            continueCollectAccessPath(first, valueContext);
 
             for (int i = 1; i < arguments.size(); i++) {
-                visit(arguments.get(i), context);
+                arguments.get(i).accept(this,
+                        new CollectorContext(context.statementContext, context.bottomFilter));
             }
             return null;
         } else if (first.getDataType().isVariantType() && arguments.size() >= 2
@@ -313,6 +402,18 @@ public Void visitElementAt(ElementAt elementAt, CollectorContext context) {
                 return continueCollectAccessPath(first, context);
             }
             return visit(elementAt, context);
+        } else if (first.getDataType().isVariantType() && arguments.size() >= 2) {
+            // Dynamic keys can hit any VARIANT field. Drop any outer literal suffix, e.g.
+            // v[cast(id AS string)]['x'], and require the first argument's root instead.
+            CollectorContext variantRootContext =
+                    new CollectorContext(context.statementContext, context.bottomFilter);
+            variantRootContext.setCollectVariantRoot(true);
+            continueCollectAccessPath(first, variantRootContext);
+            for (int i = 1; i < arguments.size(); i++) {
+                arguments.get(i).accept(this,
+                        new CollectorContext(context.statementContext, context.bottomFilter));
+            }
+            return null;
         } else {
             return visit(elementAt, context);
         }
@@ -346,6 +447,55 @@ public Void visitStructElement(StructElement structElement, CollectorContext con
         return null;
     }
 
+    @Override
+    public Void visitCreateNamedStruct(CreateNamedStruct createNamedStruct, CollectorContext context) {
+        List<String> path = context.accessPathBuilder.getPathList();
+        if (!path.isEmpty()) {
+            String fieldName = path.get(0);
+            for (int i = 0; i + 1 < createNamedStruct.arity(); i += 2) {
+                Expression fieldNameExpr = createNamedStruct.child(i);
+                if (fieldNameExpr.isLiteral() && fieldNameExpr.getDataType().isStringLikeType()
+                        && fieldName.equalsIgnoreCase(((Literal) fieldNameExpr).getStringValue())) {
+                    return collectConstructedStructField(createNamedStruct.child(i + 1), context);
+                }
+            }
+        }
+        return context.accessPathBuilder.isEmpty()
+                ? visit(createNamedStruct, context)
+                : collectChildrenWithoutAccessPath(createNamedStruct, context);
+    }
+
+    @Override
+    public Void visitCreateStruct(CreateStruct createStruct, CollectorContext context) {
+        List<String> path = context.accessPathBuilder.getPathList();
+        if (!path.isEmpty()) {
+            String fieldName = path.get(0);
+            for (int i = 0; i < createStruct.arity(); i++) {
+                if (fieldName.equalsIgnoreCase(StructLiteral.COL_PREFIX + (i + 1))) {
+                    return collectConstructedStructField(createStruct.child(i), context);
+                }
+            }
+        }
+        return context.accessPathBuilder.isEmpty()
+                ? visit(createStruct, context)
+                : collectChildrenWithoutAccessPath(createStruct, context);
+    }
+
+    private Void collectConstructedStructField(Expression fieldValue, CollectorContext context) {
+        List<String> path = context.accessPathBuilder.getPathList();
+        CollectorContext fieldContext = new CollectorContext(context.statementContext, context.bottomFilter);
+        fieldContext.setType(context.type);
+        fieldContext.getAccessPathBuilder().addSuffix(path.subList(1, path.size()));
+        return continueCollectAccessPath(fieldValue, fieldContext);
+    }
+
+    private Void collectChildrenWithoutAccessPath(Expression expression, CollectorContext context) {
+        for (Expression child : expression.children()) {
+            child.accept(this, new CollectorContext(context.statementContext, context.bottomFilter));
+        }
+        return null;
+    }
+
     @Override
     public Void visitMapKeys(MapKeys mapKeys, CollectorContext context) {
         LinkedList<String> suffixPath = context.accessPathBuilder.accessPath;
@@ -388,22 +538,39 @@ private static boolean isFunctionNullCheckPath(List<String> suffixPath) {
         return suffixPath.size() == 1 && AccessPathInfo.ACCESS_NULL.equals(suffixPath.get(0));
     }
 
+    private void collectArgumentsAfterFirst(
+            List<Expression> arguments, CollectorContext context) {
+        for (int i = 1; i < arguments.size(); i++) {
+            arguments.get(i).accept(this,
+                    new CollectorContext(context.statementContext, context.bottomFilter));
+        }
+    }
+
     @Override
     public Void visitMapContainsKey(MapContainsKey mapContainsKey, CollectorContext context) {
-        context.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_MAP_KEYS);
-        return continueCollectAccessPath(mapContainsKey.getArgument(0), context);
+        CollectorContext keyContext = context.copy();
+        keyContext.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_MAP_KEYS);
+        continueCollectAccessPath(mapContainsKey.getArgument(0), keyContext);
+        collectArgumentsAfterFirst(mapContainsKey.getArguments(), context);
+        return null;
     }
 
     @Override
     public Void visitMapContainsValue(MapContainsValue mapContainsValue, CollectorContext context) {
-        context.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_MAP_VALUES);
-        return continueCollectAccessPath(mapContainsValue.getArgument(0), context);
+        CollectorContext valueContext = context.copy();
+        valueContext.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_MAP_VALUES);
+        continueCollectAccessPath(mapContainsValue.getArgument(0), valueContext);
+        collectArgumentsAfterFirst(mapContainsValue.getArguments(), context);
+        return null;
     }
 
     @Override
     public Void visitMapContainsEntry(MapContainsEntry mapContainsEntry, CollectorContext context) {
-        context.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_ALL);
-        return continueCollectAccessPath(mapContainsEntry.getArgument(0), context);
+        CollectorContext entryContext = context.copy();
+        entryContext.accessPathBuilder.addPrefix(AccessPathInfo.ACCESS_ALL);
+        continueCollectAccessPath(mapContainsEntry.getArgument(0), entryContext);
+        collectArgumentsAfterFirst(mapContainsEntry.getArguments(), context);
+        return null;
     }
 
     @Override
@@ -619,12 +786,14 @@ public static class CollectorContext {
         private AccessPathBuilder accessPathBuilder;
         private boolean bottomFilter;
         private ColumnAccessPathType type;
+        private boolean collectVariantRoot;
 
         public CollectorContext(StatementContext statementContext, boolean bottomFilter) {
             this.statementContext = statementContext;
             this.accessPathBuilder = new AccessPathBuilder();
             this.bottomFilter = bottomFilter;
             this.type = ColumnAccessPathType.DATA;
+            this.collectVariantRoot = false;
         }
 
         public ColumnAccessPathType getType() {
@@ -638,6 +807,18 @@ public void setType(ColumnAccessPathType type) {
         public AccessPathBuilder getAccessPathBuilder() {
             return accessPathBuilder;
         }
+
+        public void setCollectVariantRoot(boolean collectVariantRoot) {
+            this.collectVariantRoot = collectVariantRoot;
+        }
+
+        public CollectorContext copy() {
+            CollectorContext context = new CollectorContext(statementContext, bottomFilter);
+            context.accessPathBuilder.accessPath.addAll(accessPathBuilder.accessPath);
+            context.type = type;
+            context.collectVariantRoot = collectVariantRoot;
+            return context;
+        }
     }
 
     /** AccessPathBuilder */
diff --git a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathPlanCollector.java b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathPlanCollector.java
index d7ff817f12c6d8..c7ff996f3ddc1f 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathPlanCollector.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/AccessPathPlanCollector.java
@@ -25,11 +25,13 @@
 import org.apache.doris.nereids.trees.expressions.Expression;
 import org.apache.doris.nereids.trees.expressions.NamedExpression;
 import org.apache.doris.nereids.trees.expressions.Slot;
+import org.apache.doris.nereids.trees.expressions.SlotReference;
 import org.apache.doris.nereids.trees.expressions.functions.Function;
 import org.apache.doris.nereids.trees.expressions.functions.generator.Explode;
 import org.apache.doris.nereids.trees.expressions.functions.generator.ExplodeMap;
 import org.apache.doris.nereids.trees.expressions.functions.generator.ExplodeMapOuter;
 import org.apache.doris.nereids.trees.expressions.functions.generator.ExplodeOuter;
+import org.apache.doris.nereids.trees.expressions.functions.generator.ExplodeVariantArray;
 import org.apache.doris.nereids.trees.expressions.functions.generator.PosExplode;
 import org.apache.doris.nereids.trees.expressions.functions.generator.PosExplodeOuter;
 import org.apache.doris.nereids.trees.expressions.literal.StructLiteral;
@@ -64,6 +66,7 @@
 public class AccessPathPlanCollector extends DefaultPlanVisitor<Void, StatementContext> {
     private Multimap<Integer, CollectAccessPathResult> allSlotToAccessPaths = LinkedHashMultimap.create();
     private Map<Slot, List<CollectAccessPathResult>> scanSlotToAccessPaths = new LinkedHashMap<>();
+    private boolean collectWholeVariantOutputEnabled = true;
 
     public Map<Slot, List<CollectAccessPathResult>> collect(Plan root, StatementContext context) {
         root.accept(this, context);
@@ -89,11 +92,12 @@ public Void visitLogicalGenerate(LogicalGenerate<? extends Plan> generate, State
             Function function = generators.get(i);
             Collection<CollectAccessPathResult> accessPaths = allSlotToAccessPaths.get(
                     generatorOutput.getExprId().asInt());
-            if (function instanceof Explode || function instanceof ExplodeOuter) {
+            if (function instanceof Explode || function instanceof ExplodeOuter
+                    || function instanceof ExplodeVariantArray) {
                 if (accessPaths.isEmpty()) {
                     // use the whole column
                     for (Expression child : function.children()) {
-                        exprCollector.collect(child);
+                        exprCollector.collectWholeVariantExpression(child);
                     }
                 } else {
                     for (CollectAccessPathResult accessPath : accessPaths) {
@@ -105,6 +109,7 @@ public Void visitLogicalGenerate(LogicalGenerate<? extends Plan> generate, State
                             if (function.child(0).getDataType().isVariantType()) {
                                 argumentContext.getAccessPathBuilder()
                                         .addSuffix(path.subList(1, path.size()));
+                                argumentContext.setCollectVariantRoot(path.size() == 1);
                             } else {
                                 argumentContext.getAccessPathBuilder()
                                         .addSuffix(AccessPathInfo.ACCESS_ALL)
@@ -122,6 +127,7 @@ public Void visitLogicalGenerate(LogicalGenerate<? extends Plan> generate, State
                             if (function.child(colIndex).getDataType().isVariantType()) {
                                 argumentContext.getAccessPathBuilder()
                                         .addSuffix(path.subList(2, path.size()));
+                                argumentContext.setCollectVariantRoot(path.size() == 2);
                             } else {
                                 argumentContext.getAccessPathBuilder()
                                         .addSuffix(AccessPathInfo.ACCESS_ALL)
@@ -132,7 +138,7 @@ public Void visitLogicalGenerate(LogicalGenerate<? extends Plan> generate, State
                         }
                         // use the whole column
                         for (Expression child : function.children()) {
-                            exprCollector.collect(child);
+                            exprCollector.collectWholeVariantExpression(child);
                         }
                     }
                 }
@@ -140,7 +146,7 @@ public Void visitLogicalGenerate(LogicalGenerate<? extends Plan> generate, State
                 if (accessPaths.isEmpty()) {
                     // use the whole column
                     for (Expression child : function.children()) {
-                        exprCollector.collect(child);
+                        exprCollector.collectWholeVariantExpression(child);
                     }
                 } else {
                     for (CollectAccessPathResult accessPath : accessPaths) {
@@ -171,14 +177,14 @@ public Void visitLogicalGenerate(LogicalGenerate<? extends Plan> generate, State
                             }
                         }
                         // use the whole column
-                        exprCollector.collect(function.child(0));
+                        exprCollector.collectWholeVariantExpression(function.child(0));
                     }
                 }
             } else if (function instanceof PosExplode || function instanceof PosExplodeOuter) {
                 if (accessPaths.isEmpty()) {
                     // use the whole column
                     for (Expression child : function.children()) {
-                        exprCollector.collect(child);
+                        exprCollector.collectWholeVariantExpression(child);
                     }
                 } else {
                     boolean useWholeItem = false;
@@ -209,7 +215,7 @@ public Void visitLogicalGenerate(LogicalGenerate<? extends Plan> generate, State
                     if (useWholeItem) {
                         // use the whole column
                         for (Expression child : function.children()) {
-                            exprCollector.collect(child);
+                            exprCollector.collectWholeVariantExpression(child);
                         }
                     } else {
                         for (int j = 0; j < function.arity(); j++) {
@@ -223,7 +229,13 @@ public Void visitLogicalGenerate(LogicalGenerate<? extends Plan> generate, State
                 exprCollector.collect(function);
             }
         }
-        return generate.child().accept(this, context);
+        boolean previousCollectWholeVariantOutputEnabled = collectWholeVariantOutputEnabled;
+        collectWholeVariantOutputEnabled = false;
+        try {
+            return generate.child().accept(this, context);
+        } finally {
+            collectWholeVariantOutputEnabled = previousCollectWholeVariantOutputEnabled;
+        }
     }
 
     @Override
@@ -231,34 +243,52 @@ public Void visitLogicalProject(LogicalProject<? extends Plan> project, Statemen
         AccessPathExpressionCollector exprCollector
                 = new AccessPathExpressionCollector(context, allSlotToAccessPaths, false);
         for (NamedExpression output : project.getProjects()) {
+            Collection<CollectAccessPathResult> outputAccessPaths =
+                    allSlotToAccessPaths.get(output.getExprId().asInt());
             // e.g. select struct_element(s, 'city') from (select s from tbl)a;
             // we will not treat the inner `s` access all path
-            if (output instanceof Slot && allSlotToAccessPaths.containsKey(output.getExprId().asInt())) {
+            if (collectWholeVariantOutputEnabled && output instanceof Slot
+                    && collectWholeVariantOutput((Slot) output)) {
                 continue;
-            } else if (output instanceof Alias && output.child(0) instanceof Slot
-                    && allSlotToAccessPaths.containsKey(output.getExprId().asInt())) {
-                Slot innerSlot = (Slot) output.child(0);
-                Collection<CollectAccessPathResult> outerSlotAccessPaths = allSlotToAccessPaths.get(
-                        output.getExprId().asInt());
-                for (CollectAccessPathResult outerSlotAccessPath : outerSlotAccessPaths) {
-                    List<String> outerPath = outerSlotAccessPath.getPath();
-                    List<String> replaceSlotNamePath = new ArrayList<>();
-                    replaceSlotNamePath.add(innerSlot.getName());
-                    replaceSlotNamePath.addAll(outerPath.subList(1, outerPath.size()));
-                    allSlotToAccessPaths.put(
-                            innerSlot.getExprId().asInt(),
-                            new CollectAccessPathResult(
-                                    replaceSlotNamePath,
-                                    outerSlotAccessPath.isPredicate(),
-                                    outerSlotAccessPath.getType()
-                            )
-                    );
-                }
+            } else if (output instanceof Slot && !outputAccessPaths.isEmpty()) {
+                continue;
+            } else if (output instanceof Alias && !outputAccessPaths.isEmpty()) {
+                collectAliasAccessPaths((Alias) output, outputAccessPaths, exprCollector, context);
+            } else if (collectWholeVariantOutputEnabled && output instanceof Alias && output.child(0) instanceof Slot
+                    && collectWholeVariantOutput((Slot) output.child(0))) {
+                continue;
+            } else if (collectWholeVariantOutputEnabled && output.getDataType().isVariantType()) {
+                exprCollector.collectWholeVariantExpression(output);
             } else {
                 exprCollector.collect(output);
             }
         }
-        return project.child().accept(this, context);
+        boolean previousCollectWholeVariantOutputEnabled = collectWholeVariantOutputEnabled;
+        collectWholeVariantOutputEnabled = false;
+        try {
+            return project.child().accept(this, context);
+        } finally {
+            collectWholeVariantOutputEnabled = previousCollectWholeVariantOutputEnabled;
+        }
+    }
+
+    private void collectAliasAccessPaths(Alias alias, Collection<CollectAccessPathResult> aliasAccessPaths,
+            AccessPathExpressionCollector exprCollector, StatementContext context) {
+        Expression child = alias.child(0);
+        if (collectWholeVariantOutputEnabled && child.getDataType().isVariantType()) {
+            exprCollector.collectWholeVariantExpression(child);
+        }
+        for (CollectAccessPathResult aliasAccessPath : aliasAccessPaths) {
+            List<String> aliasPath = aliasAccessPath.getPath();
+            CollectorContext childContext = new CollectorContext(context, aliasAccessPath.isPredicate());
+            childContext.setType(aliasAccessPath.getType());
+            if (aliasPath.size() == 1 && child.getDataType().isVariantType()) {
+                childContext.setCollectVariantRoot(true);
+            } else {
+                childContext.getAccessPathBuilder().addSuffix(aliasPath.subList(1, aliasPath.size()));
+            }
+            child.accept(exprCollector, childContext);
+        }
     }
 
     @Override
@@ -305,7 +335,13 @@ public Void visitLogicalCTEConsumer(LogicalCTEConsumer cteConsumer, StatementCon
 
     @Override
     public Void visitLogicalCTEProducer(LogicalCTEProducer<? extends Plan> cteProducer, StatementContext context) {
-        return cteProducer.child().accept(this, context);
+        boolean previousCollectWholeVariantOutputEnabled = collectWholeVariantOutputEnabled;
+        collectWholeVariantOutputEnabled = false;
+        try {
+            return cteProducer.child().accept(this, context);
+        } finally {
+            collectWholeVariantOutputEnabled = previousCollectWholeVariantOutputEnabled;
+        }
     }
 
     @Override
@@ -391,6 +427,18 @@ private void collectByExpressions(Plan plan, StatementContext context, boolean b
         }
     }
 
+    private boolean collectWholeVariantOutput(Slot slot) {
+        if (!slot.getDataType().isVariantType()
+                || (slot instanceof SlotReference && ((SlotReference) slot).hasSubColPath())) {
+            return false;
+        }
+        List<String> path = new ArrayList<>();
+        path.add(slot.getName());
+        allSlotToAccessPaths.put(slot.getExprId().asInt(),
+                new CollectAccessPathResult(path, false, ColumnAccessPathType.DATA));
+        return true;
+    }
+
     static List<CollectAccessPathResult> normalizeDataSkippingOnlyAccessPaths(
             Collection<CollectAccessPathResult> accessPaths) {
         List<CollectAccessPathResult> normalizedAccessPaths = new ArrayList<>();
diff --git a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/NestedColumnPruning.java b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/NestedColumnPruning.java
index 0da73d3e936cff..3621b3db9c45a0 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/NestedColumnPruning.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/NestedColumnPruning.java
@@ -391,6 +391,7 @@ && containsDataSkippingOnlyAccessPath(collectAccessPathResults)) {
 
         for (Entry<Slot, DataType> kv : variantSlots.entrySet()) {
             Slot slot = kv.getKey();
+            stripVariantSubpathsCoveredByFullPath(slot, allAccessPaths);
             List<ColumnAccessPath> allPaths = buildColumnAccessPaths(slot, allAccessPaths);
             result.put(slot.getExprId().asInt(),
                     new AccessPathInfo(slot.getDataType(), allPaths, new ArrayList<>()));
@@ -421,6 +422,34 @@ && containsDataSkippingOnlyAccessPath(collectAccessPathResults)) {
         return result;
     }
 
+    private static void stripVariantSubpathsCoveredByFullPath(
+            Slot slot, Multimap<Integer, Pair<ColumnAccessPathType, List<String>>> allAccessPaths) {
+        int slotId = slot.getExprId().asInt();
+        Collection<Pair<ColumnAccessPathType, List<String>>> paths = allAccessPaths.get(slotId);
+        boolean hasFullPath = false;
+        for (Pair<ColumnAccessPathType, List<String>> path : paths) {
+            if (path.first == ColumnAccessPathType.DATA
+                    && path.second.size() == 1
+                    && path.second.get(0).equalsIgnoreCase(slot.getName())) {
+                hasFullPath = true;
+                break;
+            }
+        }
+        if (!hasFullPath) {
+            return;
+        }
+
+        List<Pair<ColumnAccessPathType, List<String>>> pathsToRemove = new ArrayList<>();
+        for (Pair<ColumnAccessPathType, List<String>> path : paths) {
+            if (path.first == ColumnAccessPathType.DATA
+                    && path.second.size() > 1
+                    && path.second.get(0).equalsIgnoreCase(slot.getName())) {
+                pathsToRemove.add(path);
+            }
+        }
+        paths.removeAll(pathsToRemove);
+    }
+
     private static boolean containsDataSkippingOnlyAccessPath(
             List<CollectAccessPathResult> collectAccessPathResults) {
         for (CollectAccessPathResult collectAccessPathResult : collectAccessPathResults) {
@@ -1007,6 +1036,9 @@ public void setAccessByPath(List<String> path, int accessIndex, ColumnAccessPath
                 // Any other sub-path on a string column means full data is needed.
                 accessAll = true;
                 return;
+            } else if (type.isVariantType()) {
+                accessAll = true;
+                return;
             } else if (isRoot) {
                 children.get(path.get(accessIndex).toLowerCase()).setAccessByPath(path, accessIndex + 1, pathType);
                 return;
diff --git a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/SlotTypeReplacer.java b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/SlotTypeReplacer.java
index 6a8fd28b902ba7..b734499c65e510 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/SlotTypeReplacer.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/SlotTypeReplacer.java
@@ -64,6 +64,7 @@
 import org.apache.doris.nereids.types.MapType;
 import org.apache.doris.nereids.types.NestedColumnPrunable;
 import org.apache.doris.nereids.types.StructType;
+import org.apache.doris.nereids.types.VariantType;
 import org.apache.doris.nereids.util.MoreFieldsThread;
 
 import com.google.common.collect.ImmutableCollection;
@@ -236,8 +237,8 @@ public Plan visitLogicalExcept(LogicalExcept except, Void context) {
                 = replaceExpressions(except.getOutputs(), true, false);
 
         if (replacedRegularChildrenOutputs.first || replacedOutputs.first) {
-            return new LogicalExcept(except.getQualifier(), except.getOutputs(),
-                    except.getRegularChildrenOutputs(), except.children());
+            return new LogicalExcept(except.getQualifier(), replacedOutputs.second,
+                    replacedRegularChildrenOutputs.second, except.children());
         }
 
         return except;
@@ -254,8 +255,8 @@ public Plan visitLogicalIntersect(LogicalIntersect intersect, Void context) {
                 = replaceExpressions(intersect.getOutputs(), true, false);
 
         if (replacedRegularChildrenOutputs.first || replacedOutputs.first) {
-            return new LogicalIntersect(intersect.getQualifier(), intersect.getOutputs(),
-                    intersect.getRegularChildrenOutputs(), intersect.children());
+            return new LogicalIntersect(intersect.getQualifier(), replacedOutputs.second,
+                    replacedRegularChildrenOutputs.second, intersect.children());
         }
         return intersect;
     }
@@ -654,16 +655,22 @@ private void replaceIcebergAccessPathToId(List<String> originPath, int index, Da
                     replaceIcebergAccessPathToId(
                             originPath, index + 1, ((MapType) type).getValueType(), column.getChildren().get(1)
                     );
+                } else if (fieldName.equals(AccessPathInfo.ACCESS_MAP_KEYS)) {
+                    replaceIcebergAccessPathToId(
+                            originPath, index + 1, ((MapType) type).getKeyType(), column.getChildren().get(0)
+                    );
                 }
             } else if (type instanceof StructType) {
                 for (Column child : column.getChildren()) {
-                    if (child.getName().equals(fieldName)) {
+                    if (child.getName().equalsIgnoreCase(fieldName)) {
                         originPath.set(index, String.valueOf(child.getUniqueId()));
-                        DataType childType = ((StructType) type).getNameToFields().get(fieldName).getDataType();
+                        DataType childType = ((StructType) type).getField(fieldName).getDataType();
                         replaceIcebergAccessPathToId(originPath, index + 1, childType, child);
                         break;
                     }
                 }
+            } else if (type instanceof VariantType) {
+                replaceIcebergAccessPathToId(originPath, index + 1, type, column);
             } else {
                 originPath.set(index, String.valueOf(column.getUniqueId()));
             }
diff --git a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/IcebergMergeCommand.java b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/IcebergMergeCommand.java
index 7770d147812682..e66557108cc2c3 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/IcebergMergeCommand.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/IcebergMergeCommand.java
@@ -237,12 +237,16 @@ private List<Expression> buildDeleteProjection(Expression rowIdExpr, List<Column
         projection.add(new TinyIntLiteral(IcebergMergeOperation.DELETE_OPERATION_NUMBER));
         projection.add(rowIdExpr);
         for (Column column : columns) {
-            if (!column.isVisible() && !IcebergUtils.isIcebergRowLineageColumn(column)) {
+            if (IcebergUtils.isIcebergRowLineageColumn(column)) {
+                List<String> nameParts = Lists.newArrayList(targetNameInPlan);
+                nameParts.add(column.getName());
+                projection.add(new UnboundSlot(nameParts));
+                continue;
+            }
+            if (!column.isVisible()) {
                 continue;
             }
-            List<String> nameParts = Lists.newArrayList(targetNameInPlan);
-            nameParts.add(column.getName());
-            projection.add(new UnboundSlot(nameParts));
+            projection.add(new NullLiteral(DataType.fromCatalogType(column.getType())));
         }
         return projection;
     }
@@ -462,11 +466,25 @@ private LogicalPlan buildMergePlan(ConnectContext ctx, IcebergExternalTable iceb
                 icebergTable.getBaseSchema(true),
                 outputExprs,
                 deleteCtx,
+                writesDataFiles(matchedClauses, notMatchedClauses),
                 Optional.empty(),
                 Optional.empty(),
                 projectPlan);
     }
 
+    static boolean writesDataFiles(List<MergeMatchedClause> matchedClauses,
+            List<MergeNotMatchedClause> notMatchedClauses) {
+        if (!notMatchedClauses.isEmpty()) {
+            return true;
+        }
+        for (MergeMatchedClause clause : matchedClauses) {
+            if (!clause.isDelete()) {
+                return true;
+            }
+        }
+        return false;
+    }
+
     private boolean executeMergePlan(ConnectContext ctx, StmtExecutor executor,
                                      IcebergExternalTable icebergTable,
                                      LogicalPlan logicalPlan) throws Exception {
diff --git a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/IcebergInsertExecutor.java b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/IcebergInsertExecutor.java
index 6f9b951a9a6e06..be068206a89ae6 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/IcebergInsertExecutor.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/insert/IcebergInsertExecutor.java
@@ -17,10 +17,12 @@
 
 package org.apache.doris.nereids.trees.plans.commands.insert;
 
+import org.apache.doris.common.AnalysisException;
 import org.apache.doris.common.UserException;
 import org.apache.doris.datasource.NameMapping;
 import org.apache.doris.datasource.iceberg.IcebergExternalTable;
 import org.apache.doris.datasource.iceberg.IcebergTransaction;
+import org.apache.doris.datasource.iceberg.IcebergUtils;
 import org.apache.doris.nereids.NereidsPlanner;
 import org.apache.doris.qe.ConnectContext;
 import org.apache.doris.transaction.TransactionType;
@@ -46,10 +48,20 @@ public IcebergInsertExecutor(ConnectContext ctx, IcebergExternalTable table,
         super(ctx, table, labelName, planner, insertCtx, emptyInsert, jobId);
     }
 
+    private static void rejectVariantWrites(IcebergExternalTable table) throws UserException {
+        try {
+            IcebergUtils.validateVariantWriteUnsupported(table.getIcebergTable().schema());
+        } catch (AnalysisException e) {
+            throw new UserException(e.getMessage(), e);
+        }
+    }
+
     @Override
     protected void beforeExec() throws UserException {
+        IcebergExternalTable icebergTable = (IcebergExternalTable) table;
+        rejectVariantWrites(icebergTable);
         IcebergTransaction transaction = (IcebergTransaction) transactionManager.getTransaction(txnId);
-        transaction.beginInsert((IcebergExternalTable) table, insertCtx);
+        transaction.beginInsert(icebergTable, insertCtx);
     }
 
     @Override
diff --git a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/logical/LogicalIcebergMergeSink.java b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/logical/LogicalIcebergMergeSink.java
index 7f528020890dde..0e1acae31877a9 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/logical/LogicalIcebergMergeSink.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/logical/LogicalIcebergMergeSink.java
@@ -47,6 +47,7 @@ public class LogicalIcebergMergeSink<CHILD_TYPE extends Plan> extends LogicalTab
     private final IcebergExternalDatabase database;
     private final IcebergExternalTable targetTable;
     private final DeleteCommandContext deleteContext;
+    private final boolean writeDataFiles;
 
     /**
      * Constructor
@@ -59,10 +60,24 @@ public LogicalIcebergMergeSink(IcebergExternalDatabase database,
                                    Optional<GroupExpression> groupExpression,
                                    Optional<LogicalProperties> logicalProperties,
                                    CHILD_TYPE child) {
+        this(database, targetTable, cols, outputExprs, deleteContext, true, groupExpression,
+                logicalProperties, child);
+    }
+
+    public LogicalIcebergMergeSink(IcebergExternalDatabase database,
+                                   IcebergExternalTable targetTable,
+                                   List<Column> cols,
+                                   List<NamedExpression> outputExprs,
+                                   DeleteCommandContext deleteContext,
+                                   boolean writeDataFiles,
+                                   Optional<GroupExpression> groupExpression,
+                                   Optional<LogicalProperties> logicalProperties,
+                                   CHILD_TYPE child) {
         super(PlanType.LOGICAL_ICEBERG_MERGE_SINK, outputExprs, groupExpression, logicalProperties, cols, child);
         this.database = Objects.requireNonNull(database, "database != null in LogicalIcebergMergeSink");
         this.targetTable = Objects.requireNonNull(targetTable, "targetTable != null in LogicalIcebergMergeSink");
         this.deleteContext = Objects.requireNonNull(deleteContext, "deleteContext != null in LogicalIcebergMergeSink");
+        this.writeDataFiles = writeDataFiles;
     }
 
     public Plan withChildAndUpdateOutput(Plan child) {
@@ -70,19 +85,19 @@ public Plan withChildAndUpdateOutput(Plan child) {
                 .map(NamedExpression.class::cast)
                 .collect(ImmutableList.toImmutableList());
         return new LogicalIcebergMergeSink<>(database, targetTable, cols, output,
-                deleteContext, Optional.empty(), Optional.empty(), child);
+                deleteContext, writeDataFiles, Optional.empty(), Optional.empty(), child);
     }
 
     @Override
     public Plan withChildren(List<Plan> children) {
         Preconditions.checkArgument(children.size() == 1, "LogicalIcebergMergeSink only accepts one child");
         return new LogicalIcebergMergeSink<>(database, targetTable, cols, outputExprs,
-                deleteContext, Optional.empty(), Optional.empty(), children.get(0));
+                deleteContext, writeDataFiles, Optional.empty(), Optional.empty(), children.get(0));
     }
 
     public LogicalIcebergMergeSink<CHILD_TYPE> withOutputExprs(List<NamedExpression> outputExprs) {
         return new LogicalIcebergMergeSink<>(database, targetTable, cols, outputExprs,
-                deleteContext, Optional.empty(), Optional.empty(), child());
+                deleteContext, writeDataFiles, Optional.empty(), Optional.empty(), child());
     }
 
     public IcebergExternalDatabase getDatabase() {
@@ -97,6 +112,10 @@ public DeleteCommandContext getDeleteContext() {
         return deleteContext;
     }
 
+    public boolean writeDataFiles() {
+        return writeDataFiles;
+    }
+
     @Override
     public boolean equals(Object o) {
         if (this == o) {
@@ -112,12 +131,13 @@ public boolean equals(Object o) {
         return Objects.equals(database, that.database)
                 && Objects.equals(targetTable, that.targetTable)
                 && Objects.equals(deleteContext, that.deleteContext)
+                && writeDataFiles == that.writeDataFiles
                 && Objects.equals(cols, that.cols);
     }
 
     @Override
     public int hashCode() {
-        return Objects.hash(super.hashCode(), database, targetTable, cols, deleteContext);
+        return Objects.hash(super.hashCode(), database, targetTable, cols, deleteContext, writeDataFiles);
     }
 
     @Override
@@ -127,7 +147,8 @@ public String toString() {
                 "database", database.getFullName(),
                 "targetTable", targetTable.getName(),
                 "cols", cols,
-                "deleteFileType", deleteContext.getDeleteFileType());
+                "deleteFileType", deleteContext.getDeleteFileType(),
+                "writeDataFiles", writeDataFiles);
     }
 
     @Override
@@ -138,13 +159,13 @@ public <R, C> R accept(PlanVisitor<R, C> visitor, C context) {
     @Override
     public Plan withGroupExpression(Optional<GroupExpression> groupExpression) {
         return new LogicalIcebergMergeSink<>(database, targetTable, cols, outputExprs,
-                deleteContext, groupExpression, Optional.of(getLogicalProperties()), child());
+                deleteContext, writeDataFiles, groupExpression, Optional.of(getLogicalProperties()), child());
     }
 
     @Override
     public Plan withGroupExprLogicalPropChildren(Optional<GroupExpression> groupExpression,
             Optional<LogicalProperties> logicalProperties, List<Plan> children) {
         return new LogicalIcebergMergeSink<>(database, targetTable, cols, outputExprs,
-                deleteContext, groupExpression, logicalProperties, children.get(0));
+                deleteContext, writeDataFiles, groupExpression, logicalProperties, children.get(0));
     }
 }
diff --git a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/physical/PhysicalIcebergMergeSink.java b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/physical/PhysicalIcebergMergeSink.java
index 0281ad23243496..d5fa3d1fabc4c0 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/physical/PhysicalIcebergMergeSink.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/physical/PhysicalIcebergMergeSink.java
@@ -56,6 +56,7 @@
  */
 public class PhysicalIcebergMergeSink<CHILD_TYPE extends Plan> extends PhysicalBaseExternalTableSink<CHILD_TYPE> {
     private final DeleteCommandContext deleteContext;
+    private final boolean writeDataFiles;
 
     /**
      * Constructor
@@ -68,10 +69,23 @@ public PhysicalIcebergMergeSink(IcebergExternalDatabase database,
                                     Optional<GroupExpression> groupExpression,
                                     LogicalProperties logicalProperties,
                                     CHILD_TYPE child) {
-        this(database, targetTable, cols, outputExprs, deleteContext, groupExpression, logicalProperties,
+        this(database, targetTable, cols, outputExprs, deleteContext, true, groupExpression, logicalProperties,
                 PhysicalProperties.GATHER, null, child);
     }
 
+    public PhysicalIcebergMergeSink(IcebergExternalDatabase database,
+                                    IcebergExternalTable targetTable,
+                                    List<Column> cols,
+                                    List<NamedExpression> outputExprs,
+                                    DeleteCommandContext deleteContext,
+                                    boolean writeDataFiles,
+                                    Optional<GroupExpression> groupExpression,
+                                    LogicalProperties logicalProperties,
+                                    CHILD_TYPE child) {
+        this(database, targetTable, cols, outputExprs, deleteContext, writeDataFiles, groupExpression,
+                logicalProperties, PhysicalProperties.GATHER, null, child);
+    }
+
     /**
      * Constructor
      */
@@ -80,6 +94,7 @@ public PhysicalIcebergMergeSink(IcebergExternalDatabase database,
                                     List<Column> cols,
                                     List<NamedExpression> outputExprs,
                                     DeleteCommandContext deleteContext,
+                                    boolean writeDataFiles,
                                     Optional<GroupExpression> groupExpression,
                                     LogicalProperties logicalProperties,
                                     PhysicalProperties physicalProperties,
@@ -89,17 +104,22 @@ public PhysicalIcebergMergeSink(IcebergExternalDatabase database,
                 logicalProperties, physicalProperties, statistics, child);
         this.deleteContext = Objects.requireNonNull(
                 deleteContext, "deleteContext != null in PhysicalIcebergMergeSink");
+        this.writeDataFiles = writeDataFiles;
     }
 
     public DeleteCommandContext getDeleteContext() {
         return deleteContext;
     }
 
+    public boolean writeDataFiles() {
+        return writeDataFiles;
+    }
+
     @Override
     public Plan withChildren(List<Plan> children) {
         return new PhysicalIcebergMergeSink<>(
                 (IcebergExternalDatabase) database, (IcebergExternalTable) targetTable,
-                cols, outputExprs, deleteContext, groupExpression,
+                cols, outputExprs, deleteContext, writeDataFiles, groupExpression,
                 getLogicalProperties(), physicalProperties, statistics, children.get(0));
     }
 
@@ -112,7 +132,7 @@ public <R, C> R accept(PlanVisitor<R, C> visitor, C context) {
     public Plan withGroupExpression(Optional<GroupExpression> groupExpression) {
         return new PhysicalIcebergMergeSink<>(
                 (IcebergExternalDatabase) database, (IcebergExternalTable) targetTable, cols, outputExprs,
-                deleteContext, groupExpression, getLogicalProperties(), child());
+                deleteContext, writeDataFiles, groupExpression, getLogicalProperties(), child());
     }
 
     @Override
@@ -120,14 +140,15 @@ public Plan withGroupExprLogicalPropChildren(Optional<GroupExpression> groupExpr
                                                  Optional<LogicalProperties> logicalProperties, List<Plan> children) {
         return new PhysicalIcebergMergeSink<>(
                 (IcebergExternalDatabase) database, (IcebergExternalTable) targetTable, cols, outputExprs,
-                deleteContext, groupExpression, logicalProperties.get(), children.get(0));
+                deleteContext, writeDataFiles, groupExpression, logicalProperties.get(), children.get(0));
     }
 
     @Override
     public PhysicalPlan withPhysicalPropertiesAndStats(PhysicalProperties physicalProperties, Statistics statistics) {
         return new PhysicalIcebergMergeSink<>(
                 (IcebergExternalDatabase) database, (IcebergExternalTable) targetTable, cols, outputExprs,
-                deleteContext, groupExpression, getLogicalProperties(), physicalProperties, statistics, child());
+                deleteContext, writeDataFiles, groupExpression, getLogicalProperties(), physicalProperties,
+                statistics, child());
     }
 
     @Override
@@ -142,12 +163,12 @@ public boolean equals(Object o) {
             return false;
         }
         PhysicalIcebergMergeSink<?> that = (PhysicalIcebergMergeSink<?>) o;
-        return Objects.equals(deleteContext, that.deleteContext);
+        return Objects.equals(deleteContext, that.deleteContext) && writeDataFiles == that.writeDataFiles;
     }
 
     @Override
     public int hashCode() {
-        return Objects.hash(super.hashCode(), deleteContext);
+        return Objects.hash(super.hashCode(), deleteContext, writeDataFiles);
     }
 
     /**
diff --git a/fe/fe-core/src/main/java/org/apache/doris/planner/IcebergMergeSink.java b/fe/fe-core/src/main/java/org/apache/doris/planner/IcebergMergeSink.java
index 4af4ba17e18578..d1b468a5f1f46a 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/planner/IcebergMergeSink.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/planner/IcebergMergeSink.java
@@ -64,6 +64,7 @@ public class IcebergMergeSink extends BaseExternalTableDataSink {
 
     private final IcebergExternalTable targetTable;
     private final DeleteCommandContext deleteContext;
+    private final boolean writeDataFiles;
     private List<TIcebergRewritableDeleteFileSet> rewritableDeleteFileSets = Collections.emptyList();
 
     private static final HashSet<TFileFormatType> supportedTypes = new HashSet<TFileFormatType>() {{
@@ -75,12 +76,18 @@ public class IcebergMergeSink extends BaseExternalTableDataSink {
     private Map<StorageProperties.Type, StorageProperties> storagePropertiesMap;
 
     public IcebergMergeSink(IcebergExternalTable targetTable, DeleteCommandContext deleteContext) {
+        this(targetTable, deleteContext, true);
+    }
+
+    public IcebergMergeSink(IcebergExternalTable targetTable, DeleteCommandContext deleteContext,
+            boolean writeDataFiles) {
         super();
         if (targetTable.isView()) {
             throw new UnsupportedOperationException("UPDATE on iceberg view is not supported");
         }
         this.targetTable = targetTable;
         this.deleteContext = deleteContext;
+        this.writeDataFiles = writeDataFiles;
 
         IcebergExternalCatalog catalog = (IcebergExternalCatalog) targetTable.getCatalog();
         storagePropertiesMap = VendedCredentialsFactory.getStoragePropertiesMapWithVendedCredentials(
@@ -129,6 +136,9 @@ public void bindDataSink(Optional<InsertCommandContext> insertCtx)
         if (formatVersion >= 3) {
             schema = IcebergUtils.appendRowLineageFieldsForV3(schema);
         }
+        if (writeDataFiles) {
+            IcebergUtils.validateVariantWriteUnsupported(schema);
+        }
         tSink.setFormatVersion(formatVersion);
         tSink.setSchemaJson(SchemaParser.toJson(schema));
 
diff --git a/fe/fe-core/src/main/java/org/apache/doris/planner/IcebergTableSink.java b/fe/fe-core/src/main/java/org/apache/doris/planner/IcebergTableSink.java
index 0f3b1bb24d26bc..307d1893003d50 100644
--- a/fe/fe-core/src/main/java/org/apache/doris/planner/IcebergTableSink.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/planner/IcebergTableSink.java
@@ -134,6 +134,7 @@ public void bindDataSink(Optional<InsertCommandContext> insertCtx)
             // iceberg v3 format requires additional row lineage fields when rewrite data files.
             schema = IcebergUtils.appendRowLineageFieldsForV3(schema);
         }
+        IcebergUtils.validateVariantWriteUnsupported(schema);
         tSink.setSchemaJson(SchemaParser.toJson(schema));
 
         // partition spec
diff --git a/fe/fe-core/src/test/java/org/apache/doris/datasource/iceberg/IcebergUtilsTest.java b/fe/fe-core/src/test/java/org/apache/doris/datasource/iceberg/IcebergUtilsTest.java
index 1fcde27aa95416..a8bf7bd3db1cd3 100644
--- a/fe/fe-core/src/test/java/org/apache/doris/datasource/iceberg/IcebergUtilsTest.java
+++ b/fe/fe-core/src/test/java/org/apache/doris/datasource/iceberg/IcebergUtilsTest.java
@@ -159,6 +159,12 @@ public void testAppendRowLineageFieldsForV3AddsMetadataFields() {
         Assert.assertNotNull(schemaWithRowLineage.findField(MetadataColumns.LAST_UPDATED_SEQUENCE_NUMBER.fieldId()));
     }
 
+    @Test
+    public void testIcebergVariantTypeToDorisVariant() {
+        Assert.assertEquals(Type.VARIANT,
+                IcebergUtils.icebergTypeToDorisType(Types.VariantType.get(), false, false));
+    }
+
     @Test
     public void testGetPartitionInfoMapSkipBinaryIdentityPartition() {
         Schema schema = new Schema(
diff --git a/fe/fe-core/src/test/java/org/apache/doris/datasource/iceberg/source/IcebergScanNodeTest.java b/fe/fe-core/src/test/java/org/apache/doris/datasource/iceberg/source/IcebergScanNodeTest.java
index f55b4f6f8e8027..4f796b58bb5afb 100644
--- a/fe/fe-core/src/test/java/org/apache/doris/datasource/iceberg/source/IcebergScanNodeTest.java
+++ b/fe/fe-core/src/test/java/org/apache/doris/datasource/iceberg/source/IcebergScanNodeTest.java
@@ -17,13 +17,24 @@
 
 package org.apache.doris.datasource.iceberg.source;
 
+import org.apache.doris.analysis.SlotDescriptor;
+import org.apache.doris.analysis.SlotId;
 import org.apache.doris.analysis.TupleDescriptor;
 import org.apache.doris.analysis.TupleId;
+import org.apache.doris.catalog.ArrayType;
+import org.apache.doris.catalog.Column;
+import org.apache.doris.catalog.MapType;
+import org.apache.doris.catalog.StructField;
+import org.apache.doris.catalog.StructType;
+import org.apache.doris.catalog.Type;
+import org.apache.doris.common.DdlException;
 import org.apache.doris.common.util.LocationPath;
 import org.apache.doris.datasource.TableFormatType;
+import org.apache.doris.nereids.exceptions.NotSupportedException;
 import org.apache.doris.planner.PlanNodeId;
 import org.apache.doris.planner.ScanContext;
 import org.apache.doris.qe.SessionVariable;
+import org.apache.doris.thrift.TFileFormatType;
 import org.apache.doris.thrift.TFileRangeDesc;
 import org.apache.doris.thrift.TIcebergDeleteFileDesc;
 
@@ -51,6 +62,10 @@ private static class TestIcebergScanNode extends IcebergScanNode {
         public boolean isBatchMode() {
             return false;
         }
+
+        TupleDescriptor tupleDescriptor() {
+            return desc;
+        }
     }
 
     @Test
@@ -143,4 +158,73 @@ public void testSetIcebergParamsPropagatesPositionDeleteFileFormat() throws Exce
                 .get(0);
         Assert.assertEquals(org.apache.doris.thrift.TFileFormatType.FORMAT_ORC, deleteFileDesc.getFileFormat());
     }
+
+    @Test
+    public void testGetFileFormatTypeRejectsVariantForOrc() throws Exception {
+        SessionVariable sv = new SessionVariable();
+        TestIcebergScanNode node = new TestIcebergScanNode(sv);
+        addSlot(node.tupleDescriptor(), new Column("v", Type.VARIANT, true));
+        IcebergSource source = Mockito.mock(IcebergSource.class);
+        Mockito.when(source.getFileFormat()).thenReturn("orc");
+        setSource(node, source);
+
+        DdlException exception = Assert.assertThrows(DdlException.class, () -> node.getFileFormatType());
+        Assert.assertTrue(exception.getMessage().contains(
+                "Reading Iceberg VARIANT columns is only supported for Parquet files"));
+        Assert.assertTrue(exception.getMessage().contains("v"));
+    }
+
+    @Test
+    public void testGetFileFormatTypeRejectsNestedVariantForOrc() throws Exception {
+        SessionVariable sv = new SessionVariable();
+        TestIcebergScanNode node = new TestIcebergScanNode(sv);
+        Type nestedVariantType = new StructType(new StructField("events",
+                ArrayType.create(new MapType(Type.STRING, Type.VARIANT), true)));
+        addSlot(node.tupleDescriptor(), new Column("payload", nestedVariantType, true));
+        IcebergSource source = Mockito.mock(IcebergSource.class);
+        Mockito.when(source.getFileFormat()).thenReturn("orc");
+        setSource(node, source);
+
+        DdlException exception = Assert.assertThrows(DdlException.class, () -> node.getFileFormatType());
+        Assert.assertTrue(exception.getMessage().contains(
+                "Reading Iceberg VARIANT columns is only supported for Parquet files"));
+        Assert.assertTrue(exception.getMessage().contains("payload"));
+    }
+
+    @Test
+    public void testGetFileFormatTypeAllowsVariantForParquet() throws Exception {
+        SessionVariable sv = new SessionVariable();
+        TestIcebergScanNode node = new TestIcebergScanNode(sv);
+        addSlot(node.tupleDescriptor(), new Column("v", Type.VARIANT, true));
+        IcebergSource source = Mockito.mock(IcebergSource.class);
+        Mockito.when(source.getFileFormat()).thenReturn("parquet");
+        setSource(node, source);
+
+        Assert.assertEquals(TFileFormatType.FORMAT_PARQUET, node.getFileFormatType());
+    }
+
+    @Test
+    public void testValidateVariantDataFileFormatRejectsOrcSplit() {
+        SessionVariable sv = new SessionVariable();
+        TestIcebergScanNode node = new TestIcebergScanNode(sv);
+        addSlot(node.tupleDescriptor(), new Column("v", Type.VARIANT, true));
+
+        NotSupportedException exception = Assert.assertThrows(NotSupportedException.class,
+                () -> node.validateVariantDataFileFormat(org.apache.iceberg.FileFormat.ORC, "file:///tmp/v.orc"));
+        Assert.assertTrue(exception.getMessage().contains(
+                "Reading Iceberg VARIANT columns is only supported for Parquet files"));
+        Assert.assertTrue(exception.getMessage().contains("file:///tmp/v.orc"));
+    }
+
+    private static void addSlot(TupleDescriptor desc, Column column) {
+        SlotDescriptor slot = new SlotDescriptor(new SlotId(desc.getSlots().size()), desc.getId());
+        slot.setColumn(column);
+        desc.addSlot(slot);
+    }
+
+    private static void setSource(IcebergScanNode node, IcebergSource source) throws Exception {
+        Field sourceField = IcebergScanNode.class.getDeclaredField("source");
+        sourceField.setAccessible(true);
+        sourceField.set(node, source);
+    }
 }
diff --git a/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PruneNestedColumnTest.java b/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PruneNestedColumnTest.java
index 8d8d3441fa6008..97020fbe634b82 100644
--- a/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PruneNestedColumnTest.java
+++ b/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/PruneNestedColumnTest.java
@@ -29,13 +29,18 @@
 import org.apache.doris.nereids.rules.rewrite.NestedColumnPruning.DataTypeAccessTree;
 import org.apache.doris.nereids.trees.expressions.Alias;
 import org.apache.doris.nereids.trees.expressions.ArrayItemReference;
+import org.apache.doris.nereids.trees.expressions.EqualTo;
 import org.apache.doris.nereids.trees.expressions.Expression;
 import org.apache.doris.nereids.trees.expressions.NamedExpression;
 import org.apache.doris.nereids.trees.expressions.Slot;
 import org.apache.doris.nereids.trees.expressions.SlotReference;
 import org.apache.doris.nereids.trees.expressions.functions.scalar.Coalesce;
+import org.apache.doris.nereids.trees.expressions.functions.scalar.CreateNamedStruct;
+import org.apache.doris.nereids.trees.expressions.functions.scalar.CreateStruct;
+import org.apache.doris.nereids.trees.expressions.functions.scalar.ElementAt;
 import org.apache.doris.nereids.trees.expressions.functions.scalar.StructElement;
 import org.apache.doris.nereids.trees.expressions.literal.NullLiteral;
+import org.apache.doris.nereids.trees.expressions.literal.VarcharLiteral;
 import org.apache.doris.nereids.trees.plans.Plan;
 import org.apache.doris.nereids.trees.plans.logical.LogicalOlapScan;
 import org.apache.doris.nereids.trees.plans.physical.PhysicalCTEConsumer;
@@ -45,6 +50,10 @@
 import org.apache.doris.nereids.types.DataType;
 import org.apache.doris.nereids.types.NestedColumnPrunable;
 import org.apache.doris.nereids.types.NullType;
+import org.apache.doris.nereids.types.StringType;
+import org.apache.doris.nereids.types.StructField;
+import org.apache.doris.nereids.types.StructType;
+import org.apache.doris.nereids.types.VariantType;
 import org.apache.doris.nereids.util.MemoPatternMatchSupported;
 import org.apache.doris.nereids.util.PlanChecker;
 import org.apache.doris.planner.OlapScanNode;
@@ -52,6 +61,7 @@
 import org.apache.doris.utframe.TestWithFeService;
 
 import com.google.common.collect.ImmutableList;
+import com.google.common.collect.LinkedHashMultimap;
 import org.junit.jupiter.api.Assertions;
 import org.junit.jupiter.api.BeforeAll;
 import org.junit.jupiter.api.Test;
@@ -101,6 +111,20 @@ public void createTable() throws Exception {
                 + "  v variant\n"
                 + ") properties ('replication_num'='1')");
 
+        createTable("create table variant_container_tbl(\n"
+                + "  id int,\n"
+                + "  arr array<int>,\n"
+                + "  m map<string, int>,\n"
+                + "  v variant\n"
+                + ") properties ('replication_num'='1')");
+
+        createTable("create table variant_expr_tbl(\n"
+                + "  id int,\n"
+                + "  flag boolean,\n"
+                + "  v1 variant,\n"
+                + "  v2 variant\n"
+                + ") properties ('replication_num'='1')");
+
         // Table for string-length offset-only optimization tests
         createTable("create table str_tbl(\n"
                 + "  id int,\n"
@@ -183,6 +207,48 @@ public void testVariantAccessPath() throws Exception {
         );
     }
 
+    @Test
+    public void testNonVariantInsideNamedStructConstructorCollectsSubPath() throws Exception {
+        StructType structType = new StructType(ImmutableList.of(
+                new StructField("city", StringType.INSTANCE, true, "")));
+        SlotReference slot = new SlotReference("s", structType, true);
+        Expression expression = new StructElement(
+                new StructElement(
+                        new CreateNamedStruct(new VarcharLiteral("a"), slot),
+                        new VarcharLiteral("a")),
+                new VarcharLiteral("city"));
+
+        LinkedHashMultimap<Integer, CollectAccessPathResult> slotToAccessPaths =
+                LinkedHashMultimap.create();
+        AccessPathExpressionCollector collector =
+                new AccessPathExpressionCollector(null, slotToAccessPaths, false);
+        collector.collect(expression);
+
+        Assertions.assertEquals(ImmutableList.of(new CollectAccessPathResult(
+                        ImmutableList.of("s", "city"), false, ColumnAccessPathType.DATA)),
+                ImmutableList.copyOf(slotToAccessPaths.get(slot.getExprId().asInt())));
+    }
+
+    @Test
+    public void testNonVariantInsideStructConstructorCollectsSubPath() throws Exception {
+        StructType structType = new StructType(ImmutableList.of(
+                new StructField("city", StringType.INSTANCE, true, "")));
+        SlotReference slot = new SlotReference("s", structType, true);
+        Expression expression = new StructElement(
+                new StructElement(new CreateStruct(slot), new VarcharLiteral("col1")),
+                new VarcharLiteral("city"));
+
+        LinkedHashMultimap<Integer, CollectAccessPathResult> slotToAccessPaths =
+                LinkedHashMultimap.create();
+        AccessPathExpressionCollector collector =
+                new AccessPathExpressionCollector(null, slotToAccessPaths, false);
+        collector.collect(expression);
+
+        Assertions.assertEquals(ImmutableList.of(new CollectAccessPathResult(
+                        ImmutableList.of("s", "city"), false, ColumnAccessPathType.DATA)),
+                ImmutableList.copyOf(slotToAccessPaths.get(slot.getExprId().asInt())));
+    }
+
     @Test
     public void testVariantMultiProjectionAccessPaths() throws Exception {
         assertVariantSubColumnSlots("select v['a'], v['b']['c'] from variant_tbl",
@@ -201,6 +267,177 @@ public void testVariantPredicateAccessPath() throws Exception {
         );
     }
 
+    @Test
+    public void testVariantTopLevelNullPredicateUsesRootAccessPath() throws Exception {
+        assertColumn("select 1 from variant_tbl where v is null",
+                "variant",
+                ImmutableList.of(path("v")),
+                ImmutableList.of(path("v"))
+        );
+        assertColumn("select 1 from variant_tbl where v is not null",
+                "variant",
+                ImmutableList.of(path("v")),
+                ImmutableList.of(path("v"))
+        );
+    }
+
+    @Test
+    public void testVariantWholeColumnWithPredicateAccessPath() throws Exception {
+        assertVariantWholeColumnAndPredicateAccessPaths(
+                "select v from variant_tbl where v['k'] is not null");
+    }
+
+    @Test
+    public void testVariantWholeColumnWithSiblingSubPathAccessPath() throws Exception {
+        assertAllAccessPathsContain(
+                "select v from (select v, v['k'] as k from variant_tbl) t where k is not null",
+                ImmutableList.of(path("v")),
+                ImmutableList.of());
+    }
+
+    @Test
+    public void testVariantAliasWholeOutputWithOrderSubPathAccessPath() throws Exception {
+        assertAllAccessPathsContain(
+                "select v as a from variant_tbl order by cast(a['k'] as string)",
+                ImmutableList.of(path("v"), path("v", "k")),
+                ImmutableList.of());
+    }
+
+    @Test
+    public void testVariantAliasWholeOutputWithPredicateSubPathAccessPath() throws Exception {
+        assertAllAccessPathsContain(
+                "select a from (select v as a from variant_tbl) t where a['k'] is not null",
+                ImmutableList.of(path("v"), path("v", "k")),
+                ImmutableList.of());
+    }
+
+    @Test
+    public void testVariantWholeExpressionWithPredicateAccessPath() throws Exception {
+        assertVariantWholeColumnAndPredicateAccessPaths(
+                "select cast(v as string) from variant_tbl where v['k'] is not null");
+    }
+
+    @Test
+    public void testVariantTypeWholeExpressionWithPredicateAccessPath() throws Exception {
+        assertVariantWholeColumnAndPredicateAccessPaths(
+                "select variant_type(v) from variant_tbl where v['k'] is not null");
+    }
+
+    @Test
+    public void testVariantComparisonPredicateCollectsWholeVariantOperand() {
+        SlotReference slot = new SlotReference("v", VariantType.INSTANCE, true);
+        Expression expression = new EqualTo(slot, new NullLiteral(VariantType.INSTANCE));
+
+        LinkedHashMultimap<Integer, CollectAccessPathResult> slotToAccessPaths =
+                LinkedHashMultimap.create();
+        AccessPathExpressionCollector collector =
+                new AccessPathExpressionCollector(null, slotToAccessPaths, true);
+        collector.collect(expression);
+
+        Assertions.assertEquals(ImmutableList.of(new CollectAccessPathResult(
+                        ImmutableList.of("v"), true, ColumnAccessPathType.DATA)),
+                ImmutableList.copyOf(slotToAccessPaths.get(slot.getExprId().asInt())));
+    }
+
+    @Test
+    public void testVariantLiteralPathComparisonKeepsSubPathOperand() {
+        SlotReference slot = new SlotReference("v", VariantType.INSTANCE, true);
+        Expression expression = new EqualTo(
+                new ElementAt(slot, new VarcharLiteral("k")),
+                new NullLiteral(VariantType.INSTANCE));
+
+        LinkedHashMultimap<Integer, CollectAccessPathResult> slotToAccessPaths =
+                LinkedHashMultimap.create();
+        AccessPathExpressionCollector collector =
+                new AccessPathExpressionCollector(null, slotToAccessPaths, true);
+        collector.collect(expression);
+
+        Assertions.assertEquals(ImmutableList.of(new CollectAccessPathResult(
+                        ImmutableList.of("v", "k"), true, ColumnAccessPathType.DATA)),
+                ImmutableList.copyOf(slotToAccessPaths.get(slot.getExprId().asInt())));
+    }
+
+    @Test
+    public void testVariantWholeOrderExpressionWithPredicateAccessPath() throws Exception {
+        assertVariantWholeColumnAndPredicateAccessPaths(
+                "select id from variant_tbl where v['k'] is not null order by cast(v as string)");
+    }
+
+    @Test
+    public void testVariantWholeGroupExpressionWithPredicateAccessPath() throws Exception {
+        assertVariantWholeColumnAndPredicateAccessPaths(
+                "select cast(v as string), count(*) from variant_tbl where v['k'] is not null "
+                        + "group by cast(v as string)");
+    }
+
+    @Test
+    public void testVariantDynamicElementAtWithPredicateAccessPath() throws Exception {
+        assertVariantWholeColumnAndPredicateAccessPaths(
+                "select v[cast(id as string)] from variant_tbl where v['k'] is not null");
+    }
+
+    @Test
+    public void testVariantChainedDynamicElementAtWithPredicateAccessPath() throws Exception {
+        assertVariantWholeColumnAndPredicateAccessPaths(
+                "select v[cast(id as string)]['x'] from variant_tbl where v['k'] is not null");
+    }
+
+    @Test
+    public void testVariantWholeNonSlotExpressionWithPredicateAccessPath() throws Exception {
+        assertVariantWholeColumnAndPredicateAccessPaths(
+                "select cast(if(id = 1, v, v) as string) from variant_tbl where v['k'] is not null");
+    }
+
+    @Test
+    public void testVariantWholeExpressionOutputWithSiblingPredicateAccessPath() throws Exception {
+        assertAllAccessPathsContain(
+                "select if(flag, v1, v2) as a from variant_expr_tbl where v1['p'] is not null",
+                ImmutableList.of(path("v1"), path("v2"), path("v1", "p")),
+                ImmutableList.of());
+    }
+
+    @Test
+    public void testVariantDynamicElementAtNonSlotExpressionWithPredicateAccessPath() throws Exception {
+        assertVariantWholeColumnAndPredicateAccessPaths(
+                "select element_at(if(id = 1, v, v), cast(id as string)) "
+                        + "from variant_tbl where v['k'] is not null");
+    }
+
+    @Test
+    public void testVariantLiteralElementAtNonSlotExpressionKeepsSubPath() throws Exception {
+        assertVariantSubColumnSlots(
+                "select element_at(if(id = 1, v, v), 'a') from variant_tbl where v['k'] is not null",
+                ImmutableList.of(
+                        ImmutableList.of("a"),
+                        ImmutableList.of("k")
+                ));
+    }
+
+    private void assertVariantWholeColumnAndPredicateAccessPaths(String sql) throws Exception {
+        Pair<PhysicalPlan, List<SlotDescriptor>> result = collectComplexSlots(sql);
+        List<SlotDescriptor> slotDescriptors = result.second;
+        Assertions.assertEquals(2, slotDescriptors.size());
+
+        boolean hasWholeColumnSlot = false;
+        boolean hasPredicateSlot = false;
+        for (SlotDescriptor slotDescriptor : slotDescriptors) {
+            TreeSet<ColumnAccessPath> allAccessPaths =
+                    new TreeSet<>(slotDescriptor.getAllAccessPaths());
+            TreeSet<ColumnAccessPath> predicateAccessPaths =
+                    new TreeSet<>(slotDescriptor.getPredicateAccessPaths());
+            if (allAccessPaths.equals(new TreeSet<>(ImmutableList.of(path("v"))))
+                    && predicateAccessPaths.isEmpty()) {
+                hasWholeColumnSlot = true;
+            }
+            if (allAccessPaths.equals(new TreeSet<>(ImmutableList.of(path("v", "k"))))
+                    && predicateAccessPaths.equals(new TreeSet<>(ImmutableList.of(path("v", "k"))))) {
+                hasPredicateSlot = true;
+            }
+        }
+        Assertions.assertTrue(hasWholeColumnSlot);
+        Assertions.assertTrue(hasPredicateSlot);
+    }
+
     @Test
     public void testVariantProjectAndPredicateAccessPaths() throws Exception {
         assertVariantSubColumnSlots("select v['a'] from variant_tbl where v['b']['c'] = 1",
@@ -210,6 +447,75 @@ public void testVariantProjectAndPredicateAccessPaths() throws Exception {
                 ));
     }
 
+    @Test
+    public void testVariantCastProjectionKeepsSubPathWithSiblingPredicate() throws Exception {
+        assertAllAccessPathsContain(
+                "select cast(v as variant)['a'] from variant_tbl where v['k'] is not null",
+                ImmutableList.of(
+                        path("v", "a"),
+                        path("v", "k")
+                ),
+                ImmutableList.of(path("v")));
+        assertAllAccessPathsContain(
+                "select struct_element(cast(v['obj'] as struct<a:int,b:int>), 'a') "
+                        + "from variant_tbl where v['k'] is not null",
+                ImmutableList.of(
+                        path("v", "obj", "a"),
+                        path("v", "k")
+                ),
+                ImmutableList.of(path("v")));
+    }
+
+    @Test
+    public void testVariantIndexExpressionDoesNotInheritContainerPath() throws Exception {
+        assertAllAccessPathsContain(
+                "select arr[cast(v['idx'] as int)] from variant_container_tbl where v['k'] is not null",
+                ImmutableList.of(
+                        path("v", "idx"),
+                        path("v", "k")
+                ),
+                ImmutableList.of(path("v", "idx", AccessPathInfo.ACCESS_ALL)));
+        assertAllAccessPathsContain(
+                "select m[cast(v['map_key'] as string)] from variant_container_tbl where v['k'] is not null",
+                ImmutableList.of(
+                        path("v", "map_key"),
+                        path("v", "k")
+                ),
+                ImmutableList.of(path("v", "map_key", AccessPathInfo.ACCESS_ALL)));
+    }
+
+    @Test
+    public void testMapFunctionsCollectVariantArguments() throws Exception {
+        assertAllAccessPathsContain(
+                "select map_contains_key(m, cast(v['key'] as string)) "
+                        + "from variant_container_tbl where v['p'] is not null",
+                ImmutableList.of(
+                        path("m", "KEYS"),
+                        path("v", "key"),
+                        path("v", "p")
+                ),
+                ImmutableList.of());
+        assertAllAccessPathsContain(
+                "select map_contains_value(m, cast(v['value'] as int)) "
+                        + "from variant_container_tbl where v['p'] is not null",
+                ImmutableList.of(
+                        path("m", "VALUES"),
+                        path("v", "value"),
+                        path("v", "p")
+                ),
+                ImmutableList.of());
+        assertAllAccessPathsContain(
+                "select map_contains_entry(m, cast(v['key'] as string), cast(v['value'] as int)) "
+                        + "from variant_container_tbl where v['p'] is not null",
+                ImmutableList.of(
+                        path("m", AccessPathInfo.ACCESS_ALL),
+                        path("v", "key"),
+                        path("v", "value"),
+                        path("v", "p")
+                ),
+                ImmutableList.of());
+    }
+
     @Test
     public void testVariantAliasAccessPathPropagation() throws Exception {
         assertColumn("select x['b'] from (select v['a'] as x from variant_tbl) t",
@@ -228,6 +534,16 @@ public void testVariantCteAccessPathPropagation() throws Exception {
         );
     }
 
+    @Test
+    public void testVariantUnusedCteDoesNotCollectWholeColumn() throws Exception {
+        assertColumn("with t as (select v from variant_tbl) select 1 from t", null, null, null);
+    }
+
+    @Test
+    public void testVariantUnusedSubqueryDoesNotCollectWholeColumn() throws Exception {
+        assertColumn("select 1 from (select v from variant_tbl) t", null, null, null);
+    }
+
     @Test
     public void testVariantJoinAccessPathPropagation() throws Exception {
         assertVariantSubColumnSlotCount(
@@ -247,6 +563,14 @@ public void testExplodeVariantAccessPath() throws Exception {
         );
     }
 
+    @Test
+    public void testExplodeVariantWholeOutputWithPredicateAccessPath() throws Exception {
+        assertAllAccessPathsContain(
+                "select x from variant_tbl lateral view explode(v) tmp as x where x['k'] is not null",
+                ImmutableList.of(path("v")),
+                ImmutableList.of());
+    }
+
     @Test
     public void testExplodeVariantProjectAndFilterAccessPath() throws Exception {
         assertColumn("select x['x'] from variant_tbl lateral view explode(v['arr']) tmp as x where x['y'] is not null",
@@ -956,6 +1280,7 @@ public void testDataTypeAccessTree() {
                 ),
                 "STRUCT<city:TEXT,data:ARRAY<MAP<INT,STRUCT<b:DOUBLE>>>>"
         );
+
     }
 
     @Test
diff --git a/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/SlotTypeReplacerTest.java b/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/SlotTypeReplacerTest.java
new file mode 100644
index 00000000000000..c4f6b20cdc5081
--- /dev/null
+++ b/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/SlotTypeReplacerTest.java
@@ -0,0 +1,210 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.nereids.rules.rewrite;
+
+import org.apache.doris.analysis.ColumnAccessPath;
+import org.apache.doris.catalog.Column;
+import org.apache.doris.catalog.MapType;
+import org.apache.doris.catalog.StructField;
+import org.apache.doris.catalog.StructType;
+import org.apache.doris.catalog.Type;
+import org.apache.doris.datasource.iceberg.IcebergExternalTable;
+import org.apache.doris.nereids.trees.expressions.Slot;
+import org.apache.doris.nereids.trees.expressions.SlotReference;
+import org.apache.doris.nereids.trees.plans.Plan;
+import org.apache.doris.nereids.trees.plans.RelationId;
+import org.apache.doris.nereids.trees.plans.algebra.SetOperation;
+import org.apache.doris.nereids.trees.plans.logical.LogicalExcept;
+import org.apache.doris.nereids.trees.plans.logical.LogicalFileScan;
+import org.apache.doris.nereids.trees.plans.logical.LogicalFileScan.SelectedPartitions;
+import org.apache.doris.nereids.trees.plans.logical.LogicalIntersect;
+import org.apache.doris.nereids.types.DataType;
+
+import com.google.common.collect.ImmutableList;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.Test;
+import org.mockito.Mockito;
+
+import java.util.Collections;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+class SlotTypeReplacerTest {
+
+    @Test
+    void testIcebergAccessPathReplacesMixedCaseStructChildByFieldId() {
+        Column column = mixedCaseStructColumn();
+        LogicalFileScan scan = newIcebergScan(column);
+        SlotReference slot = (SlotReference) scan.getOutput().get(0);
+        List<ColumnAccessPath> allPaths = ImmutableList.of(path("payload", "mixedfield"));
+        List<ColumnAccessPath> predicatePaths = ImmutableList.of(path("payload", "mixedfield"));
+
+        SlotReference replacedSlot = replaceSlot(scan, slot, column, allPaths, predicatePaths);
+
+        List<ColumnAccessPath> expectedPaths = ImmutableList.of(path("10", "11"));
+        Assertions.assertEquals(expectedPaths, replacedSlot.getAllAccessPaths().get());
+        Assertions.assertEquals(expectedPaths, replacedSlot.getPredicateAccessPaths().get());
+        Assertions.assertEquals(allPaths, replacedSlot.getDisplayAllAccessPaths().get());
+        Assertions.assertEquals(predicatePaths, replacedSlot.getDisplayPredicateAccessPaths().get());
+    }
+
+    @Test
+    void testIcebergVariantFullProjectionKeepsSubpathNamesAfterRootFieldId() {
+        Column column = new Column("v", Type.VARIANT, true);
+        column.setUniqueId(100);
+        LogicalFileScan scan = newIcebergScan(column);
+        SlotReference slot = (SlotReference) scan.getOutput().get(0);
+        List<ColumnAccessPath> allPaths = ImmutableList.of(path("v"));
+        List<ColumnAccessPath> predicatePaths = ImmutableList.of(path("v", "Metric", "x"));
+
+        SlotReference replacedSlot = replaceSlot(scan, slot, column, allPaths, predicatePaths);
+
+        Assertions.assertEquals(ImmutableList.of(path("100")), replacedSlot.getAllAccessPaths().get());
+        Assertions.assertEquals(ImmutableList.of(path("100", "Metric", "x")),
+                replacedSlot.getPredicateAccessPaths().get());
+        Assertions.assertEquals(allPaths, replacedSlot.getDisplayAllAccessPaths().get());
+        Assertions.assertEquals(predicatePaths, replacedSlot.getDisplayPredicateAccessPaths().get());
+    }
+
+    @Test
+    void testIcebergMapKeyAccessPathReplacesNestedKeyStructChildByFieldId() {
+        Column column = mapWithStructKeyColumn();
+        LogicalFileScan scan = newIcebergScan(column);
+        SlotReference slot = (SlotReference) scan.getOutput().get(0);
+        List<ColumnAccessPath> allPaths = ImmutableList.of(path("payload", AccessPathInfo.ACCESS_MAP_KEYS,
+                "keyfield"));
+        List<ColumnAccessPath> predicatePaths = ImmutableList.of(path("payload", AccessPathInfo.ACCESS_MAP_KEYS,
+                "keyfield"));
+
+        SlotReference replacedSlot = replaceSlot(scan, slot, column, allPaths, predicatePaths);
+
+        List<ColumnAccessPath> expectedPaths =
+                ImmutableList.of(path("20", AccessPathInfo.ACCESS_MAP_KEYS, "22"));
+        Assertions.assertEquals(expectedPaths, replacedSlot.getAllAccessPaths().get());
+        Assertions.assertEquals(expectedPaths, replacedSlot.getPredicateAccessPaths().get());
+        Assertions.assertEquals(allPaths, replacedSlot.getDisplayAllAccessPaths().get());
+        Assertions.assertEquals(predicatePaths, replacedSlot.getDisplayPredicateAccessPaths().get());
+    }
+
+    @Test
+    void testExceptUsesReplacedOutputsAndChildrenOutputs() {
+        assertSetOperationUsesReplacedOutputs(true);
+    }
+
+    @Test
+    void testIntersectUsesReplacedOutputsAndChildrenOutputs() {
+        assertSetOperationUsesReplacedOutputs(false);
+    }
+
+    private void assertSetOperationUsesReplacedOutputs(boolean isExcept) {
+        Column column = twoFieldStructColumn();
+        LogicalFileScan leftScan = newIcebergScan(column);
+        LogicalFileScan rightScan = newIcebergScan(column);
+        SlotReference leftSlot = (SlotReference) leftScan.getOutput().get(0);
+        SlotReference rightSlot = (SlotReference) rightScan.getOutput().get(0);
+        DataType prunedType = new org.apache.doris.nereids.types.StructType(ImmutableList.of(
+                new org.apache.doris.nereids.types.StructField(
+                        "keep", org.apache.doris.nereids.types.IntegerType.INSTANCE, true, "")));
+        List<ColumnAccessPath> allPaths = ImmutableList.of(path("payload", "keep"));
+        Map<Integer, AccessPathInfo> accessPaths = new LinkedHashMap<>();
+        accessPaths.put(leftSlot.getExprId().asInt(), new AccessPathInfo(prunedType, allPaths, allPaths));
+        accessPaths.put(rightSlot.getExprId().asInt(), new AccessPathInfo(prunedType, allPaths, allPaths));
+
+        Plan setOperation = isExcept
+                ? new LogicalExcept(SetOperation.Qualifier.DISTINCT, ImmutableList.of(leftSlot),
+                        ImmutableList.of(ImmutableList.of(leftSlot), ImmutableList.of(rightSlot)),
+                        ImmutableList.of(leftScan, rightScan))
+                : new LogicalIntersect(SetOperation.Qualifier.DISTINCT, ImmutableList.of(leftSlot),
+                        ImmutableList.of(ImmutableList.of(leftSlot), ImmutableList.of(rightSlot)),
+                        ImmutableList.of(leftScan, rightScan));
+
+        Plan replacedPlan = new SlotTypeReplacer(accessPaths, setOperation).replace();
+        List<SlotReference> firstChildOutputs = isExcept
+                ? ((LogicalExcept) replacedPlan).getRegularChildOutput(0)
+                : ((LogicalIntersect) replacedPlan).getRegularChildOutput(0);
+        List<SlotReference> secondChildOutputs = isExcept
+                ? ((LogicalExcept) replacedPlan).getRegularChildOutput(1)
+                : ((LogicalIntersect) replacedPlan).getRegularChildOutput(1);
+        List<? extends Slot> outputs = replacedPlan.getOutput();
+
+        Assertions.assertEquals("STRUCT<keep:INT>", outputs.get(0).getDataType().toSql());
+        Assertions.assertEquals("STRUCT<keep:INT>", firstChildOutputs.get(0).getDataType().toSql());
+        Assertions.assertEquals("STRUCT<keep:INT>", secondChildOutputs.get(0).getDataType().toSql());
+    }
+
+    private SlotReference replaceSlot(LogicalFileScan scan, SlotReference slot, Column column,
+            List<ColumnAccessPath> allPaths, List<ColumnAccessPath> predicatePaths) {
+        Map<Integer, AccessPathInfo> accessPaths = new LinkedHashMap<>();
+        accessPaths.put(slot.getExprId().asInt(),
+                new AccessPathInfo(DataType.fromCatalogType(column.getType()), allPaths, predicatePaths));
+
+        Plan replacedPlan = new SlotTypeReplacer(accessPaths, scan).replace();
+        Slot replacedSlot = ((LogicalFileScan) replacedPlan).getOutput().get(0);
+        return (SlotReference) replacedSlot;
+    }
+
+    private LogicalFileScan newIcebergScan(Column column) {
+        IcebergExternalTable table = Mockito.mock(IcebergExternalTable.class);
+        Mockito.when(table.initSelectedPartitions(Mockito.any())).thenReturn(SelectedPartitions.NOT_PRUNED);
+        Mockito.when(table.getFullSchema()).thenReturn(Collections.singletonList(column));
+        Mockito.when(table.getName()).thenReturn("iceberg_tbl");
+        return new LogicalFileScan(new RelationId(1), table,
+                ImmutableList.of("iceberg_catalog", "iceberg_db"), Collections.emptyList(),
+                Optional.empty(), Optional.empty(), Optional.empty(), Optional.empty());
+    }
+
+    private Column mixedCaseStructColumn() {
+        StructType type = new StructType(new StructField("MixedField", Type.INT));
+        Column column = new Column("payload", type, true);
+        column.setUniqueId(10);
+        column.getChildren().get(0).setName("MixedField");
+        column.getChildren().get(0).setUniqueId(11);
+        return column;
+    }
+
+    private Column twoFieldStructColumn() {
+        StructType type = new StructType(
+                new StructField("keep", Type.INT),
+                new StructField("drop", Type.INT));
+        Column column = new Column("payload", type, true);
+        column.setUniqueId(10);
+        column.getChildren().get(0).setName("keep");
+        column.getChildren().get(0).setUniqueId(11);
+        column.getChildren().get(1).setName("drop");
+        column.getChildren().get(1).setUniqueId(12);
+        return column;
+    }
+
+    private Column mapWithStructKeyColumn() {
+        StructType keyType = new StructType(new StructField("KeyField", Type.INT));
+        Column column = new Column("payload", new MapType(keyType, Type.INT), true);
+        column.setUniqueId(20);
+        Column keyColumn = column.getChildren().get(0);
+        keyColumn.setUniqueId(21);
+        keyColumn.getChildren().get(0).setName("KeyField");
+        keyColumn.getChildren().get(0).setUniqueId(22);
+        column.getChildren().get(1).setUniqueId(23);
+        return column;
+    }
+
+    private ColumnAccessPath path(String... parts) {
+        return ColumnAccessPath.data(ImmutableList.copyOf(parts));
+    }
+}
diff --git a/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/VariantPruningLogicTest.java b/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/VariantPruningLogicTest.java
index 7a7b4b9ae4aa4b..e2910bfad74a74 100644
--- a/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/VariantPruningLogicTest.java
+++ b/fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/VariantPruningLogicTest.java
@@ -120,6 +120,33 @@ public void testExplodeVariantArrayWithOuterFilterAccessPaths() throws Exception
         );
     }
 
+    @Test
+    public void testExplodeVariantArrayFunctionAccessPaths() throws Exception {
+        assertAllAccessPathsContain(
+                "select x['x'] from variant_tbl lateral view explode_variant_array(v['arr']) tmp as x "
+                        + "where v['filter']['k'] = 1 and x['y'] is not null",
+                ImmutableList.of(
+                        path("v", "arr", "x"),
+                        path("v", "arr", "y"),
+                        path("v", "filter", "k")
+                ),
+                ImmutableList.of()
+        );
+    }
+
+    @Test
+    public void testExplodeVariantArrayFunctionFullOutputAccessPath() throws Exception {
+        assertAllAccessPathsContain(
+                "select x from variant_tbl lateral view explode_variant_array(v['arr']) tmp as x "
+                        + "where x['k'] is not null",
+                ImmutableList.of(
+                        path("v", "arr"),
+                        path("v", "arr", "k")
+                ),
+                ImmutableList.of()
+        );
+    }
+
     @Test
     public void testExplodeVariantDeepNestedAccessPaths() throws Exception {
         assertAllAccessPathsContain(
@@ -131,17 +158,25 @@ public void testExplodeVariantDeepNestedAccessPaths() throws Exception {
 
     @Test
     public void testExplodeSubqueryJoinAggAccessPaths() throws Exception {
+        String sql = "select cast(t2.v['k'] as string) as k, count(*) from (select id, v from variant_tbl) t1 "
+                + "lateral view explode(t1.v['arr']) tmp as x "
+                + "join variant_tbl t2 on t1.id=t2.id "
+                + "where x['a']['b'] = 1 and t2.v['k'] is not null "
+                + "group by cast(t2.v['k'] as string)";
+        assertVariantSubColumnSlots(
+                sql,
+                ImmutableList.of(
+                        ImmutableList.of("arr"),
+                        ImmutableList.of("k")
+                )
+        );
         assertAllAccessPathsContain(
-                "select cast(t2.v['k'] as string) as k, count(*) from (select id, v from variant_tbl) t1 "
-                        + "lateral view explode(t1.v['arr']) tmp as x "
-                        + "join variant_tbl t2 on t1.id=t2.id "
-                        + "where x['a']['b'] = 1 and t2.v['k'] is not null "
-                        + "group by cast(t2.v['k'] as string)",
+                sql,
                 ImmutableList.of(
                         path("v", "arr", "a", "b"),
                         path("v", "k")
                 ),
-                ImmutableList.of()
+                ImmutableList.of(path("v"))
         );
     }
 
@@ -226,10 +261,12 @@ private void assertAllAccessPathsContain(
             allAccessPaths.addAll(slotDescriptor.getAllAccessPaths());
         }
         for (ColumnAccessPath accessPath : expectedContain) {
-            Assertions.assertTrue(allAccessPaths.contains(accessPath));
+            Assertions.assertTrue(allAccessPaths.contains(accessPath),
+                    "expected access path " + accessPath + " in " + allAccessPaths);
         }
         for (ColumnAccessPath accessPath : expectedNotContain) {
-            Assertions.assertFalse(allAccessPaths.contains(accessPath));
+            Assertions.assertFalse(allAccessPaths.contains(accessPath),
+                    "unexpected access path " + accessPath + " in " + allAccessPaths);
         }
     }
 
diff --git a/fe/fe-core/src/test/java/org/apache/doris/nereids/trees/plans/commands/IcebergMergeCommandTest.java b/fe/fe-core/src/test/java/org/apache/doris/nereids/trees/plans/commands/IcebergMergeCommandTest.java
index 531e8e30855a2e..1659a540b1bd6a 100644
--- a/fe/fe-core/src/test/java/org/apache/doris/nereids/trees/plans/commands/IcebergMergeCommandTest.java
+++ b/fe/fe-core/src/test/java/org/apache/doris/nereids/trees/plans/commands/IcebergMergeCommandTest.java
@@ -17,11 +17,28 @@
 
 package org.apache.doris.nereids.trees.plans.commands;
 
+import org.apache.doris.catalog.Column;
+import org.apache.doris.catalog.Type;
+import org.apache.doris.datasource.iceberg.IcebergUtils;
+import org.apache.doris.nereids.analyzer.UnboundSlot;
+import org.apache.doris.nereids.trees.expressions.Expression;
+import org.apache.doris.nereids.trees.expressions.StatementScopeIdGenerator;
+import org.apache.doris.nereids.trees.expressions.literal.BooleanLiteral;
+import org.apache.doris.nereids.trees.expressions.literal.NullLiteral;
+import org.apache.doris.nereids.trees.plans.commands.merge.MergeMatchedClause;
+import org.apache.doris.nereids.trees.plans.commands.merge.MergeNotMatchedClause;
+import org.apache.doris.nereids.trees.plans.logical.LogicalEmptyRelation;
+import org.apache.doris.nereids.types.DataType;
 import org.apache.doris.qe.ConnectContext;
 
+import com.google.common.collect.ImmutableList;
 import org.junit.jupiter.api.Assertions;
 import org.junit.jupiter.api.Test;
 
+import java.lang.reflect.Method;
+import java.util.List;
+import java.util.Optional;
+
 public class IcebergMergeCommandTest {
 
     @Test
@@ -52,4 +69,47 @@ public void testExecuteWithExternalTableBatchModeDisabledRestoresValueOnExceptio
         Assertions.assertEquals("expected", exception.getMessage());
         Assertions.assertFalse(ctx.getSessionVariable().enableExternalTableBatchMode);
     }
+
+    @Test
+    public void testWritesDataFilesForMergeClauses() {
+        Assertions.assertFalse(IcebergMergeCommand.writesDataFiles(
+                ImmutableList.of(new MergeMatchedClause(Optional.empty(), ImmutableList.of(), true)),
+                ImmutableList.of()));
+
+        Assertions.assertTrue(IcebergMergeCommand.writesDataFiles(
+                ImmutableList.of(new MergeMatchedClause(Optional.empty(), ImmutableList.of(), false)),
+                ImmutableList.of()));
+
+        Assertions.assertTrue(IcebergMergeCommand.writesDataFiles(
+                ImmutableList.of(new MergeMatchedClause(Optional.empty(), ImmutableList.of(), true)),
+                ImmutableList.of(new MergeNotMatchedClause(Optional.empty(), ImmutableList.of(),
+                        ImmutableList.of()))));
+    }
+
+    @Test
+    public void testDeleteProjectionDoesNotReadVisibleTargetColumns() throws Exception {
+        IcebergMergeCommand command = new IcebergMergeCommand(
+                ImmutableList.of("catalog", "db", "target"),
+                Optional.of("t"),
+                Optional.empty(),
+                new LogicalEmptyRelation(StatementScopeIdGenerator.newRelationId(), ImmutableList.of()),
+                BooleanLiteral.TRUE,
+                ImmutableList.of(new MergeMatchedClause(Optional.empty(), ImmutableList.of(), true)),
+                ImmutableList.of());
+        Method method = IcebergMergeCommand.class.getDeclaredMethod(
+                "buildDeleteProjection", Expression.class, List.class);
+        method.setAccessible(true);
+
+        Column id = new Column("id", Type.INT, true);
+        Column variant = new Column("v", Type.VARIANT, true);
+        Column rowId = new Column(IcebergUtils.ICEBERG_ROW_ID_COL, Type.BIGINT, true);
+        List<Expression> projection = (List<Expression>) method.invoke(
+                command, new NullLiteral(DataType.fromCatalogType(Type.BIGINT)), ImmutableList.of(id, variant, rowId));
+
+        Assertions.assertTrue(projection.get(2) instanceof NullLiteral);
+        Assertions.assertTrue(projection.get(3) instanceof NullLiteral);
+        Assertions.assertTrue(projection.get(4) instanceof UnboundSlot);
+        Assertions.assertEquals(ImmutableList.of("t", IcebergUtils.ICEBERG_ROW_ID_COL),
+                ((UnboundSlot) projection.get(4)).getNameParts());
+    }
 }
diff --git a/fe/fe-core/src/test/java/org/apache/doris/planner/IcebergMergeSinkTest.java b/fe/fe-core/src/test/java/org/apache/doris/planner/IcebergMergeSinkTest.java
index 23dcb4403ce731..71f96c3bd54596 100644
--- a/fe/fe-core/src/test/java/org/apache/doris/planner/IcebergMergeSinkTest.java
+++ b/fe/fe-core/src/test/java/org/apache/doris/planner/IcebergMergeSinkTest.java
@@ -18,6 +18,7 @@
 package org.apache.doris.planner;
 
 import org.apache.doris.catalog.DatabaseIf;
+import org.apache.doris.common.AnalysisException;
 import org.apache.doris.datasource.CatalogProperty;
 import org.apache.doris.datasource.iceberg.IcebergExternalCatalog;
 import org.apache.doris.datasource.iceberg.IcebergExternalTable;
@@ -74,6 +75,34 @@ public void testBindDataSinkSkipsRewritableDeleteFileSetsAndRowLineageSchemaForV
                 IcebergUtils.ICEBERG_LAST_UPDATED_SEQUENCE_NUMBER_COL));
     }
 
+    @Test
+    public void testBindDataSinkRejectsVariantSchema() {
+        Schema schema = new Schema(
+                Types.NestedField.required(1, "id", Types.IntegerType.get()),
+                Types.NestedField.optional(2, "payload", Types.VariantType.get()));
+        IcebergMergeSink sink = new IcebergMergeSink(
+                mockIcebergExternalTable(2, schema), new DeleteCommandContext());
+
+        AnalysisException exception = Assertions.assertThrows(
+                AnalysisException.class, () -> sink.bindDataSink(Optional.empty()));
+        Assertions.assertTrue(exception.getMessage().contains(
+                "Writing Iceberg VARIANT columns is not supported: payload"));
+    }
+
+    @Test
+    public void testBindDataSinkAllowsVariantSchemaForDeleteOnlyMerge() throws Exception {
+        Schema schema = new Schema(
+                Types.NestedField.required(1, "id", Types.IntegerType.get()),
+                Types.NestedField.optional(2, "payload", Types.VariantType.get()));
+        IcebergMergeSink sink = new IcebergMergeSink(
+                mockIcebergExternalTable(2, schema), new DeleteCommandContext(), false);
+
+        sink.bindDataSink(Optional.empty());
+
+        TIcebergMergeSink thriftSink = sink.tDataSink.getIcebergMergeSink();
+        Assertions.assertTrue(thriftSink.getSchemaJson().contains("\"payload\""));
+    }
+
     private static TIcebergRewritableDeleteFileSet buildDeleteFileSet() {
         TIcebergDeleteFileDesc deleteFileDesc = new TIcebergDeleteFileDesc();
         deleteFileDesc.setPath("file:///tmp/delete.puffin");
@@ -85,6 +114,10 @@ private static TIcebergRewritableDeleteFileSet buildDeleteFileSet() {
 
     private static IcebergExternalTable mockIcebergExternalTable(int formatVersion) {
         Schema schema = new Schema(Types.NestedField.required(1, "id", Types.IntegerType.get()));
+        return mockIcebergExternalTable(formatVersion, schema);
+    }
+
+    private static IcebergExternalTable mockIcebergExternalTable(int formatVersion, Schema schema) {
         PartitionSpec spec = PartitionSpec.unpartitioned();
         Map<String, String> properties = new HashMap<>();
         properties.put(TableProperties.FORMAT_VERSION, String.valueOf(formatVersion));
diff --git a/fe/fe-core/src/test/java/org/apache/doris/planner/IcebergTableSinkTest.java b/fe/fe-core/src/test/java/org/apache/doris/planner/IcebergTableSinkTest.java
new file mode 100644
index 00000000000000..7ac49bea63d6d2
--- /dev/null
+++ b/fe/fe-core/src/test/java/org/apache/doris/planner/IcebergTableSinkTest.java
@@ -0,0 +1,89 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.doris.planner;
+
+import org.apache.doris.catalog.DatabaseIf;
+import org.apache.doris.common.AnalysisException;
+import org.apache.doris.datasource.CatalogProperty;
+import org.apache.doris.datasource.iceberg.IcebergExternalCatalog;
+import org.apache.doris.datasource.iceberg.IcebergExternalTable;
+
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.SortOrder;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.TableProperties;
+import org.apache.iceberg.types.Types;
+import org.junit.jupiter.api.Assertions;
+import org.junit.jupiter.api.Test;
+import org.mockito.Mockito;
+
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Optional;
+
+public class IcebergTableSinkTest {
+    @Test
+    public void testBindDataSinkRejectsVariantSchema() {
+        Schema schema = new Schema(
+                Types.NestedField.required(1, "id", Types.IntegerType.get()),
+                Types.NestedField.optional(2, "payload", Types.VariantType.get()));
+        IcebergTableSink sink = new IcebergTableSink(mockIcebergExternalTable(schema));
+
+        AnalysisException exception = Assertions.assertThrows(
+                AnalysisException.class, () -> sink.bindDataSink(Optional.empty()));
+        Assertions.assertTrue(exception.getMessage().contains(
+                "Writing Iceberg VARIANT columns is not supported: payload"));
+    }
+
+    private static IcebergExternalTable mockIcebergExternalTable(Schema schema) {
+        PartitionSpec spec = PartitionSpec.unpartitioned();
+        Map<String, String> properties = new HashMap<>();
+        properties.put(TableProperties.FORMAT_VERSION, "2");
+        properties.put(TableProperties.DEFAULT_FILE_FORMAT, "parquet");
+        properties.put(TableProperties.PARQUET_COMPRESSION, "snappy");
+        properties.put(TableProperties.WRITE_DATA_LOCATION, "file:///tmp/iceberg_tbl/data");
+
+        Table icebergTable = Mockito.mock(Table.class);
+        Mockito.when(icebergTable.properties()).thenReturn(properties);
+        Mockito.when(icebergTable.spec()).thenReturn(spec);
+        Mockito.when(icebergTable.specs()).thenReturn(Collections.singletonMap(spec.specId(), spec));
+        Mockito.when(icebergTable.location()).thenReturn("file:///tmp/iceberg_tbl");
+        Mockito.when(icebergTable.schema()).thenReturn(schema);
+        Mockito.when(icebergTable.sortOrder()).thenReturn(SortOrder.unsorted());
+        Mockito.when(icebergTable.name()).thenReturn("db.tbl");
+
+        CatalogProperty catalogProperty = Mockito.mock(CatalogProperty.class);
+        Mockito.when(catalogProperty.getMetastoreProperties()).thenReturn(null);
+        Mockito.when(catalogProperty.getStoragePropertiesMap()).thenReturn(Collections.emptyMap());
+
+        IcebergExternalCatalog catalog = Mockito.mock(IcebergExternalCatalog.class);
+        Mockito.when(catalog.getCatalogProperty()).thenReturn(catalogProperty);
+
+        DatabaseIf database = Mockito.mock(DatabaseIf.class);
+        IcebergExternalTable table = Mockito.mock(IcebergExternalTable.class);
+        Mockito.when(table.isView()).thenReturn(false);
+        Mockito.when(table.getCatalog()).thenReturn(catalog);
+        Mockito.when(table.getDatabase()).thenReturn(database);
+        Mockito.when(table.getDbName()).thenReturn("db");
+        Mockito.when(table.getName()).thenReturn("tbl");
+        Mockito.when(table.getIcebergTable()).thenReturn(icebergTable);
+        return table;
+    }
+}
diff --git a/regression-test/data/external_table_p0/tvf/iceberg_variant_binary_typed.parquet b/regression-test/data/external_table_p0/tvf/iceberg_variant_binary_typed.parquet
new file mode 100644
index 0000000000000000000000000000000000000000..fae29b093cd58eefb1aa91a64b7c39a588c438ac
GIT binary patch
literal 743
zcmYjP&1%~~5T0ELSE$BGsCSWy4!YDRhZ-yXkqe@bODV;NP;xAomAxqfTTK;NkXy-9
z<k$!3v2Rewr8yMZ2Pme8p85*yjMPNdO4^y(@B3zFc2AC9hoU9AIIg$1+ty5&0`Rl(
z4+j7rn8`T@d=M4m`1R-OgF8@%zz8x7g26t)$lBi{i!m@Sv|b`{isR?^+jT>1)x^CD
zPw1N2Vr2b=<1c}Gz+J~AA1B&@;I2f6!q%Pkj=8hUk?k0$QpFBmAlU@hla`pF0$Hab
zCoSodbHhYdc{b%(z4X$BPIUfuUaCA*Sus!2Oy!sJ;_@7;F%k3x9G)&=%pFz=^G(}-
zMEewNj9Y76sHG|ttv;+MsI%H=!3ltj1f=PDZrlS8(+S%pErCHfle3-`*VWdgO9-mk
zYY#Vz&9qS|29Q4t(UqUM=mROWUD0s^R^I2Y6myyPPUEIujfKkhU2LLth7M)G-hljV
zh{tlv#q%b|l$E<>v7}0r$*gz7;t2%QcLz^kSIYi>1|Y9Y;mMDN7=ir6rP;G^>H-9>
z&Q<<?ql>5O+%I*0k-p3PupdVKfgdEBbhY%))MBuRMw9XINQc8=qLXJwQx%;}RHVmI
ZG7bmHD2h~g=*AvE{lYgqmlM3={{W?AeFp#l

literal 0
HcmV?d00001

diff --git a/regression-test/data/external_table_p0/tvf/iceberg_variant_binary_unshredded.parquet b/regression-test/data/external_table_p0/tvf/iceberg_variant_binary_unshredded.parquet
new file mode 100644
index 0000000000000000000000000000000000000000..3e33f256e13baf802f012f620344da6672eb2036
GIT binary patch
literal 764
zcmY*Xzi-n(6n=NU1Y?Cr6?rGSiclDc+<_=g8XF`p-4M!9p$-UDiq7#BEVVAUNhG%Z
z0Sp}(`4?bhECWBp1PgP-Kfwg=?4&e)w%@&X-}~O(celUyB)~eh$lfl$FY99z2=qDh
zhb@3}COGFn51~S`ughP5YHL)}pb2GJ1_zG8nUQnTj2Hu!P}x(mpP%3V=Kss;Rhd(w
z!Cu#Gmvx*0*s(>)zp!^BHr+q@-WK;<K{G(kq>y`(0+X(lo!0)^M8i;BF}Sxjm`mJF
z2q3^~OB=^TXGRB@8)Dfq!6m~BR)z!9@$k{zY^oDoJfAF7k*Rz(NwQoOuP5{T>_wr|
zRHrBic@JEkr7+?yTc9{>bZ(oaV}tVciJqxc&6EkbTU)5(xdth`wikDyO<mCOOf^zc
zj^(&($JLgr{Y&sm0jh`=NfJBAAqJ3d1h(W4jy*6;4G){1&lXo5OU7L0-Gg`~+R71U
zCE_ZsSeu5GbO8BI;I=%saInE!ExjZqkuF&GlEoqTn=-o_hj5*EP>B!nqrf}zgN4sP
ze&bSfZF1DrfBaGvujYDo|D@<Fba9rQ=AEDyM7@5;Pv+Ti+Bs6w{xsSRN5g>*hQmZB
j4+jrabQG#ckD_E0^po8vQo)WFw*jgbzR*$Z)4l%($mDZA

literal 0
HcmV?d00001

diff --git a/regression-test/data/external_table_p0/tvf/iceberg_variant_shredded.parquet b/regression-test/data/external_table_p0/tvf/iceberg_variant_shredded.parquet
new file mode 100644
index 0000000000000000000000000000000000000000..d1a75e2526bf8df21b5b9196ba2ba76b7a7119af
GIT binary patch
literal 1865
zcmb7FL1+^}6#cu|Y`19<tDSKdL(QdOD{W}nZew8)^dO~pD1ya<v~04|8c5Q*O`}2&
zwkL%mwFi+NO6WmEJa`l=9u&oc;;l&OK?FT`5EVh_MgQz>(rr>TW-`qDnLqFU_vioF
z(NpJ=1c;8UeDB5WaFj$)REPo~0Dy=+gk8cO#!e)10o%>>4|ipghy>SlTo#%@lv@uS
zIogr63s#=xS<%U63T(E>?Agq;!zOdql;t=GeGuEnpWmJ<+>%6OZppQB3_`el0Nc)u
zx9>xqBkneLAaq{vcRxQ_e<s%MJ`izt9NU}gj~1G1cYlR!gy6qd-T>X49R8O5@rOVD
z(}qJFvT)~RxuxEQjCh=UPqI4etIqnXvw`Z&(Aq>zmAGjf-0+GTMS(BtjmGppREZBS
zXewF+E=IZrA;L}WSvv#XBKo0JLWk38lsHZoDxub&Ps&~}G}W6JuJy{PC{@T9as@3)
zw*_*N)27QIF0P8ibVYY_bC0wVB1?jQY)uO4ZAqlecQk5pVy}nQY=)(oGc{4NoUBzS
zPRw3%m~Ar~la`2`po&==3{e$IM84P>JIt?vwBovg*=yNW(c+@0hD%m%j&aSF;<XuO
zBOb~KVt^`piOvxNszkzLL1nHg@o3a4@<TzzPB<GyH)uqQYV&c3B9Tr|H0~<FRXRq;
z;)?0zikCPPbT@N63B!jOjP$@nlrPPso$iE50ytEwX2_DF=|ZVW(8b&k=!OvA7UJX4
z#*?mbO?0r~RBjv<v~GMnBqWM#`C#(o0lF9=9rV6To(dJtAiJEsTXYGV)Qqo+<{;*7
zHCmX?ZXEzEM@W=@Y9e|NU|1*xugSmm8qCdlOs|fBZg9|B3OVkAN}9pF+8;D%$8{I;
zeMyL~gv{Og+bv3rZnY|RR|CX`yeb9yE<#lLSRo^S;pOIN4XZV&E>aa%UpcGAW}g-2
z*FLOV$U06~sbwL)h0vh;uY><eBm=%v(|;|1F16sjK5QnZfqoCuNL<08oxcxi+Z88U
zWKJSKZL?g07n$0`u3Gt-90PS^#&Rytv0_i&iItc$o1HGil8Iz0(Hqm#bJ?67o3t{$
ynbg3b(cj0C{rzc{9_l+`r6vch6f;t3BiWlCNTsY~mo|@;S^tL|ega4FBl;UI{%AJ<

literal 0
HcmV?d00001

diff --git a/regression-test/data/external_table_p0/tvf/iceberg_variant_temporal_typed.parquet b/regression-test/data/external_table_p0/tvf/iceberg_variant_temporal_typed.parquet
new file mode 100644
index 0000000000000000000000000000000000000000..540ab2712ba72f4fd4bcf087dff6093fd46e1262
GIT binary patch
literal 1348
zcmaJ>L1+^}6#ct9*)BEORx`t{EG3sLE!5DoP14FD2%-{NDS8nMlua|0z@~}Y?W*9#
zixl)Ah&>hbD1siOpcD^16mNpyp<aqNkwT?<D4x_myV<lfMJAh>|L6br-pv2I)90_|
zD9~OUzqgjgO-j(xs0JVbfQ%1%k~m(y+k36pRHh_yOVCRD`0UB!Zjnc*D)K~*qJV<i
zCve<f|GbfiV8d-iY$rr=KaQ6_cdi}Y?1AJIz}^+42Aak3XY=buPiGWg`o4u=kg~o#
z?8dnj{q*mbO{FvX?e^6A&T$oPH25WNOa-G)HgqBmiBot^*<ff0w=AI(|7go489F*b
zlMEIBZy?GRIx#zxDnbV@A{>b}wOF2^Yi)%cL9r?fiiL%wF_OtNYNISLOcu~{LLk$G
zu_}fBP_oH(C0c4lt;w|9S}rTm5)oG}_t$25g?l$=0>`U4^Zty_>&=GexHJCUCa+>y
zshj~_u2mr~>rz0~ax#5N+z(BnU#oN9sXD$ROj%C^j=RW3<gqXvQPHNV0Kx2PCOPSf
zR3a=t(>r<Zq&b9+mP_^*GO5W&v5yhMG+YZzRGPu#2^AC=VK%cZ&5r656^p?@Sgpvy
zo4~3gDcAt)y-HK;mqN23ppr(OXsR(F1#RsiW|K9oz`9_H0*eO`TPrYz!Zwx#BoUx^
z*T6oh^fY@;s1?I%5W+@x(AK#s*#)e9TM)KG*ebxbRcf*igbsuNvXK`6<5F<&>ZItv
zC04R_B|Nd{DV>N06DQuc4iHR4-3IJI7iH`Rp=U#Uax}gl3N5p;wI<v9EvdKr6nWZ>
zy6-)TZ$<FENYK)M_^`rl!es>s<M!6IxaS37r}rqrH-hk&AY|JLBbJ6!O1u+#-)*nv
zb1z$ORJog-cj{d4;&+^S)8(L-nvQpCk^5)rUOM33Lai~M&Si7O>~MOZvRHGg={aY1
zc(yoJ$d8Wj+~{b9S0+Zro#I@<De`==lFtoS#)?HJH)yyfz@Z;p$8Xs*en0;KcS_~f

literal 0
HcmV?d00001

diff --git a/regression-test/data/external_table_p0/tvf/iceberg_variant_temporal_unshredded.parquet b/regression-test/data/external_table_p0/tvf/iceberg_variant_temporal_unshredded.parquet
new file mode 100644
index 0000000000000000000000000000000000000000..8fdb9c738ced2e1e263348033e2fc69a61c0c137
GIT binary patch
literal 917
zcmah|&x_MQ6n-;FYYDZmY-enwhkzKeprzUVh(Zo4h>CP&k-fQ4Ce5rC`eRMg(qoT;
zd(ew6d-UXg;7$AkdJqqL^RkGD=+(1t60})Z*i4d{_ws%3d-L98=gwo7<yjfW(dqIu
z-(flfd<=ZU8o+|+lv2P8QCm0;za1VH{-H_=jVMi1I4CQeT9wz;ifBNn9>(zmxyMA^
zM{i%fF0R=5lx-xW6@4qAQ$>ed>B^1bHo#r^;>TANwfW)a_NU*M^YG^pddv3rmxl5_
z8O!@oGY{q(H7hzn7fvd1Moo@}avQ6R&KMbpb&;xB$w1CeAqlt92GGH*@&fCizFO_F
zxypn)%2*P1LK1U9RE~QOuJwB|kkQlbOhkP#io3BK9*iR~=uStIXORp;88RKQ*P%lD
zA+%_P%$UAcwAR&jSrPq(p^QZ+Vxdf0$<4%IDuL@})%0zsq7E}m#b)T_9p14Eep;rq
zW*nTPXsLyHV&@D-!3W?UbF9qYX{-h+Qqg23(;>6}ib=#dpWC~Bw$Rx%M$fb$`z2K>
zDJ3iq@Gm)bm4DXQy)#<rb`l}1-3hUm#BYOhVMDSlJO3<Y^%o@V9q?Z{X7M*VdryEL
zYh1SrIN>389v+D3`BcU?hmkdt(WE~fS#HhsYIVyAru{)^?TcQ$=QZ1{#-?-|jX(ys
iH*X1Vzb!o3@`9FI51O7Q+zr#O0;D%w!t=U=KjdFmBA*%n

literal 0
HcmV?d00001

diff --git a/regression-test/data/external_table_p0/tvf/iceberg_variant_typed_only.parquet b/regression-test/data/external_table_p0/tvf/iceberg_variant_typed_only.parquet
new file mode 100644
index 0000000000000000000000000000000000000000..103b2f4b9769fe453e883add06f7119e8bebc335
GIT binary patch
literal 1724
zcmZuyPiP!f7=JVSW;#uyHMQ?CQ|7Rj3|X*CHp%X$FqjA;QhF$!JcK4Q*|+V&?q=O=
z))abZp-8Zof>3&DPoAtIQcygI2f<7AV6S?qBIqd>C7__c@4eZ{H0iwEo%j8Dzwgid
zzS+xfyi*fQyn=P`%ieM*I3!Paj9H8^8((Oqu|ECk@6SrHumv}2i)F`tc=*xXDLTi6
zOXuiJsl-Y!zl?S3+o#_-23DI(WOnFq73;%4{`%?EQR`_o1t+@Bnj3j9VBPuo`=>7z
zypakkooCF!J?RdvC8S8zD9O<Od2svJlLdbr`*enRuwpiR72Zo&_dfaZV|cp|UEyYh
z*gD34R^AT62xjcS=hPtFbdT&nW@_7l5<enSS%D9Q7I~$?0H;rTlM(~p%d4a#mA3%P
zjgzB`LJVD#wzhTBnH}Ax>tF>SW{YBuf=u$H#I-SqxE*|8{F%#wP<kRp`m`I1*QpTc
zk?`2kdR<y5_bsuOc|EbkhITtcZjVa2VULTHd=Tf~nYYh%HdU&ISJ%hMu#@yh>!a%f
z)mq=`cdvs*!k%Tm-D$D9?OS8vu1%L;rIw0m=)E4EVk;RXB-*|+PP#jaJWm-6ErV@>
z!|K9$*x4kr+gIBo)e`nGU)vyAyR+<4NMWXM(rS~R(;2DW_KVI4_fibsI`?_Evppgg
z=ATeq)l+>f9Z3j5Y=Z4G$%8Ps5wQ@hm_;|KQUR@uWgL02p^f2?WOJHCMnXqHK;SS$
zmbX1|R{qIFl@Zi*Q0JZ-%vs~SKN@o&2hlisGbG;87l_%81+VGJL}E0^0L7za^1dg|
z$@`9oCa@mUu%^#>o{z1#0jlK*ft{McOn%{sP=4cxISs&rIsq(O<KpAV9*ag>lWWnw
z6(4fl&j$d><`ZzJEkwOwCW6TyJt5`qj<{IN`rRWz4FQUISi3rf4Y?6LvEv51P2=R7
z;VQ(DXDXBFV)8dC@N-wJ>R3}REYND_wLL1KHZTPjU!Cj@R;ae)+Ct2{wwotW+rFFC
z2F{?Ld(W3<Xj-gHlX61GqQUATGgqcEjrKUmMzeoNwnM3qN#*&s0+aVBqpwdOBl(O=
z9(lOoO%;Ux%=<%3%&^+qYN>9upY#-FVmIjxx{CQ12g&fg9W^@N8<xjvxZT<6muuBp
zvpQd%OLscmR=J&Q&TlrC8ui5mRa;z4ReE7zIcc^VNmJFEX}va|E;XA;trFb8pdSCQ
LD|m@s#{bs;WeO0A

literal 0
HcmV?d00001

diff --git a/regression-test/data/external_table_p0/tvf/iceberg_variant_unshredded.parquet b/regression-test/data/external_table_p0/tvf/iceberg_variant_unshredded.parquet
new file mode 100644
index 0000000000000000000000000000000000000000..ec7295b3540c17a1a1d7a939b0b0b9ff7e3a51f5
GIT binary patch
literal 1561
zcmbtVzi-n(6n?(gu8BNQ@EtZ-ZI(u<)JDw@*C1IaY>261s2D1$OMESslC(HKVC;kh
zQ`u053NbJu2F8j7A@N6a0#OzgBqSJlXNS;A9$azf7vFt(`QG>L-Cfph-K|le73?ox
ze|)aFw1`uK8UP9a5VQiCil(7amEObt^wsAVdO!(9yn?VY14K73UAZ<Fg?-)=Ju!@<
zPG1a$A{=yfVzJfb+dPhu`XctPzkYl$WF(d9GEx#{9t!yNJ54?ZZ@)b$rczEhv@jCU
zbRbFrX4!nXQ6Qk1melFUsLQovY=;QQHUSCIR8VxoDk9cG%ixd-Qun$g$Mz@ukZl4^
zI8@6*tc8}rA=fE&Z&-3{FG*$CCg4=Z9AYiB3=X-@se99sV|)J$M{}w}u6heCgG1IS
z<?t*ywtt@C@TNNI(s7nJwwf%VttJ&F4x7N0yk&Ll+^+N=eJewIA|!7u9XpHST<K5b
zxXhsIm^L^rJ67mTxdLpzAg2d}o@mK@PXlz{xP3M1h_;BgH^)4Vcz?J#>JJ{oA`C@{
z=Yf(1ut^kxM{H$G)eW<BS!P8|#0@+cJmf>3PPH-bjs&ox<2peZT%=zB!;wFyh_ZFI
zUN(ZXE@_b%RFW#CGt`Gi217ysz@C}ZVtWRyf}}DXPgf16qKv0O6V4-r9Vz8wrF`Gb
zU$y+6C@}Y4$j=JS65bUmCW_h0S0Jkc?9ilh?2SR+D}QqaJ;^y;*(71C-icN|Ai*kB
z{(HY<?znSp+EX=SxTAm_nY6^dU!aEs*zXxuEF1X3YsPWy*++3S6mhk;6N+wC9t(R$
zJm$UKt^oVaE{`9M#BjM6m&PI<L_7Ubty=S|E2T<%6m`SW7VoTd{N}3HsEb;o(H89+
j_3PZ<TIIg*{I*wHX*YeJ*A|_?1xO$Gg4f>~{we$c<|PV9

literal 0
HcmV?d00001

diff --git a/regression-test/data/external_table_p0/tvf/test_local_tvf_iceberg_variant.out b/regression-test/data/external_table_p0/tvf/test_local_tvf_iceberg_variant.out
new file mode 100644
index 00000000000000..bf0da1da64118f
--- /dev/null
+++ b/regression-test/data/external_table_p0/tvf/test_local_tvf_iceberg_variant.out
@@ -0,0 +1,51 @@
+-- This file is automatically generated. You should know what you did if you want to edit this
+-- !desc_unshredded --
+id	int	Yes	false	\N	NONE
+v	variant<PROPERTIES ("variant_max_subcolumns_count" = "0","variant_enable_typed_paths_to_sparse" = "false","variant_max_sparse_column_statistics_size" = "10000","variant_sparse_hash_shard_count" = "0")>	Yes	false	\N	NONE
+
+-- !desc_typed_only --
+id	int	Yes	false	\N	NONE
+v	variant<PROPERTIES ("variant_max_subcolumns_count" = "0","variant_enable_typed_paths_to_sparse" = "false","variant_max_sparse_column_statistics_size" = "10000","variant_sparse_hash_shard_count" = "0")>	Yes	false	\N	NONE
+
+-- !unshredded_complex --
+1	1	name-1	100	10	false	1	name-1
+2	2	name-2	200	20	true	2	name-2
+3	3	name-3	300	30	false	3	name-3
+4	4	name-4	400	40	true	4	name-4
+5	5	name-5	500	50	false	5	name-5
+
+-- !shredded_fields --
+1	100	name-1
+2	200	name-2
+3	300	name-3
+4	400	name-4
+5	500	name-5
+
+-- !shredded_full_variant_with_scalar --
+1	true	true
+2	true	true
+3	true	true
+4	true	true
+5	true	true
+
+-- !typed_only_fields --
+1	10	alpha	true	[{"n":1},{"n":2}]
+2	20	beta	true	[{"n":3}]
+
+-- !typed_only_missing_field --
+1	true
+2	true
+
+-- !typed_only_nested_missing_field --
+1	true
+2	true
+
+-- !temporal_parity --
+1	19724	19724	3723004005	3723004005	1704164645006007	1704164645006007	true	true	true
+2	20214	20214	45296789012	45296789012	1746515289010011	1746515289010011	true	true	true
+
+-- !complex_join --
+2	name-2	200
+3	name-3	300
+4	name-4	400
+5	name-5	500
diff --git a/regression-test/suites/external_table_p0/iceberg/test_iceberg_variant_table_path.groovy b/regression-test/suites/external_table_p0/iceberg/test_iceberg_variant_table_path.groovy
new file mode 100644
index 00000000000000..40ca6bb84bd27f
--- /dev/null
+++ b/regression-test/suites/external_table_p0/iceberg/test_iceberg_variant_table_path.groovy
@@ -0,0 +1,137 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import org.apache.doris.regression.action.ProfileAction
+
+suite("test_iceberg_variant_table_path", "p0,external,iceberg,external_docker,external_docker_iceberg") {
+    def enabled = context.config.otherConfigs.get("enableIcebergTest")
+    if (enabled == null || !enabled.equalsIgnoreCase("true")) {
+        logger.info("Iceberg test is disabled")
+        return
+    }
+
+    def catalogName = "test_iceberg_variant_table_path"
+    def dbName = "test_iceberg_variant_table_path_db"
+    def restPort = context.config.otherConfigs.get("iceberg_rest_uri_port")
+    def minioPort = context.config.otherConfigs.get("iceberg_minio_port")
+    def externalEnvIp = context.config.otherConfigs.get("externalEnvIp")
+
+    def profileAction = new ProfileAction(context)
+    def getProfileByToken = { token ->
+        for (int i = 0; i < 60; ++i) {
+            List profileData = profileAction.getProfileList()
+            for (final def profileItem in profileData) {
+                if (profileItem["Sql Statement"].toString().contains(token)) {
+                    def profileText = profileAction.getProfile(profileItem["Profile ID"].toString()).toString()
+                    if (profileText.contains("ParquetReadColumnPaths")) {
+                        return profileText
+                    }
+                }
+            }
+            Thread.sleep(1000)
+        }
+        assertTrue(false)
+    }
+    def getParquetReadColumnPathSet = { profileText ->
+        def parquetReadColumnPaths = profileText.readLines().find { it.contains("ParquetReadColumnPaths") }
+        assertTrue(parquetReadColumnPaths != null)
+        logger.info("Iceberg variant table path ${parquetReadColumnPaths}")
+        def separatorIndex = parquetReadColumnPaths.indexOf(":")
+        assertTrue(separatorIndex >= 0)
+        return parquetReadColumnPaths.substring(separatorIndex + 1)
+                .split(",")
+                .collect { it.trim() }
+                .findAll { !it.isEmpty() } as Set
+    }
+
+    sql """drop catalog if exists ${catalogName}"""
+    spark_iceberg """CREATE DATABASE IF NOT EXISTS demo.${dbName}"""
+    spark_iceberg """DROP TABLE IF EXISTS demo.${dbName}.variant_table_path"""
+
+    try {
+        spark_iceberg_multi """
+            CREATE TABLE demo.${dbName}.variant_table_path (
+                id INT,
+                v VARIANT
+            ) USING iceberg
+            TBLPROPERTIES (
+                'format-version' = '3',
+                'write.format.default' = 'parquet'
+            );
+
+            INSERT INTO demo.${dbName}.variant_table_path
+            SELECT 1, parse_json('{"metric":10,"nested":{"x":"a"},"items":[1,2]}')
+            UNION ALL
+            SELECT 2, parse_json('{"metric":20,"nested":{"x":"b"},"items":[3,4]}')
+            UNION ALL
+            SELECT 3, parse_json('null');
+        """, 300
+
+        sql """
+            create catalog if not exists ${catalogName} properties (
+                "type" = "iceberg",
+                "iceberg.catalog.type" = "rest",
+                "uri" = "http://${externalEnvIp}:${restPort}",
+                "s3.access_key" = "admin",
+                "s3.secret_key" = "password",
+                "s3.endpoint" = "http://${externalEnvIp}:${minioPort}",
+                "s3.region" = "us-east-1"
+            )
+        """
+
+        sql """switch ${catalogName}"""
+        sql """use ${dbName}"""
+
+        def rows = sql """
+            select id,
+                   cast(v['metric'] as bigint) as metric,
+                   cast(v['nested']['x'] as string) as nested_x,
+                   cast(v['missing'] as string) is null as missing_is_null
+            from variant_table_path
+            order by id
+        """
+        assertEquals(3, rows.size())
+        assertEquals("1", rows[0][0].toString())
+        assertEquals("10", rows[0][1].toString())
+        assertEquals("a", rows[0][2].toString())
+        assertEquals("true", rows[0][3].toString())
+        assertEquals("2", rows[1][0].toString())
+        assertEquals("20", rows[1][1].toString())
+        assertEquals("b", rows[1][2].toString())
+        assertEquals("true", rows[1][3].toString())
+        assertEquals("3", rows[2][0].toString())
+        assertEquals(null, rows[2][1])
+        assertEquals(null, rows[2][2])
+        assertEquals("true", rows[2][3].toString())
+
+        sql """ set enable_profile = true """
+        sql """ set profile_level = 2 """
+        def profileToken = UUID.randomUUID().toString()
+        sql """
+            select "${profileToken}", sum(cast(v['metric'] as bigint))
+            from variant_table_path
+        """
+        def profile = getProfileByToken(profileToken)
+        def columnPaths = getParquetReadColumnPathSet(profile)
+        assertTrue(columnPaths.contains("v.metadata"))
+        assertTrue(columnPaths.contains("v.value"))
+    } finally {
+        sql """ set enable_profile = false """
+        sql """ set profile_level = 0 """
+        sql """drop catalog if exists ${catalogName}"""
+    }
+}
diff --git a/regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy b/regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy
new file mode 100644
index 00000000000000..52f5d551444e02
--- /dev/null
+++ b/regression-test/suites/external_table_p0/tvf/test_local_tvf_iceberg_variant.groovy
@@ -0,0 +1,448 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+import java.net.InetAddress
+import java.net.NetworkInterface
+import java.nio.file.Files
+import java.nio.file.StandardCopyOption
+import org.apache.doris.regression.action.ProfileAction
+
+suite("test_local_tvf_iceberg_variant", "p0,external") {
+    List<List<Object>> backends = sql """ show backends """
+    assertTrue(backends.size() > 0)
+
+    def dataFilePath = context.config.dataPath + "/external_table_p0/tvf/"
+    def beId = backends[0][0]
+    def outFilePath = "/"
+    def unshreddedData = "${dataFilePath}/iceberg_variant_unshredded.parquet"
+    def shreddedData = "${dataFilePath}/iceberg_variant_shredded.parquet"
+    def typedOnlyData = "${dataFilePath}/iceberg_variant_typed_only.parquet"
+    def temporalUnshreddedData = "${dataFilePath}/iceberg_variant_temporal_unshredded.parquet"
+    def temporalTypedData = "${dataFilePath}/iceberg_variant_temporal_typed.parquet"
+    def binaryUnshreddedData = "${dataFilePath}/iceberg_variant_binary_unshredded.parquet"
+    def binaryTypedData = "${dataFilePath}/iceberg_variant_binary_typed.parquet"
+
+    def localHosts = ["localhost", "127.0.0.1", InetAddress.localHost.hostAddress, InetAddress.localHost.hostName] as Set
+    NetworkInterface.networkInterfaces.each { networkInterface ->
+        networkInterface.inetAddresses.each { inetAddress ->
+            localHosts.add(inetAddress.hostAddress)
+            localHosts.add(inetAddress.hostName)
+        }
+    }
+
+    def dorisHome = new File(context.config.dataPath).parentFile.parentFile
+    def localBeHome = new File(dorisHome, "output/be")
+    def localJdbc = context.config.jdbcUrl.contains("127.0.0.1") || context.config.jdbcUrl.contains("localhost")
+    if (localJdbc && backends.size() == 1 && localHosts.contains(backends[0][1]) && localBeHome.exists()) {
+        outFilePath = ""
+        Files.copy(new File(unshreddedData).toPath(), new File(localBeHome, "iceberg_variant_unshredded.parquet").toPath(),
+                StandardCopyOption.REPLACE_EXISTING)
+        Files.copy(new File(shreddedData).toPath(), new File(localBeHome, "iceberg_variant_shredded.parquet").toPath(),
+                StandardCopyOption.REPLACE_EXISTING)
+        Files.copy(new File(typedOnlyData).toPath(), new File(localBeHome, "iceberg_variant_typed_only.parquet").toPath(),
+                StandardCopyOption.REPLACE_EXISTING)
+        Files.copy(new File(temporalUnshreddedData).toPath(), new File(localBeHome, "iceberg_variant_temporal_unshredded.parquet").toPath(),
+                StandardCopyOption.REPLACE_EXISTING)
+        Files.copy(new File(temporalTypedData).toPath(), new File(localBeHome, "iceberg_variant_temporal_typed.parquet").toPath(),
+                StandardCopyOption.REPLACE_EXISTING)
+        Files.copy(new File(binaryUnshreddedData).toPath(), new File(localBeHome, "iceberg_variant_binary_unshredded.parquet").toPath(),
+                StandardCopyOption.REPLACE_EXISTING)
+        Files.copy(new File(binaryTypedData).toPath(), new File(localBeHome, "iceberg_variant_binary_typed.parquet").toPath(),
+                StandardCopyOption.REPLACE_EXISTING)
+    } else {
+        for (List<Object> backend : backends) {
+            def beHost = backend[1]
+            scpFiles("root", beHost, unshreddedData, outFilePath, false)
+            scpFiles("root", beHost, shreddedData, outFilePath, false)
+            scpFiles("root", beHost, typedOnlyData, outFilePath, false)
+            scpFiles("root", beHost, temporalUnshreddedData, outFilePath, false)
+            scpFiles("root", beHost, temporalTypedData, outFilePath, false)
+            scpFiles("root", beHost, binaryUnshreddedData, outFilePath, false)
+            scpFiles("root", beHost, binaryTypedData, outFilePath, false)
+        }
+    }
+
+    def unshredded = outFilePath + "iceberg_variant_unshredded.parquet"
+    def shredded = outFilePath + "iceberg_variant_shredded.parquet"
+    def typedOnly = outFilePath + "iceberg_variant_typed_only.parquet"
+    def temporalUnshredded = outFilePath + "iceberg_variant_temporal_unshredded.parquet"
+    def temporalTyped = outFilePath + "iceberg_variant_temporal_typed.parquet"
+    def binaryUnshredded = outFilePath + "iceberg_variant_binary_unshredded.parquet"
+    def binaryTyped = outFilePath + "iceberg_variant_binary_typed.parquet"
+    def profileAction = new ProfileAction(context)
+    def getProfileByToken = { token ->
+        for (int i = 0; i < 60; ++i) {
+            List profileData = profileAction.getProfileList()
+            for (final def profileItem in profileData) {
+                if (profileItem["Sql Statement"].toString().contains(token)) {
+                    def profileText = profileAction.getProfile(profileItem["Profile ID"].toString()).toString()
+                    if (profileText.contains("ParquetReadColumnPaths")) {
+                        return profileText
+                    }
+                }
+            }
+            Thread.sleep(1000)
+        }
+        assertTrue(false)
+    }
+    def getParquetReadColumnPathSet = { profileText ->
+        def parquetReadColumnPaths = profileText.readLines().find { it.contains("ParquetReadColumnPaths") }
+        assertTrue(parquetReadColumnPaths != null)
+        logger.info("Iceberg variant shredding ${parquetReadColumnPaths}")
+        def separatorIndex = parquetReadColumnPaths.indexOf(":")
+        assertTrue(separatorIndex >= 0)
+        return parquetReadColumnPaths.substring(separatorIndex + 1)
+                .split(",")
+                .collect { it.trim() }
+                .findAll { !it.isEmpty() } as Set
+    }
+    def getProfileCounter = { profileText, counterName ->
+        def counterLine = profileText.readLines().find { it.contains(counterName) }
+        assertTrue(counterLine != null)
+        def matcher = counterLine =~ /${counterName}:\s*([0-9,]+)/
+        assertTrue(matcher.find())
+        return matcher.group(1).replace(",", "").toLong()
+    }
+
+    qt_desc_unshredded """
+        desc function local(
+            "file_path" = "${unshredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+    """
+
+    qt_desc_typed_only """
+        desc function local(
+            "file_path" = "${typedOnly}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+    """
+
+    order_qt_unshredded_complex """
+        select id,
+               cast(v['id'] as int) as variant_id,
+               cast(v['name'] as string) as name,
+               cast(v['metric'] as bigint) as metric,
+               cast(v['nested']['score'] as int) as score,
+               cast(v['nested']['flag'] as boolean) as flag,
+               cast(v['arr'] as array<int>)[1] as first_arr,
+               cast(v['arr'] as array<text>)[2] as second_arr
+        from local(
+            "file_path" = "${unshredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+        order by id
+    """
+
+    order_qt_shredded_fields """
+        select id,
+               cast(v['metric'] as bigint) as metric,
+               cast(v['name'] as string) as name
+        from local(
+            "file_path" = "${shredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+        order by id
+    """
+
+    order_qt_shredded_full_variant_with_scalar """
+        select id,
+               cast(v as string) like '%"name":"name-%' as has_name,
+               cast(v as string) like '%"metric":%' as has_metric
+        from local(
+            "file_path" = "${shredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+        order by id
+    """
+
+    order_qt_typed_only_fields """
+        select id,
+               cast(v['metric'] as bigint) as metric,
+               cast(v['nested']['x'] as string) as nested_x,
+               cast(v['f'] as string) is null as non_finite_float_is_null,
+               cast(v['items'] as string) as items
+        from local(
+            "file_path" = "${typedOnly}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+        order by id
+    """
+
+    order_qt_typed_only_missing_field """
+        select id,
+               cast(v['missing'] as string) is null as missing_is_null
+        from local(
+            "file_path" = "${typedOnly}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+        order by id
+    """
+
+    order_qt_typed_only_nested_missing_field """
+        select id,
+               cast(v['nested']['missing'] as string) is null as missing_is_null
+        from local(
+            "file_path" = "${typedOnly}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+        order by id
+    """
+
+    order_qt_temporal_parity """
+        select u.id,
+               cast(u.v['d'] as bigint) as unshredded_date,
+               cast(t.v['d'] as bigint) as typed_date,
+               cast(u.v['t'] as bigint) as unshredded_time,
+               cast(t.v['t'] as bigint) as typed_time,
+               cast(u.v['ts'] as bigint) as unshredded_ts,
+               cast(t.v['ts'] as bigint) as typed_ts,
+               cast(u.v['d'] as bigint) = cast(t.v['d'] as bigint) as same_date,
+               cast(u.v['t'] as bigint) = cast(t.v['t'] as bigint) as same_time,
+               cast(u.v['ts'] as bigint) = cast(t.v['ts'] as bigint) as same_ts
+        from local(
+            "file_path" = "${temporalUnshredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet") u
+        join local(
+            "file_path" = "${temporalTyped}",
+            "backend_id" = "${beId}",
+            "format" = "parquet") t
+          on u.id = t.id
+        order by u.id
+    """
+
+    def binaryUnshreddedRows = sql """
+        select id, hex(cast(v['b'] as varbinary))
+        from local(
+            "file_path" = "${binaryUnshredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet",
+            "enable_mapping_varbinary" = "true")
+        order by id
+    """
+    assertEquals(2, binaryUnshreddedRows.size())
+    assertEquals("1", binaryUnshreddedRows[0][0].toString())
+    assertEquals("FF0041", binaryUnshreddedRows[0][1].toString())
+    assertEquals("2", binaryUnshreddedRows[1][0].toString())
+    assertEquals("C328", binaryUnshreddedRows[1][1].toString())
+
+    def binaryTypedRows = sql """
+        select id, hex(cast(v['b'] as varbinary))
+        from local(
+            "file_path" = "${binaryTyped}",
+            "backend_id" = "${beId}",
+            "format" = "parquet",
+            "enable_mapping_varbinary" = "true")
+        order by id
+    """
+    assertEquals(2, binaryTypedRows.size())
+    assertEquals("1", binaryTypedRows[0][0].toString())
+    assertEquals("FF0041", binaryTypedRows[0][1].toString())
+    assertEquals("2", binaryTypedRows[1][0].toString())
+    assertEquals("C328", binaryTypedRows[1][1].toString())
+
+    try {
+    sql """ set enable_profile = true """
+    sql """ set profile_level = 2 """
+    def profileToken = UUID.randomUUID().toString()
+    sql """
+        select "${profileToken}", sum(cast(v['metric'] as bigint))
+        from local(
+            "file_path" = "${shredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+    """
+    def profile = getProfileByToken(profileToken)
+    def metricColumnPaths = getParquetReadColumnPathSet(profile)
+    assertTrue(metricColumnPaths.contains("v.metadata"))
+    // typed_value.metric.value is the field-level residual fallback for mixed-type metric rows.
+    // The top-level v.value stores non-shredded object fields and is not needed for this projection.
+    assertTrue(metricColumnPaths.contains("v.typed_value.metric.value"))
+    assertTrue(metricColumnPaths.contains("v.typed_value.metric.typed_value"))
+    assertFalse(metricColumnPaths.contains("v.value"))
+    assertFalse(metricColumnPaths.contains("v.typed_value.name"))
+
+    def nestedProfileToken = UUID.randomUUID().toString()
+    sql """
+        select "${nestedProfileToken}", count(cast(v['metric']['x'] as string))
+        from local(
+            "file_path" = "${shredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+    """
+    def nestedProfile = getProfileByToken(nestedProfileToken)
+    def nestedColumnPaths = getParquetReadColumnPathSet(nestedProfile)
+    assertTrue(nestedColumnPaths.contains("v.typed_value.metric.typed_value.x"))
+    // metric.value is required for rows that store metric as field-level residual instead of metric.typed_value.
+    assertTrue(nestedColumnPaths.contains("v.metadata"))
+    assertTrue(nestedColumnPaths.contains("v.typed_value.metric.value"))
+    assertFalse(nestedColumnPaths.contains("v.value"))
+    assertFalse(nestedColumnPaths.contains("v.typed_value.name"))
+    assertEquals(0, getProfileCounter(nestedProfile, "VariantDirectTypedValueReadRows"))
+    assertTrue(getProfileCounter(nestedProfile, "VariantRowWiseReadRows") > 0)
+
+    def typedOnlyProfileToken = UUID.randomUUID().toString()
+    sql """
+        select "${typedOnlyProfileToken}", sum(cast(v['metric'] as bigint))
+        from local(
+            "file_path" = "${typedOnly}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+    """
+    def typedOnlyProfile = getProfileByToken(typedOnlyProfileToken)
+    def typedOnlyColumnPaths = getParquetReadColumnPathSet(typedOnlyProfile)
+    assertTrue(typedOnlyColumnPaths.contains("v.typed_value.metric"))
+    assertFalse(typedOnlyColumnPaths.contains("v.metadata"))
+    assertFalse(typedOnlyColumnPaths.contains("v.value"))
+    assertTrue(getProfileCounter(typedOnlyProfile, "VariantDirectTypedValueReadRows") > 0)
+    assertEquals(0, getProfileCounter(typedOnlyProfile, "VariantRowWiseReadRows"))
+
+    def typedOnlyNestedProfileToken = UUID.randomUUID().toString()
+    sql """
+        select "${typedOnlyNestedProfileToken}", count(cast(v['nested']['x'] as string))
+        from local(
+            "file_path" = "${typedOnly}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+    """
+    def typedOnlyNestedProfile = getProfileByToken(typedOnlyNestedProfileToken)
+    def typedOnlyNestedColumnPaths = getParquetReadColumnPathSet(typedOnlyNestedProfile)
+    assertTrue(typedOnlyNestedColumnPaths.contains("v.typed_value.nested.typed_value.x"))
+    assertFalse(typedOnlyNestedColumnPaths.contains("v.metadata"))
+    assertFalse(typedOnlyNestedColumnPaths.contains("v.typed_value.nested.value"))
+    assertFalse(typedOnlyNestedColumnPaths.contains("v.value"))
+    assertTrue(getProfileCounter(typedOnlyNestedProfile, "VariantDirectTypedValueReadRows") > 0)
+    assertEquals(0, getProfileCounter(typedOnlyNestedProfile, "VariantRowWiseReadRows"))
+
+    def binaryTypedProfileToken = UUID.randomUUID().toString()
+    sql """
+        select "${binaryTypedProfileToken}", max(hex(cast(v['b'] as varbinary)))
+        from local(
+            "file_path" = "${binaryTyped}",
+            "backend_id" = "${beId}",
+            "format" = "parquet",
+            "enable_mapping_varbinary" = "true")
+    """
+    def binaryTypedProfile = getProfileByToken(binaryTypedProfileToken)
+    def binaryTypedColumnPaths = getParquetReadColumnPathSet(binaryTypedProfile)
+    assertTrue(binaryTypedColumnPaths.contains("v.typed_value.b"))
+    assertFalse(binaryTypedColumnPaths.contains("v.metadata"))
+    assertFalse(binaryTypedColumnPaths.contains("v.value"))
+    assertTrue(getProfileCounter(binaryTypedProfile, "VariantDirectTypedValueReadRows") > 0)
+    assertEquals(0, getProfileCounter(binaryTypedProfile, "VariantRowWiseReadRows"))
+
+    def typedOnlyMissingProfileToken = UUID.randomUUID().toString()
+    sql """
+        select "${typedOnlyMissingProfileToken}", count(cast(v['missing'] as string))
+        from local(
+            "file_path" = "${typedOnly}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+    """
+    def typedOnlyMissingProfile = getProfileByToken(typedOnlyMissingProfileToken)
+    def typedOnlyMissingColumnPaths = getParquetReadColumnPathSet(typedOnlyMissingProfile)
+    assertTrue(typedOnlyMissingColumnPaths.contains("v.metadata"))
+    assertFalse(typedOnlyMissingColumnPaths.any { it.startsWith("v.typed_value") })
+    assertFalse(typedOnlyMissingColumnPaths.contains("v.value"))
+
+    def typedOnlyNestedMissingProfileToken = UUID.randomUUID().toString()
+    sql """
+        select "${typedOnlyNestedMissingProfileToken}", count(cast(v['nested']['missing'] as string))
+        from local(
+            "file_path" = "${typedOnly}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+    """
+    def typedOnlyNestedMissingProfile = getProfileByToken(typedOnlyNestedMissingProfileToken)
+    def typedOnlyNestedMissingColumnPaths = getParquetReadColumnPathSet(typedOnlyNestedMissingProfile)
+    assertTrue(typedOnlyNestedMissingColumnPaths.contains("v.metadata"))
+    assertFalse(typedOnlyNestedMissingColumnPaths.any { it.startsWith("v.typed_value.nested.typed_value") })
+    assertFalse(typedOnlyNestedMissingColumnPaths.contains("v.typed_value.nested.value"))
+    assertFalse(typedOnlyNestedMissingColumnPaths.contains("v.value"))
+
+    def temporalProfileToken = UUID.randomUUID().toString()
+    sql """
+        select "${temporalProfileToken}",
+               sum(cast(v['d'] as bigint) + cast(v['t'] as bigint) + cast(v['ts'] as bigint))
+        from local(
+            "file_path" = "${temporalTyped}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+    """
+    def temporalProfile = getProfileByToken(temporalProfileToken)
+    def temporalColumnPaths = getParquetReadColumnPathSet(temporalProfile)
+    assertTrue(temporalColumnPaths.contains("v.typed_value.d"))
+    assertTrue(temporalColumnPaths.contains("v.typed_value.t"))
+    assertTrue(temporalColumnPaths.contains("v.typed_value.ts"))
+    assertFalse(temporalColumnPaths.contains("v.metadata"))
+    assertFalse(temporalColumnPaths.contains("v.value"))
+    assertTrue(getProfileCounter(temporalProfile, "VariantDirectTypedValueReadRows") > 0)
+    assertEquals(0, getProfileCounter(temporalProfile, "VariantRowWiseReadRows"))
+
+    def caseProfileToken = UUID.randomUUID().toString()
+    sql """
+        select "${caseProfileToken}", count(cast(v['Name'] as string))
+        from local(
+            "file_path" = "${shredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+    """
+    def caseProfile = getProfileByToken(caseProfileToken)
+    def caseColumnPaths = getParquetReadColumnPathSet(caseProfile)
+    assertTrue(caseColumnPaths.contains("v.metadata"))
+    assertTrue(caseColumnPaths.contains("v.value"))
+    assertFalse(caseColumnPaths.contains("v.typed_value.name"))
+
+    def fullVariantWithPredicateProfileToken = UUID.randomUUID().toString()
+    sql """
+        select "${fullVariantWithPredicateProfileToken}", cast(v as string)
+        from local(
+            "file_path" = "${shredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet")
+        where cast(v['metric'] as bigint) >= 20
+    """
+    def fullVariantWithPredicateProfile = getProfileByToken(fullVariantWithPredicateProfileToken)
+    def fullVariantWithPredicateColumnPaths = getParquetReadColumnPathSet(fullVariantWithPredicateProfile)
+    assertTrue(fullVariantWithPredicateColumnPaths.contains("v.metadata"))
+    assertTrue(fullVariantWithPredicateColumnPaths.contains("v.value"))
+    assertTrue(fullVariantWithPredicateColumnPaths.contains("v.typed_value.metric.value"))
+    assertTrue(fullVariantWithPredicateColumnPaths.contains("v.typed_value.metric.typed_value"))
+    assertTrue(fullVariantWithPredicateColumnPaths.contains("v.typed_value.name"))
+
+    order_qt_complex_join """
+        select u.id,
+               cast(u.v['name'] as string) as name,
+               cast(s.v['metric'] as bigint) as metric
+        from local(
+            "file_path" = "${unshredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet") u
+        join local(
+            "file_path" = "${shredded}",
+            "backend_id" = "${beId}",
+            "format" = "parquet") s
+          on u.id = s.id
+        where cast(u.v['nested']['score'] as int) >= 20
+        order by u.id
+    """
+    } finally {
+        sql """ set enable_profile = false """
+        sql """ set profile_level = 0 """
+    }
+}