[VL] Support native Parquet write for complex types (Struct/Array/Map)#11788
Open
Zouxxyy wants to merge 2 commits intoapache:mainfrom
Open
[VL] Support native Parquet write for complex types (Struct/Array/Map)#11788Zouxxyy wants to merge 2 commits intoapache:mainfrom
Zouxxyy wants to merge 2 commits intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI on x86 |
Contributor
Author
|
Generated-by: Kiro (Claude Opus 4.6) |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR enables Velox’s native Parquet write path to support complex Spark SQL types (Struct/Array/Map) by removing earlier type-gating and adjusting the Velox write validation to allow nested types for Parquet.
Changes:
- Removes the schema-based “native write supported” gate (
supportNativeWrite) and relies on the WriteFiles validation path instead. - Updates Velox write validation to allow Parquet
StructType(still blocksYearMonthIntervalType) and reorders validation checks. - Simplifies Delta Parquet native-writability checks and adds new Velox Parquet write tests for complex/nested types.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| gluten-substrait/src/main/scala/org/apache/spark/sql/execution/datasources/GlutenWriterColumnarRules.scala | Removes schema gate before enabling native write properties/adaptor injection. |
| gluten-substrait/src/main/scala/org/apache/gluten/backendsapi/BackendSettingsApi.scala | Deletes supportNativeWrite from the backend settings API. |
| backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxBackend.scala | Allows Parquet struct types; refactors/reorders native write validation chain. |
| backends-velox/src/test/scala/org/apache/spark/sql/execution/VeloxParquetWriteSuite.scala | Adds native Parquet write coverage for struct/array/map and nested struct. |
| backends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/files/GlutenDeltaFileFormatWriter.scala | Removes dependency on deleted Parquet companion helper; forces native-writable flag. |
| backends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/GlutenParquetFileFormat.scala | Removes fallback branch + companion object; always uses Gluten Parquet OutputWriterFactory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
101
to
106
| case rc @ DataWritingCommandExec(cmd, child) => | ||
| // The same thread can set these properties in the last query submission. | ||
| val format = | ||
| if ( | ||
| BackendsApiManager.getSettings.supportNativeWrite(child.schema.fields) && | ||
| BackendsApiManager.getSettings.enableNativeWriteFiles() | ||
| ) { | ||
| if (BackendsApiManager.getSettings.enableNativeWriteFiles()) { | ||
| getNativeFormat(cmd) | ||
| } else { |
Comment on lines
33
to
42
| import org.slf4j.LoggerFactory | ||
|
|
||
| class GlutenParquetFileFormat | ||
| extends ParquetFileFormat | ||
| with DataSourceRegister | ||
| with Logging | ||
| with Serializable { | ||
| import GlutenParquetFileFormat._ | ||
|
|
||
| private val logger = LoggerFactory.getLogger(classOf[GlutenParquetFileFormat]) | ||
|
|
|
Run Gluten Clickhouse CI on x86 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
Enable native Parquet write for complex types (Struct/Array/Map) in Velox backend.
Velox's parquet writer converts vectors to Arrow then writes via Arrow's Parquet writer, which natively supports nested types. The previous Scala-side type restrictions were unnecessary.
Changes:
supportNativeWritegate — no longer needed sincesupportWriteFilesExechandles validationvalidateDataTypesfor Parquet (onlyYearMonthIntervalTyperemains blocked, as Arrow has no mapping for it)validateDataTypesrecursively check nested types forYearMonthIntervalTypeHow was this patch tested?
New tests in
VeloxParquetWriteSuite. Existing tests pass.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Kiro (Claude Opus 4.6)