Feat: to_csv #3004

kazantsev-maksim · 2025-12-28T13:38:39Z

Which issue does this PR close?

N/A

Rationale for this change

Basic implementation of the spark to_csv function added - https://spark.apache.org/docs/latest/api/sql/index.html#to_csv

Handling of complex types must be implemented in a future iteration.
The processing of types such as DateType, TimestampType, TimestampNTZType, and BinaryType is currently inconsistent with Spark's behavior.

What changes are included in this PR?

How are these changes tested?

Added unit tests
Added benchmark tests

Benchmark results (need optimization):

This reverts commit 768b3e9.

kazantsev-maksim · 2026-01-09T14:38:22Z

native/proto/src/proto/expr.proto

+  CsvWriteOptions options = 2;
+}
+
+message CsvWriteOptions {


All settings: https://spark.apache.org/docs/latest/sql-data-sources-csv.html

codecov-commenter · 2026-01-13T21:15:01Z

Codecov Report

❌ Patch coverage is 90.24390% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.58%. Comparing base (f09f8af) to head (cf544c7).
⚠️ Report is 845 commits behind head on main.

Files with missing lines	Patch %	Lines
...rc/main/scala/org/apache/comet/serde/structs.scala	89.74%	3 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #3004      +/-   ##
============================================
+ Coverage     56.12%   59.58%   +3.45%     
- Complexity      976     1381     +405     
============================================
  Files           119      167      +48     
  Lines         11743    15562    +3819     
  Branches       2251     2577     +326     
============================================
+ Hits           6591     9272    +2681     
- Misses         4012     4992     +980     
- Partials       1140     1298     +158

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

parthchandra

(Sorry for the delay in reviewing). This looks pretty good to me, pending ci.
Also a minor comment on escaping. Can you confirm that this behaviour is consistent with Spark?

parthchandra · 2026-01-13T21:12:14Z

native/spark-expr/src/csv_funcs/to_csv.rs

+fn escape_value(value: &str, quote: &str, escape: &str, output: &mut String) {
+    for ch in value.chars() {
+        let ch_str = ch.to_string();
+        if ch_str == quote || ch_str == escape {


The CSV spec does not have any special escape for escape and the preferred way to escape the double quote is another double quote (but only if the string is enclosed in a double quote!). - https://datatracker.ietf.org/doc/html/rfc4180#section-2
Not sure what Spark does here.

Kazantsev Maksim and others added 7 commits December 14, 2025 16:24

impl map_from_entries

768b3e9

Revert "impl map_from_entries"

c68c342

This reverts commit 768b3e9.

Merge branch 'apache:main' into main

d887555

Merge branch 'apache:main' into main

231aa90

Merge branch 'apache:main' into main

9500bbb

work

5d153e9

WIP

f0f03d4

kazantsev-maksim marked this pull request as draft December 28, 2025 13:38

Kazantsev Maksim and others added 16 commits December 28, 2025 17:40

WIP

4b02dd6

Merge branch 'apache:main' into main

9577481

Add benchmark test

0f98a3c

WIP

d7a6036

Merge branch 'apache:main' into main

3791557

Merge branch 'apache:main' into main

7c2f082

Merge branch 'apache:main' into main

609a605

Merge branch 'apache:main' into main

a151b2c

Merge remote-tracking branch 'origin/main' into to_csv

e89c8f2

Add benchmark

902eb3a

add more options

86c17e8

Revert

1bbc314

Work

55388da

Work

c93d256

Fix tests

3a51b62

Fix tests

93458cf

kazantsev-maksim marked this pull request as ready for review January 9, 2026 10:56

Fix clippy warnings

773aaba

kazantsev-maksim commented Jan 9, 2026

View reviewed changes

Fix tests

cf544c7

parthchandra approved these changes Jan 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat: to_csv #3004

Feat: to_csv #3004

Uh oh!

kazantsev-maksim commented Dec 28, 2025 •

edited

Loading

Uh oh!

kazantsev-maksim Jan 9, 2026

Uh oh!

codecov-commenter commented Jan 13, 2026 •

edited

Loading

Uh oh!

parthchandra left a comment

Uh oh!

parthchandra Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feat: to_csv #3004

Are you sure you want to change the base?

Feat: to_csv #3004

Uh oh!

Conversation

kazantsev-maksim commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

kazantsev-maksim Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

parthchandra left a comment

Choose a reason for hiding this comment

Uh oh!

parthchandra Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kazantsev-maksim commented Dec 28, 2025 •

edited

Loading

codecov-commenter commented Jan 13, 2026 •

edited

Loading