Skip to content

Conversation

@kazantsev-maksim
Copy link
Contributor

@kazantsev-maksim kazantsev-maksim commented Dec 28, 2025

Which issue does this PR close?

  • N/A

Rationale for this change

Basic implementation of the spark to_csv function added - https://spark.apache.org/docs/latest/api/sql/index.html#to_csv

  1. Handling of complex types must be implemented in a future iteration.
  2. The processing of types such as DateType, TimestampType, TimestampNTZType, and BinaryType is currently inconsistent with Spark's behavior.

What changes are included in this PR?

How are these changes tested?

  1. Added unit tests
  2. Added benchmark tests

Benchmark results (need optimization):
to_csv_benchmark_result

@kazantsev-maksim kazantsev-maksim marked this pull request as draft December 28, 2025 13:38
@kazantsev-maksim kazantsev-maksim marked this pull request as ready for review January 9, 2026 10:56
CsvWriteOptions options = 2;
}

message CsvWriteOptions {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codecov-commenter
Copy link

codecov-commenter commented Jan 13, 2026

Codecov Report

❌ Patch coverage is 90.24390% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.58%. Comparing base (f09f8af) to head (cf544c7).
⚠️ Report is 845 commits behind head on main.

Files with missing lines Patch % Lines
...rc/main/scala/org/apache/comet/serde/structs.scala 89.74% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3004      +/-   ##
============================================
+ Coverage     56.12%   59.58%   +3.45%     
- Complexity      976     1381     +405     
============================================
  Files           119      167      +48     
  Lines         11743    15562    +3819     
  Branches       2251     2577     +326     
============================================
+ Hits           6591     9272    +2681     
- Misses         4012     4992     +980     
- Partials       1140     1298     +158     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry for the delay in reviewing). This looks pretty good to me, pending ci.
Also a minor comment on escaping. Can you confirm that this behaviour is consistent with Spark?

fn escape_value(value: &str, quote: &str, escape: &str, output: &mut String) {
for ch in value.chars() {
let ch_str = ch.to_string();
if ch_str == quote || ch_str == escape {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSV spec does not have any special escape for escape and the preferred way to escape the double quote is another double quote (but only if the string is enclosed in a double quote!). - https://datatracker.ietf.org/doc/html/rfc4180#section-2
Not sure what Spark does here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants