-
Notifications
You must be signed in to change notification settings - Fork 267
2PC and staging output #3068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
2PC and staging output #3068
Conversation
| outputPath: String, | ||
| committer: Option[FileCommitProtocol] = None, | ||
| jobTrackerID: String = Utils.createTempDir().getName) | ||
| case class CometNativeWriteExec(nativeOp: Operator, child: SparkPlan, outputPath: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basic execution that delegates to native writer
|
@Shekharrajak Thank you for your work. The file commit protocol has already been implemented in #2828, and work_dir is the staging dir. Is my understanding correct? cc @comphead @andygrove |
I think current original implementation duplicated what InsertIntoHadoopFsRelationCommand already does. In this PR code changes we are not managing FileCommitProtocol ourself but delegated to Spark. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3068 +/- ##
============================================
+ Coverage 56.12% 59.54% +3.41%
- Complexity 976 1374 +398
============================================
Files 119 167 +48
Lines 11743 15461 +3718
Branches 2251 2570 +319
============================================
+ Hits 6591 9206 +2615
- Misses 4012 4961 +949
- Partials 1140 1294 +154 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f645ba3 to
4ad285e
Compare
Which issue does this PR close?
Closes #3015.
Rationale for this change
The native Parquet writer needed a fix to use
output_pathas the base directory for file writes whenwork_diris not set. Without this fix, files were being written to root (/) instead of the intended output directory.What changes are included in this PR?
staging_file_pathfield toParquetWritermessage for future 2PC supportoutput_pathas fallback whenwork_diris emptyCometNativeWriteExecto write directly to output pathCometParquetWriter2PCSuitewith basic write functionality testsHow are these changes tested?
Added
CometParquetWriter2PCSuitewith 5 tests: