[CARBONDATA-4083] Refactor Update and Support Update Atomicity#4004
[CARBONDATA-4083] Refactor Update and Support Update Atomicity#4004marchpure wants to merge 1 commit intoapache:masterfrom
Conversation
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3019/ |
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4777/ |
ee7b4c4 to
67a0dad
Compare
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4805/ |
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3048/ |
67a0dad to
5119fdc
Compare
|
retest this please |
1 similar comment
|
retest this please |
68ad12c to
3733726
Compare
|
retest this please |
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4837/ |
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3083/ |
3733726 to
be3af3f
Compare
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4838/ |
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3084/ |
be3af3f to
c5175e4
Compare
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3086/ |
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4840/ |
c5175e4 to
7cfbc8a
Compare
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3087/ |
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4841/ |
7cfbc8a to
e9ff937
Compare
|
retest this please |
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4842/ |
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3089/ |
e9ff937 to
b6bb945
Compare
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4843/ |
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3090/ |
b6bb945 to
4d569df
Compare
|
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5150/ |
|
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3388/ |
7ccef61 to
44e5997
Compare
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3389/ |
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5151/ |
44e5997 to
6b51881
Compare
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3390/ |
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5152/ |
6b51881 to
23b99ae
Compare
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5153/ |
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3391/ |
Why is this PR needed? Currently, we will modify tablestatus file for serveral times in the update flow. In total 4 tablestauts write ops destoy the Atomicity to a certain extent. which maybe incur dirty data under update failure scenrios. The first time we update tablestatus is when writing delta files, firstly we update the updatedeltastarttime and updatedeltaendtime in the tablestatus, then delete some segments, which bring 2 tablestatus write ops. The second time we update tatblstatus is when insert new data. just like the first time, will bring 2 tablesatus write ops. Also, auto compaction doesn't work for UPDATE. UPDATE won't trigger MINOR Compaction even when we TURN ON carbon.merge.auto.compaction. What changes were proposed in this PR? 1. Code Clean for Update and Delete. 2. Update Tablestatus only one time in the whole update flow. During the Loading in the UPDATE flow, we skip update tablestatus, but to cache loadmetadetails in UpdateTableModel.addedLoadDetail. For non-partition table(CarbonDataRDDFactory), we use 'updateModel.get.addedLoadDetail = Some(metadataDetails)' For partition table, we deserialize 'serializedNewMetaEntry' in CommonLoadUtils.scala. In the End. we complete the loaddetails write、updatedeltatime update、delete segments together in DeleteExecution.checkAndUpdateStatusFiles. 3. Trgger Minor Compaction in UPDATE if needed. it's disabled for default. Does this PR introduce any user interface change? No Is any new testcase added? Yes Co-authored-by: shenjiayu17 <shenjiayu.hust@foxmail.com>
23b99ae to
55f6221
Compare
|
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5154/ |
|
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3392/ |
|
retest this please |
|
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5159/ |
|
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3397/ |
|
retest this please |
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3407/ |
|
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5169/ |
|
retest this please |
|
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5172/ |
|
Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3410/ |
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3305/ |
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5064/ |
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3306/ |
|
Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3302/ |
|
Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5066/ |
Why is this PR needed?
Currently, we will modify tablestatus file for serveral times in the update flow. In total 4 tablestauts write ops destoy the Atomicity to a certain extent. which maybe incur dirty data under update failure scenrios.
The first time we update tablestatus is when writing delta files, firstly we update the updatedeltastarttime and updatedeltaendtime in the tablestatus, then delete some segments, which bring 2 tablestatus write ops.
The second time we update tatblstatus is when insert new data. just like the first time, will bring 2 tablesatus write ops.
Also, auto compaction doesn't work for UPDATE. UPDATE won't trigger MINOR Compaction even when we TURN ON carbon.merge.auto.compaction.
What changes were proposed in this PR?
For non-partition table(CarbonDataRDDFactory), we use 'updateModel.get.addedLoadDetail = Some(metadataDetails)'
For partition table, we deserialize 'serializedNewMetaEntry' in CommonLoadUtils.scala.
In the End. we complete the loaddetails write、updatedeltatime update、delete segments together in DeleteExecution.checkAndUpdateStatusFiles.
Does this PR introduce any user interface change?
Is any new testcase added?