Skip to content

[CARBONDATA-4083] Refactor Update and Support Update Atomicity#4004

Open
marchpure wants to merge 1 commit intoapache:masterfrom
marchpure:update_benchmark
Open

[CARBONDATA-4083] Refactor Update and Support Update Atomicity#4004
marchpure wants to merge 1 commit intoapache:masterfrom
marchpure:update_benchmark

Conversation

@marchpure
Copy link
Copy Markdown
Contributor

@marchpure marchpure commented Nov 5, 2020

Why is this PR needed?

Currently, we will modify tablestatus file for serveral times in the update flow. In total 4 tablestauts write ops destoy the Atomicity to a certain extent. which maybe incur dirty data under update failure scenrios.
The first time we update tablestatus is when writing delta files, firstly we update the updatedeltastarttime and updatedeltaendtime in the tablestatus, then delete some segments, which bring 2 tablestatus write ops.
The second time we update tatblstatus is when insert new data. just like the first time, will bring 2 tablesatus write ops.
Also, auto compaction doesn't work for UPDATE. UPDATE won't trigger MINOR Compaction even when we TURN ON carbon.merge.auto.compaction.

What changes were proposed in this PR?

  1. Code Clean for Update and Delete.
  2. Modify Tablestatus only one time in the whole update flow. During the Loading in the UPDATE flow, we skip update tablestatus, but to cache loadmetadetails in UpdateTableModel.addedLoadDetail.
    For non-partition table(CarbonDataRDDFactory), we use 'updateModel.get.addedLoadDetail = Some(metadataDetails)'
    For partition table, we deserialize 'serializedNewMetaEntry' in CommonLoadUtils.scala.
    In the End. we complete the loaddetails write、updatedeltatime update、delete segments together in DeleteExecution.checkAndUpdateStatusFiles.
  3. Trgger Minor Compaction in UPDATE if needed. it's disabled for default.

Does this PR introduce any user interface change?

  • No

Is any new testcase added?

  • Yes

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3019/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4777/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4805/

@CarbonDataQA1
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3048/

@marchpure
Copy link
Copy Markdown
Contributor Author

retest this please

1 similar comment
@ydvpankaj99
Copy link
Copy Markdown
Contributor

retest this please

@marchpure marchpure force-pushed the update_benchmark branch 2 times, most recently from 68ad12c to 3733726 Compare November 21, 2020 08:09
@marchpure
Copy link
Copy Markdown
Contributor Author

retest this please

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4837/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3083/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4838/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3084/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3086/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4840/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3087/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4841/

@marchpure
Copy link
Copy Markdown
Contributor Author

retest this please

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4842/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3089/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4843/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3090/

@CarbonDataQA2
Copy link
Copy Markdown

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5150/

@CarbonDataQA2
Copy link
Copy Markdown

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3388/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3389/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5151/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3390/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5152/

@marchpure marchpure changed the title [WIP] update actomity [CARBONDATA-4083] Refactor Update and Support Update Atomicity Dec 13, 2020
@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5153/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3391/

Why is this PR needed?
Currently, we will modify tablestatus file for serveral times in the update flow. In total 4 tablestauts write ops destoy the Atomicity to a certain extent. which maybe incur dirty data under update failure scenrios.
The first time we update tablestatus is when writing delta files, firstly we update the updatedeltastarttime and updatedeltaendtime in the tablestatus, then delete some segments, which bring 2 tablestatus write ops.
The second time we update tatblstatus is when insert new data. just like the first time, will bring 2 tablesatus write ops.
Also, auto compaction doesn't work for UPDATE. UPDATE won't trigger MINOR Compaction even when we TURN ON carbon.merge.auto.compaction.

What changes were proposed in this PR?
1. Code Clean for Update and Delete.
2. Update Tablestatus only one time in the whole update flow. During the Loading in the UPDATE flow, we skip update tablestatus, but to cache loadmetadetails in UpdateTableModel.addedLoadDetail.
For non-partition table(CarbonDataRDDFactory), we use 'updateModel.get.addedLoadDetail = Some(metadataDetails)'
For partition table, we deserialize 'serializedNewMetaEntry' in CommonLoadUtils.scala.
In the End. we complete the loaddetails write、updatedeltatime update、delete segments together in DeleteExecution.checkAndUpdateStatusFiles.
3. Trgger Minor Compaction in UPDATE if needed. it's disabled for default.

Does this PR introduce any user interface change?
No

Is any new testcase added?
Yes

Co-authored-by: shenjiayu17 <shenjiayu.hust@foxmail.com>
@CarbonDataQA2
Copy link
Copy Markdown

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5154/

@CarbonDataQA2
Copy link
Copy Markdown

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3392/

@marchpure
Copy link
Copy Markdown
Contributor Author

retest this please

@CarbonDataQA2
Copy link
Copy Markdown

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5159/

@CarbonDataQA2
Copy link
Copy Markdown

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3397/

@Zhangshunyu
Copy link
Copy Markdown
Contributor

retest this please

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3407/

@CarbonDataQA2
Copy link
Copy Markdown

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5169/

@Zhangshunyu
Copy link
Copy Markdown
Contributor

retest this please

@CarbonDataQA2
Copy link
Copy Markdown

Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5172/

@CarbonDataQA2
Copy link
Copy Markdown

Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3410/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3305/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5064/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3306/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3302/

@CarbonDataQA2
Copy link
Copy Markdown

Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5066/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants