Skip to content

[core] Skip schema update when no changes detected#7545

Open
zhongyujiang wants to merge 2 commits intoapache:masterfrom
zhongyujiang:schema-update-noop
Open

[core] Skip schema update when no changes detected#7545
zhongyujiang wants to merge 2 commits intoapache:masterfrom
zhongyujiang:schema-update-noop

Conversation

@zhongyujiang
Copy link
Copy Markdown
Contributor

@zhongyujiang zhongyujiang commented Mar 27, 2026

Purpose

This PR fixes an issue where Paimon creates a new schema version even when there are no actual changes to the schema content.

Previously, when executing schema update operations like setOption("foo", "bar") where the option already had the same value, Paimon would still create a new schema version with an incremented ID. This resulted in unnecessary schema versions being created.

Changes

  1. Added sameContent() method to TableSchema to compare schema content while ignoring version and timeMillis
  2. Modified commitChanges() to check if new schema has same content as old schema before committing
  3. Fixed equals() and hashCode() in TableSchema to include highestFieldId for proper comparison

Tests

Added testNoChangeCommitDoesNotCreateNewSchema() in SchemaManagerTest:

  • Creates a table with an option foo=bar
  • Calls commitChanges(SchemaChange.setOption("foo", "bar")) with the same value
  • Verifies that no new schema version is created (ID remains the same)
  • Also tests updateComment with unchanged comment value

@zhongyujiang
Copy link
Copy Markdown
Contributor Author

@JingsongLi can you please help review this? Thanks!

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should modify generateTableSchema remove useless changes and return a Optional<TableSchema>.

@zhongyujiang
Copy link
Copy Markdown
Contributor Author

zhongyujiang commented Mar 30, 2026

@JingsongLi Thanks for reviewing.
If we try to track every change to remove useless changes , the code gets way too messy. We’d have to also check for updates on table and column comments. Plus, someone might move a column and then move it back (logically possible)—which basically means no change at all.

So, I think just comparing the final result is better. Calculating the new schema is super cheap, and it keeps the logic a lot simpler. What do you think?

@JingsongLi
Copy link
Copy Markdown
Contributor

JingsongLi commented Mar 31, 2026

@zhongyujiang I think the changes may not be significant. Can you give it a try?

@zhongyujiang zhongyujiang force-pushed the schema-update-noop branch 4 times, most recently from 9be6bc3 to 4145de2 Compare April 10, 2026 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants