Skip to content

feat(catalog/sql): implement TransactionalCatalog to support atomic multi-table transactions#925

Open
xixipi-lining wants to merge 3 commits intoapache:mainfrom
xixipi-lining:feature/multi_table_transaction_sql
Open

feat(catalog/sql): implement TransactionalCatalog to support atomic multi-table transactions#925
xixipi-lining wants to merge 3 commits intoapache:mainfrom
xixipi-lining:feature/multi_table_transaction_sql

Conversation

@xixipi-lining
Copy link
Copy Markdown
Contributor

This PR implements the catalog.TransactionalCatalog interface for the SQL catalog to support atomic, all-or-nothing commits across multiple tables.

Key implementations:

Two-phase commit: Metadatas validations and I/O writes are performed outside the DB transaction to minimize database lock contention.
Optimistic Concurrency Control (OCC): Performs an atomic DB update using WHERE metadata_location = ? to prevent concurrent modification issues.
Add comprehensive Unit & Integration tests for multi-table rollbacks and high-level wrappers.

Copy link
Copy Markdown
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, can we just add two more test cases?

  1. Test cases that write data for the transactions
  2. The current tests which test failure and rollback use bogus requirements, can we have a test case for a version conflict? Such as another transaction being committed between when the transaction is started and when the multitable transaction is attempted? i.e. version conflicts?

Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work and nice shape overall!

Splitting I/O outside the DB tx and doing the pointer swap inside with per-row OCC mirrors CommitTable cleanly, and keeping internal.UpdateAndStageTable in the staging path means requirements validation stays consistent with single-table commits.

Couple things worth discussing before merge.

First, the orphaned-metadata-on-rollback behavior is documented in the code comment here but not on `catalog.TransactionalCatalog itself — it's really a cross-implementation property (any 2PC backend will leak on Phase-2 failure), so worth lifting to the interface godoc so REST/SQL/future backends agree on the contract.

Second, for the "all-or-nothing" claim: since LoadTable reads are outside withWriteTx, we have commit-point atomicity rather than preparation isolation, and per-row OCC is what actually guards us. A note in the method doc calling this out, plus confirming withWriteTx isolation is sufficient on Postgres/MySQL, not just SQLite WAL, would set expectations clearly.

Happy to land as-is with the docs tightened, but a concurrency test on a non-SQLite backend would meaningfully raise my confidence here.

Comment thread catalog/sql/sql.go Outdated
}

if n == 0 {
return fmt.Errorf("table has been updated by another process: %s.%s", sc.ns, sc.tblName)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think wrapping this in a typed sentinel (say catalog.ErrCommitConflict, or reusing something shared with REST) would let MultiTableTransaction and future retry wrappers do errors.Is across backends

right now REST returns rest.ErrCommitFailed-wrapped errors and SQL returns a stringly-typed fmt.Errorf, so callers can't distinguish "conflict, retry me" from "I/O failure" portably.

Worth doing here so we don't bake divergent error contracts into each catalog

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. I’ve added catalog.ErrCommitFailed to align with Java/Python and ensure a consistent error contract. Now the SQL catalog wraps failures with this error, making it portable for errors.Is checks
across different backends.

Comment thread catalog/sql/sql.go
Comment thread catalog/sql/sql.go Outdated
Comment thread catalog/sql/sql_test.go
}
}

func (s *SqliteCatalogTestSuite) TestCommitTransactionPartialRollback() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this test actually fails during Phase 1 staging (the bogus AssertTableUUID is rejected before we ever enter withWriteTx), so the DB rollback path the name promises never runs.

Worth adding one that forces a Phase-2 failure after at least one UPDATE has landed in the tx, e.g. mutate one table's metadata_location via raw SQL between stage and commit so the second UPDATE sees zero rows.

That's the invariant the PR is adding.

…ort commits by identifier in SQL TransactionalCatalog to avoid deadlocks.
@xixipi-lining
Copy link
Copy Markdown
Contributor Author

Hi @zeroshade @laskoviymishka , thanks for the reviews. I’ve updated the PR to address your feedback:

  1. Data Writing Tests: Added TestMultiTableTransactionWriteData to verify that multi-table transactions work correctly with actual data appends.
  2. Phase 2 Conflict Verification: Added TestCommitTransactionDeterministicRollback using a mockFS to simulate conflicts during the commit phase, ensuring DB atomicity.
  3. Non-SQLite Testing & Deadlocks: I’ve added identifier sorting to ensure a deterministic locking order and prevent deadlocks in row-level locking DBs (like Postgres/MySQL).
  4. Error Handling: Added catalog.ErrCommitFailed to provide a consistent error contract across backends, aligning with Java/Python implementations.

Copy link
Copy Markdown
Contributor

@laskoviymishka laskoviymishka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. One blocking thing before merge:

CommitTransaction will nil-deref on a missing table. Phase 1 swallows ErrNoSuchTable and leaves current = nil, but Phase 2 no longer guards against that and calls sc.current.MetadataLocation(). This method is for updating existing tables: drop the errors.Is(err, catalog.ErrNoSuchTable) branch and just return the error.

Everything else I asked for is in:

  • Orphaned-metadata note lifted to the TransactionalCatalog interface godoc
  • "Commit-point atomicity" + per-row OCC wording in the method doc
  • catalog.ErrCommitFailed wrapping, portable across backends
  • TestCommitTransactionDeterministicRollback, exactly the Phase-2 invariant test I asked for; the mockFS→DB mutation trick is clean
  • Identifier sorting to prevent deadlocks on Postgres/MySQL (reasonable mitigation while a non-SQLite test harness is deferred)

Fix the nil-deref and this is good to land.

Comment thread catalog/sql/sql.go Outdated
tblName := catalog.TableNameFromIdent(commit.Identifier)

current, err := c.LoadTable(ctx, commit.Identifier)
if err != nil && !errors.Is(err, catalog.ErrNoSuchTable) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This swallows ErrNoSuchTable and leaves current = nil. But Phase 2 no longer guards against nil: sc.current.MetadataLocation() will panic.

CommitTransaction is for updating existing tables, not creating. Drop the errors.Is branch:

  if err != nil {
      return err
  }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the sharp catch on the nil-dereference! I’ve updated the code to return the error immediately if a table is missing, exactly as you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants