feat(catalog/sql): implement TransactionalCatalog to support atomic multi-table transactions#925
Conversation
…port atomic multi-table commits
zeroshade
left a comment
There was a problem hiding this comment.
Looks good to me, can we just add two more test cases?
- Test cases that write data for the transactions
- The current tests which test failure and rollback use bogus requirements, can we have a test case for a version conflict? Such as another transaction being committed between when the transaction is started and when the multitable transaction is attempted? i.e. version conflicts?
laskoviymishka
left a comment
There was a problem hiding this comment.
Great work and nice shape overall!
Splitting I/O outside the DB tx and doing the pointer swap inside with per-row OCC mirrors CommitTable cleanly, and keeping internal.UpdateAndStageTable in the staging path means requirements validation stays consistent with single-table commits.
Couple things worth discussing before merge.
First, the orphaned-metadata-on-rollback behavior is documented in the code comment here but not on `catalog.TransactionalCatalog itself — it's really a cross-implementation property (any 2PC backend will leak on Phase-2 failure), so worth lifting to the interface godoc so REST/SQL/future backends agree on the contract.
Second, for the "all-or-nothing" claim: since LoadTable reads are outside withWriteTx, we have commit-point atomicity rather than preparation isolation, and per-row OCC is what actually guards us. A note in the method doc calling this out, plus confirming withWriteTx isolation is sufficient on Postgres/MySQL, not just SQLite WAL, would set expectations clearly.
Happy to land as-is with the docs tightened, but a concurrency test on a non-SQLite backend would meaningfully raise my confidence here.
| } | ||
|
|
||
| if n == 0 { | ||
| return fmt.Errorf("table has been updated by another process: %s.%s", sc.ns, sc.tblName) |
There was a problem hiding this comment.
think wrapping this in a typed sentinel (say catalog.ErrCommitConflict, or reusing something shared with REST) would let MultiTableTransaction and future retry wrappers do errors.Is across backends
right now REST returns rest.ErrCommitFailed-wrapped errors and SQL returns a stringly-typed fmt.Errorf, so callers can't distinguish "conflict, retry me" from "I/O failure" portably.
Worth doing here so we don't bake divergent error contracts into each catalog
There was a problem hiding this comment.
Great point. I’ve added catalog.ErrCommitFailed to align with Java/Python and ensure a consistent error contract. Now the SQL catalog wraps failures with this error, making it portable for errors.Is checks
across different backends.
| } | ||
| } | ||
|
|
||
| func (s *SqliteCatalogTestSuite) TestCommitTransactionPartialRollback() { |
There was a problem hiding this comment.
i think this test actually fails during Phase 1 staging (the bogus AssertTableUUID is rejected before we ever enter withWriteTx), so the DB rollback path the name promises never runs.
Worth adding one that forces a Phase-2 failure after at least one UPDATE has landed in the tx, e.g. mutate one table's metadata_location via raw SQL between stage and commit so the second UPDATE sees zero rows.
That's the invariant the PR is adding.
…ort commits by identifier in SQL TransactionalCatalog to avoid deadlocks.
|
Hi @zeroshade @laskoviymishka , thanks for the reviews. I’ve updated the PR to address your feedback:
|
laskoviymishka
left a comment
There was a problem hiding this comment.
Almost there. One blocking thing before merge:
CommitTransaction will nil-deref on a missing table. Phase 1 swallows ErrNoSuchTable and leaves current = nil, but Phase 2 no longer guards against that and calls sc.current.MetadataLocation(). This method is for updating existing tables: drop the errors.Is(err, catalog.ErrNoSuchTable) branch and just return the error.
Everything else I asked for is in:
- Orphaned-metadata note lifted to the TransactionalCatalog interface godoc
- "Commit-point atomicity" + per-row OCC wording in the method doc
- catalog.ErrCommitFailed wrapping, portable across backends
- TestCommitTransactionDeterministicRollback, exactly the Phase-2 invariant test I asked for; the mockFS→DB mutation trick is clean
- Identifier sorting to prevent deadlocks on Postgres/MySQL (reasonable mitigation while a non-SQLite test harness is deferred)
Fix the nil-deref and this is good to land.
| tblName := catalog.TableNameFromIdent(commit.Identifier) | ||
|
|
||
| current, err := c.LoadTable(ctx, commit.Identifier) | ||
| if err != nil && !errors.Is(err, catalog.ErrNoSuchTable) { |
There was a problem hiding this comment.
This swallows ErrNoSuchTable and leaves current = nil. But Phase 2 no longer guards against nil: sc.current.MetadataLocation() will panic.
CommitTransaction is for updating existing tables, not creating. Drop the errors.Is branch:
if err != nil {
return err
}
There was a problem hiding this comment.
Thanks for the sharp catch on the nil-dereference! I’ve updated the code to return the error immediately if a table is missing, exactly as you suggested.
This PR implements the catalog.TransactionalCatalog interface for the SQL catalog to support atomic, all-or-nothing commits across multiple tables.
Key implementations:
Two-phase commit: Metadatas validations and I/O writes are performed outside the DB transaction to minimize database lock contention.
Optimistic Concurrency Control (OCC): Performs an atomic DB update using WHERE metadata_location = ? to prevent concurrent modification issues.
Add comprehensive Unit & Integration tests for multi-table rollbacks and high-level wrappers.