High priority MDL_BACKUP_COMMIT_HIGH_PRIO and Slave-side BACKUP-COMMIT tracking to prevent parallel-slave/backup deadlock by hemantdangi-gc · Pull Request #5123 · MariaDB/server

hemantdangi-gc · 2026-05-25T12:32:46Z

A parallel-slave worker that has acquired MDL_BACKUP_COMMIT and is
waiting for a prior commit can be deadlocked by an incoming
BACKUP STAGE BLOCK_COMMIT: the prior commit needs MDL_BACKUP_COMMIT too
but the standard fairness rule denies it while the backup X-request is
in the waiting queue.

Design

Add MDL_BACKUP_COMMIT_HIGH_PRIO, a sibling lock mode of
MDL_BACKUP_COMMIT. The only matrix difference is one cell of
m_waiting_incompatible: HIGH_PRIO is granted even when a BACKUP
X-request is waiting.

A committing parallel-slave worker:

registers its sub_id in backup_team,
queries has_higher() — true iff a teammate in the same domain has
a strictly greater sub_id,
requests MDL_BACKUP_COMMIT_HIGH_PRIO if true, otherwise
MDL_BACKUP_COMMIT,
unregisters after release_lock().

Register-before-decide ordering ensures that any teammate arriving
later observes our sub_id, and we observe at least our own back.

How it solves the problem

When backup's X-request is in the waiting queue and a worker that is
needed to unblock a prior commit arrives, that worker observes a
later-numbered teammate already holding the BACKUP lock, requests
HIGH_PRIO, and is granted past the X-request via the matrix. The
prior-commit chain drains; the backup X-request is then granted.

The teammate approach made the MDL scheduler call a slave-supplied callback during lock acquisition: when MDL_BACKUP_COMMIT was blocked, MDL itself walked m_granted.
This new approach inverts the ownership: the slave maintains its own per-(rli, domain) multiset of in-flight gtid_sub_ids on Relay_log_info, and before requesting the lock, the worker reads this slave-owned state to pick its lock mode — MDL_BACKUP_COMMIT_HIGH_PRIO if a teammate with greater sub_id is registered, else MDL_BACKUP_COMMIT. MDL then just consults its compatibility/priority matrices like for any other lock; the leapfrog is a pure matrix property of the new mode, not a scheduler-side override.

…backup.. The problem the patch tackles is that the binlogging parallel slave can expose prepared state of transactions to the backup process. That is the latter can enroll such transactions, typically waiting for prior commits - in other words *undecided* (of whether they are going to commit at all - sic!), into backup image. The technical possibility of that owes to earlier deadlocks fixes in this area that made the parallel slave worker to re-acquire its BACKUP MDL within the wait-for-prior commit phase (which is post- engine prepare) one. Note the --skip-log-bin or --log-slave-update=OFF parallel slave is not vulnerable to the exposure of its prepared transactions. The fixes this patch provides make sure the backup image contains only transactions that are (binlog-order) committed, while parallel slave does not deadlock (MDEV-23586). The principal part of the fixes implements leapfrogging of a waiting "high-priority" BACKUP MDL by a slave parallel worker which is exemplified in the following. First mind about notations: here (as elsewhere in the patch) `Fn` group commit followers indexed by the gtid seq-no `n`; `F|xyz` indicates the execution status (timing) of the MDL request. `B` - stands for backup thread; `Si,Xk` are BACKUP share and exclusive MDL sequenced by `i,k` that are logical times of the MDL acquisition; The semi-column separates the granted head of the lock queue from the waiters. The queue below is the BACKUP MDL queue. | which initially is | | [Initial Queue State] | GRANTED ; WAITING QUEUE | ---------------------------------- | S1(F2) ; X2(B) | | Next it receives | <- Incoming Request: S3(L1) | the scheduler having the following context | [Scheduler Evaluation] | Requestor: L1 (gtid_sub_id = 1) | Holder : F2 (gtid_sub_id = 2) | Condition: holder->gtid_sub_id > requestor->gtid_sub_id (2 > 1) -> TRUE! | Action : ignore_mdl_priority = true. L1 bypasses the waiting X2 lock. | | decides where to place it | | [Final Queue State] | GRANTED ; WAITING QUEUE | ---------------------------------- | S1(F2), S3(L1) ; X2(B) | so it chooses to team up L1,F2 correctly. Any next S4(F_n), where n > 2 | won't be granted which means F_n will wait for the backup completion. | Like this | | [Final Queue State] | GRANTED ; WAITING QUEUE | ---------------------------------- | S1(F2), S3(L1) ; X2(B),S4(F3) The decision to allow S3 leapfrog X2 requires the current lock holder F2 to have a greater gtid_sub_id (2 > 1). To implement this scheduling policy requires the following modifications. P1 sql/mdl.h The new method determines whether a BACKUP lock requester is a teammate of a group that can share the MDL lock and that there exists at least on granted member of the group. +MDL_request::bool (*is_teammate_callback)(const THD*, const THD*); P2. sql/log.cc dismantling of mdl_context.{release,acquire}_lock() in the parallel worker wait-for-prior commit P3. sql/handler.cc setting the P1 MDL_request::is_teammate_callback to the slave parallel worker as hint to try acquiring the MDL lock by team membership. P4. sql/rpl_rli.cc defines the teammate callback for the parallel slave. +rpl_group_info::ignore_mdl_priority P5. sql/handler.cc Logics of "careful" release of the MDL lock could not be streamlined. The failed parallel slave worker still has to abide with the former policy of ``` as there is extra replication book-keeping to be done before rolling back and allowing a conflicting transaction to continue (MDEV-7458).``` [ha_commit_trans]. For that reason of MDEV-7458 the BACKUP MDL request's memory is now allocated for the worker in THD::st_transaction::mem_root. Also guards are deployed to not let the mem-root be be cleaned too early, before the lock gets released. It can be released fast to follow with transaction->cleanup() for not failing trx:s. Failing to commit parallel workers defer that to Relay_log_info::cleanup_context() from where now ha_rollback_trans() will make it. Aslo necessary changes are caused by to the gtid implicit statement's design. 1. sql/sql_class.cc Removed earlier backup-on-parallel-slave bugfixes of MDEV-23586. The parallel worker does not release anymore BACKUP MDL at its wait-for-prior commit stage. 2. sql/sql_lex.h, sql/sql_base.cc, sql/rpl_gtid.cc Has to be introduced +#define TL_OPTION_GTID_TABLE_SLAVE 64 as a part of a method to find out (see open_table() hunk) at executing record_gtid() by implicit GTID statement that its BACKUP MDL lock (however it is necessary - it is not challenged here) needs the ignore-priority hint. Note this hint applies for granting a "special" S' share locked not yet the commit time ha_commit_trans()'s MDL_BACKUP_COMMIT (denoted as S) lock, which is going to be requested later. Let's exemplify it on the following diagrams, calling S' a sort of shared lock, compatible with with S of the commit time. *Without* the teammate hint the request for S'3(F2) at record_gtid() time S1(F3|wait_for_prior_commit); X2(B) <- S'3(F2|record_gtid) would end up to wait behind X2 S1; /* waiting */ X2, S'3 and we'd be regress back to hang/deadlock of earlier bugs: F3 would not be awaken by F2 who is blocked by B. Tested with extended set of parallel slave with backup scenarios. This commit must require more and extensive testing and may cause followup amendments, through review comments as well. Co-authored-with: hemant.dangi@mariadb.com.

When the standard request of MDL_BACKUP_COMMIT is not granted to a worker it first searches for a possible grantee that is a parallel slave worker and if there exist one with greater commit order the request type is escalate to the highest which is re-submitted. To the changes in sql_base.cc, note there also exist a request of the highest (new) type at time of Xid_apply_log_event::do_record_gtid which is a committing phase of slave transaction when it is executing an implicit statement. It must be able to bypass any waiting lock by backup process even without yet knowing whether the transaction is going to be enrolled into a pending to commit slave group. When later it turns out not the case the strong MDL_STATEMENT requested lock would've been already relinquished. Tested with previous commit's refined and extended BASE's rpl.parallel_backup, rpl.rpl_parallel_backup_worker_retry and new added rpl.rpl_parallel_backup_waits_worker_retry Todo: According to Svoj's review, this patch still needs removing the slave specific code from the MDL. That is changes in MDL_context::acquire_lock(). Probably ha_commit_trans() would call first try_acquire_lock_impl() when it returns empty ticket a part of the added to acquire_lock Teammate-Aware Escalation Check would be invoked just to compute `has_teammate` and when that is true to call the whole BASE version of MDL_context::acquire_lock() having the highest type of request.

CLAassistant · 2026-05-25T12:32:53Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ hemantdangi-gc
❌ emoonrain
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

gemini-code-assist

Code Review

This pull request addresses deadlocks between parallel replication and backup operations (MDEV-36025) by introducing a high-priority Metadata Lock (MDL_BACKUP_COMMIT_HIGH_PRIO) and a Backup_commit_team mechanism. Slave workers now register their gtid_sub_id and can escalate their lock priority to bypass waiting backup locks when a teammate with a higher ID is already in flight, replacing previous logic that released locks during waits. Feedback was provided to optimize memory usage in the Backup_commit_team class by erasing empty domain entries from the internal map and simplifying the is_empty function accordingly.

tracking to prevent parallel-slave/backup deadlock A parallel-slave worker that has acquired MDL_BACKUP_COMMIT and is waiting for a prior commit can be deadlocked by an incoming BACKUP STAGE BLOCK_COMMIT: the prior commit needs MDL_BACKUP_COMMIT too but the standard fairness rule denies it while the backup X-request is in the waiting queue. Design ------ Add MDL_BACKUP_COMMIT_HIGH_PRIO, a sibling lock mode of MDL_BACKUP_COMMIT. The only matrix difference is one cell of m_waiting_incompatible: HIGH_PRIO is granted even when a BACKUP X-request is waiting. A committing parallel-slave worker: 1. registers its sub_id in backup_team, 2. queries has_higher() — true iff a teammate in the same domain has a strictly greater sub_id, 3. requests MDL_BACKUP_COMMIT_HIGH_PRIO if true, otherwise MDL_BACKUP_COMMIT, 4. unregisters after release_lock(). Register-before-decide ordering ensures that any teammate arriving later observes our sub_id, and we observe at least our own back. How it solves the problem ------------------------- When backup's X-request is in the waiting queue and a worker that is needed to unblock a prior commit arrives, that worker observes a later-numbered teammate already holding the BACKUP lock, requests HIGH_PRIO, and is granted past the X-request via the matrix. The prior-commit chain drains; the backup X-request is then granted.

hemantdangi-gc · 2026-05-25T17:26:59Z

@svoj
Please review:
High priority MDL_BACKUP_COMMIT_HIGH_PRIO and Slave-side BACKUP-COMMIT tracking to prevent parallel-slave/backup deadlock
a20023b

emoonrain added 2 commits May 20, 2026 16:18

gemini-code-assist Bot reviewed May 25, 2026

View reviewed changes

Comment thread sql/rpl_rli.cc Outdated

Comment thread sql/rpl_rli.cc Outdated

hemantdangi-gc force-pushed the bb-10.11_MDEV-36025_1 branch from db6a6c6 to a20023b Compare May 25, 2026 16:10

hemantdangi-gc requested a review from svoj May 25, 2026 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High priority MDL_BACKUP_COMMIT_HIGH_PRIO and Slave-side BACKUP-COMMIT tracking to prevent parallel-slave/backup deadlock#5123

High priority MDL_BACKUP_COMMIT_HIGH_PRIO and Slave-side BACKUP-COMMIT tracking to prevent parallel-slave/backup deadlock#5123
hemantdangi-gc wants to merge 3 commits into
10.11from
bb-10.11_MDEV-36025_1

hemantdangi-gc commented May 25, 2026 •

edited

Loading

Uh oh!

CLAassistant commented May 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

hemantdangi-gc commented May 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

hemantdangi-gc commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design

How it solves the problem

Uh oh!

CLAassistant commented May 25, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

hemantdangi-gc commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

hemantdangi-gc commented May 25, 2026 •

edited

Loading

hemantdangi-gc commented May 25, 2026 •

edited

Loading