[fix][broker] fix error process in admin.updateTopicPartition by TakaHiR07 · Pull Request #25042 · apache/pulsar

TakaHiR07 · 2025-12-05T10:27:06Z

Main Issue: #25041

Motivation

fix the unreasonable update partition process

Modifications

recover the topic partition process to :

create missing partitions
create subscription for new partition
update topic metadata

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

Documentation

doc
doc-required
doc-not-needed
doc-complete

Copilot

Pull request overview

This PR fixes the partition update process in PersistentTopicsBase.internalUpdatePartitionedTopicAsync by reordering operations to prevent inconsistent states. Previously, metadata was updated before creating partitions and subscriptions, which could lead to the metadata indicating more partitions exist than are actually created if subsequent operations failed. The fix ensures metadata is updated only after successfully creating partitions and subscriptions.

Key changes:

Reordered the partition update sequence: create partitions → create subscriptions → update metadata (previously: update metadata → create partitions → create subscriptions)
Code formatting and indentation improvements for better readability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mattisonchao · 2025-12-11T14:05:50Z

Hi @TakaHiR07

Could you help write a test for it?

TakaHiR07 · 2025-12-12T10:17:20Z

Could you help write a test for it?

@mattisonchao I find that AdminApi2Test#testFailedUpdatePartitionedTopic can not pass. The reason is some other pr such as #24118 on master-branch is implement based on the new process of admin.updateTopicPartition.

It seems that this pr maybe is not a good way for master-branch, but only can use in branch-3.0. Since I don't know how many new pr on master-branch are designed based on new process.

Shawyeok · 2026-03-09T01:00:33Z

@TakaHiR07 @mattisonchao

This pr would reintroduce a silent, unrecoverable failure mode. Here is why:

The root cause of #25041 is that the update-partition-count operation is not atomic — it can fail mid-way with a persistent side effect. The key difference between the old and new ordering:

Old order (pre #19166, what this PR reverts to):

tryCreatePartitionsAsync(N) — writes managed-ledger ZK nodes for new partitions
createSubscriptions(N) — ⚠️ failure here leaves orphaned ZK nodes
Update partition metadata

When step 2 fails (e.g. due to bundle not yet assigned, causing 307 redirect exhaustion), ZK nodes for the new partitions already exist but the metadata still shows the old count. These orphaned nodes appear as non-partitioned topics, causing any retry to permanently fail with:

"Already have non partition topic … could cause conflict"

New order (introduced in #19166, what this PR reverts):

updatePartitionedTopicAsync(N) — update metadata first
tryCreatePartitionsAsync(N) — idempotent ZK writes
createSubscriptionAsync(...) — ConflictException is tolerated

With metadata updated first, any mid-operation failure leaves the system in a self-consistent, retryable state — the metadata already reflects N partitions, so no orphaned nodes exist outside the valid range.

Although the operator can still retry if the endpoint returns an exception, the old ordering makes retries permanently broken rather than transiently failing. This was verified on 2.8.1 clusters in my company.

fix process in updatePartition

5610bc2

github-actions Bot added the doc-not-needed Your PR changes do not impact docs label Dec 5, 2025

BewareMyPower assigned TakaHiR07 Dec 8, 2025

BewareMyPower requested a review from Copilot December 8, 2025 10:58

Copilot started reviewing on behalf of BewareMyPower December 8, 2025 10:59 View session

Copilot AI reviewed Dec 8, 2025

View reviewed changes

lhotari requested review from mattisonchao and poorbarcode December 10, 2025 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][broker] fix error process in admin.updateTopicPartition#25042

[fix][broker] fix error process in admin.updateTopicPartition#25042
TakaHiR07 wants to merge 1 commit intoapache:masterfrom
TakaHiR07:fix_update_partition

TakaHiR07 commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

mattisonchao commented Dec 11, 2025

Uh oh!

TakaHiR07 commented Dec 12, 2025 •

edited

Loading

Uh oh!

Shawyeok commented Mar 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

TakaHiR07 commented Dec 5, 2025

Motivation

Modifications

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

mattisonchao commented Dec 11, 2025

Uh oh!

TakaHiR07 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shawyeok commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TakaHiR07 commented Dec 12, 2025 •

edited

Loading

Shawyeok commented Mar 9, 2026 •

edited

Loading