Skip to content

[fix][broker] fix error process in admin.updateTopicPartition#25042

Open
TakaHiR07 wants to merge 1 commit intoapache:masterfrom
TakaHiR07:fix_update_partition
Open

[fix][broker] fix error process in admin.updateTopicPartition#25042
TakaHiR07 wants to merge 1 commit intoapache:masterfrom
TakaHiR07:fix_update_partition

Conversation

@TakaHiR07
Copy link
Copy Markdown
Contributor

Main Issue: #25041

Motivation

fix the unreasonable update partition process

Modifications

recover the topic partition process to :

  1. create missing partitions
  2. create subscription for new partition
  3. update topic metadata

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the partition update process in PersistentTopicsBase.internalUpdatePartitionedTopicAsync by reordering operations to prevent inconsistent states. Previously, metadata was updated before creating partitions and subscriptions, which could lead to the metadata indicating more partitions exist than are actually created if subsequent operations failed. The fix ensures metadata is updated only after successfully creating partitions and subscriptions.

Key changes:

  • Reordered the partition update sequence: create partitions → create subscriptions → update metadata (previously: update metadata → create partitions → create subscriptions)
  • Code formatting and indentation improvements for better readability

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mattisonchao
Copy link
Copy Markdown
Member

Hi @TakaHiR07

Could you help write a test for it?

@TakaHiR07
Copy link
Copy Markdown
Contributor Author

TakaHiR07 commented Dec 12, 2025

Could you help write a test for it?

@mattisonchao I find that AdminApi2Test#testFailedUpdatePartitionedTopic can not pass. The reason is some other pr such as #24118 on master-branch is implement based on the new process of admin.updateTopicPartition.

It seems that this pr maybe is not a good way for master-branch, but only can use in branch-3.0. Since I don't know how many new pr on master-branch are designed based on new process.

@Shawyeok
Copy link
Copy Markdown
Contributor

Shawyeok commented Mar 9, 2026

@TakaHiR07 @mattisonchao

This pr would reintroduce a silent, unrecoverable failure mode. Here is why:

The root cause of #25041 is that the update-partition-count operation is not atomic — it can fail mid-way with a persistent side effect. The key difference between the old and new ordering:

Old order (pre #19166, what this PR reverts to):

  1. tryCreatePartitionsAsync(N) — writes managed-ledger ZK nodes for new partitions
  2. createSubscriptions(N)⚠️ failure here leaves orphaned ZK nodes
  3. Update partition metadata

When step 2 fails (e.g. due to bundle not yet assigned, causing 307 redirect exhaustion), ZK nodes for the new partitions already exist but the metadata still shows the old count. These orphaned nodes appear as non-partitioned topics, causing any retry to permanently fail with:

"Already have non partition topic … could cause conflict"

New order (introduced in #19166, what this PR reverts):

  1. updatePartitionedTopicAsync(N) — update metadata first
  2. tryCreatePartitionsAsync(N) — idempotent ZK writes
  3. createSubscriptionAsync(...)ConflictException is tolerated

With metadata updated first, any mid-operation failure leaves the system in a self-consistent, retryable state — the metadata already reflects N partitions, so no orphaned nodes exist outside the valid range.

Although the operator can still retry if the endpoint returns an exception, the old ordering makes retries permanently broken rather than transiently failing. This was verified on 2.8.1 clusters in my company.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants