Skip to content

Conversation

@oneby-wang
Copy link
Contributor

Fixes #25083

Motivation

Use client-side looping instead of increasing broker settings to avoid potential HTTP call timeout in analyze-backlog method of Topics.

Modifications

Add client-side looping, add test.

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: oneby-wang#22

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 6, 2026
@oneby-wang oneby-wang marked this pull request as draft January 6, 2026 04:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds client-side looping to the analyzeSubscriptionBacklog method to avoid HTTP call timeouts when analyzing large backlogs. Instead of relying solely on broker-side configuration limits, the client can now iteratively call the analyze-backlog endpoint and merge results until completion or a specified entry limit is reached.

  • Adds new overloaded methods with a backlogScanMaxEntries parameter to control client-side loop termination
  • Implements iterative scanning logic that merges results from multiple server calls
  • Includes comprehensive integration tests covering various scenarios
  • Modifies OpScan behavior by removing individual deleted entry filtering

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
pulsar-client-admin-api/src/main/java/org/apache/pulsar/client/admin/Topics.java Adds new API methods with backlogScanMaxEntries parameter and detailed JavaDoc documentation
pulsar-client-admin/src/main/java/org/apache/pulsar/client/admin/internal/TopicsImpl.java Implements client-side looping logic with result merging and next position calculation
pulsar-broker/src/test/java/org/apache/pulsar/broker/admin/AnalyzeBacklogSubscriptionTest.java Adds comprehensive integration tests for the new looping behavior
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/OpScan.java Removes filterReadEntries call and adjusts entry counting logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +72 to 83
for (Entry entry : entries) {
if (remainingEntries.getAndDecrement() <= 0) {
log.warn("[{}] Scan abort after reading too many entries", OpScan.this.cursor);
callback.scanComplete(lastSeenPosition, ScanOutcome.ABORTED, OpScan.this.ctx);
return;
}
if (!condition.test(entry)) {
log.warn("[{}] Scan abort due to user code", OpScan.this.cursor);
callback.scanComplete(lastSeenPosition, ScanOutcome.USER_INTERRUPTED, OpScan.this.ctx);
return;
}
}
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fundamentally alters the behavior of the scan operation by removing the filtering of individually deleted entries. The removed code filtered out entries that were already individually deleted and didn't count them against the scan limits. Without this filtering, deleted entries are now counted and processed, which may cause the scan to abort prematurely when hitting the entry limit. This appears to be an unintended change that's not mentioned in the PR description, which only discusses adding client-side looping to avoid HTTP timeouts. Consider whether this behavioral change is intentional and document it if so.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

@oneby-wang oneby-wang Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter process is already handled by OpReadEntry. I think filtering is not needed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs

Projects

None yet

1 participant