Skip to content

Conversation

@Demogorgon314
Copy link
Member

Motivation

Long periods of metadata state: Unstable prevent ownership monitoring/cleanup, which can leave orphan service units and stale bundle ownership and trigger massive not served by this instance / Please redo the lookup spam.

The last metadata session event state is updated by one thread and read by another without a visibility guarantee, which can keep the broker stuck in an incorrect “Unstable” view and makes incident debugging difficult.

Modifications

  • Mark lastMetadataSessionEvent and lastMetadataSessionEventTimestamp as volatile to guarantee cross-thread visibility.
  • Enrich the “Skipping ownership monitor” warning with the last session event and timestamp.
  • Export new metrics for metadata state and last session event timestamp/age.
  • Add tests to assert the new metrics report expected values.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@Demogorgon314 Demogorgon314 requested a review from Copilot January 9, 2026 07:10
@Demogorgon314 Demogorgon314 self-assigned this Jan 9, 2026
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 9, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a critical visibility issue where metadata session state fields were not properly visible across threads, potentially causing brokers to remain stuck in an "Unstable" state and preventing ownership monitoring. The changes enhance observability by adding volatile modifiers, enriching log messages, and introducing new metrics.

  • Marked lastMetadataSessionEvent and lastMetadataSessionEventTimestamp as volatile to ensure proper cross-thread visibility
  • Enhanced warning log message to include session event details for better debugging
  • Added four new metrics to track metadata state and session event information

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
ServiceUnitStateChannelImpl.java Made session event fields volatile, enriched logging with session event details, and added 4 new metrics for metadata state observability
ServiceUnitStateChannelTest.java Added test for verifying metadata state metric values under different session event scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-not-needed Your PR changes do not impact docs ready-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants