Skip to content

[refactor](profile) Decouple profile node reports from display tree#63299

Open
foxtail463 wants to merge 2 commits into
apache:masterfrom
foxtail463:refactor/profile-structured-node-report
Open

[refactor](profile) Decouple profile node reports from display tree#63299
foxtail463 wants to merge 2 commits into
apache:masterfrom
foxtail463:refactor/profile-structured-node-report

Conversation

@foxtail463
Copy link
Copy Markdown
Contributor

Problem Summary:

BE previously reported pipeline profiles as an ordered list, and FE inferred the
meaning of each profile node from its position in that list. The first element
was treated as the fragment-level profile, while the remaining elements were
treated as pipeline profiles.

This couples the profile transport protocol to the display tree layout and makes
the logic fragile when the profile tree structure changes. It also forces FE to
recover semantic information from implicit ordering instead of using explicit
metadata.

This change adds structured profile node reports with explicit node type and
pipeline id. BE now reports fragment-level and pipeline-level profiles with
their semantic type, and FE builds the RuntimeProfile display tree from that
structured data. The aggregated multi-BE pipeline profile only consumes pipeline
reports.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morrySnow
Copy link
Copy Markdown
Contributor

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found blocking rolling-upgrade compatibility issues in the profile-reporting protocol.

Critical checkpoint conclusions:

  • Goal and tests: The profile-node typing change is partially covered by a new FE unit test, but mixed-version FE/BE compatibility is not covered and currently regresses profile reporting during rolling upgrades.
  • Scope: The core profile struct change is focused, but compatibility handling for the old fragment_id_to_profile field and old FE exec-status behavior is missing.
  • Concurrency: The reviewed changes do not introduce new shared mutable state requiring new lock ordering; existing profile maps remain protected by their existing mutexes/locks.
  • Lifecycle/static initialization: No new static/global lifecycle hazards were found in the changed code.
  • Configuration: No new configuration items were added.
  • Compatibility: Blocking issues found for both old-BE/new-FE and new-BE/old-FE profile reporting during rolling upgrades.
  • Parallel paths: Both legacy fragment_id_to_profile and new fragment_id_to_profile_node_reports profile transport paths need to be handled while the old thrift field still exists.
  • Tests: Missing mixed-version compatibility coverage for profile reporting.
  • Observability: Existing logs are sufficient for investigation, but the failures would surface only as dropped/rejected profile updates or status-less exec report handling.
  • Transaction/persistence/data correctness: No direct transaction or storage visibility changes found.
  • Performance: No blocking performance regression found in the changed paths.
  • User focus: No additional user-provided review focus was specified.

Because these issues can break profile reporting or exec-status handling in rolling upgrades, I am requesting changes.

Comment thread be/src/runtime/runtime_query_statistics_mgr.cpp Outdated
if (!profile.isSetFragmentIdToProfile()) {
LOG.warn("{} FragmentIdToProfile is not set", DebugUtil.printId(profile.getQueryId()));
return new Status(TStatusCode.INVALID_ARGUMENT, "FragmentIdToProfile is not set");
if (!profile.isSetFragmentIdToProfileNodeReports()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes a new FE reject profile reports from old BEs during a rolling upgrade. The thrift struct still has the legacy fragment_id_to_profile field, and old BEs will continue to send only that field until they are upgraded; with this guard, updateProfile() returns INVALID_ARGUMENT and drops those profiles instead of translating the old TDetailedReportParams format. Please keep a fallback for fragment_id_to_profile until mixed-version reporting is no longer supported.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The impact is limited to profile visibility during rolling
upgrade. Query execution status, load progress and correctness are not affected.
Since this report path is only for profile collection and the new structured field is
the intended format after decoupling profile node reports from the display tree, I
would prefer not to add legacy fallback

@foxtail463
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.06% (1854/2375)
Line Coverage 64.52% (33329/51653)
Region Coverage 65.23% (16527/25335)
Branch Coverage 55.71% (8829/15848)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants