Skip to content

fix: surface state:modified nodes in summary lineage graph#1192

Open
dtaniwaki wants to merge 1 commit intoDataRecce:mainfrom
dtaniwaki:feat/macro-change-detection
Open

fix: surface state:modified nodes in summary lineage graph#1192
dtaniwaki wants to merge 1 commit intoDataRecce:mainfrom
dtaniwaki:feat/macro-change-detection

Conversation

@dtaniwaki
Copy link
Contributor

Thanks for maintaining this project! I'd appreciate a review of this bug fix when you have a chance.

PR checklist

  • Ensure you have added or ran the appropriate tests for your PR.
  • DCO signed

What type of PR is this?

fix

What this PR does / why we need it:

When a dbt macro or project config changes, state:modified correctly returns the affected downstream nodes — even if their SQL body (and thus file checksum) is unchanged. However, generate_markdown_summary was building the lineage graph from checksums alone, never passing the lineage_diff.diff that the adapter had already computed. As a result, nodes changed only by macros or configs were silently invisible in the PR summary comment.

The fix is a one-line change in generate_markdown_summary: pass lineage_diff.diff to _build_lineage_graph. A small supporting mechanism (apply_diff / _forced_change_status) lets the graph override the checksum-based status for nodes that state:modified flagged but whose SQL didn't change.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:

The diff dict was already being populated correctly by the adapter (it calls state:modified and maps each node to a NodeDiff). The bug was purely in the summary layer not consuming it.

Does this PR introduce a user-facing change?:

Nodes modified by macro or config changes (with no SQL body change) now correctly appear as changed in the Recce PR summary comment lineage graph.

Signed-off-by: Daisuke Taniwaki <daisuketaniwaki@gmail.com>
@dtaniwaki dtaniwaki changed the title feat: detect macro and config changes in lineage diff and summary fix: surface state:modified nodes in summary lineage graph Mar 9, 2026
@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 98.73418% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
recce/summary.py 91.66% 1 Missing ⚠️
Files with missing lines Coverage Δ
tests/test_summary.py 100.00% <100.00%> (ø)
recce/summary.py 66.40% <91.66%> (+1.07%) ⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the PR summary lineage graph so it can surface state:modified nodes even when their SQL checksum hasn’t changed (e.g., macro/config-only changes), by plumbing the adapter-computed lineage_diff.diff into the summary graph builder and allowing that diff to override checksum-derived status.

Changes:

  • Pass lineage_diff.diff into _build_lineage_graph from generate_markdown_summary.
  • Add a Node.apply_diff() override mechanism (_forced_change_status) so externally computed diffs can drive change_status.
  • Add unit tests covering apply_diff behavior and _build_lineage_graph(..., diff=...).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
recce/summary.py Allows lineage graph node status to be overridden by adapter diff and wires diff into markdown summary generation.
tests/test_summary.py Adds tests ensuring diff-driven “modified” nodes appear in the graph and in “What’s Changed”.

Comment on lines +287 to +288
graph_no_diff = _build_lineage_graph(curr_lineage, base_lineage)
graph_with_none = _build_lineage_graph(curr_lineage, base_lineage, None)
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this new test, _build_lineage_graph’s signature is (base, current, ...), but the call passes curr_lineage as the first argument and base_lineage as the second. That reverses the meaning of added/removed nodes and can hide regressions because modified_set doesn’t distinguish added vs removed. Please swap the arguments to _build_lineage_graph(base_lineage, curr_lineage, ...) in both calls here so the test matches production usage (see generate_markdown_summary calling it with lineage_diff.base then lineage_diff.current).

Suggested change
graph_no_diff = _build_lineage_graph(curr_lineage, base_lineage)
graph_with_none = _build_lineage_graph(curr_lineage, base_lineage, None)
graph_no_diff = _build_lineage_graph(base_lineage, curr_lineage)
graph_with_none = _build_lineage_graph(base_lineage, curr_lineage, None)

Copilot uses AI. Check for mistakes.
@gcko
Copy link
Contributor

gcko commented Mar 10, 2026

Code Review — PR 1192

Summary

Fixes generate_markdown_summary to pass the adapter's pre-computed lineage_diff.diff into _build_lineage_graph, so nodes flagged by state:modified (e.g., macro/config-only changes) are correctly surfaced in the PR summary lineage graph. Clean, minimal fix with good test coverage.

Findings

[Warning] "Code" label is misleading for macro/config-only changes

File: recce/summary.py:148-149
Issue: When a node is force-marked as modified via apply_diff (because a macro or project config changed, not the SQL body), _what_changed() labels it as "Code" changed. This is technically inaccurate — the node's raw SQL didn't change; a macro or config dependency did. Users may look at the node's SQL diff and see no difference, which could be confusing.
Suggestion: Consider adding a distinct label like "Macro" or "Config" (would require extending NodeDiff to carry the reason). Not blocking since the current behavior is a clear improvement over nodes being completely invisible, and this can be addressed as a follow-up.

[Warning] Pre-existing shared mutable class-level defaults in LineageGraph

File: recce/summary.py:260-262
Issue: LineageGraph defines nodes and edges as class-level mutable dicts (nodes: Dict[str, Node] = {}). Without an __init__ that creates instance-level copies, all LineageGraph instances share the same dict. This is pre-existing (not introduced by this PR), but the new tests in TestBuildLineageGraphWithDiff.test_diff_marks_state_modified_nodes create two graphs sequentially — the second graph silently mutates state from the first. The assertions happen to pass because they run in sequence, but this is fragile and could cause test pollution if test ordering changes or tests run in parallel.
Suggestion: Not blocking for this PR since it's pre-existing. Worth a follow-up to add __init__ to LineageGraph that initializes instance-level dicts.

Verification

  • Tests: 26/26 passed (tests/test_summary.py)
  • Lint: flake8 clean
  • New test coverage: TestNodeApplyDiff (2 tests), TestWhatChanged (1 test), TestBuildLineageGraphWithDiff (3 tests) — all exercise the new code paths including the backward-compatibility check with real manifests.

Verdict

No critical issues found. The core fix (passing lineage_diff.diff to _build_lineage_graph) is correct and well-tested. The warnings above are non-blocking suggestions for follow-up.

@gcko
Copy link
Contributor

gcko commented Mar 10, 2026

@dtaniwaki thanks for the PR! Can you please resolve the conflict in tests/test_summary.py and look at addressing the copilot and findings above?

If you feel the issues are not valid, please say as such - and address them as necessary.

@gcko gcko self-requested a review March 10, 2026 01:46
Copy link
Contributor

@gcko gcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my above comment #1192 (comment)

@gcko gcko added the bug Something isn't working label Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants