Use query analyzer fields for artifact definition regeneration#8573
Use query analyzer fields for artifact definition regeneration#8573
Conversation
WalkthroughAdded a 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
94c75d3 to
c31ffd8
Compare
c31ffd8 to
c56e1d7
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/infrahub/proposed_change/tasks.py (1)
724-727:⚠️ Potential issue | 🔴 CriticalThe non-unique fallback still skips most existing artifacts.
run_proposed_change_pipeline()now fillsbranch_diff.subscribersvia_get_subscribers_from_diff(), which only fetches subscribers for modified node IDs. On this branch,get_subscribers_ids(InfrahubKind.ARTIFACT)therefore only contains artifacts attached to changed members, even though the non-unique case is supposed to re-render every artifact for this definition once any queried field changes. The current component test masks this by preloadingbranch_diff.subscriberswith all artifacts.Suggested fix
elif relevant_node_changes: # The query does not guarantee unique targets, so we cannot determine which # specific artifacts are affected. At least one relevant field changed, so we # must fall back to regenerating all artifacts for this definition to be safe. - impacted_artifacts = model.branch_diff.get_subscribers_ids(kind=InfrahubKind.ARTIFACT) + impacted_artifacts = list(artifacts_by_member.values())Also applies to: 774-778
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/infrahub/proposed_change/tasks.py` around lines 724 - 727, The current mapping artifacts_by_member is built only from existing_artifacts discovered via branch_diff.subscribers (populated by _get_subscribers_from_diff), so in the non-unique fallback we skip most artifacts; update the logic in run_proposed_change_pipeline() where artifacts_by_member is constructed (and the similar block at the 774-778 area) to ensure that when handling InfrahubKind.ARTIFACT in the non-unique case you query and include all artifact subscriber IDs (e.g. via get_subscribers_ids(InfrahubKind.ARTIFACT) or by fetching all artifacts for the impacted definition) and then populate artifacts_by_member[artifact.object.peer.id] = artifact.id for every returned artifact so every artifact for that definition will be re-rendered even if its member ID wasn’t in branch_diff.subscribers.
🧹 Nitpick comments (1)
backend/tests/component/proposed_change/conftest.py (1)
32-55: Makemake_node_diff()honor itsactionargument.The helper lets callers vary the node-level action, but every
NodeDiffElementis still emitted as"updated"with an updated summary. A future test that passes"added"or"removed"will silently exercise the wrong diff shape.Possible cleanup
NodeDiffElement( name=field_name, element_type=DiffElementType.ATTRIBUTE.value, - action="updated", - summary=NodeDiffSummary(added=0, updated=1, removed=0), + action=action, + summary=NodeDiffSummary( + added=1 if action == "added" else 0, + updated=1 if action == "updated" else 0, + removed=1 if action == "removed" else 0, + ), )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/tests/component/proposed_change/conftest.py` around lines 32 - 55, The helper make_node_diff currently hardcodes each NodeDiffElement.action to "updated" and uses an updated summary; change it so each element honors the make_node_diff action argument (e.g., element.action = action) and compute NodeDiffSummary per-element based on action: for "added" set added=1 updated=0 removed=0, for "removed" set removed=1 others 0, for "updated" set updated=1 others 0; update references to NodeDiffElement, NodeDiffSummary, and DiffElementType in make_node_diff accordingly so future tests passing "added"/"removed" produce the correct diff shape.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@backend/tests/component/proposed_change/test_validate_artifacts_generation.py`:
- Around line 85-103: Save the original service._client before assigning the
test SDK client and restore it after the fixture yields so process-wide state is
not left mutated; specifically, in the async fixture client capture
original_client = service._client, assign service._client = sdk_client, use the
existing dependency_provider.scope(build_client, ...) block to yield the
sdk_client, and then in a finally/teardown ensure service._client =
original_client is restored.
In `@changelog/`+15041d55.changed.md:
- Line 1: The release note sentence has agreement errors ("change prevent",
"change ... limit") and reads unfinished; update the changelog entry replacing
the current paragraph (starting "Updated process that regenerates artifacts
within a proposed change...") with the suggested rewritten sentence: "This
change prevents redundant artifact regenerations, reducing the number of
artifacts that need to be regenerated and speeding up the pipeline." Ensure the
old phrasing is removed and the new sentence is the complete entry.
---
Outside diff comments:
In `@backend/infrahub/proposed_change/tasks.py`:
- Around line 724-727: The current mapping artifacts_by_member is built only
from existing_artifacts discovered via branch_diff.subscribers (populated by
_get_subscribers_from_diff), so in the non-unique fallback we skip most
artifacts; update the logic in run_proposed_change_pipeline() where
artifacts_by_member is constructed (and the similar block at the 774-778 area)
to ensure that when handling InfrahubKind.ARTIFACT in the non-unique case you
query and include all artifact subscriber IDs (e.g. via
get_subscribers_ids(InfrahubKind.ARTIFACT) or by fetching all artifacts for the
impacted definition) and then populate
artifacts_by_member[artifact.object.peer.id] = artifact.id for every returned
artifact so every artifact for that definition will be re-rendered even if its
member ID wasn’t in branch_diff.subscribers.
---
Nitpick comments:
In `@backend/tests/component/proposed_change/conftest.py`:
- Around line 32-55: The helper make_node_diff currently hardcodes each
NodeDiffElement.action to "updated" and uses an updated summary; change it so
each element honors the make_node_diff action argument (e.g., element.action =
action) and compute NodeDiffSummary per-element based on action: for "added" set
added=1 updated=0 removed=0, for "removed" set removed=1 others 0, for "updated"
set updated=1 others 0; update references to NodeDiffElement, NodeDiffSummary,
and DiffElementType in make_node_diff accordingly so future tests passing
"added"/"removed" produce the correct diff shape.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 140260b8-a685-4d70-8c64-5733e7fedc20
📒 Files selected for processing (7)
backend/infrahub/graphql/analyzer.pybackend/infrahub/proposed_change/tasks.pybackend/tests/adapters/workflow.pybackend/tests/component/proposed_change/__init__.pybackend/tests/component/proposed_change/conftest.pybackend/tests/component/proposed_change/test_validate_artifacts_generation.pychangelog/+15041d55.changed.md
c56e1d7 to
96987a4
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/infrahub/proposed_change/tasks.py (1)
853-860: Normalize the touched docstring to the backend style.This is still free-form prose rather than the Google-style
Args/Returnslayout used in backend Python files.As per coding guidelines,
**/*.py: Follow Google-style docstring format in Python; Include Args/Parameters section in Python docstrings without typing information; Include Returns section in Python docstrings.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/infrahub/proposed_change/tasks.py` around lines 853 - 860, Update the free-form docstring for the function that decides whether to generate an artifact (references artifact_id, managed_branch, impacted_artifacts) to Google-style format: add an Args section listing parameters (artifact_id, managed_branch, impacted_artifacts, etc.) with brief descriptions (no type annotations), and a Returns section describing the boolean result and meaning; preserve the same logic notes (when it returns True/False) but move them into the Returns description or brief bullet points under Args/Returns as appropriate to match other backend Python docstrings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/infrahub/proposed_change/tasks.py`:
- Around line 853-860: Update the free-form docstring for the function that
decides whether to generate an artifact (references artifact_id, managed_branch,
impacted_artifacts) to Google-style format: add an Args section listing
parameters (artifact_id, managed_branch, impacted_artifacts, etc.) with brief
descriptions (no type annotations), and a Returns section describing the boolean
result and meaning; preserve the same logic notes (when it returns True/False)
but move them into the Returns description or brief bullet points under
Args/Returns as appropriate to match other backend Python docstrings.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 93a3a280-ccfe-4b3d-8bab-7048dcf036ea
📒 Files selected for processing (7)
backend/infrahub/graphql/analyzer.pybackend/infrahub/proposed_change/tasks.pybackend/tests/adapters/workflow.pybackend/tests/component/proposed_change/__init__.pybackend/tests/component/proposed_change/conftest.pybackend/tests/component/proposed_change/test_validate_artifacts_generation.pychangelog/+15041d55.changed.md
✅ Files skipped from review due to trivial changes (2)
- backend/infrahub/graphql/analyzer.py
- backend/tests/component/proposed_change/test_validate_artifacts_generation.py
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/tests/component/proposed_change/conftest.py
- backend/tests/adapters/workflow.py
ba21412 to
5582f4b
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/infrahub/proposed_change/tasks.py (2)
1704-1728: Add docstrings to new helper functions.Both
_get_subscribers_for_nodesand_get_subscribers_from_diffare missing docstrings. As per coding guidelines, Python functions should include docstrings following Google-style format with a brief description and Args/Returns sections.📝 Suggested docstrings
async def _get_subscribers_for_nodes( node_ids: list[str], branch: str, client: InfrahubClient ) -> list[ProposedChangeSubscriber]: + """Fetch subscribers for the given node IDs via GraphQL query groups. + + Args: + node_ids: List of node IDs to find subscribers for. + branch: Branch name to query against. + client: Infrahub client for executing GraphQL queries. + + Returns: + List of subscribers associated with the nodes through query groups. + """ + if not node_ids: + return [] result = await client.execute_graphql( query=GATHER_GRAPHQL_QUERY_SUBSCRIBERS, branch_name=branch, variables={"members": node_ids}, ) # ... rest of function async def _get_subscribers_from_diff( diff_summary: list[NodeDiff], branch: str, client: InfrahubClient ) -> list[ProposedChangeSubscriber]: + """Extract modified node IDs from diff summary and fetch their subscribers. + + Args: + diff_summary: List of node diffs from the branch. + branch: Branch name to query against. + client: Infrahub client for executing GraphQL queries. + + Returns: + List of subscribers for all modified non-schema nodes. + """ return await _get_subscribers_for_nodes( node_ids=get_modified_node_ids(diff_summary=diff_summary, branch=branch), branch=branch, client=client, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/infrahub/proposed_change/tasks.py` around lines 1704 - 1728, Add Google-style docstrings to both helper functions _get_subscribers_for_nodes and _get_subscribers_from_diff: for each function include a one-line description, an Args section listing parameters (node_ids: list[str], branch: str, client: InfrahubClient) for _get_subscribers_for_nodes and (diff_summary: list[NodeDiff], branch: str, client: InfrahubClient) for _get_subscribers_from_diff, and a Returns section documenting the returned list[ProposedChangeSubscriber]; ensure the docstrings appear immediately below each def and follow the project’s Google-style formatting.
764-788: Consider checking for emptyrelevant_node_changesfirst to avoid unnecessary GraphQL call.When
only_has_unique_targetsis True butrelevant_node_changesis empty, the code still makes a GraphQL call that will return no results. A minor efficiency improvement would be to check for empty changes upfront.♻️ Suggested optimization
- if only_has_unique_targets: + if not relevant_node_changes: + impacted_artifacts = [] + log.info( + f"Artifact definition {artifact_definition.name.value} has no applicable node changes. Existing artifacts will not be re-rendered." + ) + elif only_has_unique_targets: # The query targets specific nodes by ID or unique constraint, so we can # look up exactly which artifacts are linked to the changed nodes and limit # regeneration to only those artifacts. subscribers = await _get_subscribers_for_nodes( node_ids=relevant_node_changes, branch=model.source_branch, client=client ) impacted_artifacts = [ subscriber.subscriber_id for subscriber in subscribers if subscriber.kind == InfrahubKind.ARTIFACT ] - elif relevant_node_changes: + else: # The query does not guarantee unique targets, so we cannot determine which # specific artifacts are affected. At least one relevant field changed, so we # must fall back to regenerating all artifacts for this definition to be safe. impacted_artifacts = list(artifacts_by_member.values()) log.warning( f"Artifact definition {artifact_definition.name.value} query does not guarantee unique targets. All targets will be processed." ) - else: - # No node of a queried kind had any of its queried fields modified, so no - # artifact can be stale regardless of query targeting capability. - impacted_artifacts = [] - log.info( - f"Artifact definition {artifact_definition.name.value} has no applicable node changes. Existing artifacts will not be re-rendered." - )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/infrahub/proposed_change/tasks.py` around lines 764 - 788, When only_has_unique_targets is True but relevant_node_changes is empty, avoid calling _get_subscribers_for_nodes; instead check if relevant_node_changes is falsy first and set impacted_artifacts = [] immediately, otherwise call _get_subscribers_for_nodes as before and compute impacted_artifacts from returned subscribers. Update the conditional around only_has_unique_targets to short-circuit on empty relevant_node_changes to prevent an unnecessary GraphQL call while preserving the existing elif/else behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/infrahub/proposed_change/tasks.py`:
- Around line 1704-1728: Add Google-style docstrings to both helper functions
_get_subscribers_for_nodes and _get_subscribers_from_diff: for each function
include a one-line description, an Args section listing parameters (node_ids:
list[str], branch: str, client: InfrahubClient) for _get_subscribers_for_nodes
and (diff_summary: list[NodeDiff], branch: str, client: InfrahubClient) for
_get_subscribers_from_diff, and a Returns section documenting the returned
list[ProposedChangeSubscriber]; ensure the docstrings appear immediately below
each def and follow the project’s Google-style formatting.
- Around line 764-788: When only_has_unique_targets is True but
relevant_node_changes is empty, avoid calling _get_subscribers_for_nodes;
instead check if relevant_node_changes is falsy first and set impacted_artifacts
= [] immediately, otherwise call _get_subscribers_for_nodes as before and
compute impacted_artifacts from returned subscribers. Update the conditional
around only_has_unique_targets to short-circuit on empty relevant_node_changes
to prevent an unnecessary GraphQL call while preserving the existing elif/else
behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8b0feadf-060d-4ab8-b451-589b0c46771a
📒 Files selected for processing (3)
backend/infrahub/proposed_change/tasks.pybackend/tests/component/proposed_change/test_validate_artifacts_generation.pychangelog/+15041d55.changed.md
✅ Files skipped from review due to trivial changes (2)
- changelog/+15041d55.changed.md
- backend/tests/component/proposed_change/test_validate_artifacts_generation.py
gmazoyer
left a comment
There was a problem hiding this comment.
That is a proper PR, you made it easy to review thanks to the thorough description 👍
| from infrahub.context import InfrahubContext | ||
|
|
||
|
|
||
| class WorkflowRecorder(InfrahubWorkflow): |
5582f4b to
9536d00
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/infrahub/proposed_change/tasks.py (1)
1704-1728: Add docstrings to new helper functions.Both
_get_subscribers_for_nodesand_get_subscribers_from_diffare missing docstrings. Per coding guidelines, Python functions should include Google-style docstrings with Args and Returns sections.📝 Proposed docstrings
async def _get_subscribers_for_nodes( node_ids: list[str], branch: str, client: InfrahubClient ) -> list[ProposedChangeSubscriber]: + """Fetch subscribers for nodes that are members of GraphQL query groups. + + Args: + node_ids: List of node IDs to look up as group members. + branch: Branch name to query against. + client: Infrahub client for GraphQL execution. + + Returns: + List of subscribers (e.g., artifacts, generator instances) linked to + query groups containing the specified nodes. + """ result = await client.execute_graphql( query=GATHER_GRAPHQL_QUERY_SUBSCRIBERS, branch_name=branch, variables={"members": node_ids}, ) subscribers = [] for group in result[InfrahubKind.GRAPHQLQUERYGROUP]["edges"]: for subscriber in group["node"]["subscribers"]["edges"]: subscribers.append( ProposedChangeSubscriber(subscriber_id=subscriber["node"]["id"], kind=subscriber["node"]["__typename"]) ) return subscribers async def _get_subscribers_from_diff( diff_summary: list[NodeDiff], branch: str, client: InfrahubClient ) -> list[ProposedChangeSubscriber]: + """Fetch subscribers for all modified nodes in a diff summary. + + Args: + diff_summary: List of node diffs from the branch comparison. + branch: Branch name to filter diffs and query against. + client: Infrahub client for GraphQL execution. + + Returns: + List of subscribers linked to query groups containing modified nodes. + """ return await _get_subscribers_for_nodes( node_ids=get_modified_node_ids(diff_summary=diff_summary, branch=branch), branch=branch, client=client, )As per coding guidelines: "Always use triple quotes (""") for Python docstrings" and "Follow Google-style docstring format in Python".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/infrahub/proposed_change/tasks.py` around lines 1704 - 1728, Add Google-style triple-quoted docstrings to both helper functions _get_subscribers_for_nodes and _get_subscribers_from_diff: for _get_subscribers_for_nodes describe purpose (fetch subscribers for given node_ids), document Args (node_ids: list[str], branch: str, client: InfrahubClient) and Returns (list[ProposedChangeSubscriber]) and briefly mention any side-effects/errors; for _get_subscribers_from_diff describe purpose (derive node ids from diff_summary and delegate to _get_subscribers_for_nodes), document Args (diff_summary: list[NodeDiff], branch: str, client: InfrahubClient) and Returns (list[ProposedChangeSubscriber]). Ensure triple quotes (""") and Google-style Args/Returns sections are used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/infrahub/proposed_change/tasks.py`:
- Around line 1704-1728: Add Google-style triple-quoted docstrings to both
helper functions _get_subscribers_for_nodes and _get_subscribers_from_diff: for
_get_subscribers_for_nodes describe purpose (fetch subscribers for given
node_ids), document Args (node_ids: list[str], branch: str, client:
InfrahubClient) and Returns (list[ProposedChangeSubscriber]) and briefly mention
any side-effects/errors; for _get_subscribers_from_diff describe purpose (derive
node ids from diff_summary and delegate to _get_subscribers_for_nodes), document
Args (diff_summary: list[NodeDiff], branch: str, client: InfrahubClient) and
Returns (list[ProposedChangeSubscriber]). Ensure triple quotes (""") and
Google-style Args/Returns sections are used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 10a8b78a-2df2-408b-9c8c-9f393144e6ff
📒 Files selected for processing (7)
backend/infrahub/graphql/analyzer.pybackend/infrahub/proposed_change/tasks.pybackend/tests/adapters/workflow.pybackend/tests/component/proposed_change/__init__.pybackend/tests/component/proposed_change/conftest.pybackend/tests/component/proposed_change/test_validate_artifacts_generation.pychangelog/+15041d55.changed.md
✅ Files skipped from review due to trivial changes (2)
- changelog/+15041d55.changed.md
- backend/tests/component/proposed_change/test_validate_artifacts_generation.py
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/infrahub/graphql/analyzer.py
- backend/tests/component/proposed_change/conftest.py
- backend/tests/adapters/workflow.py
Why
Improve artifact definition regeneration within pipeline to avoid regenerating artifacts that doesn't need to be touched.
Current state
Some background for clarity: Currently when you have setup an artifact definition to create or update artifacts within a pipeline we have three scenarios when an artifact is regenerated (if an artifact for whatever reason doesn't exist it should always be created)
only_has_unique_targetsrequirements: In this case we will only regenerate artifacts that has modifications to one of the members of its GraphQL query groupThe
only_has_unique_targetsis true if the query and the query parameter is enough to always target a single known node.As an example using this query:
Since the name of a tag has a uniqueness constraint required along with the fact that the
$tag_nameparameter is required in the query we can be sure that this query would always capture the required initial node along with its relationships if any. The way to breakonly_has_unique_targetswould be to have the variable be optionalTagQuery($tag_name: String)without "!".only_has_unique_targets. If this is the case all artifacts would be regenerated.If we look at this query instead:
Here the tag part is still fine, however when we want to use all of the IpamNamespace in the query we don't have a current way in the pipeline to determine if a new namespace has been created or deleted and as such we default to the safer option of over re-generation instead of risking an invalid artifact. (Down the road additional improvements could be done here to specifically trigger regeneration if in this case a new IpamNamespace has been created)
What currently gets regenerated
Even with
only_has_unique_targets, the way this works is that whenever we create an artifact from an artifact definition the query that goes with the artifact will have a matchingCoreGraphQLQueryGroupthat contains the members of all of the nodes included in the query. For larger environments or queries there could be over 1000 members in such a group and the criteria to regenerate an artifact would be if anything on at least one of those nodes got modified within a branch. As an example looking atTagQueryUniquefrom above we can see that the query uses the "name" attribute from the BuiltinTag (which is assumed to be used to render the artifact), however the current implementation would regenerated an artifact for a given BuiltinTag even if it was just the "description" attribute that was modified.If we consider a network environment with any type of mesh setup there's a big chance that there is a big overlap between multiple artifacts and even unrelated changes to nodes multiple relationships away from the initial node of the artifact could trigger the regeneration of an artifact.
The goal of this PR
Within this PR we further utilize the InfrahubGraphQLQueryAnalyzer class and instead of only consulting
only_has_unique_targetswithin theGraphQLQueryReportwe also look at therequested_readproperty. This allows us to specifically look at what fields (i.e attributes and or relationships) that are being read by the GraphQL query and only trigger a regeneration if any of those has been changed on the node.We still need to combine this with
only_has_unique_targets, but we get a few new states:only_has_unique_targets: We only regenerate artifacts if any of the members of its GraphQL query group has any updated fields that matches therequested_readproperty of the GraphQL query group.only_has_unique_targets: We can look at therequested_readproperty and combine this with the diff. If the modifications within the branch doesn't touch any of the fields we are targeting no artifacts get regenerated.only_has_unique_targets: We can look at therequested_readproperty and combine this with the diff. If the modifications within the branch matches any of the fields we are targeting all artifacts get regenerated.What changed
ObjectAccesswithin the GraphQL query analyzer for easier access to all relationships and attributes that are being read by a query._populate_subscribers(), replaced by_get_subscribers_for_nodes()and__get_subscribers_from_diff(). There were two reasons first we had different input types then__populate_subscribers()mutated the branch_diff param in a weird way that wasn't obvious to the caller. This has been cleaned up so that we instead return a list of subscribers that are added to the branch_diff where it's calledSummary by CodeRabbit