Fix #335 fix #2: parse-asymmetry fallback to per-target hash diff#353
Merged
Conversation
When the two module-graph JSON payloads passed to get-impacted-targets are not byte-equal but one of them fails to parse, findChangedModules(emptyMap, fullMap) reports every module in the populated graph as "added". The impacted set then explodes -- either fanning out into one rdeps subprocess per matched repo (the multi-hour fan-out in #335) or, when no BazelQueryService is bound, falling through to allTargets.keys so every hashed label is reported as impacted. Both outcomes are far worse than a per-target hash diff. PR #336 made the parser tolerant of stderr-polluted JSON, which covers the historical 18.0.x base graph case, but the asymmetry remains a real failure mode for genuinely unparseable input (corrupted base graphs, truncation, future serialization format changes). The reproducer adds two @ignore'd tests in CalculateImpactedTargetsInteractorTest covering both code paths that branch on changedModules.isNotEmpty(): execute() and executeWithDistances(). Both feed a non-JSON fromGraph and a parseable toGraph and assert that only the actually-hash-changed target appears in the impacted set. Today both tests fail (every target is reported); the @ignore'd state keeps CI green until fix #2 lands. Generated with Claude Code (https://claude.com/claude-code)
`bazel mod graph --output=json` always emits at least the root module, so a parsed module-graph map that is empty really does mean parse failure (a truncated/corrupted base graph from object storage, a future serialisation change, an old stderr-polluted 18.0.x capture from before #336). When the two JSON payloads are not byte-equal but exactly one side parses to empty, the historical behaviour was for `findChangedModules(emptyMap, fullMap)` to report every module on the parseable side as "added" and the impacted set exploded: * With a `BazelQueryService` bound, every "added" module fanned out into an `rdeps` query against its canonical repo(s). On the workspace in #335 that produced ~5,000 serial subprocesses and the run took multiple hours. * With no `BazelQueryService` bound (or one that errored), the failure-tolerant fallback path returned `allTargets.keys` -- every hashed label was reported as impacted, which on a large workspace defeats the point of running bazel-diff. The fix detects this asymmetry in `detectChangedModules` and returns an empty changed-modules set, so callers fall through to `computeSimpleImpactedTargets` (the per-target hash diff that would have run if no module graph had been supplied). A per-target hash diff is bounded by the size of the hash set; the module-fan-out path is not. The two regression tests added in the parent reproducer commit have their `@Ignore` annotations dropped and now pass: 21/21 tests in `CalculateImpactedTargetsInteractorTest` (19 pre-existing + 2 new). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the asymmetry-detection fallback described in #335 fix #2 and proposed in #352. Supersedes #352 — this branch contains the reproducer commit from #352 with the
@Ignoreannotations dropped, plus the fix that makes those tests pass.The bug
When the two module-graph JSON payloads passed to
get-impacted-targetsare not byte-equal but one of them fails to parse,CalculateImpactedTargetsInteractor.detectChangedModulescomputedfindChangedModules(emptyMap, fullMap)and reported every module in the parseable graph as "added". The impacted set then exploded in one of two ways:BazelQueryServicebound, every "added" module fanned out into anrdepsquery against its canonical repo(s). On the workspace in Upgrading 18.0.x → 18.1.0 with a reused base graph triggers multi-hourqueryTargetsDependingOnModulesfan-out #335 this produced ~5,000 serial subprocesses and the run took multiple hours.BazelQueryServicebound (or one that errored), the failure-tolerant fallback returnedallTargets.keys— every hashed label was reported as impacted, defeating the point of running bazel-diff.Both outcomes are far worse than the per-target hash diff that would have run if no module graph had been supplied. #336 made the parser tolerant of stderr-polluted JSON, which covered the historical 18.0.x base graph case, but parse asymmetry remained a real failure mode for any genuinely unparseable input (corrupted base graphs pulled from object storage, truncation, a future
bazel mod graphserialisation change).The fix
bazel mod graph --output=jsonalways emits at least the root module, so a parsed module-graph map that is empty really does mean parse failure — not "no modules". IndetectChangedModules, when the two JSON strings differ but exactly one parsed graph is empty, return an empty changed-modules set instead of feeding the asymmetric pair intofindChangedModules. Callers (executeandexecuteWithDistances) then fall through tocomputeSimpleImpactedTargets/computeAllDistances— a per-target hash diff that is bounded by the size of the hash set.A warning is logged so an operator can attribute a fall-through back to a specific bad payload.
Tests
Two regression tests added (with
@Ignoredropped relative to #352):execute_parseAsymmetryFallsBackToSimpleHashDiff_regressionForIssue335Fix2— exercises the defaultexecute()path.executeWithDistances_parseAsymmetryFallsBackToSimpleHashDiff_regressionForIssue335Fix2— exercises the--depsFilepath.Both feed a
fromModuleGraphJsonwith no{at all ("garbage-non-json-payload") soModuleGraphParser.parseModuleGraphfalls through toif (start < 0) return emptyMap(), paired with atoModuleGraphJsonthat parses cleanly to one or two modules and a hash pair where only//:changedactually changed. Both assert the impacted set is exactly{//:changed}.Verification
bazel test //cli:CalculateImpactedTargetsInteractorTest— PASSED (21/21: 19 pre-existing + 2 new regression tests).CalculateImpactedTargetsInteractor.kt— both failed with the expected "every target reported" assertion error.Test plan
bazel test //cli:CalculateImpactedTargetsInteractorTestpasses with the fix landed and@Ignoreremoved.🤖 Generated with Claude Code