Skip to content

fix(install): skip cache for unpinned virtual deps when on-disk hash matches lockfile (closes #1548)#1553

Open
danielmeppiel wants to merge 1 commit into
mainfrom
danielmeppiel/fix-1548-content-hash-mismatch
Open

fix(install): skip cache for unpinned virtual deps when on-disk hash matches lockfile (closes #1548)#1553
danielmeppiel wants to merge 1 commit into
mainfrom
danielmeppiel/fix-1548-content-hash-mismatch

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Summary

Closes #1548. apm install was failing on the second run with a false-positive Content hash mismatch ... supply-chain attack error for unpinned git/virtual-file dependencies. This PR adds the missing cache-skip branch so the redundant re-download (and the false alert it triggered) goes away, without weakening supply-chain detection.

Problem

When a git or virtual-file dep was recorded in apm.lock.yaml without a resolved_commit -- typically because the underlying transport fell back from partial-clone to full bare-clone (observed on Azure DevOps repos whose server does not support partial-clone filter v2) and APM had no SHA to pin against the floating ref -- the cache-skip logic in _resolve_download_strategy (src/apm_cli/install/phases/integrate.py:113-160) had no branch covering the case. The two existing branches required either:

  1. locked_dep.resolved_commit to be set and not "cached", or
  2. locked_dep.source == "registry".

Unpinned git deps fell through both, lockfile_match stayed False, and the package was re-downloaded on every install. A re-clone of a floating ref can produce a fresh hash that does not byte-match the lockfile-recorded content_hash, so the supply-chain check at src/apm_cli/install/sources.py:766-787 then sys.exit(1) with "supply-chain attack" wording -- the false positive reported in #1548.

Fix

Add cache-skip parity for the unpinned class. In _resolve_download_strategy, after the existing registry branch, add an elif for (locked_dep, locked_dep.content_hash, not ref_changed, not update_refs). When _should_skip_redownload confirms the on-disk content still hashes to the lockfile-recorded value, set lockfile_match = True (with the _via_content_hash_only flag for the existing self-heal pipeline). The package is intact, so re-downloading it is wasted work and the source of the false positive.

This matches Option 1 from the triage brief. Option 2 (record a resolved_commit even on partial-clone-fallback paths) is more architecturally correct but ripples through the lockfile schema and self-heal chain; this PR keeps the change isolated and reversible.

Supply-chain boundary

The supply-chain check is preserved. The new branch only fires when verify_package_hash(install_path, locked_dep.content_hash) returns True -- i.e. on-disk bytes already hash to the lockfile-recorded value. We never suppress a fresh download that would otherwise have produced a different hash than the lockfile; we only suppress a redundant download whose result we have already verified against the lockfile via the on-disk content. Real divergence (file actually tampered) still falls through to the fresh-download path where sources.py runs the existing mismatch check.

Tests

tests/unit/install/test_integrate_phase3w5.py -- three new cases on TestResolveDownloadStrategy:

  • test_unpinned_git_dep_skips_when_on_disk_hash_matches -- positive: with resolved_commit=None, content_hash="deadbeef", source="git" and verify_package_hash -> True, asserts skip_download is True. Fails at HEAD without the fix; passes after. Verified the mutation-break gate by git stash of the integrate.py change: this test fails, confirming the fix is the cause of the green.
  • test_unpinned_git_dep_redownloads_when_on_disk_hash_differs -- negative gate: same setup, verify_package_hash -> False, asserts skip_download is False. Proves a real on-disk mismatch is NOT masked by the new branch.
  • test_unpinned_git_dep_no_skip_when_lockfile_lacks_content_hash -- guard: when neither resolved_commit nor content_hash are present there is no integrity anchor, so the branch must not fire.

Full suite: 1638 passed across tests/unit/install/, tests/unit/test_install_command.py, tests/unit/test_drift_registry.py, tests/unit/test_drift_detection.py.

How to test locally

  1. Construct an apm.yml with a git dep on a repo whose server cannot serve a partial-clone (or any virtual-file dep that lands without a resolved_commit).
  2. apm install once -- this succeeds and writes a lockfile entry with content_hash set and resolved_commit empty.
  3. apm install a second time. Before this PR: [x] Content hash mismatch ... supply-chain attack. After this PR: install succeeds, the cached package is reused, no re-download, no false alert.

Validation evidence

  • uv run --extra dev ruff check src/ tests/ -- silent.
  • uv run --extra dev ruff format --check src/ tests/ -- silent.
  • uv run --extra dev python -m pylint --disable=all --enable=R0801 --min-similarity-lines=10 --fail-on=R0801 src/apm_cli/ -- 10.00/10.
  • bash scripts/lint-auth-signals.sh -- clean.
  • Targeted suite: 22 / 22 in test_integrate_phase3w5.py; broader: 1638 / 1638 across install + drift modules.

ASCII-only.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

…matches lockfile

When a git/virtual-file dependency was recorded in the lockfile without a
resolved_commit (e.g. ADO partial-clone fallback could not pin a SHA, or
a virtual-file dep was carved out without a commit anchor), the second
install bypassed the content-hash cache-skip and re-downloaded the
package every time. Re-clones of floating refs can yield a slightly
different fresh hash, which then trips the supply-chain mismatch check
in sources.py with a false-positive 'supply-chain attack' alert.

Add cache-skip parity with the existing resolved_commit and registry
branches: when the lockfile recorded a content_hash for an unpinned dep
and the on-disk content still hashes to that value, skip re-download.
Supply-chain detection is preserved -- the new branch only fires when
on-disk content is verified unchanged; any real divergence falls
through to the fresh-download flow where the existing mismatch check
still runs.

closes #1548

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 29, 2026 21:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a false-positive supply-chain Content hash mismatch failure on the second apm install run for unpinned git/virtual-file dependencies by allowing cache reuse when the on-disk content still matches the lockfile-recorded content_hash.

Changes:

  • Extend _resolve_download_strategy cache-skip logic to handle lockfile entries that have a content_hash but no resolved_commit (and no ref drift / update mode).
  • Add unit tests covering the new skip/no-skip behavior for unpinned git dependencies based on verify_package_hash.
  • Add a changelog entry describing the fix.
Show a summary per file
File Description
src/apm_cli/install/phases/integrate.py Adds a cache-skip branch for unpinned deps when on-disk content matches the lockfile content_hash.
tests/unit/install/test_integrate_phase3w5.py Adds regression tests for the new unpinned-dependency cache-skip behavior (positive + negative + guard).
CHANGELOG.md Documents the fix under Unreleased.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 1

Comment thread CHANGELOG.md

### Fixed

- `apm install` no longer fails the second run with a false-positive `Content hash mismatch ... supply-chain attack` error for unpinned git/virtual-file dependencies whose lockfile entry could not record a `resolved_commit` (e.g. ADO partial-clone fallback paths). When the on-disk content still hashes to the lockfile-recorded value, the redundant re-download is now skipped, mirroring the existing cache-skip parity with pinned and registry deps. Real content divergence still falls through to the supply-chain verification path. (closes #1548) -- by @OskarKlintrot
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] apm install fail second time with Content hash mismatch

3 participants