Skip to content

perf(integrate): narrow project-scope local discovery to .apm/ to avoid full-tree walks (closes #1507)#1554

Open
danielmeppiel wants to merge 1 commit into
mainfrom
danielmeppiel/fix-1507-narrow-local-walk
Open

perf(integrate): narrow project-scope local discovery to .apm/ to avoid full-tree walks (closes #1507)#1554
danielmeppiel wants to merge 1 commit into
mainfrom
danielmeppiel/fix-1507-narrow-local-walk

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Symptom

On a 50k+ file monorepo with only 2 instructions and 1 prompt under .apm/, apm install hangs for ~13 minutes during the integrate phase before completing. Reported by @ioannispoulios in #1507.

Trace

  1. src/apm_cli/install/services.py:465-477 builds a synthetic _local PackageInfo with install_path = project_root and calls integrate_package_primitives().
  2. src/apm_cli/integration/prompt_integrator.py:234 invokes self.init_link_resolver(package_info, project_root).
  3. src/apm_cli/integration/base_integrator.py:531-553 (pre-fix) sets scan_root = package_info.install_path and only narrows to scan_root/.apm when scan_root == Path.home() (issue User-scope install (--global) unexpectedly enters local .apm integration and can spend a long time scanning $HOME #830). For project-scope local content, scan_root stays at the project root.
  4. src/apm_cli/primitives/discovery.py:590 walks base_dir with os.walk. Because LOCAL_PRIMITIVE_PATTERNS contains generic patterns like **/*.instructions.md, the walk traverses the entire repo even when no local primitive lives outside .apm/. Result: O(repo size) work proportional to the monorepo, not to APM content.

Fix

Extend the existing $HOME narrowing to also apply when install_path == project_root (i.e. the synthetic _local PackageInfo). For that case, scan only .apm/ and .github/ -- the two supported locations for local primitives per LOCAL_PRIMITIVE_PATTERNS. Real installed packages (apm_modules/<owner>/<repo>/...) keep scanning their install_path directly because their install_path differs from project_root.

The link_resolver.package_root sentinel for in-package asset rewriting (#1147) is preserved as the project root for the local case, so asset links inside .apm/ continue to resolve against the project tree.

Tests (TDD + mutation-break gate)

tests/unit/integration/test_base_integrator.py::TestInitLinkResolverLocalScoping:

  • test_narrows_to_apm_and_github_when_install_path_is_project_root -- mocks discover_primitives, asserts it is called with .apm/ and .github/ and never with project_root itself.
  • test_skips_missing_directories -- only .apm/ exists, so only .apm/ is scanned.
  • test_no_apm_or_github_means_no_walk -- no scannable dirs, discover_primitives is never called.
  • test_real_walk_does_not_traverse_noise_subtree -- end-to-end with a real walk; spies on os.walk to assert no noise/... directory is visited even though the noise subtree contains a file matching the generic **/*.instructions.md pattern.

Mutation-break gate: deleting the narrowing branch causes all four new tests to fail.

The pre-existing test_uses_install_path_when_not_home was updated to use install_path = tmp_path / "apm_modules" / "owner" / "repo" so it reflects the real-package contract (install_path != project_root) it was always meant to exercise.

How to test on a synthetic large-tree fixture

mkdir -p /tmp/big-repo/.apm/instructions /tmp/big-repo/noise
echo '---\napplyTo: "**"\n---' > /tmp/big-repo/.apm/instructions/x.instructions.md
# Generate ~50k noise files
python3 -c "
import os
for i in range(500):
    d = f'/tmp/big-repo/noise/d{i}'
    os.makedirs(d, exist_ok=True)
    for j in range(100):
        open(f'{d}/f{j}.txt', 'w').close()
"
cd /tmp/big-repo && apm init --no-interactive && time apm install

Before this PR: minutes. After: sub-second integrate phase.

Validation evidence

  • uv run --extra dev pytest tests/unit/integration/ tests/unit/install/ tests/unit/test_local_content_install.py -q -> 2959 passed.
  • Lint chain (ruff check, ruff format --check, pylint R0801, lint-auth-signals.sh) -> all silent / 10.00.

Closes #1507.

…ithub/ (closes #1507)

Local-scope integrate phase used to call discover_primitives(project_root) for the
synthetic _local package, which walks the entire project tree via os.walk because
LOCAL_PRIMITIVE_PATTERNS contains generic patterns like '**/*.instructions.md'.
On a 50k+ file monorepo with only 2 instructions + 1 prompt under .apm/, this
caused a 13-minute hang.

Mirrors the existing $HOME narrowing (#830): when install_path == project_root
(i.e. the synthetic _local PackageInfo built in services.integrate_local_content),
restrict discover_primitives to .apm/ and .github/, which are the only supported
locations for local primitives per LOCAL_PRIMITIVE_PATTERNS.

The link_resolver.package_root sentinel for in-package asset rewriting (#1147)
is preserved as the project root for the local case so asset links inside
.apm/ are still resolved against the project tree.

Tests:
- Mock-based assertion that scan_root is .apm/ (and .github/ when present),
  never project_root.
- Spy on os.walk to confirm noise subtrees are not traversed.
- Mutation-break gate: removing the narrowing branch fails 4/4 new tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 29, 2026 21:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a performance bug (#1507) where apm install's integrate phase walks the entire project tree during local primitive discovery, causing multi-minute hangs in large monorepos. Extends the existing $HOME narrowing (#830) so that when install_path == project_root (the synthetic _local package), discovery is scoped to .apm/ and .github/ only.

Changes:

  • Refactor BaseIntegrator.init_link_resolver to support multiple scan_roots and add a _is_root_local_package helper that narrows discovery for the project-scope _local package.
  • Preserve link_resolver.package_root (asset rewriting from #1147) for installed deps and the project-scope local case; continue to skip it only for the $HOME user-scope case.
  • Add TestInitLinkResolverLocalScoping (4 new tests, including a real-walk regression spy) and update test_uses_install_path_when_not_home to reflect the real-package contract (apm_modules/owner/repo).
Show a summary per file
File Description
src/apm_cli/integration/base_integrator.py Adds project-scope narrowing to .apm//.github/ and reworks the package_root assignment logic.
tests/unit/integration/test_base_integrator.py Adds 4 unit tests covering narrowing, missing dirs, no-walk, and a real os.walk spy regression test.
CHANGELOG.md Adds Unreleased Fixed entry referencing #1507.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Local-scope integrate phase does os.walk() from project root, not .apm/ — causes multi-minute hangs in large repos

2 participants