Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 21, 2026

Path lists are frequently cloned when passed to threads during environment discovery. Since these paths are immutable once created, Arc<[PathBuf]> provides O(1) cloning via atomic increment instead of O(n) deep copies.

Changes

  • Convert shared path lists to Arc<[PathBuf]>

    • environment_directories converted before thread spawns
    • global_env_search_paths converted in workspace directory loop
  • Update function signatures to accept &[PathBuf]

    • find_python_environments
    • find_python_environments_in_paths_with_locators
  • Remove unnecessary clones

    • Pass references where ownership isn't needed
    • find_executables accepts AsRef<Path>, no clone required

Before/After

// Before: O(n) clone per workspace folder
for workspace_folder in workspace_directories {
    let global_env_search_paths = global_env_search_paths.clone(); // deep copy
    // ...
}

// After: O(1) clone per workspace folder
let global_env_search_paths: Arc<[PathBuf]> = 
    get_search_paths_from_env_variables(environment).into();
for workspace_folder in workspace_directories {
    let global_env_search_paths = global_env_search_paths.clone(); // atomic increment
    // ...
}
Original prompt

This section details on the original issue you should resolve

<issue_title>Performance: Use Arc<[PathBuf]> to reduce cloning overhead for shared path lists</issue_title>
<issue_description>## Summary
The codebase frequently clones Vec<PathBuf> when passing path lists to threads. Since these paths are typically immutable once created, wrapping them in Arc<[PathBuf]> would eliminate deep cloning overhead.

Affected Locations

crates/pet/src/find.rs

// Line 128: Clones entire vector for each iteration
global_env_search_paths.clone(),

// Line 142: Clone before thread scope
let environment_directories_search = environment_directories.clone();

// Lines 212-213: Clones inside thread spawn
let global_env_search_paths = global_env_search_paths.clone();
let environment_directories = environment_directories.clone();

crates/pet-conda/src/lib.rs

// Line 94: Clones entire HashMap to iterate
let environments = self.environments.lock().unwrap().clone();

Proposed Solution

Option 1: Use Arc<[PathBuf]>

// Instead of:
let global_env_search_paths: Vec<PathBuf> = get_search_paths_from_env_variables(environment);

// Use:
let global_env_search_paths: Arc<[PathBuf]> = 
    get_search_paths_from_env_variables(environment).into();

// Then cloning is cheap (just atomic increment):
let paths = global_env_search_paths.clone(); // O(1) instead of O(n)

Option 2: Pass references where possible

For thread::scope contexts, references can often be passed directly:

thread::scope(|s| {
    let paths = &global_env_search_paths; // borrow instead of clone
    s.spawn(move || {
        // use paths
    });
});

Impact

  • Reduces memory allocations during environment discovery
  • Particularly impactful when there are many workspace directories or PATH entries
  • Arc<[PathBuf]> has same memory layout as Vec<PathBuf> but with shared ownership

Priority

Medium - Reduces allocations but requires API changes to helper functions.</issue_description>

<agent_instructions>Fix the issue. See if the recommendation makes sense and create a fix.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 2 commits January 21, 2026 00:15
- Convert environment_directories and global_env_search_paths to Arc<[PathBuf]>
  for O(1) cloning in thread spawns instead of O(n) Vec cloning
- Update find_python_environments and find_python_environments_in_paths_with_locators
  to accept &[PathBuf] slices instead of owned Vec<PathBuf>
- Properly clone Arc references before move closures to maintain thread safety

Co-authored-by: karthiknadig <3840081+karthiknadig@users.noreply.github.com>
find_executables accepts T: AsRef<Path>, so we can pass the reference
directly without cloning

Co-authored-by: karthiknadig <3840081+karthiknadig@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor shared path lists to use Arc<PathBuf> for performance improvement Use Arc<[PathBuf]> to reduce cloning overhead for shared path lists Jan 21, 2026
Copilot AI requested a review from karthiknadig January 21, 2026 00:20
@karthiknadig
Copy link
Member

PR Review

cc @copilot

Summary

The PR correctly reduces cloning overhead in find.rs by converting Vec<PathBuf> to Arc<[PathBuf]> for O(1) cloning and passing references where ownership isn't needed.

What's Good

  1. Correct Arc pattern for environment_directories — Arc is created before thread spawns, so both Step 3 and Step 4 benefit from O(1) cloning.

  2. Smart reference passing — Changing function signatures to accept &[PathBuf] eliminates unnecessary allocations when callees don't need ownership.

  3. Proper Arc usage in workspace loop — Converting global_env_search_paths to Arc before iterating over workspace folders is beneficial when there are many workspaces.

Concerns

  1. Incomplete implementation — The original issue (Performance: Use Arc<[PathBuf]> to reduce cloning overhead for shared path lists #287) also mentioned:

    // crates/pet-conda/src/lib.rs Line 94
    let environments = self.environments.lock().unwrap().clone();

    This HashMap clone is not addressed. Should this be a follow-up PR or added here?

  2. Trade-off in find_python_environments — The refactor from Vec<PathBuf> to &[PathBuf] introduces individual PathBuf clones inside the loop:

    for item in paths {
        let item = item.clone();  // New clone per iteration
        s.spawn(move || { ... });
    }

    This trades one Vec clone for N PathBuf clones. Given this function is often called with single-element slices (&[item], &[folder]), this is likely net-neutral or slightly positive. Just worth noting.

  3. Minor: naming conventionenvironment_directories_for_step3 / summary_for_step4 works but feels verbose. Not a blocker.

Verdict

Looks good overall. The implementation achieves the goal of reducing allocation overhead for the hot path (multiple workspace folders / PATH entries). Consider addressing the pet-conda case in a follow-up.

Copy link
Contributor Author

Copilot AI commented Jan 21, 2026

PR Review

cc @copilot...

The pet-conda HashMap clone is a different optimization pattern - it's cloning to release the mutex lock before calling report_missing_envs(), whereas this PR focuses on sharing immutable path lists across threads. The HashMap case would require restructuring around lock duration (e.g., Arc<RwLock<HashMap>>) rather than the Arc<[T]> pattern used here. I've proposed it as follow-up work.

Regarding the trade-off in find_python_environments - agreed, with single-element slices being the common case, the per-item clone is net-neutral or slightly positive compared to the original Vec clone.

@karthiknadig karthiknadig marked this pull request as ready for review January 21, 2026 02:07
@karthiknadig karthiknadig enabled auto-merge (squash) January 21, 2026 02:07
@vs-code-engineering vs-code-engineering bot added this to the January 2026 milestone Jan 21, 2026
@karthiknadig karthiknadig merged commit 72a6b2e into main Jan 21, 2026
26 checks passed
@karthiknadig karthiknadig deleted the copilot/optimize-path-cloning branch January 21, 2026 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: Use Arc<[PathBuf]> to reduce cloning overhead for shared path lists

3 participants