Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 21, 2026

Several locations spawn unbounded threads during environment discovery, which can cause thread exhaustion on systems with many conda environments or symlinks.

Changes

  • pet-conda/lib.rs: get_conda_environments() now uses par_iter() instead of spawning one thread per path
  • pet-conda/environment_locations.rs: get_conda_environment_paths() uses par_iter() for environment enumeration
  • pet-homebrew/sym_links.rs: get_known_symlinks() uses par_iter() for symlink resolution

Added rayon = "1.11.0" to both crates.

Before/After

// Before: unbounded thread spawning
let mut threads = vec![];
for path in paths {
    threads.push(thread::spawn(move || get_conda_environment_info(&path, &mgr)));
}
// join all...

// After: bounded parallel iteration via rayon
paths.par_iter()
    .filter_map(|path| get_conda_environment_info(path, manager))
    .collect()

crates/pet/src/find.rs left unchanged—it uses thread::scope with a fixed number of locators/directories, which is already bounded.

Original prompt

This section details on the original issue you should resolve

<issue_title>Refactor: Consider using thread pool instead of unbounded thread spawning</issue_title>
<issue_description>## Summary
Several locations spawn an unbounded number of threads during environment discovery. On systems with many Python environments (e.g., many conda envs or large PATH), this could lead to thread exhaustion or excessive context switching.

Affected Locations

1. crates/pet-conda/src/lib.rs (lines 368-378)

fn get_conda_environments(paths: &Vec<PathBuf>, manager: &Option<CondaManager>) -> Vec<CondaEnvironment> {
    let mut threads = vec![];
    for path in paths {
        let path = path.clone();
        let mgr = manager.clone();
        threads.push(thread::spawn(move || {
            // ...
        }));
    }
    // ...
}

2. crates/pet-homebrew/src/sym_links.rs (lines 33-55)

let threads = symlinks
    .iter()
    .map(|symlink| {
        std::thread::spawn(move || {
            // ...
        })
    })
    .collect::<Vec<_>>();

3. crates/pet/src/find.rs

Multiple thread::scope with spawns for each path/locator.

Proposed Solutions

Option 1: Use rayon for parallel iteration

use rayon::prelude::*;

fn get_conda_environments(paths: &[PathBuf], manager: &Option<CondaManager>) -> Vec<CondaEnvironment> {
    paths.par_iter()
        .filter_map(|path| get_conda_environment_info(path, manager))
        .collect()
}

Option 2: Use bounded thread pool

use std::sync::mpsc;
use threadpool::ThreadPool;

let pool = ThreadPool::new(num_cpus::get());
for path in paths {
    pool.execute(move || {
        // ...
    });
}
pool.join();

Option 3: Use thread::scope with chunking

thread::scope(|s| {
    for chunk in paths.chunks(num_cpus::get()) {
        s.spawn(|| {
            for path in chunk {
                // process path
            }
        });
    }
});

Benefits

  • Controlled parallelism based on CPU count
  • Better resource management
  • Avoid thread exhaustion on systems with hundreds of environments
  • rayon provides work-stealing for better load balancing

Considerations

  • rayon adds a dependency but is widely used and well-maintained
  • thread::scope (used in many places already) is good for structured concurrency
  • The impact depends on typical environment counts

Priority

Low - Current implementation works but could cause issues at scale.</issue_description>

<agent_instructions>Fix the issue. See if the recommendation makes sense and create a fix.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…tors

- Add rayon 1.11.0 dependency to pet-conda and pet-homebrew crates
- Refactor get_conda_environments() in lib.rs to use par_iter()
- Refactor get_conda_environment_paths() in environment_locations.rs to use par_iter()
- Refactor get_known_symlinks() in sym_links.rs to use par_iter()

This provides controlled parallelism based on CPU count, better resource
management, and avoids potential thread exhaustion on systems with hundreds
of Python environments.

Co-authored-by: karthiknadig <3840081+karthiknadig@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor to use thread pool for environment discovery Refactor: Replace unbounded thread spawning with rayon parallel iterators Jan 21, 2026
Copilot AI requested a review from karthiknadig January 21, 2026 00:16
@karthiknadig
Copy link
Member

PR Review: Replace unbounded thread spawning with rayon

cc @copilot

Overall Assessment: Positive with minor observations

The PR correctly addresses a legitimate issue—unbounded thread spawning can cause thread exhaustion on systems with many conda environments (hundreds is common for data scientists).

Strengths

  1. Correct problem identification: The current code spawns one OS thread per path/symlink, which is inefficient and resource-intensive. On systems with 200+ conda environments, this creates 200+ threads.

  2. Good solution choice: Rayon is the idiomatic Rust solution for this:

    • Work-stealing thread pool (bounded to CPU count by default)
    • Efficient parallel iteration with par_iter()
    • Well-maintained and widely used in the Rust ecosystem
  3. Clean refactoring: The changes are minimal and semantically equivalent across all three locations.

  4. Correct decision to leave find.rs unchanged: The existing thread::scope with bounded spawns (fixed number of locators/workspace directories) is appropriate.

Observations

  1. Semantic equivalence in get_known_symlinks(): The transformation correctly preserves the original behavior—capturing known_symlinks before parallel iteration matches the original per-thread cloning.

  2. Rayon global pool interaction: Since find.rs uses thread::scope which spawns locators in parallel, and each Conda locator now internally uses rayon, all parallel work beneficially shares the same bounded thread pool.

  3. The filter_map in lib.rs: Nice simplification—it cleanly handles the Option return instead of the original vec wrapper pattern.

Verification

// Before (environment_locations.rs)
threads.push(thread::spawn(move || get_environments(&path)));

// After
.flat_map(|path| get_environments(path))

Semantically correct—par_iter() borrows by reference and get_environments takes &Path.

Recommendation

Looks good to merge once CI passes. The dependency addition (rayon 1.11) is reasonable and the refactoring is sound.

@karthiknadig karthiknadig marked this pull request as ready for review January 21, 2026 02:37
@karthiknadig karthiknadig enabled auto-merge (squash) January 21, 2026 02:37
@vs-code-engineering vs-code-engineering bot added this to the January 2026 milestone Jan 21, 2026
@karthiknadig karthiknadig merged commit 82e47f0 into main Jan 21, 2026
26 checks passed
@karthiknadig karthiknadig deleted the copilot/refactor-thread-spawning branch January 21, 2026 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor: Consider using thread pool instead of unbounded thread spawning

3 participants