Skip to content

Refactor: Consider using thread pool instead of unbounded thread spawning #288

@karthiknadig

Description

@karthiknadig

Summary

Several locations spawn an unbounded number of threads during environment discovery. On systems with many Python environments (e.g., many conda envs or large PATH), this could lead to thread exhaustion or excessive context switching.

Affected Locations

1. crates/pet-conda/src/lib.rs (lines 368-378)

fn get_conda_environments(paths: &Vec<PathBuf>, manager: &Option<CondaManager>) -> Vec<CondaEnvironment> {
    let mut threads = vec![];
    for path in paths {
        let path = path.clone();
        let mgr = manager.clone();
        threads.push(thread::spawn(move || {
            // ...
        }));
    }
    // ...
}

2. crates/pet-homebrew/src/sym_links.rs (lines 33-55)

let threads = symlinks
    .iter()
    .map(|symlink| {
        std::thread::spawn(move || {
            // ...
        })
    })
    .collect::<Vec<_>>();

3. crates/pet/src/find.rs

Multiple thread::scope with spawns for each path/locator.

Proposed Solutions

Option 1: Use rayon for parallel iteration

use rayon::prelude::*;

fn get_conda_environments(paths: &[PathBuf], manager: &Option<CondaManager>) -> Vec<CondaEnvironment> {
    paths.par_iter()
        .filter_map(|path| get_conda_environment_info(path, manager))
        .collect()
}

Option 2: Use bounded thread pool

use std::sync::mpsc;
use threadpool::ThreadPool;

let pool = ThreadPool::new(num_cpus::get());
for path in paths {
    pool.execute(move || {
        // ...
    });
}
pool.join();

Option 3: Use thread::scope with chunking

thread::scope(|s| {
    for chunk in paths.chunks(num_cpus::get()) {
        s.spawn(|| {
            for path in chunk {
                // process path
            }
        });
    }
});

Benefits

  • Controlled parallelism based on CPU count
  • Better resource management
  • Avoid thread exhaustion on systems with hundreds of environments
  • rayon provides work-stealing for better load balancing

Considerations

  • rayon adds a dependency but is widely used and well-maintained
  • thread::scope (used in many places already) is good for structured concurrency
  • The impact depends on typical environment counts

Priority

Low - Current implementation works but could cause issues at scale.

Metadata

Metadata

Labels

debtCode quality issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions