Skip to content

Possible thread-unsafe initialization of wrapped modules? #343

@ngoldbaum

Description

@ngoldbaum

Consider the following script:

from concurrent.futures import ThreadPoolExecutor
from pkgutil import walk_packages
import scipy

def worker():
    for _ in walk_packages(scipy.__path__, scipy.__name__ + '.'):
        pass

n_threads=10

tpe = ThreadPoolExecutor(max_workers=min((n_threads, 4)))
futures = [None]*n_threads


for i in range(n_threads):
    futures[i] = tpe.submit(worker)

[f.result() for f in futures]

This is based on the scipy test scipy/_lib/tests/test_public_api.py::test_all_modules_are_expected running under pytest-run-parallel.

On both the free-threaded and GIL-enabled interpreter, this script eventually fails with the following error:

Traceback (most recent call last):
  File "/Users/goldbaum/Documents/test/test.py", line 16, in <module>
    [f.result() for f in futures]
     ~~~~~~~~^^
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/goldbaum/Documents/test/test.py", line 6, in worker
    for _ in walk_packages(scipy.__path__, scipy.__name__ + '.'):
             ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 93, in walk_packages
    yield from walk_packages(path, info.name+'.', onerror)
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 93, in walk_packages
    yield from walk_packages(path, info.name+'.', onerror)
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 93, in walk_packages
    yield from walk_packages(path, info.name+'.', onerror)
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 88, in walk_packages
    path = getattr(sys.modules[info.name], '__path__', None) or []
                   ~~~~~~~~~~~^^^^^^^^^^^
KeyError: 'scipy._lib.array_api_compat.dask.array'

It runs successfully if I set n_threads=1 in the script.

I think this is happening because there's a race to call clone_module:

def clone_module(mod_name: str, globals_: dict[str, object]) -> list[str]:
"""Import everything from module, updating globals().
Returns __all__.
"""
mod = importlib.import_module(mod_name)
# Neither of these two methods is sufficient by itself,
# depending on various idiosyncrasies of the libraries we're wrapping.
objs = {}
exec(f"from {mod.__name__} import *", objs)
for n in dir(mod):
if not n.startswith("_") and hasattr(mod, n):
objs[n] = getattr(mod, n)
globals_.update(objs)
return list(objs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions