It is recommended to use Miniforge instead of Conda because Conda can be extremely slow in resolving dependencies in the current setup. See Miniforge.
It is important to ensure that the correct versions of the packages are used. See the following links:
- Build from source
- TensorFlow API Versions
- Installing TensorFlow Graphics
- Install TensorFlow with pip
- GPU, CUDA Toolkit, and CUDA Driver Requirements
It is possible to re-create the environment by importing the requirements yaml file as follows:
-
For Linux:
mamba env create -f support/docs/environment-linux.yaml
-
For Windows:
mamba env create -f support/docs/environment-win.yaml
Note that the order of execution for the below steps is important:
-
Create and activate a new environment:
mamba create -n my_env python=3.9.19 mamba activate my_env
-
Upgrade pip:
python -m pip install --upgrade pip
-
Install NVIDIA libraries:
mamba install -y cudatoolkit=11.2.0 mamba install -y cudnn=8.1.0 mamba install -y cuda -c nvidia
-
Install TensorFlow libraries:
python -m pip install "tensorflow<2.11" python -m pip install tensorflow-graphics==2021.12.3 -
Install additional needed packages:
mamba install -y scikit-learn mamba install -y ipywidgets mamba install -y simpleitk python -m pip install notebook python -m pip install nvidia-ml-py3
Note that the rest of the packages will be installed automatically as transitive dependencies of the above.
-
Check that Nvidia drivers recognize your GPUs:
nvidia-smi
-
Verify that TensorFlow detects all CPUs and GPUs:
python -c "import tensorflow as tf; print(tf.config.list_physical_devices())"
-
New TensorFlow versions (above 2.10) require
WSL2and/or Docker to be used on Windows with GPU. However, this does not affect this project since we are using an older version of TensorFlow. -
On Windows, Jupyter notebooks in VS Code may not detect GPUs. To resolve this, activate your environment in the Windows command line, go to the source code directory, and start VS Code from there by running the
code .command. -
You may also need to manually copy the
nvml.dllfile from its installation location to theC:\Program Files\NVIDIA Corporationfolder if TensorFlow warns you that it can't find it.
-
When you use SSH to connect and need to execute long-running processes, you should detach your terminal from the remote session. Otherwise, if you disconnect from the remote machine, the running processes will be terminated. To avoid that, you can use tools like
Screen. A good reference is How To Use Linux Screen. -
You may need to monitor your hardware usage remotely. A good tool for that is NViTop.