Cuda and PyTorch
This topic should help resolving issues when working with Cuda and
PyTorch
Selecting a Cuda Version
Multiple Cuda installations are present on the cluster and can be activated using the
module
command:
module avail
module add cuda/12.1
If you want your selected Cuda version to be the default for future login sessions then run
Please note that selecting Cuda version prior to
12.5
will automatically load the
gcc/11
module because newer compilers are incompatible with older Cuda versions.
Installing PyTorch
The installation command for
PyTorch (after creating and activating a virtual environment) is:
pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu...
The last part of the URL is the desired Cuda version formatted as plain concatenation of the major and minor version number and must match the one that you activate with
module
:
module add cuda/12.1
pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
Ubuntu comes with a default Cuda installation (11.5 for Ubuntu 22.04) so if you are not selecting a particular Cuda version then use
cu115
for installing PyTorch.
Adding Additional Packages
You will need to activate the same version of Cuda that you used for installing
torch
when you install packages that depend on it and do not come with pre-compiled code. You may get errors like the following if you do not:
pip install --no-cache-dir spatial-correlation-sampler
...
RuntimeError:
The detected CUDA version (11.5) mismatches the version that was used to compile
PyTorch (12.1). Please make sure to use the same CUDA versions.
Running Jobs
The
module
command is not available for running jobs unless you add the following to your batch file or script right after the
#SBATCH
lines (note the dot):
. /etc/profile.d/modules.sh
module add cuda/12.1
Jupyter
For environments that are prepared by TAs for the
JupyterHub of the cluster the right version of Cuda is already active.
If you provide your own custom environment then you can supply the modules that you want to load at server startup.