GB10 Nodes
The
GB10 based nodes (
ASUS Ascent GX10) need special consideration when running jobs:
- These systems are ARM based, not Intel x86.
- The CPU and GPU memory is shared.
- There are only a few nodes so your job may be aborted if you use such a node but not for a course that requires one.
Interactive Sessions
When starting an interactive session via
srun, make sure that you
always use a login shell. Otherwise you will inherit the environment of the login node, including a module binary path that will not work:
srun --gpus gb10:1 --pty -A {tag} -t 60 bash --login
ARM Architecture
When building code or setting up
python or
conda environments for the
GB10 nodes then this has to be done on a
GB10 node, not on the login nodes (
student-cluster1 and
student-cluster2).
Available VRAM
Only completely unused RAM can be used for VRAM. Linux however will use all free RAM as buffer cache for file operations. The more files you read and write, the fewer RAM is unused.
You can flush the buffer cache with this command:
Run it right before your application starts and reserves VRAM.
Cuda Versions
Not all Cuda versions that are available for the other nodes via
modules are available for the
GB10 nodes.
Single User Node
The
GB10 nodes are not shared so you have all resources of the node available to yourself:
- 128 GB RAM/VRAM
- 20 cores
- 850 GB temporary local storage on an NVME drive.