Technical Information
This topic lists relevant technical information about the cluster hardware.
Nodes
Currently the cluster contains the following GPU nodes:
Count |
GPU |
CPU |
RAM |
4 |
8 x NVidia RTX 5060 Ti4608 CUDA cores16 GB RAMCompute capability 12.0 |
2 x 14 core Xeon E5-2680 V4 @ 2.4GHz |
256GB DDR4 @ 2400MHz |
28 |
8 x NVidia GTX 1080 Ti3584 CUDA cores11 GB RAMCompute capability 6.1 |
2 x 10 core Xeon E5-2630 V4 @ 2.2GHz |
256GB DDR4 @ 2133MHz |
For Jupyter notebooks that only require CPUs there are three nodes:
Count |
CPU |
RAM |
3 |
2 x 22 core Xeon E5-2699 V4 @ 2.2GHz |
512GB RAM @ 2400MHz |
Only one GPU and one CPU node is always on. The remaining nodes are turned on when needed for running jobs and shut down again automatically when idle for more than one hour.
The GPU nodes are distributed over four racks and are powered on balanced over the racks. The GPU nodes with
NVidia RTX 5060 Ti GPUs are used first and the GPUs always run at 180W power limit. The
NVidia GTX 1080 Ti GPUs are running at 250W power limit until more than four nodes per rack are running. Then the power for nodes in a rack is reduced gradually until it is only 125W per GPU when all seven nodes are running. This is required to stay in the power budget for our server racks.
Local Scratch Space
All nodes have 350GB local shared scratch space for running jobs.
Storage
A dedicated
Ceph cluster provides all the storage using CephFS:
Count |
CPU |
RAM |
5 |
2 x 6 core Xeon Scaleable 3204 @ 1.9 GHz |
96 GB RAM @ 2666MHz |
The following storage devices are used for providing storage:
Count |
Media |
Redundancy |
Used for |
5 |
7.68 TB Samsung PM893, SATA550 MB/s read, 520 MB/s write98k IOPS read, 30k IOPS write |
2 |
/work |
2 |
12.8 TB Samsung PM1735, PCI-e x88000 MB/s read, 3800 MB/s write1500k IOPS read, 250k IOPS write |
3 |
/home and /cluster |
3 |
12.8 TB KIOXIA CD8-V, PCI-e x46,600 MB/s read, 6,000 MB/s write1050k IOPS read, 380k IOPS write |
3 |
/home and /cluster |
Triple redundancy is used for improved resilience and read speed for
/cluster
and
/home
, resulting in 10.6 TB effective storage capacity. Performance is currently limited by the network and allows concurrent reads with
6 GB/s
from the nodes and writes with
3 GB/s
.
Dual redundancy is used for
/work
which results in acceptable resilience and 16 TB effective storage capacity. Performance is limited by the SATA bus and allows concurrent reads with
2.5 GB/s
from the nodes and writes with
1.3 GB/s
.
Login Nodes
Two login nodes are available for students to prepare and start jobs:
-
student-cluster1.inf.ethz.ch
and student-cluster2.inf.ethz.ch
with:
- 2 x 16 core E5-2697A v4 @ 2.6GHz
- 512 GB RAM @ 2400MHz