Technical Information

This topic lists relevant technical information.

Nodes

Currently the cluster contains the following GPU nodes:

  • 32 GPU nodes with:
    • 8 x NVidia GTX 1080 Ti
      • 3584 CUDA cores
      • 11 GB RAM
      • Compute Capability: 6.1
    • 2 x 10 core Xeon E5-2630 V4 @ 2.2GHz
    • 256GB RAM

For Jupyter notebooks that only require CPUs there are three types of nodes:

  • 1 CPU node with:
    • 2 x 22 core Xeon E5-2699 V4 @ 2.2GHz
    • 512GB RAM
  • 1 CPU node with:
    • 2 x 18 core Xeon E5-2699 V3 @ 2.3GHz
    • 384GB RAM
  • 4 CPU nodes with:
    • 2 x 12 core Xeon E5-2697 v2 @ 2.7GHz
    • 256GB RAM

All nodes have 350GB local shared scratch space for running jobs.

Only one GPU and one CPU node is always on. The remaining nodes are turned on when needed for running jobs and shut down again automatically when idle for more than one hour.

The GPU nodes are distributed over four racks and are powered on balanced over the racks. GPUs are running at 250W power limit until more than four nodes per rack are running. Then the power for nodes in a rack is reduced gradually until it is only 125W per GPU when all eight nodes are running. This is required to stay in the power budget for our server racks.

Storage

A dedicated Ceph cluster provides all the storage:

  • 5 servers with
    • 2 x 6 core Xeon 3204 @ 1.9 GHz
    • 96 GB RAM

The following storage devices are used for providing storage:

  • For /home and /cluster:
    • 3 x 12.8 TB Samsung PM1735, PCI-e x8
      • 8000 MB/s read, 3800 MB/s write
      • 1500k IOPS read, 250k IOPS write
  • For /work:
    • 5 x 7.68 TB Samsung PM893, SATA
      • 550 MB/s read, 520 MB/s write
      • 98k IOPS read, 30k IOPS write

Triple redundancy is used for improved resilience and read speed for /cluster and /home, resulting in 10.6 TB effective storage capacity. Performance is currently limited by the network and allows concurrent reads with 3 GB/s from the nodes and writes with 1 GB/s.

For /work dual redundancy is used which results in acceptable resilience and 16 TB effective storage capacity. Performance is limited by the SATA bus and allows concurrent reads with 2.5 GB/s from the nodes and writes with 1.3 GB/s.

Login Nodes

Two login nodes are available for students to prepare and start jobs:

  • student-cluster.inf.ethz.ch and student-cluster2.inf.ethz.ch with:
    • 2 x 12 core Xeon E5-2697 v2 @ 2.7GHz
    • 256 GB RAM

On both login nodes users are restricted to 2 cores and 16 GB of RAM.

Page URL: https://isg.inf.ethz.ch/bin/view/Main/ServicesClusterComputingStudentClusterTechnicalInformation
2024-12-21
© 2024 Eidgenössische Technische Hochschule Zürich