Translated using DeepL

Machine-translated page for increased accessibility for English questioners.

Aura

The aura.fi.muni.cz server is available to FI staff and PhD students for longer term, more demanding or GPU computations. For study or research purposes, FI staff can request access to others via unix@fi.muni.cz. Access to the Aura server is only possible from the MU network.

Hardware configuration

The Aura server is built on the Asus RS720A-E11-RS24U platform in the following configuration:

  • Two 64-core AMD EPYC 7713 2.0 GHz processors ( 128 physical cores and 256 threads in total).
  • 2 TiB DDR4 RAM 3200 MHz
  • 10 Gbps Ethernet connection
  • 2 SATA SSDs with 960 GB capacity in RAID 1
  • 2 6 TB NVMe drives in RAID 1
  • 2 NVIDIA A100 80 GB PCIe GPU cards with NVLink
  • Red Hat Enterprise Linux operating system

See also the blog post introducing this server.

How to work on compute servers

We recommend you also familiarize yourself with general information about running compute.

Run long-running processes (an hour or more) at a reduced priority (in the range 10-19, 19 being the lowest), for example nice ./your_program or nice -n 15 ./your_program.

To change the priority of an already running process, you can use the command renice command, but beware that a process may be running on multiple threads and changing the priority for one process may change the priority of only one thread. For example, you can get a listing of all the threads of your processes, including priority, as follows:

ps x -Lo pgid,pid,tid,user,nice,tty,time,args

You can run short-term processes or interactive debugging of your programs with normal priority.

If your process does not adhere to the priority constraints and uses a large amount of computing power, all your processes will be set to the lowest priority 19 to prevent other users from being constrained. Repeated or more serious violations of this rule may result in temporary disabling of your faculty account.

Memory limitations using systemd

The upper limit of memory usage on the system can be determined using the command below. When this limit is exceeded, the OOM mechanism will be triggered to attempt to terminate the appropriate process.

systemctl show -p MemoryMax user-$UID.slice

However, you can create your own systemd scope in which a more stringent (lower) limit on usable memory can be set:

systemd-run --user --scope -p MemoryMax=9G program

The program can also be a command line (e.g. bash). The memory limit will be applied to it and all its children together. This is different from the ulimit mechanism, where the limit applies to each process separately.

Monitoring both the created scope and the user scope can be useful:

# monitoring of the memory and CPU usage of your processes
systemd-cgtop /user.slice/user-$UID.slice

Resource constraints using ulimit

Resource limiting commands:

# limit available resources
help ulimit
# cap the size of virtual memory to 20000 kB
ulimit -v 20000
# cap the amount of total CPU time to 3600 seconds
ulimit -t 3600
# cap the number of concurrently running threads/processes
ulimit -u 100

The above commands limit the resources of the shell and all its children to the specified values. These cannot be rolled back; another separate shell will need to be run to restore the environment without the limits set. Note, however, that the resources set by ulimit apply to each process separately. Thus, if you set the limit to 20 MB of memory and run 10 processes in such an environment, they may allocate a total of 200 MB of memory. If you just want to limit the total memory to 20 MB, use systemd-run.

Specific software

If you need to install libraries or tools for your work, you have several options (besides local compilation):

  • if they are part of the distribution ( dnf search software-name), you can ask the maintainer to install,
  • you can make a module,
  • if it is a Python package, you can ask the maintainer to install it in a module python3. You can also install it locally using pip/pip3 install --user. If you use virtualenv, conda etc, we recommend installing the environment into /var/tmp/login (see below for file lifetimes).

Disk capacities

For temporary data that should be quickly available locally, two directories are available on the Aura server.

  • The directory /tmp is of type tmpfs. Due to its location in RAM, access is very fast, but the data is not persistent between server reboots and the capacity is very small. Do not store the results of calculations here, lest you cause system-wide problems when it gets full.
  • Directory /var/tmp is on a fast NVMe RAID 1 volume.

The advantage of using them, especially for I/O-intensive computations, is also lower network and server load with home and data storage.

To use this space, store your data in a directory with your login. Data that is not accessed (according to atime) is automatically deleted, for /tmp when it is a few days old, for /var/tmpwhen it is a few months old (see /etc/tmpfiles.d/tmp.conf for exact settings). Disk quotas do not apply here; however, be considerate of others in your use of space.

If access to data may not be fast and is not long-term intensive, we can also consider the data repository, where we can temporarily increase the quota significantly.

GPU computing

The Aura server has two GPU cards, namely the NVIDIA A100 80 GB PCIe.

If you have any suggestions about the functionality or how to work with GPUs on Aura, would be happy to hear about them.

GPU computations on Aura are currently not system limited in any way, and need to be respectful of others.

Choosing a card

Compared to before, it is not downright problematic to run computations concurrently on a single GPU. If the GPU card is partitioned using MIG (Multi-Instance GPU) technology, then it is possible to have several non-interacting virtual GPUs (instances).

Before starting the computation, we need to set the environment variable CUDA_VISIBLE_DEVICES appropriately. You can use the information from nvidia-smi or also the tool to select a suitable value nvisel:

[user@aura ~]$ nvidia-smi
Thu Jan 18 10:48:06 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100 80GB PCIe          On  | 00000000:21:00.0 Off |                   On |
| N/A   59C    P0             236W / 300W |  35141MiB / 81920MiB |     N/A      Default |
|                                         |                      |              Enabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80GB PCIe          On  | 00000000:61:00.0 Off |                    0 |
| N/A   43C    P0              44W / 300W |      4MiB / 81920MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| MIG devices:                                                                          |
+------------------+--------------------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |                   Memory-Usage |        Vol|      Shared           |
|      ID  ID  Dev |                     BAR1-Usage | SM     Unc| CE ENC DEC OFA JPG    |
|                  |                                |        ECC|                       |
|==================+================================+===========+=======================|
|  0    3   0   0  |              25MiB / 19968MiB  | 28      0 |  2   0    1    0    0 |
|                  |               0MiB / 32767MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0    4   0   1  |           17577MiB / 19968MiB  | 28      0 |  2   0    1    0    0 |
|                  |               2MiB / 32767MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0    6   0   2  |           17514MiB / 19968MiB  | 14      0 |  1   0    1    0    0 |
|                  |               2MiB / 32767MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0   11   0   3  |              12MiB /  9728MiB  | 14      0 |  1   0    1    1    1 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+
|  0   12   0   4  |              12MiB /  9728MiB  | 14      0 |  1   0    0    0    0 |
|                  |               0MiB / 16383MiB  |           |                       |
+------------------+--------------------------------+-----------+-----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0    4    0     902451      C   ./computation0                            17544MiB |
|    0    6    0     902453      C   ./computation1                            17494MiB |
+---------------------------------------------------------------------------------------+

Here in the first table we see a listing of the GPU cards and especially in the last field the information about enabling MIG (indicated by the words Enabled and Disabled). In this example, the GPU0 card is partitioned by MIG and the second card GPU1 is not partitioned.

In the second table, we can see the individual GPU instances with their allocated resources. If none of the cards are partitioned, the MIG devices: table is not displayed.

In the last table we see the running computations. In the example, computations are running on the partitioned GPU0 card on partitions 4 and 6. So we can choose from the unpartitioned GPU1 card, which has 80GB of memory, or the partitioned GPU0 card, which has free instances with GI 3 (20GB), 11 (10GB), and 12 (10GB).

If we chose an unpartitioned card we set CUDA_VISIBLE_DEVICES to the GPU index of the card ( CUDA_VISIBLE_DEVICES=1). If we have chosen a partition with GI 11 ( MIG Dev 3) from a partitioned GPU0 card, we need to find out its UUID. To do this, use nvidia-smi -L.

[user@aura ~]$ nvidia-smi -L
GPU 0: NVIDIA A100 80GB PCIe (UUID: GPU-309d72fd-b4f8-d6e8-6a66-e3f2253e8540)
  MIG 2g.20gb     Device  0: (UUID: MIG-ee0daf5f-9543-5e3f-8157-308a15c318b4)
  MIG 2g.20gb     Device  1: (UUID: MIG-fbb89bfe-6460-508c-ab51-9b961def7e01)
  MIG 1g.20gb     Device  2: (UUID: MIG-102d7a8b-5941-5275-be02-72ff5819ead4)
  MIG 1g.10gb     Device  3: (UUID: MIG-c4dc2f6b-2c55-566d-8738-fa8176580fda)
  MIG 1g.10gb     Device  4: (UUID: MIG-cd46e799-21e5-54d8-b751-f4a3afb52a46)
GPU 1: NVIDIA A100 80GB PCIe (UUID: GPU-04712e69-7356-4de5-f983-84083131460e)

We set the variable CUDA_VISIBLE_DEVICES to:
CUDA_VISIBLE_DEVICES=MIG-c4dc2f6b-2c55-566d-8738-fa8176580fda

Computation monitoring

We can monitor our computation either by using the nvidia-smi command or by using the interactive graphical tool nvitop or nvtop (it is recommended to have a larger terminal window than 80x24). Monitoring tools cannot display GPU usage (Util) for split cards and instead display N/A.

Changing GPU partitioning

If the existing configuration of MIG instances is not suitable for you, it can be changed by agreement at unix@fi.muni.cz if circumstances (other running computations) reasonably allow it.

Container support - Podman

For computations on the Aura server, there is also a tool called Podman that provides the same functionality as Docker. Each user is assigned a scope subuid and subgid according to their UID and can thus use rootless containers. The scope is 100000 in size and starts from UID*100000. By default, containers are placed in /var/tmp/containers/xlogin. No quotas are currently applied on this volume, so please take care of your files and delete your containers properly if you no longer need the repository.

Podman and GPU

Unlike normal container launching, we need to specify which GPU partition we want to use. To set the selected GPU partition, instead of its UUID, the partition is specified in the format GPU:Device, where GPU and Device are the numbers from the nvidia-smi -L listing. To use GPU 0 and partition 4, we add --device nvidia.com/gpu=0:4 as a parameter.

Sample

If you would like to use JupyterLab with TensorFlow and GPU support, an example gpu-jupyter image can be found on the Docker Hub site (it is a large image, the first run may take time). On the Aura server you just need to run:

podman run --rm --security-opt label=disable -p 127.0.0.1:11000:8888 \
-v "${PWD}":/home/jovyan/work --device nvidia.com/gpu=0:4 \
cschranz/gpu-jupyter:v1.5_cuda-11.6_ubuntu-20.04_python-only

Beware, containers mapped in this way are only available from the Aura server. For wider availability, you need to create an SSH tunnel ( ssh -L 8888:localhost:11000 aura) for example. The port can also be mapped using -p 11000:8888 - it will then be accessible from the FI network.

The -v "${PWD}":/home/jovyan/work part maps the current working directory inside the container to /home/jovyan/work. This directory is used by JupyterLab as the working directory.

We can then verify the functionality using, for example, this code, which returns the number of available GPUs (the expected output is 1):

import tensorflow as tf
print("Number of available GPUs: ", len(tf.config.list_physical_devices('gpu')))

The next run is already faster, since the downloaded image is not automatically deleted after completion. To delete the remnants (unused images, containers, ...) we can use the command podman system prune -a.