...
- At SkyHiGh (IIK's production instance):
- General purpose flavors, all IIK affiliates are eligible:
- dx5.8c90r.v100-8g: A flavor with 90GB RAM, 8 vCPU's and a 1/4 of a Tesla v100 (8GB GPU-RAM).
- de3
- .
- 12c60r.a100-10g: A flavor with 60GB RAM, 12 vCPUs and 1/4 of a Tesla A100 (10GB GPU-RAM)
- Flavors only available for SFI NORCICS:
- de3.24c120r.a100-20g: A flavor with 120GB RAM, 24 vCPUs and 1/2 of a Tesla A100 (20GB GPU-RAM)
- Only available for SFI NORCICS
- de3.48c240r.a100-40g: A flavor with 240GB RAM, 48 vCPUS and 1/1 of a Tesla A100 (40GB GPU-RAM)
Only
- Flavors only available for SFI NORCICSNorwegian Biomtetrics Lab:
- dx4.24c60r.p40-24g: A flavor with 60GB RAM, 24 vCPUs and 1/1 of a Tesla P40 (24GB GPU-RAM)
- de2.
- 24c240r.a100-20g: A flavor with 240GB RAM, 24 vCPU's and 1/2 of a Tesla A100 (20GB GPU-RAM)
- Only available for Norwegian Biometrics Lab
- de3.24c120r.a100d-20g: A flavor with 120GB RAM, 24 vCPU's and 1/4 of a Tesla A100 80GB (20GB GPU-RAM)
- Only available for Norwegian Biometrics lab
- dx4.24c60r.p40-24g: A flavor with 60GB RAM, 24 vCPUs and 1/1 of a Tesla P40 (24GB GPU-RAM)
- General purpose flavors, all IIK affiliates are eligible:
- At SkyLow (IIK's development instance):
- dx4.8c20r.m10-8G: A flavor with 20GB RAM, 8 vCPU's and one core of a Tesla M10 card (8GB GPU-RAM).
- gpu.a100.10Gdx4.24c110r.p100: A flavor with 60GB 110GB RAM, 7 24 vCPU's and 1/4 of a Tesla A100 (10GB a Tesla p100 card (16GB GPU-RAM)
- dx4.48c220r.2p100: A flavor with 220GB RAM, 48 vCPU's and two Tesla p100 cards (2*16GB GPU-RAM)
- At stackit (NTNU IT's production platform):
- dx4.28c120r.a100-20g: A flavor with 120GB RAM, 28vCPU's and 1/2 of a Tesla a100 (20GB GPU-RAM)
GPU-enabled images
- dx5s.96c470r.a100d-80g.e3400g: A flavor with 470GB RAM, 96 vCPUs and a Tesla a100d (80 GB GPU-RAM) and 3.4TiB with compute-local flash storage.
- Only available for an IV-EPT project.
- dx5s.96c470r.a100d-80g.e3400g: A flavor with 470GB RAM, 96 vCPUs and a Tesla a100d (80 GB GPU-RAM) and 3.4TiB with compute-local flash storage.
GPU-enabled images
We provide an image We provide images with pre-installed Nvidia driver and CUDA package. These images This image contains the word "GRID" in their names its name and are regular ubuntu/centOS images a regular Ubuntu Server LTS image with the following additions:
...
The installation of the drivers requires a restart; så so the newly created instance will reboot shortly after creation.
...
Many of our GPU users will probably need Nvidia's cuDNN library. This is not pre-installed in our imagesimage, because Nvidia requires all users to register for the Nvidia Developer Program before dowloading. So, please follow the instructions here, to install it on your VM; and use tar file options. DO NOT USE THE DEB OR RPM ALTERNATIVE. Be sure to download the cuDNN version that corresponds to our current CUDA version.
...
Code Block |
---|
ubuntu@gputest:~$ journalctl -u nvidia-gridd | tail .Sep 04 08:54:23 jammy-gpu systemd[1]: Starting NVIDIA Grid Daemon... JanSep 0504 0708:4454:5724 gputestjammy-gpu nvidia-gridd[694724]: Acquiring license. (Info: http://openstack-nvidia.lisens.ntnu.no:7070/request; NVIDIA Virtual Compute Server) Jan 05 07:44:57 gputest Started (724) Sep 04 08:54:24 jammy-gpu systemd[1]: Started NVIDIA Grid Daemon. Sep 04 08:54:24 jammy-gpu nvidia-gridd[694724]: Configuration Callingparameter load_byte_array(tra) Jan 05 07:44:59 gputest( ServerAddress ) not set Sep 04 08:54:24 jammy-gpu nvidia-gridd[694724]: LicensevGPU acquiredSoftware successfully.package (Info: http://openstack-nvidia.lisens.ntnu.no:7070/request; NVIDIA Virtual Compute Server) |
The "nvidia-smi" tool will show you the GPU status
0)
Sep 04 08:54:24 jammy-gpu nvidia-gridd[724]: Ignore service provider and node-locked licensing
Sep 04 08:54:24 jammy-gpu nvidia-gridd[724]: NLS initialized
Sep 04 08:54:24 jammy-gpu nvidia-gridd[724]: Acquiring license. (Info: nvidiadls02.it.ntnu.no; NVIDIA Virtual Compute Server)
Sep 04 08:54:27 jammy-gpu nvidia-gridd[724]: License acquired successfully. (Info: nvidiadls02.it.ntnu.no, NVIDIA Virtual Compute Server; Expiry: 2023-9-5 8:54:16 GMT) |
The "nvidia-smi" tool will show you the GPU status
Code Block |
---|
ubuntu@gputest:~$ nvidia-smi
Mon Sep 4 08:58:11 2023
+ |
Code Block |
ubuntu@gputest:~$ nvidia-smi Thu Jan 5 07:46:10 2023 +-------------------------------------------------------------------------------+ | NVIDIA-SMI 470525.82125.01 06 Driver Version: 470525.82125.0106 CUDA Version: 1112.40 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GRID A100V100D-4C 8C On | 00000000:00:05.0 Off | N/A 0 | | N/A N/A P0 N/A / N/A | 407MiB0MiB / 4091MiB8192MiB | 0% Default | | | | Disabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ |
Starting from CUDA 12.x, the samples are no longer included in the install packages. If you want to verify that the GPU/license works, you have to download them from github:
Code Block |
---|
$ git clone https://github.com/NVIDIA/cuda-samples.git |
And the you can compile and run a sampleThere are some CUDA-tools installed in /root of your VM that can be used to test that the GPU works. For instance you could do like so:
Code Block |
---|
ubuntu@gputestubuntu@jammy-gpu:~$ sudo su - root@gputest:~# cd NVIDIA_CUDA-11.4_cuda-samples/Samples/1_Utilities/deviceQuery root@gputestubuntu@jammy-gpu:~/NVIDIA_CUDA-11.4_cuda-samples/Samples/1_Utilities/deviceQuery#deviceQuery$ make ... lots-of-text-from-make ... root@gputestubuntu@jammy-gpu:~/NVIDIA_CUDA-11.4_cuda-samples/Samples/1_Utilities/deviceQuery#deviceQuery$ ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GRID A100V100D-4C8C" CUDA Driver Version / Runtime Version 1112.40 / 1112.40 CUDA Capability Major/Minor version number: 87.0 Total amount of global memory: 40928192 MBytes (42906419208589934592 bytes) (108080) Multiprocessors, (064) CUDA Cores/MP: 69125120 CUDA Cores GPU Max Clock rate: 14101380 MHz (1.4138 GHz) Memory Clock rate: 1215877 Mhz Memory Bus Width: 51204096-bit L2 Cache Size: 419430406291456 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total shared memory per multiprocessor: 16793698304 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 37 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: EnabledDisabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: No Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 5 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 1112.40, CUDA Runtime Version = 1112.40, NumDevs = 1 Result = PASS |
...
Code Block |
---|
# Enable the repositories distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list # Install the package sudo apt update && sudo apt -y install nvidia-docker2 # Restart the docker daemon sudo systemctl restart docker # Run a test to verifiy that it works sudo docker run --rm --gpus all nvidia/cuda:1112.40.01-base-ubuntu22.04 nvidia-smi # Optionally run a test with Tensorflow that actually runs a bit of code on the GPU via docker sudo docker run --gpus all -it --rm tensorflow/tensorflow:latest2.14.0-gpu \ python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))" |
...