...
- At SkyHiGh (IIK's production instance):
- General purpose flavors, all IIK affiliates are eligible:
- dx5.8c90r.v100-8g: A flavor with 90GB RAM, 8 vCPU's and a 1/4 of a Tesla v100 (8GB GPU-RAM).
- de3
- .
- 12c60r.a100-10g: A flavor with 60GB RAM, 12 vCPUs and 1/4 of a Tesla A100 (10GB GPU-RAM)
- Flavors only available for SFI NORCICS:
- de3.24c120r.a100-20g: A flavor with 120GB RAM, 24 vCPUs and 1/2 of a Tesla A100 (20GB GPU-RAM)
- Only available for SFI NORCICS
- de3.48c240r.a100-40g: A flavor with 240GB RAM, 48 vCPUS and 1/1 of a Tesla A100 (40GB GPU-RAM)
Only
- Flavors only available for SFI NORCICSNorwegian Biomtetrics Lab:
- dx4.24c60r.p40-24g: A flavor with 60GB RAM, 24 vCPUs and 1/1 of a Tesla P40 (24GB GPU-RAM)
- de2.24c240r
- .a100-20g: A flavor with
- 240GB RAM, 24 vCPU's and 1/2 of a Tesla A100 (20GB GPU-RAM)
- Only available for Norwegian Biometrics Lab
- de3.24c120r.a100d-20g: A flavor with 120GB RAM, 24 vCPU's and 1/4 of a Tesla A100 80GB (20GB GPU-RAM)
- Only available for Norwegian Biometrics lab
- dx4.24c60r.p40-24g: A flavor with 60GB RAM, 24 vCPUs and 1/1 of a Tesla P40 (24GB GPU-RAM)
- General purpose flavors, all IIK affiliates are eligible:
- At SkyLow (IIK's development instance):
- dx4.8c20r.m10-8G: A flavor with 20GB RAM, 8 vCPU's and one core of a Tesla M10 card (8GB GPU-RAM).
- gpu.a100.10Gdx4.24c110r.p100: A flavor with 60GB 110GB RAM, 7 24 vCPU's and 1/4 of a Tesla A100 (10GB GPU-RAM)a Tesla p100 card (16GB GPU-RAM)
- dx4.48c220r.2p100: A flavor with 220GB RAM, 48 vCPU's and two Tesla p100 cards (2*16GB GPU-RAM)
- At stackit (NTNU IT's production platform):
- dx4.28c120r.a100-20g: A flavor with 120GB RAM, 28vCPU's and 1/2 of a Tesla a100 (20GB GPU-RAM)
- dx5s.96c470r.a100d-80g.e3400g: A flavor with 470GB RAM, 96 vCPUs and a Tesla a100d (80 GB GPU-RAM) and 3.4TiB with compute-local flash storage.
- Only available for an IV-EPT project.
GPU-enabled GPU-enabled images
We provide images an image with pre-installed Nvidia driver and CUDA package. These images This image contains the word "GRID" in their names its name and are regular ubuntu/centOS images a regular Ubuntu Server LTS image with the following additions:
...
The installation of the drivers requires a restart; så so the newly created instance will reboot shortly after creation.
...
Many of our GPU users will probably need Nvidia's cuDNN library. This is not pre-installed in our imagesimage, because Nvidia requires all users to register for the Nvidia Developer Program before dowloading. So, please follow the instructions here, to install it on your VM; and use tar file options. DO NOT USE THE DEB OR RPM ALTERNATIVE. Be sure to download the cuDNN version that corresponds to our current CUDA version.
...
Code Block |
---|
ubuntu@gputest:~$ lspci | grep NVIDIA 00:05.0 VGA3D compatible controller: NVIDIA Corporation GV100GLDevice [Tesla V100 PCIe 32GB] 20f1 (rev a1) |
You can also verify that a license for the GPU is acquired successfully (yes, we need licences to use our GPUs...):
Code Block |
---|
ubuntu@gputest:~$ journalctl -u nvidia-gridd | tail .... JunSep 3004 08:3754:1423 gputest nvidia-gridd[1159jammy-gpu systemd[1]: AcquiringStarting licenseNVIDIA for GRID vGPU EditionGrid Daemon... JunSep 3004 08:3754:1424 gputestjammy-gpu nvidia-gridd[1159724]: CallingStarted load_byte_array(tra724) JunSep 3004 08:3754:1724 gputest nvidia-gridd[1159jammy-gpu systemd[1]: Started LicenseNVIDIA acquiredGrid successfullyDaemon. (Info: http://openstack-nvidia.lisens.ntnu.no:7070/request; Quadro-Virtual-DWS,5.0) |
The "nvidia-smi" tool will show you the GPU status
Sep 04 08:54:24 jammy-gpu nvidia-gridd[724]: Configuration parameter ( ServerAddress ) not set
Sep 04 08:54:24 jammy-gpu nvidia-gridd[724]: vGPU Software package (0)
Sep 04 08:54:24 jammy-gpu nvidia-gridd[724]: Ignore service provider and node-locked licensing
Sep 04 08:54:24 jammy-gpu nvidia-gridd[724]: NLS initialized
Sep 04 08:54:24 jammy-gpu nvidia-gridd[724]: Acquiring license. (Info: nvidiadls02.it.ntnu.no; NVIDIA Virtual Compute Server)
Sep 04 08:54:27 jammy-gpu nvidia-gridd[724]: License acquired successfully. (Info: nvidiadls02.it.ntnu.no, NVIDIA Virtual Compute Server; Expiry: 2023-9-5 8:54:16 GMT) |
The "nvidia-smi" tool will show you the GPU status
Code Block |
---|
ubuntu@gputest:~$ nvidia-smi
Mon Sep 4 08:58:11 2023
+------- |
Code Block |
ubuntu@gputest:~$ nvidia-smi Tue Jun 30 08:57:13 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 418.130 Driver Version: 418.130 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU NameNVIDIA-SMI 525.125.06 Driver Version: 525.125.06 Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GRID V100D-8Q On | 00000000:00:05.0 Off |CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GRID V100D-8C On | 00000000:00:05.0 Off | N/A | | N/A N/A P0 N/A / N/A | 0MiB / 8192MiB | 0% Default | | N/A | | N/A | N/A P0 N/A / N/A | 528MiB / 8192MiB | 0% DefaultDisabled | +-------------------------------+----------------------+----------------------+ +----------------------------------------------------------------------------------+ | Processes:---------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | || ID ID GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ |
Starting from CUDA 12.x, the samples are no longer included in the install packages. If you want to verify that the GPU/license works, you have to download them from github:
Code Block |
---|
$ git clone https://github.com/NVIDIA/cuda-samples.git |
And the you can compile and run a sampleThere are some CUDA-tools installed in /root of your VM that can be used to test that the GPU works. For instance you could do like so:
Code Block |
---|
ubuntu@gputestubuntu@jammy-gpu:~$ sudo su - root@gputest:~# cd NVIDIA_CUDA-10.1_cuda-samples/Samples/1_Utilities/deviceQuery root@gputestubuntu@jammy-gpu:~/NVIDIA_CUDA-10.1_cuda-samples/Samples/1_Utilities/deviceQuery#deviceQuery$ make ... lots-of-text-from-make ... root@gputestubuntu@jammy-gpu:~/NVIDIA_CUDA-10.1_cuda-samples/Samples/1_Utilities/deviceQuery#deviceQuery$ ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GRID V100D-8Q8C" CUDA Driver Version / Runtime Version 1012.10 / 1012.10 CUDA Capability Major/Minor version number: 7.0 Total amount of global memory: 8192 MBytes (8589934592 bytes) (80080) Multiprocessors, ( 64064) CUDA Cores/MP: 5120 CUDA Cores GPU Max Clock rate: 1380 MHz (1.38 GHz) Memory Clock rate: 877 Mhz Memory Bus Width: 4096-bit L2 Cache Size: 6291456 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 6553649152 bytes Total amount of shared memory per blockmultiprocessor: 4915298304 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 7 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: No Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 0 / 5 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 1012.10, CUDA Runtime Version = 1012.10, NumDevs = 1 Result = PASS |
...
Code Block |
---|
# Enable the repositories distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list # Install the package sudo apt update && sudo apt -y install nvidia-docker2 # Restart the docker daemon sudo systemctl restart docker # Run a test to verifiy that it works sudo docker run --rm --gpus all nvidia/cuda:1112.0.1-base-ubuntu22.04 nvidia-smi # Optionally run a test with Tensorflow that actually runs a bit of code on the GPU via docker sudo docker run --gpus all -it --rm tensorflow/tensorflow:latest2.14.0-gpu \ python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))" |
...