I will give a demonstration of the various aspects involved when using GPUs to speed up training of deep learning models.
I will talk a bit about
The terminal commands should be applicable.
Requirements:
You also need to be able to log on to the UNIX server.
Deep learning (DL) applications can involve
Example where training a single model required
“Tensorflow is a software library for machine learning and artificial intelligence”
We focus on Python.
On the UNIX server, the Foswiki information resource, provides an overview of Tungregning-servere.
GPU types residing on UNIX-servers for computation, gorina4 and gorina6.
Requires manual reservation.
GPU types residing on UNIX-servers gorina7, gorina8, gorina9.
Requires to use the SLURM-job-queue system on gorina11.
“The NVIDIA® CUDA® Toolkit provides a development environment for creating high-performance, GPU-accelerated applications.”
“The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.
NVIDIA® TensorRT™ is an ecosystem of APIs for high-performance deep learning inference.
We will look at an example, Training a neural network on MNIST with Keras.
We will train a neural network to recognize handwritten numbers.
For use of tensorflow, we refer to Install TensorFlow with pip from the tensorflow web pages. Note that we are using venv instead of conda in the following example.
It is important to be aware of which versions of Python, tensorflow, CUDA, cuDNN and TensorRT you will be using.
Check the compatibility table to ensure you are using compatible versions of tensorflow, CUDA and cudnn.
For example, if you know you will be using tensorflow 2.12.0, the table tells you that it is compatible with
Use the uenv-avail to see which librariies are available
You can filter using grep
uenv-avail | grep -i miniconda | grep -i 310
uenv-avail | grep -i cuda | grep -i 11.8 uenv-avail | grep -i cudnn | grep 11. | grep 8.6 uenv-avail | grep -i tensorrt | grep 11.x-8.6
returning cuda-11.8.0, cudnn-11.x-8.6.0 and TensorRT-11.x-8.6-8.5.3.1, respectively.
We will add these libraries to the LD_LIBRARY_PATH to make them available the environment applications.
In the following example, we demonstrate both the manual reservation and SLURM system for running on the GPUs. For example using PyTorch, see the MNIST demonstration.