This site is for helping you getting started with Docker and containers.

 

Docker is a platform for running applications inside containers. If you are familiar with python virtual environments, you can think about a container as a system-wide virtual environment; it provides what feels and looks like a clean ubuntu installation where your application can live and run, without the hassle and overhead of setting up a full-fletched virtual machine.

Containers are ubiquitous in the software industry, so I would recommend taking the time to read the official Docker documentation (https://docs.docker.com/get-started/). However, if you're in a hurry and just want to get a GPU-powered python environment up and running asap, the guide below should help you get started.


Step-by-step guide

1 - Making sure you have access to docker

By default (for security reasons), you should not have access to docker, even if you have access to the host machine. To check whether you have access, log into the host machine and run any docker command, e.g.:

Show docker images/processes
# Get a list of all the docker images
docker images 
# or
docker image list 

# To view a list of docker processes
docker ps

If you get an error message complaining about access rights, it means that you are not part of the docker group on the machine. Ask your supervisor to relay a request to one of the engineers at IDI, who should be able to grant you the necessary privileges.


2 - The Dockerfile - building an image

Our end goal is to make a container from which we can run our own code. However, to achieve this, we need to create something called aimage first. An image is a prototype of a container; it serves as a premade snapshot that can be used to spawn any number of containers. An image is created from something called a Dockerfile, which in its most basic form is just a list of prerequisites you want installed and commands you want to run before every startup. The example below should be a nice starting point.

The Dockerfile - Building an image
# Use the latest tf GPU image as parent. This operation is analogous to inheritance in OOP. 
# The image ships with tensorlfow-gpu and jupyter installed for python 2. It is also  
# configured so that a jupyter server will be launched at container startup. Note that you 
# don't have to use this image as parent. 
FROM tensorflow/tensorflow:latest-gpu 

# Set working directory for container 
WORKDIR /app  

# Make ssh directory (useful for adding ssh keys later) 
RUN mkdir -p /root/.ssh 

# Update repositories 
RUN apt-get update 

# Install git  
RUN apt-get install git -y 

# Install pip3 (parent image only comes with python2 stuff) 
RUN apt-get install python3-pip -y 

# Install your python packages  
RUN pip3 install --upgrade pip 
RUN pip3 install numpy 

# Add more pip installs here. Alternatively move everything to a dedicated requirements file.


The snippet above should be saved as a simple text file called Dockerfile. To build an image from it, I would recommend putting it in a dedicated folder, e.g.:

Move to dedicated folder
mkdir ~/docker/myproject 
mv Dockerfile ~/docker/myproject/Dockerfile

 

Then change working directory to the newly created folder and build the image:

Change working directory and build image
cd ~/docker/myproject 
docker build -t <image_name> . 

 

Note the ‘.’ (dot) at the end of the command; don't forget it as it tells docker where to look for a Dockerfile. <image_name> is a user-specified name used to identify the created image. By convention, since all images created on the machine is stored in one place, it is common to include your username in the image name; e.g.: olanorm/testproject

If the build command executed smoothly, your image should now be ready.  You can verify this by running:

Show docker images
docker image list

 

It prints a list of all the available images on the current machine. You should find your newly created one at the top.

 

3 - Running the container

Once the image is successfully built we can run a container from it. The docker run command contains many different options that you might want to explore through the official reference. However, to keep things simple, here’s a command for running a container capable of providing a jupyter notebook that can be accessed from the outside:

Run container
 docker run -d -rm --p YYYY:8888 --name <container_name> <image_name>

 

  • -d means that the container will run in detached mode. I.e. it will run in the background while freeing up your current shell. 

  • --rm means that the container will be cleaned up (everything inside it will be deleted) after it has exited. A container exists when its root process (which if you used the tensorflow:latest-gpu parent image is a jupyter notebook) terminates. Skip this flag if you want to keep the container around after it has exited, just remember to clean it up yourself so that you don’t clutter up the system. 

  • -p maps a network port from the host machine to a port in the container. In the command above, we map YYYY, which should be an unused port number of the host machine, to port 8888 in the container, which is where the default jupyter process will listen. 

  • --name sets a name for the container. It is not needed, since the container also gets a hash ID, but it is good practice to mark it with a human readable string as well. For instance, if you called your image olanorm/myproject, you can call the container olanorm_myproject (since slashes are not allowed in container names). This name can be used to access the container when running extra commands in it or when you wish to shut it down. 

  • The final, positional argument is just the name of the image created in the previous section.

 

After executing the run command, docker will print the name of the container, or just a hash if you did not specify one. The container is now running and ready, you should be able to see it by executing:

Show docker process
docker ps

 

The command should give you a output similar to this: 

 

4 - Connecting to jupyter

Normally, when using jupyter notebook-like apps on a computer, we just run the server and access it through a web browser. However, since the server process is running on a different machine that most likely does not expose the serving port on the network, we need some extra magic to make it accessible. To achieve this, open a new SSH connection to the server that is running your docker container with the following command:

Connect to server
ssh -L XXXX:localhost:YYYY <username>@<hostname>

Where XXXX corresponds to any unused ( >1023) port on your local machine. And YYYY corresponds to the docker host machine port you mapped to 8888 when running the container.

 

If you’re working on windows; you might not be able to run the command above. However, the program you used to establish your initial connection with the server most likely has an option for setting up an SSH tunnel as well. A tutorial on how to do this with PuTTY can be found here.


Now, if you open your favorite web browser and go to localhost:XXXX, you should be met with a jupyter notebook sign-in page like the one shown below.


The last piece of the puzzle is to get past this login screen. One way would be to configure jupyter to run without any authentication. However, it is just as easy to simply authenticate once. To get the token required for logging in, execute the following docker command:

Get token for login
docker exec <container name> jupyter notebook list


This should return a list of all running notebooks in your container (just the one started by the root process unless you’ve done anything else). Simply copy the token part (highlighted in the image below) of the URL in the notebook list, paste it into the password prompt in your web browser, and you should be good to go.

 

5 - Further steps

Below are some extra tips and commands that might come in handy. 

 

Executing a command in a currently running container
docker exec <container name> <command arguments>

We ran into this command when accessing jupyter in the section above. The exec command allows you to execute arbitrary commands in a running container, e.g. a python script.

 

Getting a shell in a running container
docker exec -it <container name> bash

This returns a shell that gives you terminal level access to your container. It might be highly useful when setting up things in a container that was not already fixed by the docker script, or for just getting more familiar with the environment you’ve created. Note the –it switches used; they make the session interactive, which means that stdin from your current shell will be hooked up to the shell granted by the container. 

 

Moving files in and out of a container
docker cp <container_name> path/to/src/file path/to/dest/file

This command allows copying single files to or from a container. Usage-wise it is more or less identical to the normal cp command.

 

Mounting external directories
docker run -v /host/path/to/dir:/container/mount/location --name <container...

If you want to keep whole directories of files synched or a permanent place to store output, then the copy command is not going to cut it. Fortunately, docker comes with the option to mount whole directories from the host machine into the container. You can do this by adding an extra argument to the docker run command, specifying the source directory root and the location you want it to show up in the container. While it is often convenient to mount your own home directory, this is in my experience currently not feasible on most docker setups we got because the ticket used to access your home folder is not transferred to the docker program. One alternative is to use the host machine’s /tmp folder, but this is naturally a bad idea if you're looking for permanent storage. Therefore, most machines are set up with a /data folder, where users of the system can get their own subfolder which they can mount into the containers they spawn.

 

 

Security issues 

On most NTNU systems docker runs as root, meaning that anything you do through docker will be run as root – not just in your container but also when interacting with the host machine. Please keep this in mind when working and be responsible / careful. Misuse of trust may lead to exclusion.





 

 

 Please feel free to share your skills and knowledge if you think some of it could be nice to include in this wiki. To contribute you can send an email to: joakim.g.antonsen@ntnu.no