Software Containers (Singularity)

Overview

Teaching: 45 min
Exercises: 15 min
Questions
  • What is a software container and hoy to use it?

Objectives
  • Use singularity containers

Singularity Containers

Containers are a software technology that allows us to keep control of the environment where a given code runs. Consider for example that you want to run a code in such a way the same code runs on several machines or clusters ensuring that the same libraries are loaded and the same general environment is present. Different clusters could come installed with different compilers, different Linux distributions and different libraries in general. Containers can be used to package entire scientific workflows, software and libraries, and even data and move them to several compute infrastructures with complete reproducibility.

Containers are similar to Virtual Machines, however, the differences are enough to consider them different technologies and those differences are very important for HPC. Virtual Machines takes up a lot of system resources. Each Virtual Machine (VM) runs not just a full copy of an operating system, but a virtual copy of all the hardware that the operating system needs to run. This quickly adds up to a lot of precious RAM and CPU cycles, valuable resources for HPC.

In contrast, all that a container requires is enough of an operating system, supporting programs and libraries, and system resources to run a specific program. From the user perspective, a container is in most cases a single file that contains the file system, ie a rather complete Unix filesystem tree with all libraries, executables, and data that are needed for a given workflow or scientific computation.

There are several container solutions, one prominent example is Docker, however, the main issue with containers is security, despite the name, containers do not actually contain the powers of the user who executes code on them. That is why you do not see Docker installed on an HPC cluster. Using dockers requires superuser access something that on shared resources like an HPC cluster is not widely offered.

Singularity offers an alternative solution to Docker, users can run the prepared images that we are offering on Spruce or bring their own.

For more information about Singularity and complete documentation see: https://singularity.lbl.gov/quickstart

How to use a singularity Image

There are basically two scenarios, interactive execution and job submission.

Interactive Job

If you are using Visit or RStudio, programs that uses the X11 forwarding, ensure to connect first to the cluster with X11 forwarding, before asking for an interactive job. In order to connect into Spruce with X11 forwarding use:

ssh -X <username>@spruce.hpc.wvu.edu

Once you have login into the cluster, create an interactive job with the following command line, in this case we are using standby as queue but any other queue is valid.

qsub -X -I -q standby

Once you get inside a compute node, load the module:

module load singularity/2.5.1

After loading the module the command singularity is available for usage, and you can get a shell inside the image with:

singularity shell /shared/software/containers/<Image Name>

Job Submission

In this case you do not need to export X11, just login into Spruce

ssh <username>@spruce.hpc.wvu.edu

Once you have login into the cluster, create a submission script (“runjob.pbs” for this example), in this case we are using standby as queue but any other queue is valid.

#!/bin/sh

#PBS -N JOB
#PBS -l nodes=1:ppn=1
#PBS -l walltime=04:00:00
#PBS -m ae
#PBS -q standby

module load singularity/2.5.1

singularity exec /shared/software/containers/<Image Name> <command_or_script_to_run>

Submit your job with

qsub runjob.pbs

Exercise: Singularity

Interactive

This exercise propose the use of singularity to access RStudio 1.1 and R 3.4.3

Follow the instructions for accessing an interactive session

The image is located at:

/shared/software/containers/RStudio-desktop-1.1.423_R-3.4.3.simg

Non-interactive

Create an small script for R and create a submission script using the Singularity image above.

Key Points

  • Containers provide a easy way to keep code and data reproducible by creating a consistent environment for a workflow.