Containers/cvmfs/DIRAC

Simon's musings on containers on the grid

There are quite a lot of options for running containers on the grid. I've tried to write a description below. A short version is probably something like: "If you can build a docker version of the software, you can unpack that onto CVMFS and run it with singularity on the grid; there are also other possible approaches that can work."

Singularity works in a very similar way to docker, but with some tweaks to make it more suited to a grid environment. With docker you need a daemon running on the node and there are some security issues with it on multi-user machines. Singularity gets around this by doing everything in one binary without the daemon, otherwise "singularity exec" and "docker exec" are pretty similar.

The usual system would be to either:
- Build a singularity image (or docker image) for your software locally and then unpack it to a directory on CVMFS.
- Use a pre-existing container (such as CernVM, which is just a build of Scientific Linux 6/7) and compile your software against that and then only place the binaries in your CVMFS area.

The latter option is the simplest, but offers less flexibility as you can't add system packages to the image.

If you make your own container image, you should always aim to have an unpacked container on CVMFS rather than an image file (.SIF) as these don't work everywhere. An example of what an unpacked image should look like is the CernVM one here:
/cvmfs/cernvm-prod.cern.ch/cvm4/
(It should look just like a root file-system of a machine)

I would try to avoid building anything in the jobs themselves: Any software or images should be pre-built and then just run by the job. The whole point of using the containers is to get the same environment everywhere so that recreating things at a different site isn't needed.

Running with DIRAC doesn't make much difference to the container set-up. You just have to make sure you do all of the DIRAC operations (any downloading/uploading files) outside of the container. The usual workflow would be to have a job script that looks like:

- Download any input data
- Run singularity to do the real processing
- Upload any outputs after singularity has finished

The only real complexity in the above is finding the singularity binary in the first place: We're in a transitional period at the moment where some sites have singularity installed and some expect users to find it on CVMFS instead. A snippet of script like this is probably what you'd need:


      SINGULARITY_OPTS="" 

      if [ `sysctl -n user.max_user_namespaces` -gt 100 ]; then 

          # Use CVMFS version of singularity 

          SINGULARITY="/cvmfs/oasis.opensciencegrid.org/mis/singularity/current/bin/singularity" 

          SINGULARTY_OPTS="--userns" 

      else 

          # Use version found on $PATH if it's there 

          SINGULARITY=singularity 

      fi 

      # You almost certainly want -B /cvmfs to get CVMFS inside the container at 

      # the very least! 

      $SINGULARITY -d exec $SINGULARITY_OPTS [your other options go here]