MasterCode on the Grid

The official MasterCode webpage.

How to run 'mastercode' on the grid:

(0) Just try it out
on lx05/lx06:
mkdir mastercode
cd mastercode
source /vols/grid/glite/ui/3.2.10-1/external/etc/profile.d/grid-env.sh
voms-proxy-init --voms vo.londongrid.ac.uk
[type in your password]
myproxy-init -n -d
[type in your password]
wget http://www.hep.ph.ic.ac.uk/~dbauer/mastercode/code/config_grid.tar
tar -xf config_grid.tar
cd config
[edit rscan_grid.sh to replace 'dbauer' with you own username]
cd ..
rm -rf config_grid.tar; tar -cf config_grid.tar config
wget http://www.hep.ph.ic.ac.uk/~dbauer/mastercode/scripts/run_mastercode_grid.sh
wget http://www.hep.ph.ic.ac.uk/~dbauer/mastercode/jdls/glite-run.imperial.jdl
glite-wms-job-submit -a -o grid.log glite-run.imperial.jdl
glite-wms-job-status -i grid.log
[once this shows 'Job Terminated Successfully'] do:
glite-wms-job-output --dir test -i grid.log
(this will get all your log files)
wget http://www.hep.ph.ic.ac.uk/~dbauer/mastercode/scripts/copy_and_delete.sh
chmod u+x copy_and_delete.sh
[modify script to point to your directory, i.e. most likely it's just a case of replacing 'dbauer' with your own username and specify the output directory of your choice.]
This script will then copy you root file(s) to the directory specified.
That's it.

(1) And now in more detail.....

(1a) Introduction
The current version of MasterCode deployed on the four London based Tier 2 sites (Imperial, QMUL, RHUL and Brunel) is based on the tag 'clean-start-NOV11'. No changes to the code have been made, only to the masterfitter Makefile.
As part of the code within the mastercode is compiled on the fly at runtime, it is necessary to provide at least one executable per user. On the grid there is no guarantee that a user will be mapped to the same internal user every time, hence the executable is compiled at the beginning of each job, i.e. one executable for each individual job. At Imperial an executable where some of the libraries have been precompiled is being provided (as our environment has already been modified clusterwide to accommodate mastercode), at the other universities all code is compiled from scratch ever time a grid jobs starts. Here are the links to the tar balls (the grid jobs get the tarballs from the local SE, not my webpage): Imperial, not-Imperial.
Additionally the rscan.sh script has been modified to include a number of grid specific environment variables. To avoid confusion it has been renamed to rscan_grid.sh.


(1b) Setting up a grid environment
This assumes that you have a valid grid certificate and are a member of the londongrid VO ('Virtual Organization'). If you haven't/aren't here is how you get there.
On lx05/lx06 setup a UI:
source /vols/grid/glite/ui/3.2.10-1/external/etc/profile.d/grid-env.sh
the, get a proxy:
voms-proxy-init --voms vo.londongrid.ac.uk
If at this point you would like to send out a 'hello world' job to see if everything is working, try these instructions.
A default proxy will last for 12 h. As mastercode jobs tend to last longer you now need to store a long lived proxy on the myproxy server at RAL:
myproxy-init -n -d
This proxy will last for 7 days. To check if you still have a proxy on the myproxy server use: myproxy-info -d

(1c) Mastercode specific files
You need to replace your rscan.sh file by rscan_grid.sh which contains in addition to the mastercode steering parameters some settings needed by the grid. These are clearly marked and should be left unchanged, except to please replace my username with yours :-).
You also need run_mastercode_grid.sh which is a steering file send with the grid job which takes care of setting up the working area etc. You should not have to modify this file.
Finally you need a JDL file to describe your job to the grid, so it can be sent to the correct site. A bunch of possible jdl files is given here. They should be fairly self-explanatory.
Parameteric Jobs: If you want to send a collection of jobs that are basically identical (as mastercode jobs are, as they rely on the random number generator to create their input settings), you can use parametric job submission. The examples show how to submit a collection of 50 jobs in one go.

(1d) Submitting a mastercode job and getting the output back
Submitting a collection of 50 jobs:
glite-wms-job-submit -a -o mastercodegrid.log glite-run.imperial.parametric.jdl
Checking the status:
glite-wms-job-status -i mastercodegrid.log
retrieving the output:
This will retrieve the log files etc:
glite-wms-job-output --dir test -i mastercodegrid.log
The actual results are staged out to the local SE ('Storage Element') at the site the jobs is run. This is to avoid the results getting lost if for some reason the file transfer at the end of a grid job times out. Here is a short shell script which shows how to transfer all the root files to a directory of your choice.




(2) Technical details

(a) Root
MasterCode seems to be depending on a specific version of root (5.29) which is not available from the official root webpage anymore. I have used the version installed on lx06 and distributed it to all the sites, here is a link to the tar ball and the install script.

(b) cernlib
cernlib is not usually installed on grid nodes (it is installed on the Imperial ones). Unfortunately because of the way mastercode is compiled it relies on the shared object libraries to be present. In the end I gave up and just tarred up the version installed on lx06 and installed it in the software area on all the sites except Imperial.
This required the masterfitter/Makefile to be adapted and the LD_LIBRARY_PATH modified to include cernlib. At QMUL liblapack used by cernlib is also missing, so I copied the version used at Imperial to QMUL. This also required to modify the LD_RUN_PATH variable at QMUL. (c) Fortran compilers
f77 seems to be the most popular fortran compiler out there. QMUL however uses an Intel fortran compiler which seems to be sufficiently different from f77 to cause problems down the line. As it has f77 installed as well, removing the Intel compiler from the PATH is sufficient to ...

(3) Verification
As I am not familiar with the workings of of MasterCode, I use a standard set of parameters which I found more or less by trial and error: Debug Parameters which runs for 2000 tries and I then scan the resulting root file by eye using tree->Scan(). [...]

Updates and versioning

(1) Feb12
All scripts used for the Feb12 release are here. I now compile all code at Imperial as well, the overhead is very small and that way it's consistent among all the sites.
run_mastercode_grid_debug.sh is used to produce a set of root files for verification, using plot_debug.C to make a plot (needs the same root version as mastercode uses).
The Makefile is this directory is the one used bu non-IC sites.
The tarballs are located in:
Imperial: lcg-ls --vo=vo.londongrid.ac.uk srm://gfe02.grid.hep.ph.ic.ac.uk/pnfs/hep.ph.ic.ac.uk/data/london/mastercode/tarballs
RHUL: lcg-ls --vo=vo.londongrid.ac.uk srm://se2.ppgrid1.rhul.ac.uk/dpm/ppgrid1.rhul.ac.uk/home/vo.londongrid.ac.uk/mastercode/tarballs/
Brunel: lcg-ls --vo=vo.londongrid.ac.uk srm://dc2-grid-64.brunel.ac.uk/dpm/brunel.ac.uk/home/vo.londongrid.ac.uk/mastercode/tarballs/
QMUL: lcg-ls --vo=vo.londongrid.ac.uk srm://se03.esc.qmul.ac.uk/vo.londongrid.ac.uk/mastercode/tarballs/