6 Example of a DAVINCI analysis job

The example here put together the pieces from above in a non-trivial way. It will analyse a signal dataset with an algorithm from the Tutorial package of DAVINCI. The first analysis will be on the local machine going through just 100 events, and afterwards on the Grid analysing many events with splitting into many subjobs.

First we define a DAVINCI application. We place the code in a non-default location to avoid interfering with other development work. We use a supplied options file but just add a single line which limits the number of events analysed.


 v = 'v20r0'


topdir='~/public/cmtTutorial'


master='Tutorial/Analysis/v7r4'


tutdir=topdir+'/DaVinci_'+v+'/'+master+'/solutions/DaVinci3'


dv = DaVinci(version=v,

             user_release_area=topdir, 

             masterpackage=master,

             platform='slc4_ia32_gcc34')


dv.optsfile=[tutdir+'/DVTutorial_3.py']


dv.extraopts='ApplicationMgr().EvtMax = 1000'


t = JobTemplate(name='TutorialAnalysis',

                application=dv, 

                backend=Local())

Notice that we set the platform to 32-bit as we plan to run with the DIRAC backend afterwards where this is the only supported architecture.

We check out code from CVS and compile the package. Cheat by copying some code from the solutions directory before we compile.


 dv.getpack('Tutorial/Analysis  v7r4')

!cp $tutdir/*.{cpp,h} $tutdir/../../src


dv.make()

We now go on to define the dataset for the local analysis. This is simply chosen from the bookkeeping database while ensuring the data is available at CERN:

dataCERN = LHCbDataset(files=[
'LFN:/lhcb/production/DC06/v1r0/00002042/DST/0000/00002042_00000001_2.dst',
'LFN:/lhcb/production/DC06/v1r0/00002042/DST/0000/00002042_00000003_2.dst'])

With this we create our local test job with the template, only adding our dataset.

j = Job(t, inputdata=dataCERN)

While developing the code we might then go around in loops for the following lines of submission, checking, editing and rebuilding.

j.submit()

< wait for job to finish >

< look at output >
j.peek()
total 57K
-rw-r--r--     0 Aug 12 13:10 __syslog__
-rw-r--r--   35K Aug 12 13:10 stdout
-rw-r--r--   340 Aug 12 13:10 stderr
-rw-r--r--    86 Aug 12 13:10 __jobstatus__
-rw-r--r--   20K Aug 12 13:10 DVHistos_3.root

< look at stdout file >
j.peek('stdout')

< edit the code >

j.application.make()

< create new job >

j = Job(t, inputdata=dataCERN)

When happy we can then create our analysis job for the Grid. We assign a dataset with logical filenames without the restriction that they are located at CERN, change the number of events to analyse and tell the job to split into subjobs with 3 datasets per job.

data = LHCbDataset(files=[
'LFN:/lhcb/production/DC06/v1r0/00002042/DST/0000/00002042_00000001_2.dst',
'LFN:/lhcb/production/DC06/v1r0/00002042/DST/0000/00002042_00000003_2.dst',
'LFN:/lhcb/production/DC06/v1r0/00002042/DST/0000/00002042_00000005_2.dst',
'LFN:/lhcb/production/DC06/v1r0/00002042/DST/0000/00002042_00000006_2.dst',
'LFN:/lhcb/production/DC06/v1r0/00002042/DST/0000/00002042_00000008_2.dst',
'LFN:/lhcb/production/DC06/v1r0/00002042/DST/0000/00002042_00000009_2.dst',
'LFN:/lhcb/production/DC06/v1r0/00002042/DST/0000/00002042_00000010_2.dst'
])
j = Job(t,inputdata=data, backend=Dirac())
j.application.extraopts='ApplicationMgr().EvtMax = 10000000'
j.splitter=DiracSplitter(filesPerJob = 3)
j.submit()

As the dataset we used here has 7 datafiles this will cause the job to split into 3 subjobs on submission. We recommend to use 10 datasets per job as the default but break it up further here to illustrate the functionality. When the master job j is completed all its subjobs are completed.

for js in j.subjobs:
   js.peek('DVHistos_3.root','ls -sh')

1.0K /afs/.../LocalAMGA/51/0/output/DVHistos_3.root
1.0K /afs/.../LocalAMGA/51/1/output/DVHistos_3.root
1.0K /afs/.../LocalAMGA/51/2/output/DVHistos_3.root

In the end lets store the job definition for our job on the Grid as a template for easy use in a later GANGA session:

t = JobTemplate(j)
t.name='MyFirstGridAnalysis'
templates

where the last command simply print all your templates.

See About this document... for information on suggesting changes.