4 Input and Output

4.1 Input sandbox

The files specified as a list in the inputsandbox attribute of a job describes the files that will be copied to the local execution environment. Options files and shared libraries are taken care of by GANGA so you should not specify them.

4.2 Input data

The data that your GAUDI job will read should be specified as an LHCbDataset in the inputdata attribute of your job.

The following will create a dataset object with logical filenames from the stripped DC06 dataset:

In [22]: dataLFN = LHCbDataset(files=[
'LFN:/lhcb/production/DC06/phys-v2-lumi2/00001758/DST/0000/00001758_00000001_5.dst',
'LFN:/lhcb/production/DC06/phys-v2-lumi2/00001758/DST/0000/00001758_00000002_5.dst',
])

There is no need to enter the same lines into your options file as GANGA will take care of this at submission time.

If you create a DAVINCI job in GANGA without specifying an input dataset in the j.inputdata attribute, the input data will be extracted from the options file as it will happen if you run a DAVINCI job outside GANGA. The specification of inputdata in the options file is left for backwards compatibility but will eventually disappear. This is to ensure a clear separation between the configuration of the application and the data it will process.

Note: The dataset defined in the inputdata field will take precedence over what is in the options file. So if you have a specification in both places, a warning will be issued and anything in the options file will be ignored.

You can use logical filenames for submission to the Local and Batch backends as well. The translation to the physical file names is taken care of but it is your own responsibility to give LFNs which are actually present at the site where you run (ie GANGA will not copy the files from other sites for you).

4.3 Output sandbox

When a job has finished it will copy its output back to the local file workspace (by default /gangadir/workspace). The outputdir attribute will give you the exact location. As an example:

In [4]:j.outputdir
Out[4]: /afs/cern.ch/user/u/uegede/gangadir/workspace/uegede/LocalAMGA/51/output

Normally you should just leave this field empty and all output files from your GAUDI job will get copied back. For ROOT and GAUDIPYTHON jobs you need to specify everything apart from standard output and standard error.

To look into the output sandbox it is very convenient to use the peek method on a job j.

# Look at what is in the output sandbox
j.peek()

# Look in the input sandbox
j.peek( "../input" )

# View ROOT histograms, running root.exe in a new terminal window
j.peek( "histograms.root", "root.exe &&" )

See the reference for the full documentation on what is possible with the peek method and how it can be configured.

If the size of an output sandbox file with the DIRAC backend exceeds 10 Mb it will automatically be treated as output data instead and copied to a Grid Storage Element.

4.4 OutputData

Rather than returning large files to your local file system you might want to store them on a mass storage system. This is done by default for data files created by GaussTape, DigiWriter or the DstWriter. The location of files in the mass storage depends on the backend used:

Local, LSF

Files are stored in the location $CASTOR_HOME/gangadir/j.id/outputdata/

Dirac

The files are registered in the LCG file catalogue (LFC) and can be used as input to Grid jobs in exactly the same way as for other input data. The name of the file is stored in a structure similar to the home directories on afs at CERN:

      LFN:/lhcb/user/<initial>/<username>/<diracid>/<fname>

The DIRAC id is obtained as j.backend.id. The LFN given above can be used in a new Grid job for further analysis. If you would like to retrieve a local copy of the file pointed to by the LFN do something like.

      j.backend.getOutputData(names=['myDST.dst'])

See the online help or the reference for the full documentation of the getOutputData method.

See About this document... for information on suggesting changes.