Usage

This page contains the pyHepGrid usage guide.

pyHepGrid instruction set

  1. For nnlojet developers only: initialise LHAPDF

    pyHepGrid ini -L
    
  2. initialise runcard

    pyHepGrid ini runcard.py -B
    

    where -B is production in arc -D in dirac -A is warmup in arc. If you want to provide warmup files add -w warmup.file.

  3. test your setup locally

    pyHepGrid test runcard.py -B
    

    Make sure that you only run a small, quick setup locally, e.g. limit the number of events. test will always use seed 0.

  4. send the jobs to run with one of:

    pyHepGrid run <runcard.py> -A # ARC WARMUP
    pyHepGrid run <runcard.py> -B # ARC PRODUCTION
    pyHepGrid run <runcard.py> -D # DIRAC PRODUCTION
    
  5. manage the jobs/view the database of runs with:

    pyHepGrid man <runcard.py> -(A/B/D)
    

    include one or multiple of the following flags:

    • -S/-s for job status

    • -p to print the stdout of the job (selects last job of set if production)

    • -P to print the job log file (selects last job of set if production)

    • -I/-i for job information

For running anything on the grid, the help text in pyHepGrid -h is useful for hidden options that aren’t all necessarily documented(!). These features include warmup continuation, getting warmup data from running warmup jobs, initialising with your own warmup from elsewhere, database management stuff, running on the test queue.

When running, the python script runfile (e.g. nnlorun.py) is sent to the run location. This script then runs automatically, and pulls all of the appropriate files from grid storage (e.g. NNLOJET executable, runcard, warmups). It then runs NNLOJET in the appropriate mode, before tarring up output files and sending them back to the grid storage.

Finalising results

The process of pulling the production results from grid storage to the gridui. You have a choice of setups for this (or you can implement your own)

Default setup

By default, pyHepGrid ships a "--get_data" script that allows you to retrieve jobs looking at the database.

pyHepGrid man -A --get_data

For ARC runs (either production or warmup) or for Dirac runs:

pyHepGrid man -D --get_data

The script will then ask you which database entry do you want to retrieve and will put the contents in the folders defined in your header. warmup_base_dir/date/of/job/RUNNAME for warm-ups or production_base_dir/date/of/job/RUNNAME for production.

For instance, let’s suppose you sent 4000 production runs to Dirac on the 1st of March and this job’s entry is 14, you can do

pyHepGrid man -D -g -j 14

and it will download all .dat and .log files to warmup_base_dir/March/1/RUNNAME

Custom setups

For your own custom setup, you just need to write a finalisation script which exposes a function called do_finalise(). This function does the pulling from the grid storage. You then set the variable finalisation_script to the name of your script (without the .py suffix). For example:

./finalise.py

set finalisation_script = "finalise" in your header and just do

pyHepGrid man --get_data

This will find all of the runcards specified at the top of finalise_runcard.py (or other as specified in finalise_runcards) and pull all of the data it can find for them from the grid storage. The output will be stored in production_base_dir (as set in the header) with one folder for each set of runs, and the prefix as set in finalise_prefix. Corrupted data in the grid storage will be deleted.

Normal workflow

  1. Make sure you have a working proxy

  2. initialise warmup runcard (optional)

  3. run warmup runcard (optional)

  4. switch warmup -> production in runcard

  5. When warmup complete, reinitialise runcard for production

  6. run production runcard as many times as you like w/ different seeds

  7. pull down the results (finalisation)

runcard.py files details

  • Include a dictionary of all of the runcards you want to submit/initialise/manage, along with an identification tag that you can use for local accounting

  • template_runcard.py is the canonical example

  • Must be valid python to be used

  • Has a functionality whereby you can override any parameters in your header file by specifying them in the runcard file. So you can e.g specify a different submission location for specific runs, give different starting seeds/numbers of production runs.

  • You can even link/import functions to e.g dynamically find the best submission location