.. _usage-label: =============== Usage =============== This page contains the pyHepGrid usage guide. .. contents:: :local: :depth: 3 pyHepGrid instruction set ========================= 0. For nnlojet developers only: initialise LHAPDF .. code-block:: bash pyHepGrid ini -L #. initialise runcard .. code-block:: bash pyHepGrid ini runcard.py -B where ``-B`` is production in arc ``-D`` in dirac ``-A`` is warmup in arc. If you want to provide warmup files add ``-w warmup.file``. #. test your setup locally .. code-block:: bash pyHepGrid test runcard.py -B Make sure that you only run a small, quick setup locally, e.g. limit the number of events. ``test`` will always use seed 0. #. send the jobs to run with one of: .. code-block:: bash pyHepGrid run -A # ARC WARMUP pyHepGrid run -B # ARC PRODUCTION pyHepGrid run -D # DIRAC PRODUCTION #. manage the jobs/view the database of runs with: .. code-block:: bash pyHepGrid man -(A/B/D) include one or multiple of the following flags: - ``-S/-s`` for job status - ``-p`` to print the stdout of the job (selects last job of set if production) - ``-P`` to print the job log file (selects last job of set if production) - ``-I/-i`` for job information For running anything on the grid, the help text in ``pyHepGrid -h`` is useful for hidden options that aren't all necessarily documented(!). These features include warmup continuation, getting warmup data from running warmup jobs, initialising with your own warmup from elsewhere, database management stuff, running on the test queue. When running, the python script ``runfile`` (e.g. ``nnlorun.py``) is sent to the run location. This script then runs automatically, and pulls all of the appropriate files from grid storage (e.g. NNLOJET executable, runcard, warmups). It then runs NNLOJET in the appropriate mode, before tarring up output files and sending them back to the grid storage. Finalising results ================== The process of pulling the production results from grid storage to the gridui. You have a choice of setups for this (or you can implement your own) Default setup ------------- By default, ``pyHepGrid`` ships a ``"--get_data"`` script that allows you to retrieve jobs looking at the database. .. code-block:: bash pyHepGrid man -A --get_data For ARC runs (either production or warmup) or for Dirac runs: .. code-block:: bash pyHepGrid man -D --get_data The script will then ask you which database entry do you want to retrieve and will put the contents in the folders defined in your header. ``warmup_base_dir/date/of/job/RUNNAME`` for warm-ups or ``production_base_dir/date/of/job/RUNNAME`` for production. For instance, let's suppose you sent 4000 production runs to Dirac on the 1st of March and this job's entry is 14, you can do .. code-block:: bash pyHepGrid man -D -g -j 14 and it will download all .dat and .log files to ``warmup_base_dir/March/1/RUNNAME`` Custom setups ------------- For your own custom setup, you just need to write a finalisation script which exposes a function called ``do_finalise()``. This function does the pulling from the grid storage. You then set the variable ``finalisation_script`` to the name of your script (without the ``.py`` suffix). For example: .. code-block:: bash ./finalise.py set ``finalisation_script = "finalise"`` in your header and just do .. code-block:: bash pyHepGrid man --get_data This will find all of the runcards specified at the top of ``finalise_runcard.py`` (or other as specified in ``finalise_runcards``) and pull all of the data it can find for them from the grid storage. The output will be stored in ``production_base_dir`` (as set in the header) with one folder for each set of runs, and the prefix as set in ``finalise_prefix``. Corrupted data in the grid storage will be deleted. Normal workflow =============== 0. Make sure you have a working proxy #. initialise warmup runcard (optional) #. run warmup runcard (optional) #. switch warmup -> production in runcard #. When warmup complete, reinitialise runcard for production #. run production runcard as many times as you like w/ different seeds #. pull down the results (finalisation) runcard.py files details ======================== - Include a dictionary of all of the runcards you want to submit/initialise/manage, along with an identification tag that you can use for local accounting - ``template_runcard.py`` is the canonical example - Must be valid python to be used - Has a functionality whereby you can override any parameters in your header file by specifying them in the runcard file. So you can e.g specify a different submission location for specific runs, give different starting seeds/numbers of production runs. - You can even link/import functions to e.g dynamically find the best submission location