1. Dokumentace pro uživatele...
  2. Queue Luna
  3. Submitting Jobs to Queue Luna

Submitting Jobs to Queue Luna

All worker nodes luna are running the batch system PBS Professional (Portable Batch System).

Submit your jobs from the frontend tarkil.metacentrum.cz or from the worker node luna.fzu.cz which is dedicated for an interactive work.

At the batch system PBSPro a job can request resources either at the node level in so called chunks defined in a selection statement, or in job-wide resource requests.

Syntax for per-chunk resource requests in the qsub command:

-l select=N:<resource_name1=value1>:<resource_name2=value2> ...

where N stands for a number of chunks. If not specified, it is taken to be 1. One chunk represents indivisible set of resources allocated to a job at one worker node. It is also possible to affect the way how these chunks will be planned on worker nodes.

Syntax for job-wide resource requests in the qsub command:

-l <resource_name=value>

Per-chunk resources:

  • number of cores
    • ncpus=
  • amount of memory (only 400 MB per node as default)
    • mem=
  • size and type of scratch (no scratch assigned as default)
    • scratch_local=
  • node property
    • property_name=True/False

Job-wide resources:

  • estimated walltime
    • -l walltime=HH:MM:SS
  • chunk placement
    • -l place=free/pack/scatter/excl
  • Infiniband
    • -l place=group=infiniband
  • licenses
    • -l <lic_name>=<amount>
  • notifications via e-mail
    • -m=<abe>
  • job's dependency
    • -W depend=afterok:<Job Id>

Examples:

qsub -q luna -l select=1:ncpus=8:mem=10gb:scratch_local=20gb -l walltime=48:00:00 job_script.sh
qsub -q luna -l select=1:ncpus=4:mem=12gb:scratch_local=20gb -l matlab=1 -l matlab_Optimization_Toolbox=4 -l walltime=72:00:00 job_script.sh
qsub -q luna -l select=1:ncpus=2:mem=20gb:scratch_local=40gb -W depend=afterok:123456.meta-pbs.metacentrum.cz -l walltime=10:00:00 job_script.sh

Chunk placement:

  • at one or more worker nodes according to actual resource availability – as default
  • only at one worker node
    • -l place=pack
  • only at different worker nodes (a number of nodes equals a number of chunks)
    • -l place=scatter
  • at exclusive worker node/nodes
    • -l place=excl

Example:

qsub -q luna -l select=2:ncpus=4:mem=10gb -l walltime=72:00:00 -l place=scatter:excl job_script.sh


Parallel jobs:
In case a job runs on two or more worker nodes, it is necessary to specify the property -l place=group=infiniband to the qsub command in order to run a job only within one cluster.
Only nodes from the cluster Luna2019 (luna65-99) are connected via Infiniband network.

Example:

qsub -q luna -l select=2:ncpus=4:mem=10gb -l walltime=72:00:00 -l place=group=infiniband job_script.sh

Note: Do not forget to set appropriatelly an environment variable OMP_NUM_THREADS. Usually setting it to a requested number of ncpus is ok. Otherwise your job will use all cores available at the assigned worker node and will limit processes of other jobs running there.

export OMP_NUM_THREADS=$PBS_NUM_PPN

To run a job at a specific group of luna nodes (luna65-95, luna96-99, luna100-105), set the requirement -l select=luna= and the year of purchase (2019, 2020, 2021, 2022).

Example:

qsub -q luna -l select=luna=2021 ...

Job requirements for memory – the maximum value of memory a job may ask for within one particular worker node is stated in the table below.

Node TypeTotal RAMUsable RAM
luna65-95512 GB504 GB
luna96-99512 GB504 GB
luna100-1051024 GB1008 GB
luna201-206512 GB504 GB

In case you did not correctly estimate the required walltime, you can extend the walltime value with the command:

qextend <full JobId> <additional_walltime>

Example:

qextend 8642013.meta-pbs.metacentrum.cz 12:00:00

Some user quotas are set to this command, see detailed information at the MetaCentrum Wiki.

Job requirements may be also specified by special directive lines in the job script. The job script starts with #!/bin/bash and continues with a series of lines starting with #PBS. The PBS directives should precede any executable lines, otherwise the directives will be ignored. Then just submit your job:

qsub <job script>

A job script example:

#!/bin/bash
#PBS -q luna
#PBS -l select=1:ncpus=8:mem=16gb:scratch_local=20gb
#PBS -l walltime=72:00:00
...
<commands> 

Hints:

  • Use a full path for data storage arrays, i.e. /storage/locality/home/username/ instead of just plain /home/username/ in your job script.
  • By default user home directories are accessible to all users, except the job output files. You can change access permissions by yourself.
  • Use a qsub command helper at Metacentrum website (https://metavo.metacentrum.cz/en/state/index.html – the options “Personal view” and "Qsub assembler for PBSPro") to check whether your requested resources are available.
  • In case you are running hundreds of jobs at the same time, be aware of the fact, that notification e-mails might cause overloading your mailbox even blacklisting MetaCentrum mailservers.

The command qstat displays only running and waiting jobs. Use the option -x to list jobs which have finished recently.

stat -x -u username
qstat -x -f JobID
qstat -xwn -u username ... the option -n lists assigned worker nodes

A completed job is in the state F (finished).

In case it is not possible to delete a job with the command qdel JobID, use the following option:

qdel -W force JobID

Documentation: