- Home
- /
- Documentation
- /
- Queue Luna
- /
- Submitting Jobs to Queue...
Submitting Jobs to Queue Luna
All worker nodes luna are running the batch system OpenPBS.
Submit your jobs from the frontend tarkil.metacentrum.cz or from the worker node luna.fzu.cz which is dedicated for an interactive work.
At the batch system OpenPBS a job can request resources either at the node level in so called chunks defined in a selection statement, or in job-wide resource requests.
Syntax for per-chunk resource requests in the qsub command:
-l select=N:<resource_name1=value1>:<resource_name2=value2> ...
where N stands for a number of chunks. If not specified, it is taken to be 1. One chunk represents indivisible set of resources allocated to a job at one worker node. It is also possible to affect the way how these chunks will be planned on worker nodes.
Syntax for job-wide resource requests in the qsub command:
-l <resource_name=value>
Per-chunk resources:
- number of cores
- ncpus=
- amount of memory (only 400 MB per node as default)
- mem=
- size and type of scratch (no scratch assigned as default)
- scratch_local=
- node property
- property_name=True/False
Job-wide resources:
- estimated walltime
- -l walltime=HH:MM:SS
- chunk placement
- -l place=free/pack/scatter/excl
- Infiniband
- -l place=group=infiniband
- licenses
- -l <lic_name>=<amount>
- notifications via e-mail
- -m=<abe>
- job’s dependency
- -W depend=afterok:<Job Id>
Examples:
qsub -q luna -l select=1:ncpus=8:mem=10gb:scratch_local=20gb -l walltime=48:00:00 job_script.sh
qsub -q luna -l select=1:ncpus=4:mem=12gb:scratch_local=20gb -l matlab=1 -l matlab_Optimization_Toolbox=4 -l walltime=72:00:00 job_script.sh
qsub -q luna -l select=1:ncpus=2:mem=20gb:scratch_local=40gb -W depend=afterok:123456.pbs-m1.metacentrum.cz -l walltime=10:00:00 job_script.sh
Chunk placement:
- at one or more worker nodes according to actual resource availability – as default
- only at one worker node
- -l place=pack
- only at different worker nodes (a number of nodes equals a number of chunks)
- -l place=scatter
- at exclusive worker node/nodes
- -l place=excl
Example:
qsub -q luna -l select=2:ncpus=4:mem=10gb -l walltime=72:00:00 -l place=scatter:excl job_script.sh
Parallel jobs:
In case a job runs on two or more worker nodes, it is necessary to specify the property -l place=group=infiniband to the qsub command in order to run a job only within one cluster.
Only nodes from the cluster Luna2019 (luna65-99) are connected via Infiniband network.
Example:
qsub -q luna -l select=2:ncpus=4:mem=10gb -l walltime=72:00:00 -l place=group=infiniband job_script.sh
Note: Do not forget to set appropriatelly an environment variable OMP_NUM_THREADS. Usually setting it to a requested number of ncpus is ok. Otherwise your job will use all cores available at the assigned worker node and will limit processes of other jobs running there.
export OMP_NUM_THREADS=$PBS_NUM_PPN
To run a job at a specific group of luna nodes (luna65-95, luna96-99, luna100-105, luna201-206, luna106-108), set the requirement -l select=luna= and the year of purchase (2019, 2020, 2021, 2022, 2023).
Example:
qsub -q luna -l select=luna=2021 ...
Job requirements for memory – the maximum value of memory a job may ask for within one particular worker node is stated in the table below.
Node Type | Total RAM | Usable RAM |
---|---|---|
luna65-95 | 512 GB | 504 GB |
luna96-99 | 512 GB | 504 GB |
luna100-105 | 1024 GB | 1008 GB |
luna106-108 | 1536 GB | |
luna201-206 | 512 GB | 504 GB |
In case you did not correctly estimate the required walltime, you can extend the walltime value with the command:
qextend <full JobId> <additional_walltime>
Example:
qextend 8642013.pbs-m1.metacentrum.cz 12:00:00
Some user quotas are set to this command, see detailed information at the MetaCentrum Wiki.
Job requirements may be also specified by special directive lines in the job script. The job script starts with #!/bin/bash and continues with a series of lines starting with #PBS. The PBS directives should precede any executable lines, otherwise the directives will be ignored. Then just submit your job:
qsub <job script>
A job script example:
#!/bin/bash
#PBS -q luna
#PBS -l select=1:ncpus=8:mem=16gb:scratch_local=20gb
#PBS -l walltime=72:00:00
...
<commands>
Hints:
- Use a full path for data storage arrays, i.e. /storage/locality/home/username/ instead of just plain /home/username/ in your job script.
- By default user home directories are accessible to all users, except the job output files. You can change access permissions by yourself.
- Use a qsub command helper at Metacentrum website (https://metavo.metacentrum.cz/en/state/index.html – the options “Qsub assembler”) to check whether your requested resources are available.
- In case you are running hundreds of jobs at the same time, be aware of the fact, that notification e-mails might cause overloading your mailbox even blacklisting MetaCentrum mailservers.
The command qstat displays only running and waiting jobs. Use the option -x to list jobs which have finished recently.
stat -x -u username
qstat -x -f JobID
qstat -xwn -u username ... the option -n lists assigned worker nodes
A completed job is in the state F (finished).
In case it is not possible to delete a job with the command qdel JobID, use the following option:
qdel -W force JobID
Documentation:
- Quick Start Guide: https://wiki.metacentrum.cz/w/images/9/9f/Quickstart-pbspro-small.pdf
- Batch system PBSPro – https://docs.metacentrum.cz/computing/run-basic-job/