Memory
Default limit: 2 GB per one core
Default max limit: 512 GB per job
No swap allowed for jobs
Batch system has to know in advance job memory (resource) requirements for optimal scheduling and utilization of available resources. In case your jobs require more than 2 GB of RAM you must add memory requirement in a job_submission_file, e.g.
request_memory = 3GB
Jobs will be killed (preempted) by batch system in case they start to use excessive amount of memory with respect to the requested memory limit (when the required value is exceeded by more than ten percent). This configuration protects our worker nodes from resource exhaustion and other jobs from badly behaving jobs. HTCondor provides information about maximum memory used by each running / finished job in the MemoryUsage job classAd attribute
# list user running jobs and their max memory usage
condor_q -constraint 'JobStatus =?= 2 && MemoryUsage isnt undefined' -af GlobalJobId RemoteHost Owner RequestMemory MemoryUsage
# maximum memory usage for last 100 finished / terminated jobs
condor_history -limit 100 -constraint 'Owner =?= "username"' -af GlobalJobId LastRemoteHost Owner RequestMemory MemoryUsage
and memory usage history is also stored in a job log file that can be specified by log directive in a submission file.
Preempted jobs are not immediately removed from a condor queue, but instead they are kept on-hold for seven days. User can modify job memory requirements and release held jobs.
# list held jobs
condor_q -hold -wide
# change memory requirements to 5GB for all held jobs
condor_qedit -constraint 'JobStatus =?= 5' RequestMemory=5120
# release all held jobs
condor_release -constraint 'JobStatus =?= 5'
Reasonable low memory requirements provide optimal scheduling and job execution. If your job requires more than 3 GB of memory it can take longer for batch system to find worker nodes with sufficient amount of available memory. Currently jobs that require more than 16 GB of memory can stay queued for really long time, please consult your requirements with batch system administrators.
