Memory

Default limit: 2GB

Batch system has to know in advance job memory (resource) requirements for optimal scheduling and utilization of available resources. In case your jobs require more than 2GB of RAM you must add memory requirement in a job_submission_file, e.g.

request_memory = 3GB

Jobs will be killed (preempted) by batch system in case they start to use excessive amount of memory with respect to the requested memory limit (when the required value is exceeded three times). This configuration protects our worker nodes from resource exhaustion and other jobs from badly behaving jobs. HTCondor provides information about maximum memory used by each running / finished job in the MemoryUsage job classAd attribute

# list user running jobs and their max memory usage
condor_q -constraint 'JobStatus =?= 2 && MemoryUsage isnt undefined' -af GlobalJobId RemoteHost Owner RequestMemory MemoryUsage

# maximum memory usage for last 100 finished / terminated jobs
condor_history -limit 100 -constraint 'Owner =?= "username"' -af GlobalJobId LastRemoteHost Owner RequestMemory MemoryUsage

and memory usage history is also stored in a job log file that can be specified by log directive in a submission file.

Preempted jobs are not immediately removed from a condor queue, but instead they are kept on-hold for one day. User can modify job memory requirements and release held jobs

# list held jobs
condor_q -hold -wide

# change memory requirements to 5GB for all held jobs
condor_qedit -constraint 'JobStatus =?= 5' RequestMemory=5120

# release all held jobs
condor_release -constraint 'JobStatus =?= 5'

Reasonable low memory requirements provide optimal scheduling and job execution. If your job requires more than 3GB of memory it can take longer for batch system to find worker nodes with sufficient amount of available memory. Currently jobs that require more than 20GB of memory can stay queued for really long time (forever), please consult your requirements with batch system administrators.

Přejít nahoru