HTCondor monitoring scripts

HTCondor can be monitored with condor_* commands. We wrote extended python scripts which can be found on our git:

  • htc-history.py
  • htc-info.py
  • htc-run.py

History is limited to a maximum of 10 thousand entries per scheduler (approx. 4 days).

For each scheduler

1. How many jobs are currently being computed on schedulers, how long these jobs are computed (walltime) and how much they consumed the cputime (a list of each jobs, not the sum)

htc-run.py ... show number of running jobs, walltime and cputime for each scheduler
htc-run.py -a ... show all jobs
htc-run.py -g alice
htc-run.py -u nova

2. How many free slots are on the schedulers

htc-info.py ... show number of free/draining slots, number of running jobs according to schedulers, number of running / idle jobs (users) according to groups and users
htc-info.py -a ...show all slots
htc-info.py -m mikan ...show slots containing 'mikan'

3. Number of running / idle jobs according to groups

htc-info.py ... show number of free/draining slots, number of running jobs according to schedulers, number of running / idle jobs (users) according to groups and users

4. How many jobs have been finished over the last X hour

htc-history.py ... number of completed jobs according to schedulers, groups and users (with walltime and cputime)
htc-history.py -b 24 ... number of completed jobs according to schedulers, groups and users over the last 24 hours (with walltime and cputime)

5. How many jobs were finished for the period from X to Y

htc-history.py ... number of completed jobs according to schedulers, groups and users (with walltime and cputime)
htc-history.py -f "2018-10-19 02:45:00" -t "2018-10-20 12:45:00"
htc-history.py -r 1539909900 -o 1540032300

6. How much walltime / cputime has used for completed tasks for the period from X to Y

htc-history.py ... number of completed jobs according to schedulers, groups and users (with walltime and cputime)
htc-history.py -f "2018-10-19 02:45:00" -t "2018-10-20 12:45:00"
htc-history.py -r 1539909900 -o 1540032300

For each group (eg atlas, alice, auger, nova, …)

7. Number users in groups

htc-info.py ... show number of free/draining slots, number of running jobs according to schedulers, number of running / idle jobs (users) according to groups and users

8. Number of running and idle jobs for each group

htc-info.py ... show number of free/draining slots, number of running jobs according to schedulers, number of running / idle jobs (users) according to groups and users

9. Number of finished jobs over the last X hour

htc-history.py ... number of completed jobs according to schedulers, groups and users (with walltime and cputime)
htc-history.py -b 24 ... Number of completed jobs according to schedulers, groups and users over the last 24 hours (with walltime and cputime)

10. Number of finished jobs for the period from X to Y

htc-history.py ... number of completed jobs according to schedulers, groups and users (with walltime and cputime)
htc-history.py -f "2018-10-19 02:45:00" -t "2018-10-20 12:45:00"
htc-history.py -r 1539909900 -o 1540032300

11. How much walltime / cputime the group has used for completed tasks for the period from X to Y

htc-history.py ... number of completed jobs according to schedulers, groups and users (with walltime and cputime)
htc-history.py -f "2018-10-19 02:45:00" -t "2018-10-20 12:45:00"
htc-history.py -r 1539909900 -o 1540032300

For each user

12. Number of running jobs

htc-info.py ... show number of free/draining slots, number of running jobs according to schedulers, number of running / idle jobs (users) according to groups and users

13. Number of idle jobs

htc-info.py ... show number of free/draining slots, number of running jobs according to schedulers, number of running / idle jobs (users) according to groups and users

14. On which schedulers the user’s jobs are running

htc-info.py ... show number of free/draining slots, number of running jobs according to schedulers, number of running / idle jobs (users) according to groups and users

15. Number of finished jobs over the last X hour

htc-history.py ... Number of completed jobs according to schedulers, groups and users (with walltime and cputime)
htc-history.py -b 24 ... Number of completed jobs according to schedulers, groups and users over the last 24 hours (with walltime and cputime)

16. Number of finished jobs for the period from X to Y

htc-history.py ... Number of completed jobs according to schedulers, groups and users (with walltime and cputime)
htc-history.py -f "2018-10-19 02:45:00" -t "2018-10-20 12:45:00"
htc-history.py -r 1539909900 -o 1540032300

17. How much walltime / cputime the user has used for completed tasks for the period from X to Y

htc-history.py ... Number of completed jobs according to schedulers, groups and users (with walltime and cputime)
htc-history.py -f "2018-10-19 02:45:00" -t "2018-10-20 12:45:00"
htc-history.py -r 1539909900 -o 1540032300
Přejít nahoru