The Computing Center (CC) is an organizational unit of the Institute of Physics of the Czech Academy of Sciences established in 2008. Since its inception, it has been managed by RNDr. Jiří Chudoba Ph.D. CC ensures the operation of computing clusters, storage servers and other devices in the server room.

Computing center of FZU operates several computing clusters and storage servers involved in national and international grid projects. The largest Golias cluster provided approximately 10,000 logical (5,000 real) computing cores in 2023. The cluster is continuously expanding with new servers, so it includes different old versions of CPUs from Intel and AMD. Together with storage servers with a total capacity of over 8 PB (August 2023), it is involved in the international EGI grid, the WLCG project (distributed data processing for experiments at the LHC accelerator) and the OpenScience grid (mainly for projects in the USA). Thanks to the cooperation with the national e-Infrastructure eINFRA, it has an excellent external connection (100 Gbps to the private network for the LHC and 40 Gbps to the Internet). We manage all servers using Puppet, keep changes in Git. Extensive local monitoring is provided by Prometheus and its Grafana visualization. The distribution of jobs to servers is ensured by the batch system HTCondor. Usage statistics are published on the EGI portal. The computational capacity is expanded by functional forwarding of selected jobs to the IT4I national supercomputing center. We perform backups and expansion of storage space on CESNET resources.

Other HPC clusters are intended for demanding parallel jobs of local users.

The LUNA cluster, intended mainly for FZU users from the Solid State Physics Division, is connected to the MetaCentrum national grid environment. In the middle of the year 2022, it provided 3,072 AMD computing cores in 48 servers with a total capacity of 27.6 TB of RAM. FZU users have at their disposal a priority job queue in the batch system PBSPro, which schedules jobs for all clusters in the MetaCentrum. If some servers are temporarily not fully occupied, even shorter jobs of other MetaCentum users can run on them. Local disk array provides 100 TB for fast file sharing between servers, backup space for home directories is available on remote CESNET servers. If a different operating system is needed for workloads, like on other clusters of CC, Singularity is available for virtualization. Many applications are available on the AFS shared file system and more recently on CVMFS.

The Koios cluster for the CEICO group (Central European Institute for Cosmology and Fundamental Physics) consists of 30 powerful servers connected by a low-latency high-throughput Infiniband EDR network (100Gb/s) and backed-up shared storage with a capacity of 100 TB. The computing capacity of 960 CPU cores is complemented by 14,336 GPU cores, users can use up to 11 TB of RAM. The system supports batch and interactive jobs, access from the command line and a graphical user interface. The distribution of jobs to servers is performed by the batch system Slurm. A development environment containing the latest tools from the GNU gcc and Intel compiler families, the Wolfram Mathematica interactive tool and tools for interactive work with data not only in the Python Jupyter Hub/Notebook language is prepared for users. The modularity and portability of the code is ensured by Singularity containers and the EasyBuild framework for building scientific tools.

The second HPC cluster of the CEICO group, the Phoebe cluster, consists of 20 computing nodes with a total capacity of 1,280 cores and 10.2 TB of RAM. The cluster is further expanded by two GPU servers, equipped with 8 NVIDIA A100 GPU cards. Software and data can use 218 TB of a hybrid storage, that consists of both solid and rotational hard drives. All components of the HPC cluster are connected by a low-latency 100 Gb/s Infiniband network.

Parameters of a large server room (we have more special rooms for servers):

  • area 62 m^2
  • UPS: 1X250 kVA, 2X100 kVA, total 400 kVA
  • the larger UPS provided by a 350 kVA diesel generator, which also covers air conditioning consumption in the event of a longer power outage
  • air cooling with a capacity of 108 kW
  • water cooling with a capacity of 144 kW
en_USEnglish
Přejít nahoru