Poster Session I

Project Type

Poster

Faculty Mentor’s Full Name

Jacob Downs

Faculty Mentor’s Department

Computer Science

Abstract / Artist's Statement

Slurm is a workload manager widely used in university research clusters and High-Performance Computing (HPC) environments to allocate computational resources in multi-user systems. Its primary function is to schedule submitted jobs efficiently while ensuring equitable access to shared infrastructure. To achieve this, Slurm assigns each job a priority score, which is influenced by several factors, including a “Fairshare” score. Fairshare is designed to balance resource usage over time by prioritizing users or groups who have consumed fewer computational resources relative to their allocated share. There are two algorithms that make up Fairshare, there is the classic algorithm that was previously used as the base algorithm of Fairshare, but is still used in some clusters. There is the Tree Level algorithm that is currently used that was made with extra factors in the version Slurm 19.05. These are used together in order to calculate the Fairshare score and then assigned to the user and then apply it into the Multifactor Priority plugin. This paper will go through the steps of these algorithms to see how a Fairshare score is assigned. Other works (See Yaslim, Rodrigo et. al.) focus primarily on HPC administrators, and the advanced calculations behind these algorithms. This paper will focus on the end user perspective, and what can be understood for a researcher on a research cluster to most effectively submit jobs to not interrupt their own, and other users ability to use the cluster.

Category

Physical Sciences

Share

COinS
 
Apr 17th, 10:45 AM Apr 17th, 11:45 AM

Analyzing the Slurm Scheduler Fairshare Scoring

UC South Ballroom

Slurm is a workload manager widely used in university research clusters and High-Performance Computing (HPC) environments to allocate computational resources in multi-user systems. Its primary function is to schedule submitted jobs efficiently while ensuring equitable access to shared infrastructure. To achieve this, Slurm assigns each job a priority score, which is influenced by several factors, including a “Fairshare” score. Fairshare is designed to balance resource usage over time by prioritizing users or groups who have consumed fewer computational resources relative to their allocated share. There are two algorithms that make up Fairshare, there is the classic algorithm that was previously used as the base algorithm of Fairshare, but is still used in some clusters. There is the Tree Level algorithm that is currently used that was made with extra factors in the version Slurm 19.05. These are used together in order to calculate the Fairshare score and then assigned to the user and then apply it into the Multifactor Priority plugin. This paper will go through the steps of these algorithms to see how a Fairshare score is assigned. Other works (See Yaslim, Rodrigo et. al.) focus primarily on HPC administrators, and the advanced calculations behind these algorithms. This paper will focus on the end user perspective, and what can be understood for a researcher on a research cluster to most effectively submit jobs to not interrupt their own, and other users ability to use the cluster.