Poster Session I
Project Type
Poster
Faculty Mentor’s Full Name
Jacob Downs
Faculty Mentor’s Department
Computer Science
Abstract / Artist's Statement
Slurm is a workload manager widely used in university research clusters and High-Performance Computing (HPC) environments to allocate computational resources in multi-user systems. Its primary function is to schedule submitted jobs efficiently while ensuring equitable access to shared infrastructure. To achieve this, Slurm assigns each job a priority score, which is influenced by several factors, including a “Fairshare” score. Fairshare is designed to balance resource usage over time by prioritizing users or groups who have consumed fewer computational resources relative to their allocated share. There are two algorithms that make up Fairshare, there is the classic algorithm that was previously used as the base algorithm of Fairshare, but is still used in some clusters. There is the Tree Level algorithm that is currently used that was made with extra factors in the version Slurm 19.05. These are used together in order to calculate the Fairshare score and then assigned to the user and then apply it into the Multifactor Priority plugin. This paper will go through the steps of these algorithms to see how a Fairshare score is assigned. Other works (See Yaslim, Rodrigo et. al.) focus primarily on HPC administrators, and the advanced calculations behind these algorithms. This paper will focus on the end user perspective, and what can be understood for a researcher on a research cluster to most effectively submit jobs to not interrupt their own, and other users ability to use the cluster.
Category
Physical Sciences
Analyzing the Slurm Scheduler Fairshare Scoring
UC South Ballroom
Slurm is a workload manager widely used in university research clusters and High-Performance Computing (HPC) environments to allocate computational resources in multi-user systems. Its primary function is to schedule submitted jobs efficiently while ensuring equitable access to shared infrastructure. To achieve this, Slurm assigns each job a priority score, which is influenced by several factors, including a “Fairshare” score. Fairshare is designed to balance resource usage over time by prioritizing users or groups who have consumed fewer computational resources relative to their allocated share. There are two algorithms that make up Fairshare, there is the classic algorithm that was previously used as the base algorithm of Fairshare, but is still used in some clusters. There is the Tree Level algorithm that is currently used that was made with extra factors in the version Slurm 19.05. These are used together in order to calculate the Fairshare score and then assigned to the user and then apply it into the Multifactor Priority plugin. This paper will go through the steps of these algorithms to see how a Fairshare score is assigned. Other works (See Yaslim, Rodrigo et. al.) focus primarily on HPC administrators, and the advanced calculations behind these algorithms. This paper will focus on the end user perspective, and what can be understood for a researcher on a research cluster to most effectively submit jobs to not interrupt their own, and other users ability to use the cluster.