Generic Fair Share Scheduling Using Priority-Based Queuing

Statement of the Problem

Assume we have a stream of incoming jobs arriving for execution. Assume jobs are categorized by certain abstract groups, for which a fair share allocation has been defined in some sense (to be explained later). In addition, assume that zero or more groups are priority groups such that no job of from a lower-priority group should run in the presence of a pending, ready to run job from a higher-priority group.

In our problem, we do not execute the jobs directly; rather, all the jobs have to be submitted into an underlying queuing system (QS) for execution. We further assume that the QS allows to assign priority to each job. In particular, it is possible to submit a job in a way that will let it run either before or after all the other already pending jobs, if any. Note that this capability assumes that the underlying QS does not implement any implicit (i.e., not specified by the submitter) aging.

We state the problem as finding an algorithm to assign priorities to the incoming jobs and submitting them into the QS so as to enforce fair share scheduling of jobs by the groups, possibly in the presence of priority jobs.

Applicability to the SAM Resource Management

The problem, as we state generally state it, arises in at least two places in SAM resource management. First, the Optimizer (Site Resource Manager) will regulate access to the Mass Storage System (MSS). Here, the jobs are requests to transfer files and the QS is the MSS, such as Enstore, which queues file transfer requests with submitter-assigned priorities. The jobs are grouped by the Access Mode (and possibly, further grouped by physics groups). At least one Access Mode (physics data taking) is a priority mode, i.e., its file transfers must be honored whenever possible.

Second, SAM's Station Master - Batch System intregration module (SMBSM) will recieve requests from users to run projects, and will define and submit Batch System jobs (naturally, the BS is the QS). Here, groups with which jobs are associated are the real physics groups.

Fair Share Allocation (FSA)

There may be multiple ways to specify fair share allocation for our abstract algorithm. In a simplest case, one defines FSA by usage of a single resource (example: the LSF Batch System monitors CPU usage). More generally, one can compute a weighted usage of several resources. One can also define FSA by allocating the number of processes running (in the spirit of FBS). In a yet more interesting scenario one tracks FSA by the amount ow work (however defined) done by each group.

In all cases, each abstract group receives a benefit when its job is run. The essence of the benefit is irrelevant to our algorithm, we only assume it can be computed for every job using an externally supplied function: B=B(j,g(j)). (In practice, we will provide several possible functions with administrators being able to parameterize them.) Consequently, we assume that we can compute the cumulative benefit received by a group g. The FSA is specified as the fractions of the benefit allocated to each group.

The Algorithm

Let b[g_i] be the fraction of the benefit allocated to group g_i. Let B[g_i] be the cumulative benefit received by the group.

No priority groups, pure fair share. For every incoming job j belonging to the abstract group g=g(j), we submit the job to the front of the QS if B[g_i] + B(j, g(j)) < b[g_i] and to the back of the QS otherwise.

Generalization for the presence of priority groups. For every priority group G_i, we choose a priority p[g_i]  in the underlying QS; p[g_i]'s are numerically ordered in the same way way as the groups, let the minimum be p_0. For all other groups, we arbitrarily choose two priorities p_f and p_b such that p_b < p_f < p_0. We submit every incoming priority job j from group g_i with priority p[g_i]  and every fair share job j with priority:
 p = B[g_i] + B(j, g(j)) > b[g_i]  ? p_b : p_f.