RESOURCE MANAGEMENT GOALS We identify the following goals in SAM resource management (followed by explanation and then prioritization): 1) Resource allocation by access mode: a) There will be one or more priority modes such that the mode of highest priority uses as many resources as it requires, contingent on their total availablity, the second mode uses as many resources as it needs after the first mode allocation, and so forth. b) All other modes receive a fraction of the remaining resources, as determined by the dynamic experiment policies. 2) For a given access mode, dynamic fair share allocation by working groups (aka research groups, aka analysis groups although we prefer the most general definition). As in 1a), the fractions are determined by the experiment policies. 3) Jobs associated with each group will be scheduled so as to maximize throughput and minimize turnaround. So far, several access modes have been identified by the D0 collaboration, with the possibility of adding in the future. For the purposes or resource management, access modes are distinguished primarily by how critical they are for the experiment mission. It is possible that there will be only one priority mode (physics data taking), but we will not have such a restriction. With the "fair share" scheduling, a group's usage of a resource may at any given time may exceed it's share when it's not used by other groups, as long as the overall (time-average) usage proportions are maintained. The fair share scheduling is more flexible than assigning quotas but may reduce resource availability. For example, if a group starts a project after a long period of inactivity, it may find all its files removed from the cache due to other groups having been active recently. We are required to keep the number of tunable parameters as small as reasonably possible [1,2]. Therefore, as the number of research groups grows in D0, we may need to revise the notion of a group. For example, it may be necessary to separate resource allocation groups from e.g., processing groups used to establish a "consumer". Resource allocation by group will apply to every scarce resource, it will be determined by the collaboration, and executed by the SAM system administrators. In this context, "job" is either a user project or an activity to import or export data. Throughput is defined is the number of units of work done per unit time. Since SAM is a data access system, we define a unit of work as processing of a unit of data (at present, a data file). For a given access mode and a given group, all units of work will have equal weight; thus, we will not discriminate among different users within a group. We hope that within each group scheduling problems will be resolved internally. We assume that the group administrators will have some control over other users jobs. We suggest the above ordering of the goals by importance. This is why we state the second goal in the context of the access mode and third goal in the context of the group. Otherwise, the goals may conflict each other as exemplified below. - goal 2 conflicts goal 3 when group A runs a large number of projects whose files are all cached on disk. Group B submits a project that will require several MSS (Mass Storage System) transfers and several tape mounts. Maximization of the number of jobs done (and minimization of turnaround) suggest scheduling of group A projects only. This clearly is "unfair" to group B which is entitled to certain MSS bandwidth, number of tape mounts etc. - goal 1 conflicts goal 3 when a large number of small jobs of the "file on demand" mode competes with one production farm reconstruction job. The Shortest Job First (SJF) scheduling algorithm will minimize the average job turnaround (goal 3) but will compromise the D0 mission as a whole (goal 1, production activity prevails). SAM resource management will strive to use optimally, rather than to replace, resource management in the other data handling components of D0. Therefore, the following are NOT the goals of SAM resource management: 1) SAM effectively replacing the traditional batch system such as LSF. Instead, SAM will complement the batch system by concentrating on resources which are beyond the batch system's realm, such as MSS bandwidth. 2) SAM interfering with the work queues in Enstore (and other MSS's). In order to ensure cleaner, smaller interfaces and subsystem integrity, SAM will never manipulate requests which are already submitted to the mass storage system. REFERENCES 1. Requirements for the Sequential Access Model Data Access System, D0 Note 3465, CD Note TN0088, URL: http://d0db.fnal.gov/sam/doc/requirements/sam_vw_req_arch.ps 2. The Station Master design document, with the specialization on the disk caching, URL: http://d0db.fnal.gov/sam/doc/design/station.html