es

Administering and Using SAM Station

 

1. Creating SAM Station

A station must first be created on a machine (or collection of machines). Send a request to SAM administrators
sam-design@fnal.gov to create a new station. Specify what machine(s) the station will control and who is authorized to be the station administrator.
 

2. Starting SAM Station

The SAM station master is started at boot time by the "ups start sam" command. The relevant command is:

$ sam start station [--name=<name>][--quiet|--verbose] [--nofork] [--auto-stagers=yes|no] [<config-options>]

If the name flag is omitted, the environment variable SAM_STATION must be set.  The --auto-stagers option defaults to "yes" and specifies whether the station should automatically start a stager for each local disk. The config-options are described in the next section.

3. Configuring the SAM Station

Station configuration options can be supplied either when the station is started or when it is explicitly configured via the following command:

$ sam configure station [--station=<name>] [<config-options>]

where config-options are:
 --min-delivery=<KBytes>
 --preferred-loc=<location>
 --honor-opter-order=yes|no
 --file-release-timeout=<minutes>
 --max-project-file-usage=<num>
 --default-batch-sytem=<name>
 

The  min-delivery flag works as follows. The SAM optimizer (global resource manager) groups requests for file deliveries by the file's tape, in order to minimize the tape mounts. Normally, the station will deliver all or none of the files in the group. If the entire group cannot be fit on disk and the min-delivery flag is set to a number in KBytes, then the station will attempt to deliver a fraction of the group but at least as much as the value of the flag. If the value of min-delivery is zero, there is no minimum unit of delivery (i.e., a single file may be delivered).

The preferred-loc flag specifies the preferred source location for the files to deliver. It affects all of the subsequently started projects and none of the already running projects.

The honor-opter-order flag determines whether the SM strictly follows the order of delivery units (request groups) authorized by the optimizer. If the flag is set to "no", the SM will try to deliver other units if the unit in the head of the "queue" is undeliverable for any reason.

The file-release-timeout flag specifies the maximum time a project is allowed to hold files and not release any of them. The feature is targeted to reap seemingly abandoned projects (when the user abandons his/her project, the project still "uses" files given to it, if any, as far as the station is concerned).

The max-project-file-usage flag limits the number of files that a project can hold (use) at the same time. If more files are available in cache at the time the project is started, they also may be given to the project, however, no more files will be requested from HSM on the project's behalf once it is using "too many" files. The feature is designed to guard HSM from requests for files for projects that are abandoned or proceed to slowly to "deserve file deliveries".

The default-batch-system flag sets the default batch system to which user jobs will be submitted unless users specify otherwise.

Obviously, configuring the station requires the station administrator  privilege (check using sam dump station --groups).

4. Configuring Groups on the SAM Station (Group Rules)

At any time during the station operation, the station administrators may define what physics groups are authorized to use the station's resources and what rules those groups obey:

$ sam add|configure group --group=<group_name> [--station=<station_name>] [--max-disk=<KBytes>] [--max-lock=KBytes] [--max-projects=<num>] [--admin=<name1,[name2...]> ]

The  add group command applies to a new group whereas the configure group command applies to a group which is already known to the station. The group itself must be known to SAM; use the SAM Web browsing tools to know about the valid groups (requests to create a new group should also be directed to sam-design@fnal.gov  ). Note the difference between a group being known to the SAM system as a whole and a group being known to a particular station. If the station flag is omitted, the environment variable SAM_STATION must be set.

The max-disk flag specifies how much disk space the group may use on this station. The argument is the size allocated and a unit, for example 10G is 10 Gigabytes, and 10M is 10 Megabytes. The max-lock flag specifies the amount of disk space that is occupied by the files that the group group locks on disk (see the section on file locking). The max-projects flag specifies the maximum number of simultaneously run projects by the group. The  admin flag  specifies admistrators of the group on the station (a group administrator on one station may not be an admistrator of that group on another station. For example, a member of a group may setup a 'private' station on her desktop and dub herself an administrator of that group on the station). The value of the flag is a comma-separated list of UNIX user names of group administrators. The corresponding persons must be members of the group (from SAM's perspectives, see SAM Web browsing tools) in the first place.
 

5. Managing Station's Disks

At any time during the station operation, the station's administrators may add a disk (a simple disk, a partition thereof or a striped array such as RAID) to the station. The disk must be mounted, writable by usename sam and be not used for other purposes):

$ sam add disk [--station=<station_name>] --mount=<path> --sizeK=<size> [--machine=<machine>]

If the machine argument is omitted, the current node is used; the disk space must be in KBytes.  It is also possible to remove a disk from a station at run time, provided there are no cached files on the disk:
 

$ sam remove disk [--station=<station_name>] --mount=<path> [--machine=<machine>]

For removal of a disk that does have files cached on it, contact SAM administrators.

6. Setting a Group's Cache Replacement Policy

At any time during the station operation, the station's group administrators may change that group's cache replacement policy:

$ sam set policy --group=<group_name> [--station=<station_name>] --policy=RANDOM|FIFO|LRU [--param=<policy-dependent-value>]

The policy determines what files are to be erased from disk when new deliveries are required. Note that the cache replacement policy does not affect files that group has locked on disk. Those files will not be removed until explicitly unlocked by the group (see the section on file locking). The currently implemented policies are Least Recently Used, First In First Out and Random. Some policies require a parameter whose meaning is policy-dependent.  Currently, only the Random algorithm requires a parameter which must be positive integer seed.
 

7. Locking of Files on Disk

At any time during the station operation, the station's group administrators may lock a cached file on its current disk location. The locked file will reside on disk until another group administrator unlocks the file. Since file locking effectively reduces the swappable disk space, it should only be used on files that require minum access latency at all times.

$ sam lock|unlock file --file=<file_name> --group=<group_name> [--station=<station_name>]
 

8. Managing Batch Systems

The station administrators may add/remove different batch systems to the station's configuration using following commands:

$ sam add batch system [--station=<station_name>] --name=<bs_name>

$ sam remove batch system [--station=<station_name>] --name=<bs_name>

If a particular batch system is the first one that is being added to the station, it will also become the default batch system. Similarly, if a batch system that is being removed from the station is configured as default, the new default batch system will be chosen randomly from the list of the remaining batch systems.

9. Viewing the Station State

Web interfaces exist to view the station's configuration. In addition, the station's configuration, together with the information about currently cached files (and files being delivered, if any), is contained in a simple text utility:
 

$ sam dump station [--disks | --projects | --groups | --files={cached|requested|all} | --all [--station=<station_name>]
 

10. Intra- and inter-station Routing. Global File Routing.

If every station node had access to all data store locations then there would be almost no need for routing. In practice, there are 2 reasons why off-site stations will need to route:
  1. Off-site nodes have no access to enstore. A remote stager running on-site is needed - ask sam-design. To ensure that all tape requests are routed though this stager we need
    --route=enstore::d0mino.fnal.gov

  2. You may wish that incoming transfers only go to a restricted set of nodes. This could be to limit the number of external transfer enabled nodes, or to route everything through your big cache, i.e.slow the cache turnover. In this case use --constrain-delivery=node.name.gov,node2,...
    This only applies to inter-station transfers, so to ensure that the transfers from the remote stager are also constrained use
    --route=d0mino::node.name.gov,node2,...
    Multiple --route arguments are allowed.

  3. --routing-station=<regexp::station_name> --routing-group=<group> --routing-user=<user>
    These are the options that take care of the situation when station does not have direct access to the group of files but allowed to request them on the behalf of other station. regexp is the regular expression that defines location pattern of the file group to be requested by the remote station "station_name". routing-group and routing-user are set of credentials under which request should be made. These options offer high level Global File Routing capabilities that obsolete administrative efforts to set up remote staging areas. Within the same station regular routing rules apply (as described in 1) and 2) ).

11 Other station arguments

  1. --retry-attempts=<num>
    This argument defines number of station retry attempts to deliver file from the single source. If file has multiple locations station will try each of them sequentially until delivery is successful otherwise delivery error is reported.

  2. --retry-interval=<num<s |m>>
    How much time should station wait until the next attempt to deliver file the had delivery failures.

  3. --stager-arg="--max-transfers=5"
    This argument can be used to pass arbitrary option to the auto-stager. Auto-stager is a stager process spawned by the station automatically during startup.

  4. --max-prefetched-files=<num>
    --max-prefetched-files allows station to balance delivery performance between projects. In particular it forces only configured number of external files to be staged by the station in parallel. This options is useful to ensure that no project jams the station delivery queue and underlying transport resources.

  5. --excess-satisfaction=<num>
    Defines project share in the cache by specifying how many files should be protected on the every project node from removal during cache turnover in addition to those that are already protected by the project's direct use. File is in use if it was given to any of its consumer processes. This option also defines the "size" of the file packet that is going to be requested when consumer process file need condition is detected.

  6. --aggressive-replication
    Turns off station mode in which replication within cache is done with respect to the immediate need for files by the project consumer processes.

  7. --pmaster-arg
    Passes any option to the pmaster process. Example --pmaster-arg=--consumption-map=fnal.gov::d0mino.

  8. --consumption-map
    This argument sets mapping between consumer process location and the list of locations accessible by the consumer process. Syntax :
    --consumption-map=<regular expr>::node1,node2,node3 
    Regular expression defines locations of the consumer processes this option should be applied to. List of nodes defines locations where station will stage files to for those consumer processes that were matched by the regular expr. Example : see --pmaster-arg option example.
  9. --routing-station-metrics=<station::metrics>
    Sets the number of files transfered concurrently from "station" via global routing.
  10. --routing-station-group=<station::group>
    Overrides default group used to authorize routing request within router.
  11. --routing-public-node=<node>
    Designates node that will be used by the remote stations to route through.
  12. --bad-loc-tolerance=<num> --bad-loc-expire=<min>
    "Bad location" is the source location that will be avoided when several replicas exist for the same file.
    Tolerance defines number of files that failed to be transfered from the particular location within expire time before this location is considered "bad" for all files that reside there. Default values are 3 and 24 hours respectively.
  13. --max-delivery-unit-size=<num>
    If optimizer authorizes unit with more than --max-delivery-unit-size files this unit will be broken on --max-delivery-unit-size chunk units.
  14. Each non conflicting option has cumulative effect otherwise last option in the list takes precedence.

=============================================================================
Project : SAM
Package : sam
$Id: Station.html,v 1.16 2005/04/15 19:21:45 lauri Exp $

This work is part of a development project, called SAM, which consists of a
number of coordinated packages each named sam_xxxx .

Notice of authorship, copyright status, and terms and conditions, should
the software eventually become available for use outside Fermilab, can be
found in the README and LICENCE files in the top level directory of the main
sam package.

==============================================================================