Administering and Using the File Storage Server (FSS)

 

1. Starting the FSS

The FSS  "ups start sam_bootsrap" command. The relevant command is:

$ samcmd start fss [--name=<name>][--quiet|--verbose] [--nofork] [<route-options>] [<retrial-options>]

If the name flag is omitted, the environment variable SAM_STATION must be set.  The  options are described in the two following section.

2. Routing options

These are used for file routing only. The online system is not, at present, envisioned to use them.

--default-route=<via>

Specifies default route for the file stores from this station. Here <via> is a colon-separated 3-tuple: <node>:<machine>:<dir>, where node is the distributed cache network node capable of routing, i.e, the name of a station, <machine> and <dir> are exact details of the interim location.

--route=<node>,<via>

Specifies the route for the given node. For the FNAL Enstore system, use "enstore". Example:

$ samcmd start fss --route=enstore,central-analysis:d0mino.fnal.gov:/sam/cache1/import/poo

When no routing (default or for a particular node) is specified, direct accessibility is assumed. The default routing for off-site SAM stations is such that stores to Enstore proceed via the central-analysis disk. Each of these two flags may be repeated to specify alternatives.
 

3. Retrial options

FSS retrial configuration options can be supplied either when the station is started or when it is explicitly configured via the following command:

$ sam configure fss [--name=<name>] [<config-options>]

where config-options all pertain to retrial of failing operations:
 --opter-retrial-count=
 --opter-retrial-interval=
 --auth-retrial-count=
 --auth-timeout=
 --stager-retrial-count=
 --stager-retrial-interval=
 --xfer-retrial-count=
 --xfer-retrial-interval=
 --relay-retrial-count=
 --relay-retrial-interval=
 --dbs-retrial-count=
 --dbs-retrial-interval=

Every retrial count is a positive integer, every retrial interval is a positive integer, in minutes. Most flags control behavior for the case when a SAM server (DB server, Optimizer, stager, or downstream FSS in case of multi-stage routing) cannot be reached due to that server being temporarly unavailable or due to the transient CORBA communication errors.

When the FSS requests authorization for file transfer, the request may never be granted if the optimizer restarts during the wait. To protect against such situation, the flags opter-retrial-count and opter-retrial-interval are used to set up a finite retrial mechanism for authorization obtainment.

If the FSS requests file transfer from the stager and the delivery fails, the FSS will in general retry after the time interval specified by the xfer-retrial-interval parameter. It is assumed that this time is long enough for the operators to ensure that, for example, bad volume/bad drive conditions are cleared with Enstore. The same flag determines the timeout waiting for file transfer callback by the "eworker". The number of transfer retrials is set
by the xfer-retrial-count flag. To deem all transfer errors fatal, set this variable to one!

The default values of the parameters may vary, check the dump of the FSS for the current values.
 

4. Viewing the FSS State

Web interfaces do not yet exist to view the FSS configuration or state. The configuration, together with brief information about current requests and request fulfillment summary, is provided by:

$ sam dump fss [--station=<station_name>]

=============================================================================
Project : SAM
Package : sam
$Id: FSS.html,v 1.2 2000/10/25 22:04:10 terekhov Exp $

This work is part of a development project, called SAM, which consists of a
number of coordinated packages each named sam_xxxx .

Notice of authorship, copyright status, and terms and conditions, should
the software eventually become available for use outside Fermilab, can be
found in the README and LICENCE files in the top level directory of the main
sam package.

==============================================================================