Overview
station <station_environment> <station_version> <station_name> [ station options ]
station_environment is the name of the sam_config environment you'd like to station to use.
<station_version> <station_name> - self explanatory
Example: station sm_central_analysis_prd v4_2_1_65 central-analysis
You'll need to set following packages to run station manually : sam_station , sam_config , sam_cp.
Example: setup sam_station v4_2_1_65 -q GCC-2.95.2 setup sam_config -q prd setup sam_cp smaster start [--name=<name>][--quiet|--verbose] [--nofork] [ other station options]
If the name flag is omitted, the environment variable SAM_STATION must be set.
If you want SAM to be started whenever the system reboots, you will need to make sure that your sysadmin has enabled automatic startup of UPS products (documented at http://www.fnal.gov/docs/products/ups/). Then, as a product maintainer, you will need to edit the file ${PRODUCTS}/.upsfiles/startup/<node>.products and insert commands similar to the examples shown below:
echo "now invoking ups start sam_bootstrap"
/bin/su - sam -c \
". /usr/local/etc/setups.sh; \
setup setpath; \
export SAM_BOOTSTRAP_ENV=/fnal/ups/db/sam_bootstrap/sam_bootstrap_d0lxac1.env; \
ups start sam_bootstrap"
echo "sam_bootstrap startup complete."
Note, the name of the SAM_BOOTSTRAP_ENV file will be different on your
system!
ups stop sam bootstrap ps -ef | grep samups stop sam_bootstrap does not stop station associated processes. So you'd be able to start station and keep all user requests.
To bring station to a complete stop you'll need manually kill all station processes. Kill (kill -9) all sam processes (e.g. pmaster, eworkerng, smaster, stagerng, sleep), but be careful not to kill your shell! Repeat the ps command to ensure that all unwanted processes have been killed.
Station views its manageable universe as a collection of nodes that group file locations that have common characteristics : such as reside on one physical machine, by common protocol or have common data access performance. In practice station nodes can group files that reside on shared NFS , HPSS server or single machine.
Each node must have at least one disk. Generally speaking "disk" is what nodes are made of : atomic units that can sub-group files subjecting them to "cache" management rules i.e. special selection criteria to add or remove files from the sub group.When station adds files to a disk transfer event is triggered.Erase/release is triggered when files are removed from a disk. Each disk is characterized by size and "mount" point. Mount point is the location on the disk to store incoming files. Must be interpretable by samcp.
Groups are designed to aggregate all individual disks across various nodes into common logical space that defines high level "cache" and station management policies. Groups define named share of the station resources such as disk space, number of running projects, maximum number of locked files etc that has to be referenced by application in order to proceed with data analysis.
Stagers are station "hands" when it comes down to making requests to a particular node. Stagers run samcp in its context on the machine they are installed at.
Station works on nodes and disks making decisions on what files should be transfered from, where and when. Station itself does not make actual data movement. Instead it relies on configured samcp to recognize appropriate protocols and location to do the job. Station supplies disk mount points to hint samcp about destination location and replica catalog location it chosen as source location for a files. Station admin needs to break down its hardware configuration into station nodes,disk so that samcp can adequately be configured.
Obviously, configuring the station requires the station administrator privilege (check using sam dump station --groups).
$ sam add disk [--station=<station_name>] --mount=<nodename:path> --sizeK=<size>
When determining SAM disk size please account for space needed by filesystem to maintain its records. It implies that SAM disk size should be 1-2% percent less than the filesystem total size.
If the machine argument is omitted, the current hostname assumed as nodename; the disk space must be in KBytes. It is also possible to remove a disk from a station at run time, provided there are no cached files on the disk:
$ sam remove disk [--station=<station_name>] --mount=<nodename:path> [--machine=<machine>]
For removal of a disk that does have files cached on it,
$ sam disallow disk [--station=<station_name>] --mount=<nodename:path>
should be used to uncache all files on the disk.
After disk is configured stagers need to be launched.
Stagers start automatically if nodename equals to the hostname of the machine station runs on.To override this behavior --auto-stager=no option should be given.
Stagers can be started manually via sam_bootstrap. Stager must be given a name to associate itself with station node that it serves. This name can be supplied via --node-name=<station node> argument or via current hostname if this argument is missing.
$ sam add|configure group --group=<group_name> [--station=<station_name>] [--max-disk=<KBytes>] [--max-lock=KBytes] [--max-projects=<num>] [--admin=<name1,[name2...]> ]
The add group command applies to a new group whereas the configure group command applies to a group which is already known to the station. The group itself must be known to SAM; use the SAM Web browsing tools to know about the valid groups (requests to create a new group should also be directed to sam-admin@fnal.gov ). Note the difference between a group being known to the SAM system as a whole and a group being known to a particular station. If the station flag is omitted, the environment variable SAM_STATION must be set.
The max-disk flag specifies how much disk space
the group may use on this station. The argument is the size allocated and a unit,
for example 10G is 10 Gigabytes, and 10M is 10 Megabytes.
The max-lock flag specifies
the amount of disk space that is occupied by the files that
the group group locks on disk (see the section on file locking). Note if you decrease the max-disk value station will uncache and erase files that belong to the group according to its cache replacement policy to match to the new group size. The max-projects
flag
specifies the maximum number of simultaneously run projects by the group.
The admin flag specifies administrators of the group
on the station (a group administrator on one station may not be an administrator
of that group on another station. For example, a member of a group may
setup a 'private' station on her desktop and dub herself an administrator
of that group on the station). The value of the flag is a comma-separated
list of UNIX user names of group administrators. The corresponding persons
must be members of the group (from SAM's perspectives, see SAM
Web browsing tools) in the first place.
$ sam set policy --group=<group_name> [--station=<station_name>] --policy=RANDOM|FIFO|LRU [--param=<policy-dependent-value>]
The policy determines what files are to be erased from disk when new
deliveries are required. Note that the cache replacement policy does not
affect files that group has locked on disk. Those files will not be removed
until explicitly unlocked by the group (see the section on file locking).
The currently implemented policies are Least Recently Used, First In First
Out and Random. Some policies require a parameter whose meaning is policy-dependent.
Currently, only the Random algorithm requires a parameter which must be
positive integer seed.
$ sam lock|unlock file --file=<file_name> --group=<group_name>
[--station=<station_name>]
CDF Sam at a glance, D0 Sam at a glance : overview with little information about a special station. CDF diagnostics, D0 diagnostics : get detailed information about one selected station. CDF SAM TV, D0 SAM TV : get the current projects and deliveries to the stations.
These pages provide in principle the same information as the sam dump station command. Because the pages are updated every few minutes these pages could show that your station is down although you just restarted it. The other point is that if your station is in the yellow state it does not necessarily mean that your station has problems. It can also mean that your server is behind a firewall or some other network filter or server is temporarily busy processing another request and could not respond to a ping before timing out (3 seconds).
In addition, the station's configuration, together with the information about currently cached files (and files being delivered, if any), is contained in a simple text utility:
$ sam dump station [--disks | --projects | --groups | --files={cached|requested|all}
| --all [--station=<station_name>]
$ sam dump station --station=d0ppdg-wisconsin --groups --disks --projects --fsman
*** BEGIN DUMP STATION d0ppdg-wisconsin version v4_2_1_57 running at diet.cs.wisc.edu 65 days 22 hours 20 minutes 55 seconds, admins: abaranov garzogli terekhov
Replica selection: prefer (d0mino enstore), avoid (empty)
There are 0 authorized transfer groups
Full delivery unit is enforced; external deliveries are unconstrained
Excess consumer satisfaction: 1
AUTHORIZED GROUPS:
group dzero: admins: abaranov garzogli terekhov , swap policy: LRU, fair share: 1, quotas (cur/max): projects = 0/-1, disk: 44558656KB/52428800KB, locks:0B/52428800KB
STATION DISKS:
disk 1635 diet.cs.wisc.edu:/export/sam/cache, 7870143KB/52428800KB = 15% free
station disk total: 7870143KB/52428800KB = 15% free
Disk dump shows how much space is physically free/out of total space available. You may also see that disk shows as "INACTIVE" in which case you should check availability of stagers,problem with stage area and discrepancies in disk sizes.
PROJECT MANAGER: fileReleaseTO = 1 days : maxConsumer Wait time = 1 days, max prefetched files : 5
NO PROJECTS (746 already ended, 163 prematurely)
FAIR SHARE MAN:
Benefit weights: volumes mounted: 0.2, CPU: 0.2, KBytes transferred from MSS: 0.2, KBytes transferred inter-station: 0.2, files consumed: 0.2
*** END OF STATION DUMP ***
Project dump shows name,id of the project. It shows how many files have been delivered to project,how many left and how many are locked but not yet given.
After station is installed, disks , nodes, groups and samcp are configured you'll need to supply arguments that define station criteria to select source locations for files in your projects as well as configure global routing rules to benefit from other stations caches if protocols at your site do not allow immediate access to the experiment mass storage systems.
D0 installation must use global routing to gain access to enstore. Example :
--prefer-loc=enstore --routing-station=enstore::central-router --routing-group=dzero --routing-user=<your station name>
Setup examples:
station prd <version> <your station_name> --prefer-loc=<your domain>,enstore --routing-station=enstore::central-router --routing-group=<your group> --routing-user=<your station> --pmaster-arg=--consumption-map=\.\*::<your NFS head node> --min-delivery=1k --log-file=sm_log
--prefer-loc=<your domain>,enstore - select your domain locations as a first preference, enstore as a second preference.
--pmaster-arg=--consumption-map=\.\*::<your NFS head node> - restrict all project consumer processes to receive files from NFS node only.
--min-delivery=1k - deliver as much as possible. Most of the remote stations need this option.
sam_bootstrap config line :
station sm_central_analysis_prd v4_2_1_64 central-analysis --routing-station=in2p3::central-router --routing-user=central-analysis --routing-group=dzero --revival=fast --preferred-loc=d0mino.fnal.gov:/sam/cache,enstore --excess-satisfaction=10 --retry-attempts=1 --min-delivery=1k --log-file=sm_central_analysis_log
Start station version v4_2_1_64 in sm_central_analysis_prd environment. Have files that have "in2p3" string in their location routed via central-router. Select d0mino cache locations first then ask enstore. Lock 10 files per project consumer process. Retry ones.
Station setup:
station sm_cab_prd v4_2_1_64 cab -OAhost d0mino-sam.fnal.gov --max-prefetched-files=5 --max-delivery-unit-size=3 --retry-attempts=1 --preferred-loc=d0mino.fnal.gov,enstore --min-delivery=1k --bad-loc-expire=240 --log-file=sm_log
Set bad location expiration time to 240 minutes. Set delivery unit to 3 to decrease granularity of deliveries among distributed nodes. Bind d0mino-sam interface as CORBA service.
Worker node stager setup:
stager cabsrv1_worker v4_2_1_56 fnal-cabsrv1 --with-sm --without-fss --log-file=stager_log --max-transfers=5 --node-name=d0cs181
Run stager version v4_2_1_56 in cabsrv1_worker environment. Set max-transfers (maximum concurrent transfers) to 5. Assign stager name to d0cs181.
station prd v4_2_1_64 ccin2p3-analysis --prefer-loc=cchpssd0,in2p3.fr,enstore --routing-station=enstore::central-router --routing-group=dzero --routing-user=ccin2p3-analysis --pmaster-arg=--consumption-map=\.\*::rfio://in2p3.fr --route=rfio://in2p3::rfio://in2p3.fr,ccd0.in2p3.fr --route=hpss::rfio://in2p3.fr -OAport 4501 --min-delivery=1k --constrain-delivery=ccd0.in2p3.fr --routing-public-node=ccd0.in2p3.fr --log-file=sm_log --fileReleaseTimeout=172800
--prefer-loc=cchpssd0,in2p3.fr,enstore - prefer cchpssd0 location first than in2p3 second and finally enstore. enstore location are routed via central-router.
--pmaster-arg=--consumption-map=\.\*::rfio://in2p3.fr - restrict all project consumer processes to receive files from rfio://in2p3.fr node only.
--route=rfio://in2p3::rfio://in2p3.fr,ccd0.in2p3.fr - permit intrastation transfers from "rfio://in2p3" to itself and ccd0.in2p3.fr node only.
--route=hpss::rfio://in2p3.fr - permit transfers of hpss locations to rfio://in2p3.fr node only.
--routing-public-node=ccd0.in2p3.fr - is station is used as routing station ccd0.in2p3.fr is only exporting node (not rfio://in2p3).
--constrain-delivery=ccd0.in2p3.fr - have all external files routed to ccd0.in2p3.fr unless --route flags directs otherwise.
=============================================================================
Project : SAM
Package : sam
$Id: stationConfig.shtml,v 1.18 2004/07/15 19:27:24 abaranov Exp $
This work is part of a development project, called SAM, which consists
of a
number of coordinated packages each named sam_xxxx .
Notice of authorship, copyright status, and terms and conditions,
should
the software eventually become available for use outside Fermilab,
can be
found in the README and LICENCE files in the top level directory
of the main
sam package.
==============================================================================