File Retrieval Using SAM: Running Projects

0.  Quick Start for the Impatient

For non-framework retrieval of files, see the Python-script based "sam run" command. For the framework, you need to start the project before running the framework, as well as stop the project  after the framework exits. When starting the project right after it has been defined, use the "new" keyword for the snapshot version.

1. Initial setup


To use SAM, the user needs to do some basic UPS setups first:

$ setup n32 #IRIX only
$ setup sam

For the following, assume, (analysis) project  "mcc99_1_ttbar" exists at the station "protofarm". Defining a project from constraints to snapshot, and then attaching a project to a station is described earlier.

From this point on their are two methods for receiving files to process.
 

sam run project
We do lots of stuff for you 
To easily run a project the user needs to do two things. 
  • Create a version of a project setup 
  • Type 'sam run project project.setup.py' from the directory that the project setup has been placed in. 
The SAM system will then do as the user asks for every file in the project.
Manual way
Total control
  • Most subsequent command use the station name and the project name. It's best to have the user define the SAM_STATION and SAM_PROJECT environment variables as follows (assuming the user uses Bourne-compatible UNIX shell): 


$ export SAM_STATION=protofarm
$ export SAM_PROJECT=mcc99_1_ttbar
 

  • To "start" the project (perhaps launch is a better word) right after it has been defined:


$ sam start project --defname=<definition> --snapvers=<version|new|last> [--quiet|--verbose] [--file-cut=<num>]

This command will start the project master which will be serving the subsequent requests for files. The prefloc parameter is a string with the preferred source location for the project's files: if some or all of the input files are located in several different places (such as d02ka dist or tape robot), the user may choose where to go. 

On occasion a project may need to be restarted. This may be done with the "sam restart project" command.

$ sam restart project [--keep-orig-procs] [--skip-skipped]
  • --keep-orig-procs By default all consumers will be created but not the consumer's process(es). This switch will force the consumer to recreate the processes.
  • --skip-skipped By default all files which have not been sucessfully SEEN by the consumer will be redelivered to the consumer. This flag will skip all files the consumer marked as in error when he released the file.

Once the Project Master is running, the user can type: 

$ sam dump project
 

  • To gain control over when the actual physical retrieval of files begins, the user should use: 


$ sam suspend delivery
$ sam resume delivery

This is especially useful when pre-fetch of files is undesirable. For example, in the farm environment one may want to suspend delivery until enough stagers have been connected to the Project Master, thus distributing the delivery load uniformly among the worker jobs (and different buffers). 

Actual File Consumption

  • To start a consumer and consumer process: 
    $ sam establish consumer --appname=<appname> [--group=<working_group>]   
        --version=<version> [--rtfile=<cid_file>]
    $ sam establish process --cid=`cut -f2 -d' ' cid_file` --rtfile=<cpid_file>

Here appname and version are the name and the version of the consumer application (such as reconstruction program), working_group is the working group that must be known to the database.  The user as identified by the UNIX uid must be known to the database as a member of the group. For the list of the valid groups, consult the query forms(developers go here.) The first command creates a consumer ID (CID) that the second command uses to create a consumer process ID (CPID), provided the CID had been written into cid_file. The group option to the first command is needed only if the consumer's group is different from the project creator's group (see the create analysis command in the project setup phase). 

  • To request next file from the project set: 


$ sam get next file --cpid=<cpid> --rtfile=<fname_file>

This command uses consumer ID (CID) and consumer process ID (CPID) as established in the previous step. It writes the name of the input project file into fname_file. If the contents of the file is the string "END OF STREAM", proceed to step 8. 

  • What the user does with the input file whose name is in fname_file, is not of SAM's concern. When she's done, inform the Project Master:


$ sam release --file=<file> --status={ok|error} --cpid=<cpid>

Here file is the name of the input file (not fname_file from the previous command). The file may be physically erased from the SAM buffer after all the interested consumers have released it. 
 

  • Repeat the last two steps  until the file name returned from the get next file command is "END OF STREAM" or "ERROR". 

  •  
  • To stop the running project master: 
   $ sam stop project

Graceful shutdown is strongly recommended over killing the physical UNIX process in which the Project Master runs. 
 

  • At any time after the project is started but before it is stopped, the user can request the consumer statistics as follows: 
   $ sam dump consumer --cid=<cid>

(Note that the statistics is about the consumer as a whole rather than about a particular consumer process. Consumer is a collection of consumer processes.) The output will contain the number of files in the project, the number of files consumed/not consumed etc. In the future, we plan to display CPU usage per file (or per event) and other useful statistics. The recommended place to view the persistently stored information about the consumer is, however, in the database (developers go here.) 
 

2. Some common problems


Aside from the configuration problems (not finding the right station, stager, optimizer, etc) that should concern SAM administrator rather than the user (see, however, the list of CORBA names of the SAM servers), the following error conditions may occur:


=============================================================================
Project : SAM
Package : sam
$Id: using_sam_projects.html,v 1.2 1999/06/18 21:57:18 terekhov Exp $

This work is part of a development project, called SAM, which consists of a
number of coordinated packages each named sam_xxxx .

Notice of authorship, copyright status, and terms and conditions, should
the software eventually become available for use outside Fermilab, can be
found in the README and LICENCE files in the top level directory of the main
sam package.

==============================================================================