File Retrieval Using SAM: Running Jobs with SAM Projects

Pre-requisites: defining SAM datasets and using SAM with the D0 framework.

  1. Quick Start for the Impatient
  2. Using d0tools is the quickest, easiest way to access sam.

    But for those of you who want a bit more detail -- in general, you will want a script for your processing job, that will set up the necessary environment variables, etc. If however all you want to do is run a framework executable (either interactively or as an LSF batch job) you can take the following shortcut. After all the necessary setups, including RCP paths, one can type, e.g:

        d0mino> sam submit --defname=sam_demo \
                           --framework-exe=SAMTest \
                           --framework-params="-rcp SAMTest.rcp" \
                           --cpu-per-event=1m \
                           [--interactive]
    

    Here SAMTest is a framework executable compiled and configured for use with SAM. The "defname" argument specifies a valid dataset definition name that must have been created.

    Even if you succeeded, you must eventually read this entire document (bummer).

  3. Projects and Jobs
  4. In SAM, .the term "analysis project" refers to the activity of retrieving and processing SAM files. Every time you process a dataset, SAM creates and starts an analysis project. The identifying string (name) of the project (which is not to be consufed with the name of the underlying dataset definition) is passed to the user job.

    In this document, the term "user job" refers to any user executable whose primary goal is to process SAM project files. The following example script uses the framework demo program SAMTest:

       #!/bin/sh
    
       # initialize environment:
       . /usr/local/etc/setups.sh
       setup n32
       setup D0RunII <version>
       d0setwa
       setup sam
    
       sam submit --framework-exe=SAMTest --framework-params="-rcp SAMTest.rcp" \
          --cpu-per-event=1m --batch-system-flags="-J sam_demo" \
          --defname=sam_demo --dataset-version=new --group=demo
    

    If, instead of the framework executable, you must use other methods to access SAM files (such as shell or Python script), you would modify the sam submit command to something like:

       sam submit --script=mySAMscript --script-params="arg1 arg2" \
                  --cpu-per-event=1m --batch-system-flags="-J mySAM" \
                  --defname=sam_demo --dataset-version=new --group=demo
    
    For historical purposes only: read the older document on using the command-line consumer interface.

  5. The Submit Command
  6. The sam submit command will guarantee that the SAM project is started by the time the user job is dispatched, and it will submit the user job to the batch system (or run it interactively). The syntax is:

        sam submit [project arguments] \
                   [script OR framework arguments] \
                   --cpu-per-event=<time> \
                   [--batch-system-flags="<flags>"] \
                   [--interactive]
    
    where:
    project arguments: --defname=definitionName --dataset-version=definitionVersion
    OR
    --dataset=dataSetIdNumber
     
    script arguments: --script=scriptName [--script-params="arg1 arg2..."]
     
    framework arguments: --framework-exe=frameworkExe --framework-params="-rcp rcpFile..."

    Project Arguments:

    The project arguments specify the dataset to be processed. It is done either by the dataset ID, --dataset=<number>, or by the pair of the dataset definition name and dataset version: --defname=<name> --dataset-version=<version>. The dataset version is either a (small) integer number or one of the special words "last" or "new". The "last" version means that the dataset with the latest defined version is used. The "new" version means that a new dataset is created from the definition (this is the default value). Browse the SAM meta-data catalog for valid datasets. Go to the dataset definition pages to make new dataset definitions.

    The script argument specifies the user job script conforming to the above rule. Optional user-supplied arguments to the script are specified by the --script-param argument.

    The CPU per event parameter is a rough stimate of how CPU-intensive your job is. It will help SAM better allocate resources for the job. The value of the parameter is an integer followed by one of s,m,h (seconds, minutes, hours).

    The group argument specifies under which group the job is run. These are the major groups to which the D0 computing resources are allocated according to the experiment's policies. These groups must be authorized to run at the station and may be different from the processing groups used to describe the actual consumer.

    If you need to pass additional arguments to the batch system when the job is submitted, use the --batch-system-flags= parameter. The value of that parameter is batch system-specific, of course.

    As the name implies, the --interactive flag will cause your script to be executed interactively rather than be submitted to the batch system. The command sam submit --interactive ... is equivalent to sam run job ....

    Important Notes:

    1. Do not use the batch system commands, such as "bsub", directly to submit SAM jobs to the batch system!
    2. (Earlier users only) Do not use the "sam start/stop project" commands in your script any longer.
    3. Instead of writing a simple script, you may choose to do all the setups prior to the "sam submit" command, and then take advantage of the fact that the LSF system passes the user environment to the batch job. In that case, you need to supply to SAM the name of the framework executable and its parameters (with the exception of the SAM-specific -project and -station which will be added by SAM on your behalf):
        d0mino> sam submit <project_arguments> --framework-exe=<executable> \
              --cpu-per-event=<time> [--framework-params=] \
              [--batch-system-flags="..."] [--interactive]
        

  7. Examples
  8. If the above demo script is called "runsam" and it is in your path, try (assuming LSF, which understands the -J argument):

            sam submit --defname=sam_demo --cpu-per-event=1m --script=runsam \
               --script-params="SAMTest.rcp" --group=demo \
               --batch-system-flags="-J myjob"
    

    This will result in the "SAMTest" executable being run with arguments "-rcp SAMTest.rcp" in a batch job with the name "myjob", the analysis project will be created anew from the dataset definition called "sam_demo". The same can be achieved without writing any script:

            sam submit --defname=sam_demo --cpu-per-event=1m --framework-exe="SAMTest" \
                 --framework-params="-rcp SAMTest.rcp" --group=demo \
                 --batch-system-flags="-J myjob"
    

    (Note the difference between --script-params and --framework-params).

    Getting help:

            sam help submit
    
=============================================================================
Project : SAM
Package : sam
$Id: using_sam_projects.html,v 1.23 2002/08/13 17:28:02 veseli Exp $

This work is part of a development project, called SAM, which consists of a
number of coordinated packages each named sam_xxxx .

Notice of authorship, copyright status, and terms and conditions, should
the software eventually become available for use outside Fermilab, can be
found in the README and LICENCE files in the top level directory
of the main sam package.

==============================================================================