An alternative method for adding information about files to the SAM catalog is described in the sam declare documentation. This method is used, for example, when files are brought to the system on exchange media and added directly to the tape library without an intermediate disk staging operation.
To use SAM, the user needs to do some basic UPS setups first:
$ setup n32 #IRIX only
$ setup sam
$ setenv SAM_STATION <station_name>
On d0mino the <station_name> is "central-analysis" and this
is setup automatically when you setup sam.
To verify that the FSS is running and operational, the user can type:
$sam dump fss
to see the state of the server, including its known stagers
and file store requests. Initially, the server is blank. A typical
user should never need to start the FSS or stagers. However, in certain
cases this information is needed and is provided in a later section of
this document.
The general command for storing a file or files using FSS is:
$sam store --descrip=<descrip_file> --source=<source_dir> [--dest=<dest_location>] [--timeout=<mins>] [--resubmit] [--copy=<n>] [--create-rm-file]
This call returns 0 if all files were stored to the destination and the database without error and non-0 otherwise.
Where descrip_file is a description of the file(s) explained below, source_dir is the full path to the local directory containing the data file(s), and dest_location is the destination location (either an ENSTORE location beginning with "/pnfs" or a remote disk location of the form <node>:<remote_disk_dir>. If no --dest is specified, a destination is produced based on the information in the description file and compared to the list of valid destinations stored in the database. Use the --dest flag only if you are an expert.. The cpid, the id of the process that has created the file, can be included in the description file but is not required for certain applications, like when adding Monte Carlo files. For additional details about this parameter, see how to run projects , or the datalogger example later in this document. This operation will execute synchronously, i.e., it will return to the user's command prompt upon completion of the request or after time-out period, if one is specified. Giving the value of zero for the timeout (equivalent to --asynchronous) will cause the command to exit immediately and not wait for the store operation to complete.
The descrip_file parameter is the name of the Python file containing the description of the file being stored if you choose to add the .py suffix to the filename, this must also be used in the flag. You can also specify the complete path to the description file.
The user is encouraged to view newly imported files in the database by using the SAM Data Browsing web pages.
In the case of errors the user can throw the --resubmit flag and SAM will attempt to fix what went wrong. It is very important not to use this as your default submission as it turns off a number of SAMs error checking mechanisms.
The --storeagain flag allows the duplicate storage of files. As with --resubmit this disables some error checking in the system (i.e. those checks which verify that the file hasn't been stored before) and should only be used in the case where you are trying to store a file for a second or more time.
The --create-rm-file flag tells SAM to create a file named rmfile in
the current directory with the commands needed to delete the files that
were successfully stored. The SAM system will not remove the files
for you, just create this file. It is the responsibility of the user to
make certain that they are ready to delete the files before executing this
output file. You can run sam store several times until all files in the
system and SAM will keep appending successful transfers to the file.
$sam get file store status <file_name>
Here the <file_name> is the name of the file being stored rather than that of the meta-data file.
These description files are, in fact, python scripts and use python objects to record information about each file. Before such a file is sent to SAM, it is recommended that basic syntax checking be done done by running the description file through the Python interpreter, e.g.,
$python -c "import descrip_file" && echo "My file is good"
If the message "My file is good" doesn't appear,
there is a problem with descrip_file. This is not an acid
test to see if your data willbe accepeted by the system, but tests the
format of the description file. The user needn't know anything about Python
to be able to submit file store requests so the example files below
should be viewed merely as templates. If you are not familiar with Python,
just follow the punctuation and the indentation of the examples. Please
note the time format for start_time and end_time.
Assume the user has created a file called metadata.py in
her current directory /home/samuser/tmp, and assume she has a
file fifo.ace of size 12345 KBytes in the directory /home/samuser/outbuffer
whose contents are exactly the same as in the above import_processed.py
file. Further, assume that the user is running a consumer process in a
project and the process ID is 100. The user can then type:
$sam store --descrip=project_metadata.py --source=/home/samuser/outbuffer --dest=/pnfs/samson/NULL
This example has deliberately chosen a non-existent (as far as the SAM database is concerned) file ssh-kurino as the parent, among other things. There should be a clear error message. While the file is being stored, the user can view the status of this (and all other) request via
$sam dump fss
For example, the sam store command could have been executed in the background by using the usual & symbol at the end of the command line, thus allowing the user to continue the job and perhaps submit more requests. It is important that the FSS is capable of handling any number of submitted requests (up to a reasonable limit determined by system resources) in parallel. The fulfillment of requests is regulated by SAM resource management mechanisms, primarily by the optimizer.
Once a request has been submitted, it cannot be re-submitted or canceled unless the request has entered the error state, of which the submitting processed will be notified (i.e., the sam store command returns with a non-zero status and a brief message). Thus, the users are advised to catalogue and store files responsibly. Although SAM will make every effort to verify the files' description and accept only files that are consistent with the supporting data in SAM database, it cannot ensure that the description is 100% correct.
The format of the description file for reconstructed data is exemplified below.
# A sample description file for SAM store
from import_classes import *
TheFile = ReconstructedFile(name='fifo.ace', sizeK=12345,
events=Events(1, 100, 70),
tier='reconstructed',
start_time='08/01/1998 17:00:00',
end_time='08/01/1998 18:00:00',
parent_name='ssh-kurino',
pid=656)
In terms of the Python programming language, the file contains instantiation
of a Reconstructed File object. (This and other relevant classes
are defined in the import_processed.py file under the sam_user
package.) From the user's prospective, the file describes the following
attributes of the file being stored (fifo.ace): the name, the
size in kilobytes, the event information (first and last event numbers,
number of events in the file), the data tier for the file (in this example,
the generic reconstructed tier is used; in the future, more specific
tier such as EDU250 must be supplied), the start and end times for file
creation, the name of the parent file, and the Consumer Process ID of the
creator. There has to be exactly one parent for the Reconstructed File
class.
$sam store --descrip=mc_metadata.py --source=/home/samuser/outbuffer --dest=/pnfs/sam/NULL
In this case, the description file is called mc_metadata.py. Example
Monte Carlo description files are shown below illustrating
the format used for Monte Carlo data. Although not shown, more
than one file can be stored at a time. These files were generated
automatically by the Monte Carlo launching tool called runMCjob described
in Monte Carlo D0 documentation.
from import_classes import *
#
# Generated by runMCwin
#
my_generator = AppFamily( "generator","psim01.00.01","single" )
class MyProcess(ProcFamily):
group="mcc99"
origin_location="FNAL"
origin_facility="d0mino"
produced_for="Qizhong Li"
phase="mcp03"
def __init__(self, stream, param_file, produced_by):
self.stream=stream
self.param_file=param_file
self.produced_by=produced_by
class Generator(MyProcess):
appfamily=my_generator
channel = Channel("pdgid13","incl")
gen_fil=Generator(stream="notstreamed", \
param_file="spw_single_test.params", \
produced_by="Greg Graham")
gen_fil_import = PrimaryMCFile("lees-sam-v2.1-test.gen",
gen_fil, 133, Events(1, 10, 10), \
"08/16/2000 09:38", "08/16/2000 09:38", 2.000, channel)
#
# Generated by runMCwin
#
my_d0gstar = AppFamily( "simulator","pmc03.00.01","d0gstar"
)
class MyProcess(ProcFamily):
group="mcc99"
origin_location="FNAL"
origin_facility="d0mino"
produced_for="Qizhong Li"
phase="mcp03"
def __init__(self, stream, param_file, produced_by):
self.stream=stream
self.param_file=param_file
self.produced_by=produced_by
class Simulator(MyProcess):
appfamily=my_d0gstar
channel = Channel("pdgid13","incl")
d0g_fil=Simulator(stream="notstreamed", \
param_file="spw_d0gstar_test.params", \
produced_by="Greg Graham")
d0g_file_import = SimulatedFile("import_d0g_test.py",\
d0g_fil, 128, Events(1, 10, 10),\
"08/16/2000 09:40", "08/16/2000 09:40","lees-sam-v2.1-test.gen",
1, 1, channel)
#
# Generated by runMCwin
#
my_d0sim = AppFamily( "digitizer","psim01.00.01","d0sim"
)
class MyProcess(ProcFamily):
group="mcc99"
origin_location="FNAL"
origin_facility="d0mino"
produced_for="Qizhong Li"
phase="mcp03"
def __init__(self, stream, param_file, produced_by):
self.stream=stream
self.param_file=param_file
self.produced_by=produced_by
class Digitizer(MyProcess):
appfamily=my_d0sim
channel = Channel("pdgid13","incl")
minbi = MinBias("none","0.0")
dig_fil=Digitizer(stream="notstreamed", \
param_file="spw_d0sim_test.params", \
produced_by="Greg Graham")
dig_file_import = DigitizedFile("lees-sam-v2.1-test.psim",
dig_fil, 613, Events(1, 10, 10),
"08/16/2000 09:44", "08/16/2000 09:44",
"lees-sam-v2.1-test.d0g", 1, 1, channel, minbi)
#
# Generated by runMCwin
#
my_reco = AppFamily( "reconstruction","preco04.00.02","d0reco"
)
class MyProcess(ProcFamily):
group="HiT"
origin_location="FNAL"
origin_facility="d0mino"
produced_for="Qizhong Li"
phase="mcp03"
def __init__(self, stream, param_file, produced_by):
self.stream=stream
self.param_file=param_file
self.produced_by=produced_by
class Reconstruction(MyProcess):
appfamily=my_reco
channel = Channel("pdgid13","incl")
minbias = MinBias("none","0.0")
rec_fil=Reconstruction(stream="notstreamed", \
param_file="spw_d0reco_test.params", \
produced_by="Greg Graham")
rec_file_import = ReconstructedMCFile("lees-sam-v2.1-test.reco",
rec_fil, 671, Events(1, 10, 10),
"08/16/2000 09:50", "08/16/2000 09:50",
"lees-sam-v2.1-test.psim", 1, 1,channel,minbias)
sam establish online process --appfamily=datalogger --version=<data_logger_version> [--start-time=<process_starting_time>]
This command returns a process ID, pid, which is needed to properly identify subsequent entries in the database and is needed when the process is ended. Next, a run is established with the sam establish run command.
sam establish run --number=<run_number> --type=<run_type> --cme=<center_of_mass_energy> --start-time=<run_starting_time>
This command returns run_id needed in the data description file and when the run is ended. The file is stored using the store command.
sam store file --descrip=<description_file> --source=<source_directory> --dest=<destination_directory> [--keep-description] The keep-description qualifier allows one to store the meta-data to the database even if physical file transfer fails. Finally, at the end of each run, the end run command is issued.
sam end run --end-time=<run_ending_time> --runID=<run_id>
To finish the process, possibly at the end of each run, the end online process command is used.
sam end online process --pid=<process_id> [--end-time=<process_ending_time>]
If the end-time parameter is not supplied it will be obtained from the local system clock.
The data logger description file is more complete than the others, since for each file to be added to the SAM database there is an event list. The event list contains the event number, level 1, level 2, level 3, and luminosity block information for each event. This format is generated by the data logger and stored by sam in a fashion similar to other transfers described above.
from import_classes import *
TheEventList = [
RawEvent(
ev_num = 4, lum_block = 911,
level_1 = 0x41C6967E2781C46BF94B95FBD9E29CFBL,
level_2 = 0xBF540FF60ABD31DF237CAF1C7DE1C487E201D2BFE231E3DEE9569372500F2847L,
level_3 = 0x2C6775664287B3594DAAE488F73CEF59EEEA5656E113CA7B31D2AD8599A169D8L),
RawEvent(
ev_num = 5, lum_block = 911,
level_1 = 0xB53C3B547D55102F1B377AAEDE65B45BL,
level_2 = 0xE3DA61027A79839828CCE0E39F1A4B76858EFA5F28D98799388F751F493F8F36L,
level_3 = 0x48EE2043BF781E4D3D0D33FAEFBE36A6ADDA30E40586148EC2DC59290C6DB34EL),
RawEvent( ev_num = 6,
lum_block = 911,
level_1 = 0x62FF9F56ABE11D70A620A6FBF18F84B1L,
level_2 = 0xD35833055690DDC5F8091DDCEB537BCDE3AAB73B5648E799145223D31152EE9DL,
level_3 = 0xE0061A9F11EAA5B5E6C21C06C813DB989949FEB22001371E60AC6E32F288FD31L),
...
RawEvent( ev_num
= 4000, lum_block = 2000,
level_1 = 0xB53C3B547D55102F1B377AAEDE65B45BL,
level_2 = 0xE3DA61027A79839828CCE0E39F1A4B76858EFA5F28D98799388F751F493F8F36L,
level_3 = 0x48EE2043BF781E4D3D0D33FAEFBE36A6ADDA30E40586148EC2DC59290C6DB34EL)]
TheRawDataFile = RawDataFile(name = 'STREAM-000_0000012346_001.raw',stream
= 'STREAM-000',
part_nr = 1,start_time = '09/09/1999 17:42:25',end_time
='09/09/1999 17:44:26',sizeK=0,
lum_min = 911,lum_max = 911,ev_min = 4,ev_max
= 63,ev_list = TheEventList,
pid = 15042,run_id = 102939)
NB: The information in this section is for experts only. Do not,
under any circumstances, start your own fss or stager on machines where
a sam supported fss or stager is already running.
On a properly configured station, a server called FSS (File Store Server), runs and manages user requests to store processed files. The following information is not needed by general users, but may be required in special cases, e.g. on the reconstruction farm. The server's CORBA name is /SAMStations/<station_name>/FSS:Sewer. If the server is not running, which may be the case for a farm station, it must be started with the following command:
$sam start fss [--quiet|--verbose]
On a farm, starting the FSS typically occurs in the beginning section of the job. The above command assumes that the SAM_STATION environment variable is a valid station. See the list of valid SAM stations available on the Quickie Query Lists in the SAM Data Browsing web pages. For developers: use --opter-suffix=devel in development environment (created by setup sam -q dev) to communicate to the development, rather than production, optimizer.
The processed files that the user wishes to store with SAM must reside on a node that is a part of the station. More precisely, the files to be stored must be on a file system that is managed or at least read-accessible by the station. Thus, there has to be at least one stager running at the node or original submission. Again, a properly configured station has stagers running at all of its component nodes. If, however, there is no stager known to the FSS running at the node, such as the case for a farm station, a stager would have to be started as follows:
$sam start stager [--quiet|--verbose] [--rtfile=<pid_file>]
On a farm, starting the stager typically occurs in the worker-node script. If there is already a stager running at the node, ,such as a stager used to deliver input files, that stager can be used for storing of output files as well. Next, a running stager must be connected to the FSS via:
$sam add stager --pid=<pid> --fss=FSS
Where the pid of the affected stager may extracted from the return file
(<pid_file>) of the previous command. This command merely connects the
stager to the FSS rather than creates a new stager. If a stager had to
be re-started for any reason, it must be re-connected to the FSS afterwards.
Note: in the next version of SAM, when stations are configured and
have station masters running permanently, the above steps of starting
the FSS and/or stagers will be absent.
Aside from the configuration problems like not finding the right
station, stager, optimizer, etc, which should concern SAM administrator
rather than the user, the following error conditions may occur.
Problem: FSS server is not running
Message: CORBA Exception, server is probably dead (Minor: 0 Completed:
COMPLETED_NO)
Solution: Contact sam-design and have server(s) restarted.
Problem: Missing description file
Message: No module named foo, where foo appears
as the --descrip value.
Solution: Ensure that there is a file foo.py
in your current directory.
Problem: Syntax errors in the description file.
Message: description file is not a valid Python file.
Solution: The Python interpreter will hopefully describe the nature
of the error. Fix the description file format.
Problem: Invalid data tier, parent file name, etc.
Message: These are semantic errors in the description file found by
the SAM database.
Solution: Check spelling of data tier, file names, etc. If no problems
found, check sam browser for correct options. Contact
sam-design if new options are required.
Problem: Invalid source location. The source directory containing
the data file is invalid or inaccessible to SAM.
Message: stream of python error messages
Solution: Check spelling and location.
Problem: File delivery problems or Invalid destination.
Message: stream of python error messages
Solution: Reported by the mover agent such as encp for disk
to Enstore transfers, or rcp. Check for valid locations using
browser. If new location is required contact sam-design.
Problem: Invalid location is reported by the SAM database at the last
step of storing the file when the FSS attempts to store the new location
of the file.
Message: Invalid location
Solution: All valid file locations must be known to SAM; if a location
is not known and it is not the user's mistake, contact sam-design.
=============================================================================
Project : SAM
Package : sam
$Id: SamStore.html,v 1.24 2005/04/15 19:21:45 lauri Exp $
This work is part of a development project, called SAM, which consists
of a
number of coordinated packages each named sam_xxxx .
Notice of authorship, copyright status, and terms and conditions,
should
the software eventually become available for use outside Fermilab,
can be
found in the README and LICENCE files in the top level directory
of the main
sam package.
==============================================================================