Rules for Creating Monte Carlo Meta Data Import Files
This document is 3 sections.
There are 2 steps in processing a monte carlo import. First, the python meta data input script must be verified, then the database update will be allowed to run. The meta data must exist in python format, and be named with a .py extension.
The monte carlo verification and import_process code is written in python. Pay particular attention to indentations in the template,(section 3). Invalid indentations in python will result in a python syntax error. The verify program will catch python syntax errors but does not interpret the error to something more understandable. All sub definitions under a class must be indented 2 spaces minimally from the class. All classes should start at the left margin. Comments are denoted with a #.
A template is provided at the end of this document to be used to create a python meta data input script.
The template uses the concept of classes and inheritance to process the meta data. Mandatory classes include runs and file. All other classes are optional and inherited by the file class.
What follows is a review of the mandatory meta data fields, and where correct values can be for those fields. Those fields designated mandatory MUST exist with a valid value, or the verification code will not let them through. All date fields must be in YYYYMMDD.HHMI format.
number_of_tapes
The number of tapes to be processed, user supplied, must be numeric, not mandatory number_of_files to be processed, must correspond to the number of class files that follow in the import data file. User supplied, mandatory field.
tape_media_type
To be defined later.
tape_format
The format the tape is in. Mandatory field. This field is normally set to tar, ascii-standard, ascii-nonstandard, cpio.
class runs parameters
A runs class must exist in the import meta data. The runs class parameters will define the initial run to be added or updated. Normally the run defined in the run class will be inherited to the first file. Runs is the first class in the metat data file. The runs class can define an existing run/run_type, or a new run that will be added to the database. If the run/run_type already exists, the times, description and center_of_mass_energy fields will be ignored, and can be 'null'.
runs_number
The run_number for this run. This can be an existing run, or a new run_number. To find valid existing runs, query the runs.run_number in the database.
runs_type
The runs_type defines the run. It is a mandatory field. Values valid for use in this field can be found in the run_types.run_type table, cannot be null. Most likely this field should be set to 'monte carlo'.
runs_begin_time
Denotes the begin time of the run being defined in the runs_class. It is a mandatory field. Must be a valid date and time, must be less than the runs_end_time. If the run_number referenced in the runs_number parameter is an existing run in the database, this field can be 'null'. If this parameter is not 'null' and the run_number exists in the database, the parameter will not be used to update the database, it will be ignored.
runs_end_time
Denotes the end time of the run being defined in the runs_class. It is a mandatory field. Must be a valid date and time, must be greater than the runs_begin_time. If the run_number referenced in the runs_number parameter is an existing run in the database, this field can be 'null'. If this parameter is not 'null' and the run_number exists in the database, the parameter will not be used to update the database, it will be ignored.
runs_description
A short, (50 bytes or less), free form description of the run being defined in the runs_class. It is a mandatory field, but it can be null (''). If this is a new run that will be added to the database, the run will be added with this run description, and this description will be the permanent description of the run . If it is an existing run, the value in runs_description will not be used to update the database, it will be ignored.
center_of_mass_energy
A number field, indicating the center of mass energy for the run. The format is number (5,3).
general paramters:
These are parameters to be defined in a class of your choosing. The class would then by inherited by fileN.
process_family, process_name, & process_version
All three of these parameters are tied together. They are all columns in the application_families table. Process_family defines the family, process_name defines the name and process_version defines the version. All three are mandatory. Values valid for use in these fields can be found in the application_families table. The fields are valid when 1 record exists in the application_familes table that has all 3 columns corresponding to the 3 parameters. If there is not a 1 to 1 correspondence the record will be flagged in error.
parameter_input_file
The parameter input file is the parameter file used in the application, eg. the Geant card file. The file should be included on the tape. Parameter_input_file is stored on the import_processes table. It is a free form string field up to 50 long. It is not mandatory and can be 'null'.
physics_channel
Defines the physics_channel for the import process. It is a free form string up to 50 bytes long. It is mandatory, but can be 'null'. Standard is to be developed.
phyiscal_stream
Defines the physical_datastream for the files. It is a mandatory field. Values valid for use in the field can be found in the physical_datastreams.physical_datastream_name table, cannot be 'null'.
data_tier
Defines the data_tier for the files. It is a mandatory field. Values valid for use in the field can be found in the data_tiers.data_tier table, cannot be 'null'.
center_of_mass_energy
Center of mass energy for the run. You may use the field, but no checking or updating to the run is done using center_of_mass_energy at the general level.
physics_group
Defines the physics working group . It is a mandatory field. Values valid for use in the field can be found in the working_groups.work_grp_name table, cannot be 'null'.
origin_location
Free form string, up to 50 bytes, defining the original location of the import process. It is a mandatory field, and must not be null.
origin_facility
Free form string, up to 50 bytes, defining the original facility of the import process. It is a mandatory field, and must not be null.
produced_for
Identifies the USERNAME the process was produced for. This username must exist in the persons table in the database. A person can be put into the database via a registration form. This person must also be assigned to the physics_group referenced above, and exist in the persons table in the database. It is a mandatory field and cannot be null. Values valid for use in the field can be found in the persons.username table.
produced_by
Identifies the USERNAME the process was produced by. This username must exist in the persons table in the database. A person can be put into the database via a registration form. This person must also be assigned to the physics_group referenced above, and exist in the persons table in the database. It is a mandatory field and cannot be null. Values valid for use in the field can be found in the persons.username table.
project_description
Description of the import process. Must be 50 bytes or less. This is a mandatory field, but it can be 'null'.
split_parent
Split parent defines whether the parents defined for a file were split to create the new file. If a parent is split, not all events of that parent go to each child. It is a mandatory field, and must be y/n/t/f. A y/t value indicates the parent was split, a n/f value indicates the parent was not split to create the child file.
run_number
This second run_number is the means to attach files in the import to a run(s) other than the run defined in the runs class. This run_number overrides the run_number in the runs class. It is not a mandatory field. However, if the run_number is used at a general class level, the run requested must preexist in the database, else an error will be flagged. By default this run will use the same run_type as defined in the run_class. If used, this value must be an integer.
tape
The name of the tape volume. It is a mandatory field and must not be 'null'.
volume_type
Defines the volume type. It is a mandatory field and cannot be null. Values valid for use in the field can be found in the volume_types.volume_type table.
volume_location
Defines the volume location. It is a mandatory field and cannot be null. Values valid for use in the field can be found in the volume_locations.volume_location table. This parameter is case sensitive.
File Class(es)
File classes define in detail the file to be created. These new files will be stored in the data_files table in the database. It is critical that the number of files defined equal the number_of_files parameter. You risk not processing all the files if this match is not correct. The file class must also inherit from previously defined classes. They can inherit from more general classes which define parameters used by many files.
parent_files
A list of 1 or more file names identifying the parent(s) of the filename. It is imperative that the parents listed either 1) already exist in the database or 2) were defined in import meta data file in one of the file classes above the file that is using the parents. Verification must be able to 'find' all the parents by previous definition or existence in the database where a file is processed. This is a mandatory field. If the file being defined has no parent, the parent_files should be set to 'null', and the split to 'n'.
filename
The name of the file to be created by this file definition. This filename must be unique to the database. Any filenames that already exist in the database will result in an error. This is a mandatory field and must not be null.
seqn
Seqn is the sequence number of the file on the volume defined for that file. This parameter must be an integer >=1. It is a mandatory field and cannot be null.
size
Size defines the size of the file in k bytes. This value must be an integer >=1. It is a mandatory field.
start_event
Defines the first event_number on the file being defined. This value must be an integer >=1. It is a mandatory field.
end_event
Defines the last event_number on the file being defined. This value must be an integer >=1. It is a mandatory field.
num_event
Defines the number of events on the file being defined. This value must be an integer >=1. It is a mandatory field.
data_files_start_time
Denotes the start time of the file. It is a mandatory field. Must be a valid date and time, and must be less than the data_files_end_time.
data_files_end_time
Denotes the end time of the file. It is a mandatory field. Must be a valid date and time, must be greater than the data_files_start_time.
An example of a python meta data import data file:, please note the indentation of 2 spaces pattern if you are unfamiliar with python.
A python meta data import data file template