The new SAM Dataset Definition tools allow dataset definitions to be created using one of three methods: the SAM Command Line Interface, a SAM Python API, or the SAM Dataset Definition web GUI. Each dataset definition is defined by suppling a series of dimension names and their constraint values and operators, as well as the set operators to use when combining many different dimensions/constraints. The command line and python APIs both allow you to supply the dimensions and their constraints in either the original format which only allowed a fixed set of dimensions, or the more flexible dimension entry format, allowing more options. The web GUI for editing dataset definitions allows only the entry of the more flexible dimension/constraint pairs. The details regarding the command syntax and the allowed dimension entry formats are included below.
SAM Project Definition Grammar
When supplying dimensions and constraints for SAM Dataset Definitions, you must supply the following arguments in the format noted below. These options are used when translating constraints, creating a dataset definition, etc.arguments: name = the dimension name for the query, see Valid Dimension Names conOper = the constraint operator, see Valid Constraint Operators value = the constraint value, see Constraint Value Rules & Short Hand setOper = the set operator, see Valid Set Operators syntax: --dim="[(]name [conOper] value [setOper name [conOper] value][)] ..."
Three different example dimension clauses are include here:
Valid Dimension Names
The list of valid dimension names can be found in the Dimensions table. Or, when using the SAM Command Line Interface (CLI) you can use the --dim=help option of the "sam translate constraints" clause to show all the valid dimensions.
One special dimension name that you can use is the dimension __set__. A dimension name of __set__ indicates that the dimension is not a dimension at all, but is the inclusion of a previosly created dataset definition in this new definition. When using a dimension name of __set__, the constraint value must be the name of an existing dataset definition. This use of an existing set is most useful when you want to see the results of a new set minus any files already found in your previous dataset. Or, maybe you want to see the results of your old set but want to exclude files that meet some new criteria.
Valid Constraint Operators
The following list denotes all the valid SQL operators that can be used as your constraint operators.
=, !=, >, < >=, <=, like, not like, in, not in, between, is null, is not null
Valid Set Operators
The valid set operators are AND, OR, MINUS. A set operator of AND is equivalent to an intersection of the two sets, while the operator OR is equivalent to a union, and the set operator MINUS subtracts the second set of files from the first.
Constraint Value Rules & Short Hand
A series of constraint entry short-hands were created to make it easier for you to enter your query conditions. The following table shows the allowable formats for constraint values. You may use a full operator/value combination, e.g. file_name like '%ztautau%'. Or, you can use the simpler, short hand notation, e.g. file_name %ztatatau%.
| Constraint | Shorthand | Description |
|---|---|---|
| = 1 | 1 | No quotes needed for numbers. |
| = 'a' | a | Quotes needed for full notation but not for shorthand. Shorthand could also be 'a'. Single quotes are optional for shorthand, but required when the shorthand text contains spaces. |
| like 'a%' | a% | Presence of a wildcard (%) implies like operator in shorthand. Same quote rules as above. |
| in (4,5) | 4,5 | Comma separated shorthand implies or predicate, or more simply, an in predicate if there are only simply values. |
| in ('a','b') | a,b | Comma separated shorthand implies or predicate. Same quote rules apply for character data. |
| between 1 and 8 | 1-8 | Dash in shorthand implies between operator. For between, notice that the low value must preceed the high value. Also, while between is allowed for characters it is only recommended for use with numeric constraints. As such, the dash shorthand only applies to numeric dimensions. |
| != 22 | !22 | Only a minor shorthand, but may be easier for users. |
| not in (12,24,48) | !(12,24,48) | Exclamation point preceding parenthesis indicates a not in clause. |
| > 34 | no short-hand allowed | |
| >= 67 | no short-hand allowed | |
| < 51 | no short-hand allowed | |
| <= 40 | no short-hand allowed | |
| not like 'qid%' | no short-hand allowed | |
| is null | no short-hand allowed | |
| is not null | no short-hand allowed |
SAM Command Line Syntax
When using the SAM command line interface (CLI), you may enter your constraints in either the original format or the new dimension/constraint pair format. The original format included the dimension name as the --option, as noted below. This format only allows for a small subset of all available dimensions.
Usage: sam translate constraints --option=<value> --option=<value> ... Where: runnum eventnum datatier filename physicaldatastream logicaldatastream physicaldataset applicationfamily applicationfamilyversion
The new command line options allow for entry of dimension/constraint pairs using the --dim option. A sample dimension query is shown in the example below.
sam translate constraints --dim="file_name %test-query%"
For help with the new dimension/constraint pairing and a list of all available dimensions, you can use the option --dim=help, e.g.
sam translate constraints --dim=help
SAM Python API
The SAM Python API provides python objects that can be imported into python and used to translate project definitions into the resulting sets of files, and store those project definitions for later use. The syntax for this is still being determined. Samples of an early convention considered have been left in this document for reference. But, you should expect that they will change as the Python API is delivered.
from SAM import DatasetDefiner
datasetDefiner = DatasetDefiner.DatasetDefiner()
translation = datasetDefiner.translate('file_name %ztautau% and run_number 1-1000')
# translation is a DatasetTranslation object, which contains the following elements:
# datasetSummary = dictionary of dataset summary information, including
# fileCount = count of files found for this set of constraints
# avgFileSize = avg size (KBytes) of the files found
# volumeList = list of volume summary information, including
# volumeType = the type of volume
# fileCount = files found on this volume
# avgFileSize = avg size (KBytes) of the files found on this volume
# fileList = detailed list of file names found
# saveDefinition = method to create a definition from the translation,
# returns dataset definition object
datasetDefinition = translation.saveDefinition(name='my-favorite-project-1')
datasetSummary = datasetDefinition.translate()
# datasetSummary is indentical in structure to the above translation.datasetSummary